| | Assessing Self-Care and Social Function Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability InventoryAbstract Coster WJ, Haley SM, Ni P, Dumas HM, Fragala-Pinkham MA. Assessing self-care and social function using a computer adaptive testing version of the Pediatric Evaluation of Disability Inventory. ObjectiveTo examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DesignComputer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SettingPediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children’s homes. ParticipantsChildren with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). InterventionsNot applicable. Main Outcome MeasuresSummary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. ResultsScores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94–.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. ConclusionsSelf-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. THE PAST DECADE HAS SEEN significant effort directed to improving the measures used to examine health and function in children with disabilities.1, 2 These efforts reflect the convergence of multiple forces, including increased appreciation that the child’s ability to perform important daily activities and to participate in important life situations is the outcome that matters most to families3 and increased emphasis by payers on documentation that services provided have resulted in progress toward these goals. The importance of sound measures of function has been further illustrated by research findings that interventions may be associated with meaningful functional improvement even in the absence of measurable changes in impairments.4 Measurement development has also been advanced by the introduction of newer methodologies, in particular those using item response theory (IRT).5 These methods have supported clearer construct and item definition and the construction of scales that are sensitive to the smaller degrees of change across time often seen in children with disabilities. Nevertheless, IRT methods alone have been insufficient to address a key challenge for functional assessment: balancing comprehensiveness of coverage against practicality. To obtain sufficient coverage of the full range of function across the continuum of development and across degrees of disability, traditional fixed-length instruments tend to be so long as to be impractical for routine use in clinical settings. Alternatively, shorter instruments must sacrifice coverage, either by limiting the number of items (and therefore reducing sensitivity to change) or by limiting the age span covered by the instrument (and thereby reducing the ability to track change across the full period of child development using the same instrument). Recently, computer adaptive testing (CAT) methods have been proposed as a potential solution to this measurement dilemma.6, 7, 8 Adaptive testing approaches tailor the assessment to the current level of function of the child so that only items that yield useful information (ie, are neither too hard nor too easy) are administered. In CAT administration, the program uses the response to an initial question to establish a general range of likely function. Subsequent questions are selected through application of algorithms to progressively refine the estimated score to the range of precision established a priori by the examiner. Regardless of the actual items administered all scores are on the same scale, which supports comparisons across time or across groups of people with different levels of current functional performance. Although CAT offers a potential solution to the conflict between comprehensiveness and practicality, the reliability, validity, and acceptability of any application must still be shown through appropriate testing. The purpose of this article is to present results from a comparison of CAT results to full-length administration of 2 functional scales for children, one measuring self-care activity performance and the second measuring social function. Although there is some previous work examining CAT applications in the domain of functional mobility,9, 10 to our knowledge there are no reports of investigation of the feasibility of CAT for measuring these other important domains in children. The development of a CAT requires: (1) a large set of items (item pool) examining the functional area of interest; (2) items that scale consistently on a single dimension from low to high functional achievement; and (3) rules to guide starting, stopping, and scoring. IRT methods are used to create hierarchically organized item pools, after which software algorithms select items that match the child’s estimated functional level. All respondents answer the same first question, which has been selected a priori based on its broad coverage of the range of function. The response to the first question is used to estimate an initial score and confidence interval (CI) and guides selection of a second item within the estimated range. The response to this second question is used to re-estimate the score and CI. The process continues in an iterative fashion until the computer algorithm determines that the stopping rule has been satisfied (either a preset number of items or a minimum CI). The stopping rule can be altered to suit the specific purpose of measurement; for example, a larger confidence interval may be acceptable for large population studies, whereas a narrow CI might be important for the precision required in a clinical trial. In the present study, we created prototype CATs using the self-care and social function functional skills items from the Pediatric Evaluation of Disability Inventory (PEDI).11 Two phases of testing were conducted using the prototype CATs: computer simulation studies of retrospective data and a prospective validation study. In addition to examining the accuracy and precision of the CATs compared with the standard fixed-form assessment, we also examined perceived respondent burden for each method. Methods  Samples Analytic sample We used an existing database of 881 children who had complete data on the 73-item self-care and the 65-item social function scales of functional skills part of the PEDI. This retrospective analytic sample included 2 groups: (1) a normative sample of 412 healthy children between the ages of 6 months and 7.5 years that was also used to create the initial standardization and normative scoring of the PEDI, and (2) a clinical sample of 469 children and youth (age range, 6mo–17y) who had received inpatient, outpatient, or school-based rehabilitation services at Franciscan Hospital for Children, Boston, MA. Of the 469 clinical cases, 249 had longitudinal data appropriate for sensitivity analyses for the self-care scale and 200 had data for the social function scale. Approximately 48% of the children in the clinical sample had congenital or inherited diseases, 21% had growth and maturation disorders, 16% had acquired conditions, and 15% were diagnosed with traumatic injuries. Demographic characteristics of the analytic sample are presented in table 1. The sample size of 881 is acceptable for initial calibration work for a prototype CAT.12 | | |  | Characteristics | Analytic Sample | Cross-Validation Sample |  |
|---|
 | Age range | 6mo–17y | 6mo–18y |  |  | % Female | 45.2 | 49.3 |  |  | % Hispanic or Latino | 9.3 | 5.5 |  |  | % Asian | 1.5 | 5.5 |  |  | % Other | 5.8 | 4.1 |  |  | % Black or African American | 14.6 | 2.7 |  |  | % White | 68.8 | 82.2 |  |  | Total sample size | 881.0 | 73.0 |  | | | |
Cross-validation sample We recruited a convenience sample of 73 children and youth for the prospective cross-validation study. Thirty-eight children with disabilities, ages 1 year to 17 years, were recruited from the clinical programs (inpatient, outpatient, early intervention, and hospital-based school) at Franciscan Hospital for Children. Ethnic representation corresponding to the current United States census was targeted for recruitment; however, respondents who did not speak English as a primary language were excluded because of the prohibitive cost of translating and interpreting. Children were further selectively recruited to assure representation of each of the following 4 impairment groups: congenital or inherited disease, growth and maturation disorders, acquired conditions, and traumatic injuries. Thirty-five children without disabilities, ages 6 months to 7.5 years, were recruited through the Franciscan Family Child Care Center and the home communities of the 2 field-test coordinators. Instrument The PEDI11 is a comprehensive functional assessment instrument that measures both capability and performance of functional activities. The self-care and the social function functional skills scales were used in the present investigation. Results of a CAT application for the mobility domain of the PEDI have been reported elsewhere.10 The self-care domain includes 73 activities involved in eating and drinking, grooming, dressing, and toileting tasks, which are assessed with a series of items using a dichotomous capable or unable scoring criterion. The social function domain includes 65 items related to communication (expression and comprehension), problem solving, interactions with peers and adults, and safety at home and in the community. Several studies have supported the reliability and validity of the PEDI scales in a wide variety of clinical samples.13, 14 Evidence of construct validity has been obtained by showing the ability of the PEDI to correctly identify children with and without disabilities15 and to discriminate between different types of acquired brain injury.16, 17 Studies also have reported successful outcome monitoring using the PEDI in children with cerebral palsy,18, 19 myelodysplasisa,20, 21 osteogenesis imperfecta,22 and traumatic brain injury (TBI).23, 24, 25, 26, 27 The ability of the PEDI functional skills scales to detect meaningful clinical changes has also been shown.28 Because the development of the PEDI scales and construction of summary scores are based on Rasch rating scale methodology,29, 30, 31 these scales provide an excellent starting point for the development of prototype CATs. Development of the CAT Unidimensionality and local independence IRT and CAT methods assume certain measurement properties of item sets that purport to represent a functional construct (latent variable). These include the assumptions of unidimensionality, local independence, and stability of item parameters across groups (eg, clinical vs normative samples). Item sets that violate these assumptions may be less effective in modeling the latent variable and may limit the accuracy of a CAT instrument. A key assumption of the latent variable models that serve as the basis for CAT is that all items in a scale measure a single, unitary concept; that is, the items are unidimensional. The latent variable alone should explain how items are related to one another.32, 33 We tested the latent structure of the self-care and the social function items in a series of confirmatory factor analyses34 and evaluated item loadings and residual correlations between items using MPlus software.35, a We used weighted least squares means and variance adjusted estimation methods, which are more precise when analyzing moderate-size samples with skewed categorical data.34, 36 To determine the extent to which a unidimensional model adequately represented scale structure, we considered the eigenvalues associated with each factor extracted; item loadings on the primary factor; and results from overall model fit tests. To ensure adequate sample size for estimation of model parameters we combined the normative and clinical PEDI samples. Assuming the item parameters are similar across groups, combining the samples enhances generalizability of results across both groups and provides a greater number of persons at the moderate to low end of the scale to enhance precision of estimated scores in this region. In the self-care domain, 1 factor explained 87.9% of the item variance and all the factor loadings were very high (range, .778–.974). The comparative fit index (CFI) value of .995 indicated very good fit and can be interpreted as an indicator that 99% of covariance in the data is reproducible by the model. This conclusion was supported by the Tucker-Lewis index (TLI) value of .997, also indicating good fit. The root mean square error of approximation (RMSEA) of .078 is in the acceptable range. In the social function domain 1 factor explained 87.8% of the item variance. All the factor loadings were very high, ranging from .77 to .987 and the fit indexes also supported the 1-factor model (CFI=.994, TLI=.997, RMSEA=.104). The requirement of local independence means that scale items must be independent, or unrelated, to each other at a given score level. One indicator that items share more than the latent trait is high residual correlations. High residual correlations (> ±0.2) were observed between 9 pairs of items on the self-care scale and 24 pairs on the social function scale.37 These correlations likely reflect the structure of the PEDI, which groups similar items into skill sets that have an implicit hierarchical relation to each other. For example, the item “eats all textures of table food” implies accomplishment of the previous item “eats cut up/chunky/diced foods” and thus the response to the more challenging item is not independent of the response to the easier item. This violation of model assumptions may affect the estimation of test information and item discrimination parameters, but cannot be rectified in an existing database. Item calibrations The item parameters for each scale were estimated using the Rasch model, which estimates the item difficulty parameters.38, 39, 40 The Rasch model was selected as the best solution for this phase of the project because of simplicity in interpretation and flexibility about the underlying form of the population or trait distributions. The item parameters and fit statistics were calculated using ConQuest,41, b which is based on marginal maximum likelihood estimation. We evaluated fit using the fit statistics for each item based on the comparison of expected and observed value. To maximize sample size and the distribution of item difficulty, data for the total analytic sample were used to generate item calibrations. Note that the original item calibration and instrument standardization for the PEDI was conducted using the normative sample alone (n=412).11 In the self-care domain there were 4 items that did not fit the model: “allows nose to be wiped” (infit=1.52), “removes socks and unfastened shoes” (infit=1.6), “manages tangles and parts hair” (infit=1.72), and “brushes or combs hair” (infit=1.68). Those items were removed from the item set to be used for the CAT prototype. In the social function domain only 1 item did not fit the model: “if upset because of a problem, child must be helped immediately or behavior deteriorates” (infit=1.81). Because of the important content reflected in this item we chose to keep it in the item pool. We estimated the individual scores using weighted maximum likelihood42 estimation. Weighted maximum likelihood is preferable to the expected a posteriori methods because it adjusts the first-order bias. The individual scores were standardized to a mean 50 and standard deviation (SD) of 10. Differential item functioning In IRT, the child’s score on an item should depend entirely on the latent variable being measured. Significant differential item function (DIF) indicates that variables other than the latent variable, such as diagnosis, age, or sex, are likely influencing the response.43 We used logistic regression to determine the extent to which item responses to the self-care and social function items differed by clinical diagnosis or age. The diagnosis variable was treated dichotomously (clinical, typical) and age was treated as a continuous variable. If diagnosis or age produced significant model coefficients and the child variable explained more than 2% of variance, considering the total score, then an item was considered to exhibit DIF. A Bonferroni-corrected P value was applied for significance testing (self-care domain, P<.05/73=.0685; social function domain, P<.05/65 items=.0077). We also assessed the amount of model variance explained by the group variables. One of the 73 self-care items (“removes socks and unfastened shoes”) exhibited DIF by diagnosis. This item also showed misfit on the previous analyses, thus supporting the decision to remove this item. Sixteen of the 65 social function items exhibited DIF by diagnosis or age. There were 2 items that functioned differently for both diagnosis and age: “if upset because of a problem, child must be helped immediately or behavior deteriorates” and “explores and functions in familiar community settings without supervision.” Because the problematic items represent important content, we did not remove them. However, these items are clearly candidates for future revision. Development of the CAT program We based the self-care and social function CAT algorithms on the HDRI softwarec developed at the Health & Disability Research Institute. The CATs were designed to be completed by a child’s clinician or parent and can be administered from a stand-alone computer. We programmed the CATs to use weighted maximum likelihood score estimation.7 We selected the items “puts on pants with an elastic waist” and “provides names and descriptive information about family members” to be the first items administered to all respondents for the self-care and social function CATS, respectively. These items were chosen because their difficulty parameters were in the middle of the range, they did not exhibit DIF, and the content seemed appropriate for most respondents. The response to the first item is fed into the engine and the application calculates a probable score as well as a person-specific measure of how precise that score is. If the score is not estimated with sufficient precision, according to internal guidelines, additional questions are selected and administered until either the precision standard is reached or the defined maximum number of items has been administered. To be able to compare results from the simulation and cross-validation studies we used a fixed-stopping rule of 15 items in the present project. However, we expected that only a few respondents would need to complete that many items to attain desirable levels of precision. Accuracy of the CAT Computer simulations We evaluated the IRT-based algorithms for each CAT using computer simulation methods for the analytic sample. The simulations compare the psychometric merits of alternative strategies for programming assessments. In these simulations, responses to items selected by the CAT software were obtained for cases in the analytic data set and fed to the computer to simulate the conditions of an actual CAT assessment. As in an actual CAT, the simulation uses the IRT model to select the best item to administer next, for example, the one with the highest information function given the current score level, re-estimates the domain score and CI, and decides whether or not to continue testing. In the present study, in order to be able to compare results from the simulation and cross-validation studies, we used a fixed-stopping rule of 15 items. We developed 3 CAT scores in the simulations to reflect 3 potential item-stopping rules (self-care or social function CAT-15, self-care or social function CAT-10, and self-care or social function CAT-5). These simulated scores were compared with a criterion standard—the actual IRT latent trait score (self-care or social function) estimated by the full model. Cross-validation field test The self-care or social function CATs and full-length scales for each domain were completed on a sample of children with disabilities from the Franciscan Hospital for Children clinical programs through parent interview conducted by the field test coordinators. For children without disabilities, we also administered both instruments through interview with the parent or the parent’s designee (in some cases the child’s teacher or day care worker). The CAT was completed using the preset 15-item stopping rule to enable comparison with scores from the full-length scale. For all children, both the CAT and full-length scale were completed during 1 session. For both groups (children with and without disabilities), the order of assessment type was counterbalanced to avoid an order effect. After administration, we obtained verbal feedback from the physical therapist and/or parent respondent about the relative merits or limitations of both modes of administration. We collected the actual time (to the closest minute) required for administration of the full-length scale in 73% of the cases; each CAT had an internal clock to track the amount of time and the number of items needed to meet preset levels of precision. Demographic information (ethnicity, sex, age, and diagnosis when applicable) was collected for each child. All procedures were approved by the institutional review boards at Boston University and Franciscan Hospital for Children. Data Analysis Pearson correlations were calculated between each of the CAT scores and the optimal IRT-based latent trait score (full-length scale) to assess the extent to which simulated CAT scores were consistent with scores from the full-length form. The ability of each CAT version to discriminate between groups of children on the basis of diagnosis (normative vs clinical) as compared with the full-length scale was evaluated by comparing average scores and relative validity (RV) coefficients based on F ratios, as in previous studies.44 RV is the ratio of the F statistic for the measure in question divided by that for the best measure. The full-length scale for each domain was established as the criterion standard and the RV ratio was set to 1. The comparability of simulated CAT-based estimates in measuring change over time was examined within a subsample of the analytic clinical sample (n=249 for self-care; n=200 for social function) who had been administered each PEDI scale more than once during their rehabilitation program. Average scores and relative validity coefficients based on F ratios were compared. To compare the relative precision of the CAT scores with scores from the full-length scales, we plotted the CIs in relation to the person ability scores. A series of paired t tests was used to examine differences in the amount of time needed for each CAT (internal clock) and full-length scale (timing by test administrators) in the cross-validation study. Results  Score Precision Examination of the standard errors (SEs) and corresponding CIs of different scores showed that the CAT-15 and CAT-10 had a similar pattern; however, SEs of the CAT-5 were larger across all ranges. As expected, CAT-15 and CAT-10 SEs are somewhat larger than those from the full-length version because fewer items were used to calculate the overall score. These patterns are illustrated in Fig 1, Fig 2. For all methods, the SEs were greater at extreme score ranges. Cross-Validation Study Results from administration of the prototype CATs and previous results from simulation studies were very similar. With administration of 10 or more items, the results from the CAT were very close to scores obtained with the full item pool in terms of precision. Correlations between prototype CAT scores and scores generated from the total item pool were only very slightly lower than the correlations obtained previously with the simulated CATs (table 5). | | |  | Scales | Self-Care | Social Function |  |
|---|
 | Mean ± SD | Range | Correlation | Mean ± SD | Range | Correlation |  |
|---|
 | Full item pool | 52.32±7.61 | 35.33–62.79 | NA | 55.55±8.86 | 34.99–67.23 | NA |  |  | Actual CAT-15 | 52.45±7.52 | 35.56–62.49 | .99 | 55.59±9.31 | 33.78–67.21 | .98 |  |  | Actual CAT-10 | 52.39±7.79 | 34.53–62.19 | .98 | 55.53±9.40 | 33.78–67.18 | .98 |  |  | Actual CAT-5 | 51.83±7.79 | 37.08–61.52 | .95 | 54.73±8.18 | 37.88–62.18 | .94 |  | | | |
There were 38 children in the clinical group (mean age, 8.7y; range, 1.23–17.7y) and 35 typical children (mean age, 4.09y; range, .42–7.5y) in the sample (see table 1). A general linear model that included age, group (1: clinical group, 0: typical group), and the interaction of age and group was used for analysis. Results showed a positive main effect of age indicating scores increased with chronologic age. However, in the typical group the increase slope was much steeper than in the clinical group. There was no main effect of group, but there was a significant age by group interaction (ie, whether age had an effect depended on which group the child was in). These results may reflect the fact the most of the children in the clinical group were older, so the expected age effect would be much less. Comparing the response burden of the CAT administration with that of the paper form (full item pool), 81% of respondents said the paper version was more burdensome compared with 3% who found the CAT more burdensome. In fact, the average total time to administer both CATs was 3.9 minutes, compared with 16.49 minutes to complete both long forms (difference significant at P<.001). In addition, 84% of respondents answered that the paper version asked more irrelevant questions than the CAT but only 4% gave the opposite response. Equal percentages (37%–38%) selected the CAT or the paper version as providing more meaningful information. Finally, 70% answered that they would be more likely to use the CAT in the future, compared with 6% who preferred the long paper form and 23% who said they would be equally likely to use either. Discussion  The results of our analyses indicate that CAT models built from the PEDI self-care and social function item pools can provide accurate and valid estimates of children’s functional capabilities while substantially reducing the administrative burden compared with the full-length instruments. These results are consistent with previous research with CAT models for functional mobility10 and confirm that effective and efficient models can be developed for other domains of function important to children and families. Results from the field study were highly similar to those from the simulation studies in spite of the smaller number of participants in the cross-validation sample. These findings suggest that simulations may provide very good approximations of actual CAT administration. Most disabling conditions in children affect self-care skill acquisition or performance, and/or social development. There are also a number of significant clinical disorders that may affect these functional domains almost exclusively, such as autism spectrum disorders, emotional disorders, and intellectual disabilities, and others such as TBI that may have significant impact across all 3 of the areas examined by the PEDI. Thus, it is important that measures developed to document outcomes of rehabilitation services examine content in each of these areas in order to provide an accurate and comprehensive picture of function and disability. The results from the present study are encouraging because they show that the goal of comprehensive coverage may be achievable without loss of precision or excessive administrative burden. Although further research is clearly needed, the results suggest that the PEDI CAT offers the possibility of an outcome measure that could be usefully applied across diverse populations of children with disabilities. As was found previously for the mobility CAT, the present results suggest that very little sensitivity to change or ability to discriminate across known groups is lost as long as the CAT program has between 10 and 15 items. However, the 5-item CATs were notably less accurate and sensitive and therefore would not be recommended for most purposes. In a CAT model using a stopping rule based on a desired level of score precision, it is quite possible that the scores of some people might be estimated with fewer than 10 items. One of the advantages of CAT is that it allows users to specify the level of score precision necessary for their current purpose. Thus, in individual assessment, where high precision is desirable, a 15-item stopping rule or a criterion reflecting a smaller degree of measurement error could be applied. On the other hand, for large scale studies where efficiency of administration is essential and less precision is required, even the 5-item CAT may be acceptable. It is noteworthy that even the 15-item CAT substantially reduced the administration time required to complete both scales to an average of 4 minutes (combined). In contrast, completion of the entire PEDI questionnaire through parent interview typically takes between 30 and 45 minutes. The brief administration time of the CAT makes it far more feasible to conduct regular assessment of a child’s functional status and may support alternative methods for administration such as telephone follow-up interviews that are not practical with the longer survey format. Parent respondents may also respond more positively to the assessment in the CAT format because they are asked fewer questions that are clearly irrelevant for their child. Study Limitations The present analyses also identified a number of areas where further revision of the item pools would be appropriate. There were a substantial number of item pairs in the social function pool that did not meet the criterion for local independence as well as a smaller number in the self-care pool. This finding likely reflects the hierarchical organization of the 5-item sets within each original scale and suggests that some of these items should be dropped or reworded to capture more distinct aspects of function in their respective areas. Further exploration should also be undertaken to understand the possible reasons for DIF by group in 16 of the social function items so that this problem can be addressed either by rewriting or dropping the items. Although such revisions would likely improve performance of the PEDI CAT, our results suggest that the CAT is robust even when some items that violate scaling assumptions are retained. More direct investigation of the impact of various violations of Rasch and IRT assumptions on the performance of CAT algorithms would be extremely useful to guide future measurement efforts. In a previous study with the mobility CAT,10 clinician respondents reported that they often used the context of completing the full-length PEDI in a parent interview to establish rapport and initiate discussion with families around the needs of their child. In the present study, when asked which version they found most informative, approximately equal percentages selected the CAT and the full-length version. These findings suggest that factors other than the time required for administration may be important determinants of clinicians’ acceptance and use of assessments. These factors need to be considered carefully in future CAT work so that the CAT interface, interpretative supports, and reports are optimally designed to meet the needs of clinicians and families seeking information about a child’s functioning for various purposes. Conclusions  The results of the present study confirm that CAT methods can be applied successfully in 2 important domains of children’s functioning that have not been examined previously. Although the content of the self-care and social function item pools was substantially different from the previously examined mobility domain, the results of the simulation and cross-validation studies were very similar. Thus, application of CAT methodology can substantially reduce the time required for administration without significant loss of precision or sensitivity to change. Although further work is recommended to refine the item pools in these 2 domains, the results suggest that the CAT approach offers a valid and viable solution to the long-standing conflict between the need for accuracy in clinical assessment and the equal need for practicality of administration. Suppliers References  1. 1Msall M. Tools for measuring daily activities in children: promoting independence and developing a language for child disability. Pediatrics. 2002;109:317–319. 2. 2Lollar D, Simeonsson R, Nanda U. Measures of outcome in children and youth. Arch Phys Med Rehabil. 2000;81(12 Suppl 2):S46–S52. Abstract | Full Text |
Full-Text PDF (60 KB)
|
CrossRef
3. 3Butler C. Outcomes that matter [editorial]. Dev Med Child Neurol. 1995;37:753–754. MEDLINE 4. 4Nordmark E, Jarnlo GG, Hägglund G. Comparison of the Gross Motor Function Measure and Paediatric Evaluation of Disability Inventory in assessing motor function in children undergoing selective dorsal rhizotomy. Dev Med Child Neurol. 2000;42:245–252. MEDLINE |
CrossRef
5. 5Hays R, Morales L, Reise S. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000;38(9 Suppl):II28–II42. MEDLINE 6. 6Ware J, Bjorner J, Kosinski M. Practical implications of item response theory and computerized adaptive testing. Med Care. 2000;38:II73–II82. MEDLINE 7. 7Wainer H, Dorans N, Flaugher R. Computerized adaptive testing: a primer. 2nd ed.. Mahwah: Erlbaum; 2000;. 8. 8Revicki DA, Cella DF. Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing. Qual Life Res. 1997;6:595–600. MEDLINE |
CrossRef
9. 9Dijkers M. A computer adaptive testing simulation applied to the FIM instrument motor component. Arch Phys Med Rehabil. 2003;84:384–393. Abstract |
Full-Text PDF (113 KB)
|
CrossRef
10. 10Haley SM, Raczek AE, Coster WJ, Dumas HM, Fragala-Pinkham MA. Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory. Arch Phys Med Rehabil. 2005;86:932–939. Abstract | Full Text |
Full-Text PDF (153 KB)
|
CrossRef
11. 11Haley SM, Coster WJ, Ludlow LH. Pediatric evaluation of disability inventory: development, standardization and administration manual. Boston: Trustees of Boston University; 1992;. 12. 12Embretson SE, Reise SP. Item response theory for psychologists. Mahwah: Lawrence Erlbaum; 2000;. 13. 13Wright FV, Boschen KA. The Pediatric Evaluation of Disability Inventory (PEDI): validation of a new functional assessment outcome instrument. Can J Rehabil. 1993;7:41–42. 14. 14Nichols DS, Case-Smith J. Reliability and validity of the pediatric evaluation of disability inventory. Pediatr Phys Ther. 1996;8:15–24. 15. 15Feldman AB, Haley SM, Coryell J. Concurrent and construct validity of the Pediatric Evaluation of Disability Inventory. Phys Ther. 1990;70:602–610. MEDLINE 16. 16Fragala MA, Haley SM, Dumas HM, Rabin JP. Classifying mobility recovery in children and youth with brain injury during hospital-based rehabilitation. Brain Inj. 2002;16:149–160. MEDLINE |
CrossRef
17. 17Dumas HM, Haley SM, Ludlow LH, Rabin JP. Functional recovery in pediatric brain injury during inpatient rehabilitation. Am J Phys Med Rehabil. 2002;81:661–669. MEDLINE |
CrossRef
18. 18Ostensjo S, Strinnholm M, Carlsson M, Dahl M. Everyday functioning in young children with cerebral palsy: functional skills, caregiver assistance, and modifications of the environment. Dev Med Child Neurol. 2003;45:603–612. MEDLINE 19. 19Ketelaar M, Vermeer A, Hart H, van Petegem-van Beek E, Helders PJ. Effects of a functional therapy program on motor abilities of children with cerebral palsy. Phys Ther. 2001;81:1534–1545. MEDLINE 20. 20Norrlin S, Strinnholm M, Carlsson M, Dahl M. Factors of significance for mobility in children with myelomeningocele. Acta Paediatr. 2003;92:204–210. MEDLINE 21. 21Tsai PY, Yang TF, Chan RC, Huang PH, Wong TT. Functional investigation in children with spina bifida-measured by the Pediatric Evaluation of Disability Inventory (PEDI). Childs Nerv Syst. 2002;18:48–53. MEDLINE |
CrossRef
22. 22Engelbert RH, Custers JW, van der Net J, et al. Functional outcome in osteogenesis imperfecta: disability profiles using the PEDI. Pediatr Phys Ther. 1997;9:18–22. 23. 23Haley SM, Dumas HM, Ludlow LH. Mobility outcomes of children and adolescents in an inpatient rehabilitation program: variation by diagnostic and practice pattern groups. Phys Ther. 2001;81:1425–1436. MEDLINE 24. 24Kothari DH, Haley SM, Gill-Body KM, Dumas HM. Measuring functional change in children with acquired brain injury: comparison of normative and disease-specific scoring models using the Pediatric Evaluation of Disability Inventory (PEDI). Phys Ther. 2003;83:776–785. MEDLINE 25. 25Dumas H, Haley S, Rabin J. Short term durability and improvement of function in traumatic brain injury: a pilot study using the Paediatric Evaluation of Disability Inventory (PEDI) classification levels. Brain Inj. 2001;15:891–902. MEDLINE |
CrossRef
26. 26Dumas HM, Haley SM, Bedell GM, Hull EM. Social function changes in children and adolescents with acquired brain injury during inpatient rehabilitation. Pediatr Rehabil. 2001;4:177–185. MEDLINE 27. 27Dumas HM, Haley SM, Fragala MA, Steva BJ. Self-care recovery of children with brain injury: descriptive analysis using the Pediatric Evaluation of Disability Inventory (PEDI) functional classification levels. Phys Occup Ther Pediatr. 2001;21:17–27. 28. 28Iyer LV, Haley SM, Watkins MP, Dumas HM. Establishing minimal clinically important differences for scores on the Pediatric Evaluation of Disability Inventory for inpatient rehabilitation. Phys Ther. 2003;83:888–898. MEDLINE 29. 29Ludlow L, Haley S. New directions in pediatric rehabilitation measurement: the growing challenge. J Outcome Meas. 2000;4:482–490. 30. 30Ludlow L, Haley S. Effect of context in rating of mobility activities in children with disabilities: an assessment using the Pediatric Evaluation of Disability Inventory. Educ Psychol Meas. 1996;56:122–129. 31. 31Haley SM, Ludlow LH, Coster WJ. Pediatric Evaluation of Disability Inventory: clinical interpretation of summary scores using Rasch rating scale methodology. Phys Med Rehabil Clin N Am. 1993;4:529–540. 32. 32Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of item response theory. Newbury Park: Sage; 1991;. 33. 33Van der Linden W, Hambleton R. Handbook of modern item response theory. Berlin: Springer; 1997;. 34. 34Mislevy RJ. Recent developments in the factor analysis of categorical variables. J Educ Stat. 1986;11:3–31. 35. 35Muthen B, Muthen L. MPlus user’s guide. Los Angeles: Muthen & Muthen; 1998;. 36. 36Beauducel A, Herzberg PY. On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Struct Equat Model. 2006;13:186–203. 37. 37Tjur T. A connection between Rasch’s item analysis model and a multiplicative Poisson model. Scand J Stat. 1982;9:23–30. 38. 38Fischer G, Molenaar I. Rasch models: foundations, recent developments, and applications. Berlin: Springer-Verlag; 1995;. 39. 39Andrich D. Rasch models for measurement. Beverly Hills: Sage; 1998;. 40. 40Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149–174.
CrossRef
41. 41Wu ML, Adams RJ. ConQuest [computer software and manual]. Melbourne: Australian Council for Educational Research; 1998;. 42. 42Warm TA. Weighted likelihood estimation of ability in item response theory. Psychometrika. 1989;54:427–450.
CrossRef
43. 43Hariharan S, Rogers HJ. Detecting differential item functioning using logistic regression procedures. J Educ Meas. 1990;27:361–370. 44. 44McHorney CA, Ware JE, Lu JF, Sherbourne CD. The MOS 36-item short-form health survey (SF-36): III (Tests of data quality, scaling assumptions and reliability across diverse patient groups). Med Care. 1994;32:40–66. MEDLINE |
CrossRef
45. 45Jacobson NS, Truax P. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consul Clin Psychol. 1991;59:12–19. a Department of Occupational Therapy and Rehabilitation Counseling, Boston University Sargent College, Boston, MA b Health and Disability Research Institute, Boston University School of Public Health, Boston, MA c Research Center for Children with Special Health Care Needs, Franciscan Hospital for Children, Boston, MA. Correspondence to Wendy J. Coster, PhD, OTR/L, Dept of Occupational Therapy and Rehabilitation Counseling, Boston University Sargent College, 635 Commonwealth Ave, Boston, MA 02215
Supported by the National Center on Medical Rehabilitation Research, National Institute of Child Health and Human Development, National Institutes of Health (grant nos. R43 HD42388-01, K02 HD45354-01A1). A commercial party having a direct financial interest in the results of the research supporting this article has conferred or will confer a financial benefit upon 1 or more of the authors. Haley has stock interest in CRE Care LLC, which distributes the Pediatric Evaluation of Disability Inventory (PEDI) products. Coster and Haley have a financial interest in the distribution of PEDI products. Reprints are not available from the authors. PII: S0003-9993(08)00033-6 doi:10.1016/j.apmr.2007.09.053 © 2008 American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved. | |
|