Archives of Physical Medicine and Rehabilitation
Volume 87, Issue 9 , Pages 1223-1229, September 2006

Measurement Precision and Efficiency of Multidimensional Computer Adaptive Testing of Physical Functioning Using the Pediatric Evaluation of Disability Inventory

  • Stephen M. Haley, PhD, PT

      Affiliations

    • Health and Disability Research Institute, Boston University, Boston, MA
    • Corresponding Author InformationReprint requests to Stephen M. Haley, PhD, PT, Health and Disability Research Institute, Boston University, 53 Bay State Rd, Boston, MA 02215
  • ,
  • Pengsheng Ni, MD, MPH

      Affiliations

    • Health and Disability Research Institute, Boston University, Boston, MA
  • ,
  • Larry H. Ludlow, PhD

      Affiliations

    • Educational Research, Measurement and Evaluation Department, Lynch School of Education, Boston College, Boston MA
  • ,
  • Maria A. Fragala-Pinkham, MS, PT

      Affiliations

    • Research Center for Children with Special Health Care Needs, Franciscan Hospital for Children, Boston, MA

Article Outline

Abstract 

Haley SM, Ni P, Ludlow LH, Fragala-Pinkham MA. Measurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the Pediatric Evaluation of Disability Inventory.

Objective

To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI).

Design

Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT– and M-CAT–simulated assessments to a random draw of items.

Setting

Pediatric rehabilitation hospital and clinics.

Participants

Clinical and normative samples.

Interventions

Not applicable.

Main Outcome Measures

Not applicable.

Results

The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT.

Conclusions

M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired.

Key Words:  Outcome assessment (health care) , Pediatrics , Psychometrics , Rehabilitation

 

PHYSICAL FUNCTIONING IN CHILDREN is a fundamental component of normal development and is a domain embedded in most pediatric functional and quality of life assessments. Physical functioning constitutes a broad concept, including mobility skills, personal-care routines, play that involves movement and sports, and other aspects of the child interacting with a physical environment. Assessment of physical functioning helps health care providers and educators identify limitations in a child’s repertoire of activities in the context of daily routines at home and school.1, 2, 3

Physical-functioning assessments used for routine screening or evaluation should be practical yet provide rich information about the child’s ability to conduct daily routines. Assessments of physical functioning conducted in health care environments, often administered through parent or clinician report, are challenging for many reasons.4, 5 There is an inverse relation between response burden and the likelihood that an assessment will actually be completed and used for clinical or research purposes. Second, items that are relevant for 1 age group may not be appropriate for another, such as items that evaluate floor movement (creeping, crawling) for older children and items that address sports skills for infants. This often leads to the development of parallel forms for different age groups, for which scores across the different versions cannot be compared. A final problem to note is the lack of conceptual and empirical work to identify meaningful subdimensions of physical functioning, which can lead to difficulty in interpreting global summary scores and assessing effects of intended interventions.

Computer adaptive testing (CAT) has been proposed as an alternative to fixed-format instruments that have the limitations noted previously. In contrast to an assessment in which all items must be scored for every child, CAT6 selects only questions that are appropriate to a child’s functional level based on previous responses and skips items that are clearly too easy or too hard. CAT uses an algorithm7 that selects questions tailored to each child and shortens or lengthens the test to achieve either the desired precision or a preassigned stopping rule based on a maximum number of items. In our previous work,8, 9 we showed the utility of CAT assessments in pediatric health care applications by building unidimensional CATs (U-CATs) for mobility and self-care functional skills (physical functioning) scales of the Pediatric Evaluation of Disability Inventory (PEDI).

We were interested in determining if a multidimensional CAT (M-CAT) model might be a more parsimonious and efficient approach toward administering the 2 PEDI subdomains of physical functioning. M-CAT models have been advocated in educational and credentialing applications when the 2 subdomains are highly correlated yet still define concepts for which information on each component is desired separately.10, 11, 12, 13 M-CAT applications have not been widely used at this point in time in health care assessments, mainly because of the complexity of the algorithms and the lack of available software programs for analyses. Many health and functional concepts, however, appear to have more than 1 dimension; hence, there is interest in exploring these methods. One notable example is the multidimensional CAT developed by Gardner et al,14 who created a M-CAT screening assessment to identify a broad range of mental health problems in pediatric primary-care clinics.

Multidimensional item-response theory (IRT) models provide important flexibility by allowing information about other test domains into the estimation of item difficulty levels and person scores.15 However, to date, most demonstrations of the advantages of M-CAT have been conducted on data-generated simulations (data are created with known item and sampling properties) rather than on real datasets (retrospective patient data) using “empirical simulations.” Wang and Chen,16 for example, recently conducted a series of data-generated simulations to test the potential advantages of M-CAT over 2 or more separate unidimensional CATs. Under conditions in which subdomains were highly correlated, fewer items were needed with the M-CAT condition to obtain person scores with a similar level of accuracy and precision than if using a unidimensional model.

Empirical simulations, in contrast, use item responses from previously collected data to simulate, in this case, a computer adaptive testing program, as new responses are fed into the CAT program based on how persons answered items from the previously completed functional assessment. In the present research, we used an empirical simulation approach for investigating the merits of M-CAT by using the complete set of the actual item responses of parents reporting about their child’s physical functioning in both well clinics and health care settings. As items were selected for administration to a child by the CAT software, responses were extracted from the child’s actual dataset.

The main objective of our study was to determine if M-CAT improves the accuracy and precision of person score estimates over separate U-CAT administrations of each physical functioning (mobility, self-care) domain and if fewer items are needed with M-CAT than U-CAT for similar levels of precision and accuracy. The results of this main objective are especially pertinent when there is a practical restriction on the number of items that would be administered in a busy clinical environment. A secondary objective was to compare both the accuracy and precision of both the M-CAT and U-CAT, which select items based on previous responses, in comparison to a purely random draw of items.

Back to Article Outline

Methods 

Participants 

This study used secondary data from normative samples (both the original standardization [n=412] and expanded-age samples [n=378]) and a clinical sample (n=469) drawn from Franciscan Hospital for Children. Specifics of the sampling and demographics of this original standardization cohort are provided in detail in the PEDI administration and standardization manual.17 The original functional skills scales of the PEDI contain 73 self-care and 59 mobility items. We developed the multidimensional CAT model on only the self-care and mobility items of the PEDI because both domains, although recognized as distinct, have traditionally been highly correlated and can be conceptualized as also part of a broader concept of physical functioning.

We combined this original standardization sample with an expanded-age sample to increase the age range of the instrument up through 14 years of age.18 The expanded-age sample included 378 healthy children for whom data were collected by parent proxy. Details of the sampling procedure and data-collection procedures are reported elsewhere.18 For this new cohort, 50 self-care and 100 mobility items were added to the original PEDI to increase the functional level through approximately 14 years and to create smaller skill increments between items to improve scoring precision and sensitivity to change. Full details of new item development strategies are provided in an earlier report.19 The full normative sample in this study includes the original and expanded-age sample of the PEDI (n=790).

A clinical sample of 469 children and youths (age range, 6–17y) who had received inpatient, outpatient, or school-based rehabilitation services at Franciscan Hospital for Children was also included. The clinical sample was administered the original PEDI with 73 self-care and 59 mobility items. Approximately 48% of the children in the clinical sample had congenital or inherited diseases, 21% had growth and maturation disorders, and 31% had acquired conditions, of which the majority was diagnosed with traumatic head and extremity injuries. Thus, the full sample for these analyses was n equal to 1259. See figure 1 for a schematic of the combined sample for this study. Large sample sizes are needed for multidimensional modeling; thus, for this study, we combined samples who had taken at least the core 73 self-care and 59 mobility PEDI items. The new items (50 self-care, 100 mobility) were administered only to the expanded-age sample. The institutional review boards at Boston University and Franciscan Hospital for Children approved the study.

Unidimensionality and Local Independence 

We examined unidimensionality by using both substantive and statistical criteria.13 We asked 8 content experts (physical and occupational therapists each with over 10 years of clinical experience) to categorize the full list of mobility and self-care items into 1 of the 2 content categories. Over 91% of the 282 items were categorized as hypothesized into either mobility or self-care subdomains by all 8 experts. In all of the remaining items, the majority of experts correctly classified the items into the a priori mobility or self-care content subdomains.

Both exploratory (EFA) and confirmatory factor analysis (CFA) supported a 2-factor model. To maximize the unique variance of the common factors, we used the principal axis method of EFA (using polychoric correlations) for the initial extraction of factors, followed by an oblique (promax) rotation.20 The subsequent factor solution, consistent with the break in the magnitude of the eigenvalues in the scree plot, revealed 2 factors. The correlation between the obliquely rotated self-care and mobility factors was r equal to .70.

Using the same data, we also performed separate CFAs on both the self-care and mobility subdomains. Model fit was evaluated using the comparative fit index (CFI),21 the Tucker-Lewis index (TLI),22 and the root mean square error of approximation (RMSEA).23, 24 Values of CFI and TLI greater than .90 are indicative of good model fit25; RMSEA values lower than 0.1 reflect adequate fit.25 Both the self-care (CFI=.997, TLI=.997, RMSEA=.075) and mobility (CFI=.994, TLI=.993, RMSEA=.090) fit indices were all within acceptable ranges.

Furthermore, we found that less than 9% of the residual correlations (correlations between pairs of items using the item residuals after the 2 factors were extracted from the data) for self-care and mobility items were greater than 0.1. This finding means the scale has relatively few items that may be locally dependent. Locally dependent items occur when the response to any one item is necessarily dependent on the response to a previous item, a violation of a fundamental IRT assumption. The level of item dependency in these data is considered too slight to affect the performance of a CAT program.26

Self-Care and Mobility Item Banks 

The functional mobility and self-care items on the PEDI are scored on a dichotomous scale (capable/unable). We conducted the unidimensional and multidimensional Rasch analyses through the multidimensional random coefficients multinomial logit model27 as implemented in the ConQuest software package.28,a Following the notation of Wang and Chen,16 the probability of a response for person n on item i in category k may be expressed as

where Xnik is 1 if the response by person n to item i is in category k and is 0 otherwise, Ki is the number of categories (2, in this case) in item i, ξ is a vector of item difficulty parameters, θn is person n’s level on the D latent traits (self-care, mobility) describing their test performance, b′ik is a scoring vector for category k of item i across the D latent traits, and a′ik is a design vector for category k of item i describing the linear relationship among the elements of ξ (ie, which items belong to which trait). The scoring vector b′ik and design vector a′ik were specified for a compensatory multidimensional random coefficients multinomial logit model for dichotomous scoring of a between-item multidimensional test.16

The compensatory model specifies the probability of a given response as a simple linear combination of the latent traits in contrast to some nonadditive function linking the 2 traits. A between-item multidimensional test (ie, the PEDI) is one in which items are intended to measure an overall physical functioning latent trait, whereas the test consists of more than 1 latent trait (ie, self-care, mobility). The 2 latent traits for self-care and mobility were estimated simultaneously in the M-CAT situation.

We assessed model fit using the information-weighted mean square (the infit mean square) statistic, which is not sensitive to extreme unexpected responses. It has a mean of 1; higher values indicate poor fit resulting from unexpected responses, and lower values indicate responses that fit the model predictions more closely than would be expected by chance. A range in values of 0.7 to 1.4 is generally deemed acceptable,29 but it has no known exact distribution. Hence, investigator experience always plays a role in its interpretation.

Before finalizing the item bank for self-care and mobility, we examined differential item functioning (DIF) between the normative and clinical samples using a logistic regression model.30 DIF is the extent to which persons in different groups but at the same level of functioning have different probabilities of success on an item, an undesirable characteristic of an item. Based on the item fit and DIF analysis, we removed 4 self-care and 3 mobility items that exceeded tolerance limits of infit and DIF. We retained a final item bank of 119 self-care and 156 mobility items. All score estimates were transformed to a standard scale mean of 50 with a standard deviation (SD) of 10.

CAT Empirical Simulations 

Following the work of Wang and Chen,16 the CAT simulation software for both the U-CAT and M-CAT programs uses the Segall formulation11 for Bayesian modal estimation of latent traits and subsequent adaptive item selection. Item selection depended on responses to earlier items in the test, and responses to each item were taken from the empirical data collected during the assessment of children’s physical functioning. In using actual data to estimate the CAT scores, we assume that persons respond in much the same way to items regardless of their testing context (ie, item order or the number of items administered should not affect how a person responds to items).

Because the correlation between the self-care and mobility subdomains was approximately r equal to .70, we assumed that the prior distribution of the latent traits representing self-care and mobility was multivariable normal with a mean vector equal to 0, variance equal to 1, and covariance equal to 0.7. At each step of item selection, the Bayesian modal estimation procedure estimates the latent trait level that maximizes the posterior distribution based on the current likelihood of the data and the assumed prior distribution.11 A Newton-Raphson procedure then drives the iterative convergence process. The final step locates the item that maximizes the determinant of the information matrix at the provisional latent trait level. Hence, each selected item is matched as close as statistically possible to the person’s estimated level of functioning. The CAT algorithms and software were developed at the Health and Disability Research Institute, Boston University.

Theoretically, M-CAT uses the correlation between the 2 latent traits to obtain more accurate and precise person estimates (smaller standard errors [SEs]) than in a unidimensional condition, which is constrained by measuring only 1 latent trait at a time. To test this hypothesis, we conducted 2 comparative analyses of the M-CAT–simulation procedure. First, we conducted an empirical simulation by using a U-CAT model for the self-care and mobility subdomains. In the U-CAT simulation, the standard normal distribution was used as the prior distribution and the 2 latent traits were estimated separately, 1 latent trait at a time in 2 separate CAT runs. In both U-CAT and M-CAT conditions, the item calibrations were identical. Second, we developed multidimensional (M-RAN) and unidimensional (U-RAN) model simulations, in which items selected during the empirical simulation were chosen by a random item generator.31 This type of comparison performed by using a random item selection process provides an important baseline for evaluating CAT performance.12 Theoretically, the U-CAT and the M-CAT simulations should generate more accurate and precise estimates than their respective U-RAN and M-RAN simulations. To examine differences in accuracy and precision over a wide range of administration conditions, we established item-stop rules of 3, 5, 10, 15, and 20 items per content dimension. The intent here was to mimic a clinical situation in which only a limited number of items could be administered.

Analyses of Accuracy and Precision 

Accuracy and precision should be optimized in a CAT application when the content of items is matched to the child’s functional level.31, 32 We defined accuracy as the level of correspondence (using a Pearson product moment correlation) between CAT-based person estimate and the “best” possible estimate on the full set of items answered by each parent responding to questions about his/her child’s level of physical functioning. Two best estimates were used as a criterion-standard frame of reference, the “best” person score estimate based on a unidimensional IRT model (U-IRT) and the “best” person score estimate based on a multidimensional IRT model (M-IRT). We calculated correlations between the M-CAT, M-RAN, U-CAT, and U-RAN estimates and the M-IRT and U-IRT “best” person estimates for various item-stop rules (3, 5, 10, 15, 20) items per domain. In the case of the U-CAT, each set of items per domain was administered separately, whereas the self-care and mobility items in the M-CAT were administered together.

We defined precision as the average SE associated with person estimates.31 We calculated average SEs for item-stop rules of 3, 5, 10, 15, and 20 items per domain. Because CAT-based procedures may differentially improve precision more in extreme person estimates ranges than in the middle of the distribution,33 we compared precision results at fixed-stop rules of 3, 5, 10, 15, and 20 items per domain for persons whose M-IRT and U-IRT estimates decreased below 1 SD of the mean score (low range), between –1 and 1 SDs of the mean estimate (mid-range), and above 1 SD of the mean estimate (high range). Finally, we calculated the root mean square error (RMSE)34 between the person score estimated by the CAT and the “best” person score estimated by the full item set for both the unidimensional and multidimensional models. This statistic shows how precise the CATs estimate person scores are relative to the “best” estimates from the full item set. The RMSE is calculated as:

where N is the number of subjects, is the person score estimated by the CAT, and is the person score estimated by the full item set (set as reference).

Back to Article Outline

Results 

The correlations between the 4 sets of empirical estimates and their respective “best” estimates are plotted in Fig 2, Fig 3. Both the U-CAT and M-CAT correlations of person scores with the full item set (by using their own respective models as a frame of reference) are high and positive for both self-care and mobility subdomains. Even for 5 items per domain, the correlations of both U-CAT and M-CAT are above .92, and with 10 items per domain, the correlations are all above .95. In addition, the U-CAT and the M-CAT estimates are always more accurate than their respective random item selection comparisons.

  • View full-size image.
  • Fig 2. 

    PEDI self-care domain. Correlations of self-care CATs (unidimensional, multidimensional) and random person scores as compared with unidimensional full item set (left axis) and multidimensional full item set (right axis) as a function of number of items administered.

  • View full-size image.
  • Fig 3. 

    PEDI mobility domain. Correlations of mobility CATs (unidimensional, multidimensional) and random person scores as compared with unidimensional full item set (left axis) and multidimensional full item set (right axis) as a function of number of items administered.

The person score SEs, as a function of the number of items per domain, are plotted in Fig 4, Fig 5. The SEs for the M-CAT are always lower at each item stop rule than the U-CAT. In addition, the U-CAT and the M-CAT SEs are always more precise than their respective random item selection comparisons. In table 1, we highlight that the SEs are always less with the M-CAT than the U-CAT across the entire range of self-care and mobility person scores.

  • View full-size image.
  • Fig 4. 

    PEDI self-care domain. Average self-care SEs for unidimensional and multidimensional CAT models and corresponding random items selection models as a function of items administered.

  • View full-size image.
  • Fig 5. 

    PEDI mobility domain. Average mobility SEs for unidimensional and multidimensional CAT models and corresponding random items selection models as a function of items administered.

Table 1. SEs for Person Scores at 3 Levels of Self-Care and Mobility Scales for U-CAT and M-CAT Programs
Average SEs
Self-Care DomainLow Range (n=246)Mid Range (n=802)High Range (n=221)
CAT Stop RulesU-CATM-CATU-CATM-CATU-CATM-CAT
3 items per domain (U-CAT-3, M-CAT-6).83.72.83.71.83.73
5 items per domain (U-CAT-5, M-CAT-10).74.63.73.61.74.65
10 items per domain (U-CAT-10, M-CAT-20).71.52.61.49.62.55
15 items per domain (U-CAT-15, M-CAT-30).54.48.53.42.56.48
20 items per domain (U-CAT-20, M-CAT-40).51.46.48.39.52.45
Average SEs
Mobility DomainLow Range (n=251)Mid Range (n=768)High Range (n=240)
CAT Stop RulesU-CATM-CATU-CATM-CATU-CATM-CAT
3 items per domain (U-CAT-3, M-CAT-6).77.72.81.72.81.73
5 items per domain (U-CAT-5, M-CAT-10).69.63.72.62.73.65
10 items per domain (U-CAT-10, M-CAT-20).57.52.59.50.61.55
15 items per domain (U-CAT-15, M-CAT-30).52.47.52.44.55.51
20 items per domain (U-CAT-20, M-CAT-40).49.44.48.41.51.49

The RMSE estimates of the congruence between the full item set “best” person scores and the U-CAT and M-CAT person scores are summarized in table 2. The number of items for the U-CAT is displayed as the number of items per domain, whereas the M-CAT items are listed as the total number of items across the mobility and self-care subdomains. The RMSEs are lower for corresponding total number of items for the M-CAT estimates than for the U-CAT estimates. For example, note the boldface rows. We achieved the same or better person score estimation with 20 items of the M-CAT than with 15 items per domain or a total of 30 items with the U-CAT (15 items per subdomain). Overall, the M-CAT uses between 25% to 40% fewer items than the U-CAT to accomplish the same level of accuracy in person score estimates.

Table 2. RMSE Between the CAT Estimation and the Full Item Set Estimation
U-CATM-CAT
No. of Items for Each DomainSelf-CareMobilityNo. of Total M-CAT ItemsSelf-CareMobility
32.882.8962.472.3
52.452.44101.921.77
101.651.64201.060.93
151.111.10300.640.55
200.720.71400.450.40

Back to Article Outline

Discussion 

The results of these simulation analyses revealed that M-CAT models yield accurate and precise estimates of physical functioning skills in children. We are encouraged that in every comparison, the M-CAT, as predicted, shows more accurate and precise scoring estimates than its unidimensional comparison and by far exceeds the performance of random item selection. These results are consistent with simulation studies in other fields that have examined the potential accuracy and precision advantages of the M-CAT.10, 11, 16 These advantages appear to be optimal when the number of items is restricted by practical considerations and the correlation between subdomains is high enough to include both subdomains in the model.10, 15

As instruments based on IRT models become used more routinely in health care, a concern has been noted regarding the requirement to adhere to a unidimensional set of items.35 Meeting strict unidimensionality assumptions within complex health and functional constructs is difficult to achieve and may be seen to restrict the future use of IRT testing models. Yet, many important health and functional concepts may be comprised of different abilities or composites of abilities.13, 16 The multidimensional IRT models provide flexibility for allowing more than 1 subdimension to be tested at the same time. M-CAT models may provide an optimal approach to assessing content across related subdomains, providing both adequate content coverage and a spectrum of item difficulty levels that may not be possible with unidimensional IRT models.11 Clinicians may find that scores on multiple domains are preferable to an overall aggregate score across domains that loses its meaning and interpretability in patient diagnoses and care planning.14 An additional advantage of the M-CAT approach may be the variation in content exposure that is part of the item-selection process. Items that come from different aspects of a functional instrument may be more appealing to the user and provide sufficient variety in content to keep the assessment process interesting.16

The efficiency gains of the M-CAT were substantial (25%−40% fewer items than U-CAT). We are currently using the 20-item M-CAT stop rule for prospective work because we found that the 20-item M-CAT provides us with good estimation of individual scores and is just as accurate and precise as collecting 30 total items (15 per dimension).36 The favorable results of the M-CAT in this simulation study should be generalized to prospective studies with appropriate caution. The same data are used to estimate the following: (1) the results of the factor analysis, (2) the IRT item parameters, (3) the score estimations, and (4) performing the real data simulations with the CATs. Thus, an IRT model that fits the dataset may also make the CAT look more precise and accurate than anticipated. This likely enabled both the U-CAT and M-CAT to be more efficient than is to be expected in real clinical situations in which CATs are administered at the point of clinical contact. Prospective studies performed by using the CAT software are needed in the future to estimate scores from an independent sample. We have found, however, that simulation studies only marginally overestimate actual CAT performances9; thus, we expect to see the same type of favorable results with the M-CAT in future prospective use.

A number of limitations of the study should be mentioned. First, we used a sample in which all persons were administered the core item set of the PEDI (73 self-care and 59 original mobility items), but only a subgroup (expanded-age sample) were administered the new items. Ideally, all persons would complete all items; however, this was not feasible in conducting this retrospective study. IRT procedures have been shown to be robust in the face of missing data, and in the past we have been able to develop stable IRT models in the presence of significant missing data.8, 19 By including only the self-care and mobility domains of the PEDI in this multidimensional work, we do not preclude the development of a separate CAT program for the third domain of the PEDI, social function. The social function domain of the PEDI was not examined in this article, although our suspicion is that because social function does not have high correlations with the self-care and mobility domains of the PEDI, social function may not work as well under the multidimensional model than the self-care and mobility domains that are part of a broader physical function concept. The results of this study are limited to implications for estimating summary scores for children’s self-care and mobility function. Prospective work is planned to see if these CAT models are effective in examining sensitivity to change. In the future, it will be necessary to develop means of interpreting these scores for clinicians, either by using concepts of minimal detectable change or by using some form of item map analysis to identify the specific content meaning of summary scores for individual children.

Back to Article Outline

Conclusions 

Multidimensional IRT models using computer adaptive testing appear to be an efficient and precise method to estimate person scores. These models appear to have most promise when subdomains are highly correlated, accurate and precise estimates of person scores are required, and respondent burden needs to be minimized.

Suppliers

Back to Article Outline

Acknowledgment 

The contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health.

Back to Article Outline

References 

  1. Ziviani J , Ottenbacher K , Shephard K , Foreman S , Astbury W , Ireland P . Concurrent validity of the Functional Independence Measure for children (WeeFIM) and the Pediatric Evaluation of Disabilities Inventory in children with developmental disabilities and acquired brain injuries . Phys Occup Ther Pediatr . 2001;21:91–101
  2. Ottenbacher KJ , Msall ME , Lyon N , et al.   Functional assessment and care of children with neurodevelopmental disabilities . Am J Phys Med Rehabil . 2000;79:114–123
  3. Ostensjo S , Carlberg EB , Vollestad NK . Everyday functioning in young children with cerebral palsy (functional skills, caregiver assistance, and modifications of the environment) . Dev Med Child Neurol . 2003;45:603–612
  4. Msall ME . Tools for measuring daily activities in children (promoting independence and developing a language for child disability) . Pediatrics . 2002;109:317–319
  5. Lollar DJ , Simeonsson RJ , Nanda U . Measures of outcome for children and youth . Arch Phys Med Rehabil . 2000;81(12 Suppl 2):S46–S52
  6. Revicki DA , Cella DF . Health status assessment for the twenty-first century (item response theory, item banking and computer adaptive testing) . Qual Life Res . 1997;6:595–600
  7. Wainer H . Computerized adaptive testing (a primer) . Mahwah: Lawrence Erlbaum Associates; 2000;
  8. Haley SM , Ni PS , Fragala-Pinkham MA , Skrinar AM , Corzo D . A computer adaptive testing approach for assessing physical functioning in children and adolescents . Dev Med Child Neurol . 2005;47:113–120
  9. Haley SM , Raczek AE , Coster WJ , Dumas HM , Fragala-Pinkham MA . Assessing mobility in children using a computer adaptive testing version of the Pediatric Evaluation of Disability Inventory (PEDI) . Arch Phys Med Rehabil . 2005;86:932–939
  10. DeMars C . Measuring higher education outcomes with a multidimensional Rasch model . J Appl Meas . 2004;5:350–361
  11. Segall D . Multidimensional adaptive testing . Psychometrika . 1996;61:231–354
  12. Luecht R . Multidimensional computerized adaptive testing in a certification or licensure context . Appl Psychol Meas . 1996;20:389–404
  13. Ackerman TA , Gierl MJ , Walker CR . An NCME instructional module on using multidimensional item response theory to evaluate educational and psychological tests . Educ Meas Issues Pract . 2003;37–53 Fall:
  14. Gardner W , Kelleher KJ , Pajer KA . Multidimensional adaptive testing for mental health problems in primary care . Med Care . 2002;40:812–823
  15. de la Torre J , Patz R . Making the most of what we have (a practical application of multidimensional item response theory in test scoring) . J Educ Behav Stat . 2005;30:295–311
  16. Wang WC , Chen PH . Implementation and measurement efficiency of multidimensional computerized adaptive testing . Appl Psychol Meas . 2004;28:295–316
  17. Haley SM , Coster WJ , Ludlow LH , Haltiwanger JT , Andrellos PA . Pediatric Evaluation of Disability Inventory (development, standardization and administration manual) . Boston: Trustees of Boston Univ; 1992;
  18. Haley SM , Fragala-Pinkham MA , Ni P , Skrinar AM , Kaye EM . Pediatric physical functioning reference curves . Pediatr Neurol . 2004;31:333–341
  19. Haley S , Fragala MA , Aseltine R , Ni PS , Skrinar AM . Development of a disease-specific instrument for Pompe disease . Pediatr Rehabil . 2003;6:77–84
  20. Muthen B , Muthen L . Mplus user’s guide . Los Angeles: Muthen & Muthen; 2001;
  21. Bentler P . Comparative fit indices in structural models . Psychol Bull . 1990;107:238–246
  22. Tucker L , Lewis C . A reliability coefficient for maximum likelihood factor analysis . Psychometrika . 1973;38:1–10
  23. Steiger JH , Lind J . Statistically-based tests for the number of common factors . Iowa City (IA): The Psychometric Society; 1980; Presented to: May 30;
  24. March H , Balla J , Hau K . An evaluation of increment fit indices (a clarification of mathematical and empirical properties) . In:  Marcoulides G ,  Schumaker RE editor. Advanced structural equation modeling (issues and techniques) . Mahwah: Lawrence Erlbaum; 1996;p. 315–353
  25. Browne M , Cudeck R . Alternative ways of assessing model fit . In:  Long K editors. Testing structural equation modeling . Thousand Oaks: Sage; 1993;
  26. Yen WM . Scaling performance assessments (strategies for managing local item dependence) . J Educ Meas . 1993;30:187–213
  27. Adams RJ , Wilson MR , Wang WC . The multidimensional random coefficients multinomial logit . Appl Psychol Meas . 1997;21:46–75
  28. Wu ML , Adams RJ . ConQuest (computer software and manual) . Melbourne: Australian Council for Educational Research; 1998;
  29. Bond TG , Fox CM . The question of model fit . In:  Bond T ,  Fox C editor. Applying the Rasch model (fundamental measurement in the human sciences) . Mahwah: Lawrence Erlbaum Associates; 2001;p. 173–187
  30. Zumbo B . A handbook on the theory and methods of differential item functioning (DIF) . Ottawa: Directorate of Human Resources Research and Evaluation; 1999;
  31. Luecht R . Computer-adaptive testing . In:  Everett B ,  Howell D editor. Encyclopedia of statistics in behavioral science . New York: Wiley; 2004;
  32. Hambleton R , Zaal J . Advances in educational and psychological testing (theory and applications) . Boston: Kluwer Academic; 1991;
  33. Haley SM , Coster WJ , Andres PL , Kosinski M , Ni PS . Score comparability of short forms and computerized adaptive testing (simulation study with the activity measure for post-acute care) . Arch Phys Med Rehabil . 2004;85:661–666
  34. Wang S , Wang T . Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing . Appl Psychol Meas . 2001;25:317–331
  35. Cook KF , Monahan PO , McHorney CA . Delicate balance between theory and practice (health status assessment and item response theory) . Med Care . 2003;41:571–574
  36. Haley S , Ni P , Fragala-Pinkham M , Skrinar AM , Corzo D . A computer adaptive testing approach for assessing physical functioning in children and adolescents . Dev Med Child Neurol . 2005;47:113–120
  • a Conquest; Australian Council for Educational Research (ACER) Ltd, 19 Prospect Hill Rd, Camberwell, Melbourne, Victoria, 3124, Australia.

 Supported by the National Institute for Child Health and Development, National Institutes of Health (independent scientist award no. K02 HD45354-01) and Genzyme Corporation.A commercial party having a direct financial interest in the results of the research supporting this article has conferred or will confer a financial benefit upon the author or 1 or more of the authors. Haley has a stock interest in CRE Care LLC, which distributes the Pediatric Evaluation of Disability Inventory products.

PII: S0003-9993(06)00470-9

doi:10.1016/j.apmr.2006.05.018

Archives of Physical Medicine and Rehabilitation
Volume 87, Issue 9 , Pages 1223-1229, September 2006