Archives of Physical Medicine and Rehabilitation
Volume 87, Issue 8 , Pages 1033-1042, August 2006

Computerized Adaptive Testing for Follow-Up After Discharge From Inpatient Rehabilitation: I. Activity Outcomes

  • Stephen M. Haley, PhD, PT

      Affiliations

    • Health and Disability Research Institute, Boston University, Boston, MA
    • Corresponding Author InformationReprint requests to Stephen M. Haley, PhD, PT, Health and Disability Research Institute, Boston University, 53 Bay State Rd, Boston, MA 02215
  • ,
  • Hilary Siebens, MD

      Affiliations

    • Department of Rehabilitation Medicine, University of Virginia, Charlottesville, VA
  • ,
  • Wendy J. Coster, PhD, OTR

      Affiliations

    • Department of Occupational Therapy and Rehabilitation Counseling, Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, MA
  • ,
  • Wei Tao, BS

      Affiliations

    • Health and Disability Research Institute, Boston University, Boston, MA
  • ,
  • Randie M. Black-Schaffer, MD, MA

      Affiliations

    • Spaulding Rehabilitation Hospital and the Department of Physical Medicine and Rehabilitation, Harvard Medical School, Boston MA
  • ,
  • Barbara Gandek, MS

      Affiliations

    • Health Assessment Lab, Waltham, MA
  • ,
  • Samuel J. Sinclair, MEd

      Affiliations

    • Health Assessment Lab, Waltham, MA
  • ,
  • Pengsheng Ni, MD, MPH

      Affiliations

    • Health and Disability Research Institute, Boston University, Boston, MA

Article Outline

Abstract 

Haley SM, Siebens H, Coster WJ, Tao W, Black-Schaffer RM, Gandek B, Sinclair SJ, Ni P. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes.

Objective

To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home.

Design

Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit.

Setting

Follow-up visits conducted in patients’ home setting.

Participants

Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions.

Interventions

Not applicable.

Main Outcome Measures

Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66).

Results

AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77–.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients’ own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval.

Conclusions

Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time.

Key Words:  Outcome assessment (health care) , Psychometrics , Rehabilitation

 

COMPUTERIZED ADAPTIVE TESTING (CAT) has been proposed as an alternative to fixed-format instruments that traditionally have been applied to monitoring functional progress in rehabilitation programs.1, 2, 3 In contrast to an assessment in which all items must be scored for every person, CAT4 selects only questions that are appropriate to a person’s functional level based on previous responses and skips items that are obviously too easy or too hard.5 In previous work, investigators have examined the potential of CAT for group-level adult rehabilitation assessments by conducting empirical simulations. These simulations use item responses from previously collected data to replicate a CAT session by using the most informative items (items that have good discrimination and represent a level of functioning that is near the person’s functional ability) for each individual. The simulations clearly indicate that CAT software has potential to minimize response burden and produce accurate group-level scores in samples from a general rehabilitation population,2, 6, 7 and in persons with stroke,8 medically complex conditions,9 lower-extremity functional deficits,10 and persons with chronic headaches.11 These simulation studies, however, may tend to overestimate CAT results in real patient care settings because the same item responses used to estimate the item parameters also are being used to estimate person scores.

As a next step, prospective studies of CAT that replicate prospective outcome assessment conditions in patient care environments are needed to further evaluate the accuracy, validity, and responsiveness of the activity measures generated using CAT software. Reports of prospective CAT applications have been much less common than simulation studies in the field of rehabilitation to date, and those prospective studies have had relatively small samples. For example, Ware et al2 conducted a cross-sectional prospective pilot study of 20 adult rehabilitation patients using a CAT with a selected set of physical functioning content, and reported a high level of agreement with an alternative short-form and improved discriminant validity over the comparison short-form. In a small sample of children with and without disabilities using a CAT program measuring physical functioning, the CAT program was found to approximate closely the discriminant validity and scoring estimates of the full instrument.12 In the only prospective longitudinal study of CAT conducted in a rehabilitation environment of which we are aware, Haley et al13 found that both the full-length functional mobility instrument and the CAT version were able to detect statistically significant functional changes during a 16-week fitness intervention for children with disabilities, with a large reduction in test burden as compared with the full-length instrument.

One of the largest assessment challenges in rehabilitation is conducting follow-up of patients once they have returned back to the community. The predominant functional assessment system used in rehabilitation, the FIM instrument, may not be the most effective measure for long-term follow-up of patients who have been discharged from inpatient settings.14 Recent work by Coster et al15 using patient-reported outcomes suggests that a short-form version of the Activity Measure for Post-Acute Care (AM-PAC)16 may be more sensitive than the FIM in assessing functional gains and losses once a patient returns to the community. The AM-PAC17 is a patient-reported outcome system that includes 3 functional activity domains: movement & physical,6 personal care & instrumental,18 and applied cognitive.19 Our research group has built a CAT version of the AM-PAC and undertaken testing of its psychometric properties compared with a fixed-format version in a prospective follow-up study of patients who were recently discharged from inpatient rehabilitation.

CAT applications require: (1) a large set of items (item banks) that are empirically calibrated with fixed item parameters (difficulty, discrimination) for each functional area of interest, (2) items that scale consistently along a dimension of low to high functional proficiency and that target the range of functional ability in the intended sample, and (3) rules that guide starting, stopping, and scoring procedures. Item response theory (IRT) methods are used to create hierarchically ordered item pools, and then software algorithms select items to match the person’s functional level. We have built and tested CAT software based on earlier versions of the AM-PAC item bank.2, 7 In the current study, we have revised some items from previous work and have collected new item calibration data for each of the 3 AM-PAC domains.

Our objective in the current project was to examine essential psychometric properties of CAT scores used to measure functional recovery of patients who have recently been discharged from inpatient rehabilitation. We examined the agreement between CAT-generated scores (AM-PAC-CAT) and those derived from a 66-item fixed-length form of the AM-PAC (AM-PAC-66), and we report the amount of time needed for each assessment format. Second, we examined the ability of CAT scores to discriminate among patient groups classified using a known severity index. Finally, we examined the sensitivity of the CAT to detect changes and examined its responsiveness in relation to patient-centered estimates of their own functional changes during the follow-up period. We expected that the advantage of a CAT-based assessment of activity outcomes for follow-up after inpatient discharge would be reduced respondent burden with only marginal losses in score accuracy, discriminant validity, sensitivity, and responsiveness as compared to the longer fixed-length form assessment.

Back to Article Outline

Methods 

Sample 

This study is a longitudinal, prospective 1-group cohort study of patients followed within approximately 2 weeks of hospital discharge and then 3 months later. The final sample of 94 (mean age ± standard deviation [SD], 61.7±17.0y; range, 20−90y) patients were recently discharged from the inpatient rehabilitation program at a major rehabilitation hospital (designated as a long-term acute care facility). The average inpatient rehabilitation hospital length of stay (LOS) for the study sample was 28.6±30.2 days. The 94 study participants completed both the initial and follow-up assessments once they returned home. To achieve the final study sample, we originally recruited 149 patients, of whom 111 completed the initial home visit approximately 2 to 6 weeks after hospital discharge and 94 completed the final follow-up visit approximately 3 months after the initial home visit. For the final sample, the average time between hospital discharge and the initial visit was 22.2±11.8 days (range, 5−42d) and the average time between visits was 87.1±27.4 days (range, 29−185d).

Eligibility criteria included: 18 years of age or older, receiving inpatient rehabilitation services at the time of recruitment, ability to speak English, and having a planned discharge back home. Participants also needed to pass a cognitive screen20 to assure that each person could report reliably on their own functional status. In addition, patients were excluded if the facility recruiter judged that they were unable to give informed consent based on information in the medical record and/or discussions with treating clinicians. Specifically, the presence of any of the following criteria indicated ineligibility: (1) any orientation deficit, (2) difficulty remembering the day’s events, or (3) receptive or expressive communication deficits that precluded the patient from communicating responses reliably (verbally or nonverbally). The final study sample was stratified to include approximately equal numbers of subjects in 3 major patient groups: (1) 38.7% with neurologic disorders (eg, stroke, multiple sclerosis, Parkinson’s disease, brain injury, spinal cord injury, neuropathy); (2) 33.3% with musculoskeletal disorders (eg, fractures, joint replacements, orthopedic surgery, joint or muscular pain); and (3) 28.0% with medically complex disorders (eg, debility resulting from illness, cardiopulmonary conditions, postsurgical recovery). To assure good representation of levels of functional severity, recruitment was also stratified to yield a distribution of subjects representing 2 distinct severity levels—slight to mild (41.5%) and moderate to severe (58.5%), based on scores from an adapted Modified Rankin Scale (MRS),21 in which we converted the original 4 categories into 2. This was done so we could make meaningful comparisons in severity groups with the available sample size. The sample was heterogeneous and reflects the racial and ethnic distribution of the recruitment site, although we did make a strong effort to over-recruit minorities. See table 1 for a full description of the demographic characteristics of the final study sample. Eighty-three percent of the patients were receiving some form of home or outpatient skilled rehabilitation services at the time of the initial visit, and 55% continued to receive services at the follow-up visit. The institutional review boards of Boston University and the recruitment facility approved the study and all persons signed informed consent forms prior to participation.

Table 1. Demographic Characteristics of Final Study Sample (N=94)
CharacteristicsValues
Mean age ± SD (y)61.7±17.0
Sex (%)
Male45.2
Female54.8
Marital status (%)
Married67.6
Race (%)
White82.6
Black/African American14.1
Asian3.3
Education (%)
High school or less44.1
Bachelor/certificate40.9
Graduate degree15.1
Impairment group (%)
Neurology38.7
Orthopedics33.3
Medically complex28.0
MRS (%)
Severe/moderate58.5
Mild41.5
Living with (%)
Alone25.5
Spouse/partner28.7
Family42.6
Nonfamily3.2
Living location (%)
House54.3
Apartment/condominium38.3
Senior housing4.3
Assisted living3.2
Problems with (%)
Eyesight22.3
Hearing13.8
Speech13.8
Thinking/understanding/remembering14.9
Use of legs80.9
Use of arms36.2
Grasping and use of fingers38.3
Walking79.8
Mean FIM discharge ± SD
Motor score77.50±9.15
Cognitive score34.09±1.79
Total score111.59±9.30

AM-PAC Item Banks 

We developed the item banks used to build the CAT in this study from a calibration study that included 535 patients from inpatient and transitional care rehabilitation units (48%), outpatient (26%), and home care settings (25%). To reduce response burden, and to avoid administering irrelevant items to patients in either a hospital or community setting, data on core items were obtained on all patients across settings, and data on additional items were collected depending on whether the patient was in an inpatient or community rehabilitation setting. These procedures have been outlined in previous item calibration studies on the AM-PAC.6 SF-8 Health Survey22 data indicated that the health of the item calibration sample based on the physical component summary (mean ± SD, 38.3±10) was below the U.S. population norms (mean, 50±10), although the mental component summary (mean, 49.1±10.8) was consistent with U.S. population norms (mean, 50±10).

The full AM-PAC item pool consisted of 233 activity items, of which 117 movement & physical, 52 personal care & instrumental, and 47 applied cognitive items were retained for the final analyses. The majority of items removed were items that involved use of wheelchairs due to the small number of persons using wheelchairs in the calibration sample; other items were deleted due to poor fit or redundancy of content. Item parameter estimations and model fit tests were conducted using Parscale software.23a A generalized partial credit model was used for all 3 domains, because the data did not support the requirement of common item slopes. However, to obtain convergence for 2 domains (movement & physical, applied cognitive), we used a variation of the generalized partial credit model, in which a 1-parameter model first was estimated on a core set of items completed by all patients (15 for movement & physical, 18 for applied cognition). We then estimated a second 1-parameter model (for each domain) for the remaining items, anchoring on the core items. We evaluated fit to the model based on the comparison of expected and observed values across the distribution of the latent variable, following a method described by Mislevy and Bock24 and adapted with slight modification for use in Parscale. Bonferroni-adjusted P values were used for significance testing. Assumptions of unidimensionality and local independence were evaluated prior to finalizing the item banks using factor analysis of categorical data,25, 26 because violations of these model assumptions can affect the estimation of item information discrimination parameters.27 We estimated IRT-based scores for the item banks using weighted maximum likelihood estimation.28 Weighted maximum likelihood is less biased than maximum likelihood estimation with the same asymptotic variance and normal distribution, and is more accurate than other procedures with a CAT fixed item stop rule.29 The final IRT-based AM-PAC scores were standardized to a mean of 50 and SD of 10 based on the current rehabilitation sample. This was done so that the addition of future new items could be easily integrated onto the same scale metric.

AM-PAC-CAT 

We based the AM-PAC algorithms on the DYNHA softwareb developed at QualityMetric Inc.11, 30 The AM-PAC-CAT was designed to be completed by patients and was administered from a stand-alone laptop computer using a Windows operating system. For each of the 3 separate activity domains, we selected an initial item with a high information function in the middle of the scoring range and content that seemed appropriate for most respondents. We chose to use an item in the middle of the range for the initial item because test information function usually peaks in this range. The response to the first item is fed into the DYNHA engine, and the application calculates a probable score, as well as a person-specific measure of score precision. Rules for stopping each AM-PAC-CAT domain were based on score precision or maximum number of items. If the score is not estimated with sufficient precision, additional questions are selected and administered until the 95% confidence interval (CI) around a (mean, 50±10) score is above the set limit (standard error, ±5) or the defined maximum number of items has been administered (10 per domain).

AM-PAC Short-Forms 

We selected 66 items (AM-PAC-66) from the AM-PAC items banks for inclusion on the AM-PAC short forms to maximize content coverage and information value of items across the range of content within each activity domain. We felt that these 66 items represented all aspects of content and the full range of functional ability expected in the sample. Thus, approximately 20 items per activity domain were selected to approximate the actual IRT latent trait scores estimated by the full set of items in each activity bank. We selected items at or near the extreme score ranges to minimize ceiling and floor effects. Twenty-six items were selected for the movement & physical (22% of the item bank), 20 items for personal care & instrumental (38% of the item bank), and 20 items for applied cognitive (43% of the item bank) domains.

Testing Procedures 

We collected AM-PAC-66 and AM-PAC-CAT data from patient interviews at 2 time points: approximately 2 weeks after discharge from the inpatient rehabilitation hospital and 3 months after the initial home visit. An on-site recruiter at the inpatient hospital explained the study to potential participants, answered any questions, and obtained signed consent forms prior to hospital discharge. The data collector abstracted information from the medical record including basic demographic, medical and diagnostic information, and all FIM inpatient rehabilitation admission and discharge data, as well as selected information from the Uniform Data System for Medical Rehabilitation31 data fields. All data were entered into files on laptop computers without personal identifiers.

Patient interviews were conducted by trained interviewers at the subjects’ current living location. Research staff contacted each subject 1 to 2 weeks before each interview was scheduled to set up a convenient time for the interview. A window of 6 weeks from the due date to be interviewed was applied. Subjects not interviewed within 6 weeks of hospital discharge were dropped from the study. The administration sequence of the AM-PAC-66 and AM-PAC-CAT was alternated systematically to avoid an order effect. Each person enrolled was consecutively assigned a test order pattern, either CAT first, short-form second on initial visit, then short-form first, CAT second on follow-up visit or the reverse pattern, starting with the short-form first on the initial visit. Due to drop-outs, the order pattern of CAT first on the initial visit occurred 52.1% of the time, while the reverse order occurred 47.9%.

We collected the AM-PAC-66 and all other data besides the CAT by interview. During CAT administration, participants viewed the computer screen along with the data collectors. If the person was computer literate and chose to interact with the computer directly, the respondent was asked to enter item responses into the computer by use of mouse or touch pad. Most often the data collector served as the person who used the mouse to record the response for the patient.

At the follow-up visit, and after both CAT and AM-PAC-66 administrations, we asked each person to rate his/her functional status (worse, about the same, better) in each activity domain, since the start of the study, using a standard Likert scale.32, 33, 34 For each domain, the patient first rated him-/herself as worse, about the same, or better compared with 3 months earlier, and then scored the amount of change using a 15-point scale ranging from −7 (a very great deal worse) through 0 (no change) to 7 (a very great deal better). Participants completed all 3 global rating questions at the end of the interview. Each interview lasted about 45 minutes to an hour. We collected the actual time (to the closest minute) required for administration of the AM-PAC-66; the AM-PAC-CAT had an internal clock to track the amount of time and the number of items needed to meet preset levels of precision. Additional data were collected at each visit if time permitted, including participation outcomes using short forms and CAT, the results of which are reported in a companion article.

Analyses 

Intraclass correlations coefficient model 3,1 (ICC3,1)35 between CAT initial visit and follow-up scores and the best-estimate IRT-based latent trait scores from the AM-PAC-66 were calculated to assess the extent to which CAT scores accurately reproduced an estimate of the item bank score based on the 66-item short-form. ICCs were calculated as a ratio of the variance of scores between subjects to the total variance of scores between and among subjects. For group estimates, reliability is considered high if the ICC is greater than .80, substantial if it is between .61 and .80, moderate between .41 and .60, and poor to fair if it is .40 or less.35 The ability of the AM-PAC-CAT, compared with the AM-PAC-66 versions, to discriminate between groups of patients on the basis of severity of disability (adapted MRS score36) was evaluated by a series of independent sample t tests. For this analysis, we found similar patterns in the initial and follow-up data, thus to decrease the number of tests and limit dependencies, we report just the follow-up data across visits for the discriminant validity comparisons.

We used a series of paired t tests to examine differences in sensitivity between the AM-PAC-66 and CAT versions. We defined sensitivity as the amount of positive change detected by an instrument. As no one index of sensitivity appears to be used consistently, we calculated 2 sensitivity indices. The simple effect size (ES) for correlated samples is the average change between initial and follow-up measurements, divided by the SD of the initial measurement.37 The standardized response mean (SRM) is the ratio of mean change to the SD of the change score.38 Within the same sample, the SRM is identical to the Cohen statistic (d) for correlated samples, which is calculated by taking the t statistic and dividing it by the square root of the sample.39 The ES and SRM often yield the same ranking of measures, but the absolute values can be different. The magnitude of ES is dependent on the variability of scores within the initial measurement session. In situations in which the correlation between the initial and follow-up scores is large, the SRM is considerably larger than the ES.40 To compare the equality of the ES and SRM between activity domain pairs of the AMPAC-66 and AM-PAC-CAT, we generated 95% CIs using a total of 5000 bootstrap random samples with replacement. To examine the efficiency and difference in response burden of the CAT, we used a series of 1-sample t tests to examine the difference between the number of items required in the AM-PAC-CAT versus the AM-PAC-66 (fixed), and a series of paired t tests to examine differences in the amount of time needed for the AM-PAC-CAT (internal computer clock) and AM-PAC-66 (timing by test administrators).

Finally, we examined responsiveness using 2 methods. We reserved the term “responsiveness” to mean changes based on an external anchor, in this case, the global ratings of change between initial and follow-up visits provided by the patients. We collapsed data from an original 15-point Likert scale of change that has been used in previous responsiveness studies34 (appendix 1). We grouped the absolute values of the ratings (either worse or better) using the following categories: 0 and ±1 as no change, ±2 to 5 as small to medium change (some), and ±6,7 as large change. We limited our analyses to 3 rather than the customary 4 categories32, 34 because of negligible numbers in the smallest change category. We compared the mean AM-PAC-CAT and AM-PAC-66 scores for each change category. Then, we contrasted the responsiveness of the 2 test formats by producing receiver operator characteristic (ROC) curves41, 42 for detecting at least a small to medium change (either worse or better) based on patient-centered ratings. The construction of paired-ROC curves for the AM-PAC-CAT and AM-PAC-66 involved plotting sensitivity against (1 − specificity) along multiple cutpoints based on the absolute change values of patient scores from either the AM-PAC-CAT or the AM-PAC-66 domains. A series of change cutpoints using absolute values (≈89 per domain) were used to develop the ROC curves. The true positive rate (sensitivity) is the proportion of those patients exceeding each absolute change cutpoint relative to those who reported making at least a small to medium change based on their own global rating. The false positive rate (ie, 1 − specificity) is the proportion of patients not exceeding each absolute change cutpoint relative to those who made at least a small to medium change based on their own ratings. Chi-square tests were conducted to examine if the areas under the ROC curves were different than expected by chance alone (curve that follows the diagonal) and if the area under the curves between paired AM-PAC-CAT and AM-PAC-66 ROC curves were statistically different. When comparing 2 ROC curves constructed on the same individuals, statistical analyses between the curves should take into account the correlated nature of the data.43

Back to Article Outline

Results 

Sample Follow-Up 

There were no significant differences in average age, proportions of sex or race, or average inpatient LOS between the study group (n=94) and persons who enrolled, but dropped out of the study prior to the first visit (n=38). Persons who dropped out had a higher average discharge FIM total (mean difference, 4.1; t=2.67, P=.008) and discharge FIM motor scores (mean difference, 4.3; t=2.93, P=.004) than the study group. A greater percentage of persons who dropped out had less severe levels of physical disability as measured by an adapted severity classification of the MRS212 test=10.95, P<.004). There were no differences in demographic or initial activity scores from the CAT or the fixed-length form between those subjects who completed both visits in comparison to those who were lost to follow-up (n=17). Reasons for persons who were lost to follow-up included: death (n=3), unable to contact or no longer living at home (n=6), refused (n=6), and missing data (n=2).

Score Comparability and Validity 

Intraclass correlations between score estimates of all of the CATs and the AM-PAC-66 indicate a high to substantial degree of correspondence (ICC range, .77−.86) for group level data. For the entire sample, the average ICC correlation across all 3 domains and test occasions was .82 (table 2). We saw no substantial differences in score agreement between the initial (ICC mean, .810) and follow-up (ICC mean, .826) test occasions. With extreme outliers removed (difference >10 points, or 1 SD, between the AM-PAC-66 and the AM-PAC-CAT; n=5 to 7 per domain for the movement & physical and applied cognition domains; n=16 for the personal care & instrumental domain), greater than 75% of all individual scores between the CAT and fixed format instruments were within a margin of ±5 points (0.5 SD).

Table 2. Score Agreement Between the AM-PAC-CAT and the AM-PAC-66
Activity DomainInterviewICC3,1 (N=94)ICC3,1 (average within domain)ICC3,1 (average across domains)
Movement & physicalInitial.829.850
Follow-up.864
Personal care & instrumentalInitial.794.780.820
Follow-up.771
Applied cognitionInitial.807.830
Follow-up.844

Fifty-five (58.5%) patients were coded as having moderate to severe disability on the initial visit, and 42 (44.7%) patients were coded as moderate to severe on the follow-up visit; almost 60% of the sample did not change disability status between visits. The AM-PAC-66 and the AM-PAC-CAT were successful in discriminating between known severity groups in each of the 3 activity domains; that is, both were able to detect statistically significant differences in scores between the 2 severity groups. As expected, because the MRS emphasizes primarily physical functioning, the movement & physical and personal care & instrumental domains in general (both CAT and fixed-form) were more discriminating between physical disability severity groups than was applied cognitive (table 3).

Table 3. CAT Versus Fixed-Form Test Format Discrimination by Severity of Physical Disability (Follow-Up Data)
Activity DomainVersionMean Severe/Moderate ± SD (n=42)Mean Mild/Slight ± SD (n=52)DifferencetdfP
Movement & physicalAM-PAC-6650.28±7.2660.68±9.0110.399−6.05692<.001
AM-PAC-CAT51.03±6.1660.08±10.699.046−5.13684<.001
Personal care & instrumentalAM-PAC-6650.62±8.4656.14±7.955.522−3.25392.002
AM-PAC-CAT50.54±10.8057.39±13.286.841−2.69592.008
Applied cognitionAM-PAC-6649.86±8.8855.09±6.135.226−3.24170.002
AM-PAC-CAT48.25±7.9152.38±6.514.125−2.77492.007

CAT Item Selection and Time Burden 

The average number of items ± SD required for the AM-PAC-CAT (averaged across the initial and follow-up visits) was 6.0±1.3 for movement & physical, 7.4±1.8 for personal care & instrumental, and 8.2±2.9 for applied cognition domains. The minimum number of items per domain was 2 and the maximum (established by the item-stop rule) was 10. The average number of items per person for the 3 domains was 21.6±5.4. For the movement & physical domain, 50 of the 118 items were selected 1 or more times across both initial and follow-up visits, 22 of the 52 items were selected from the personal care & instrumental domain, and 14 of 47 items were selected from the applied cognition domain. Within each of the activity domains, the 10 most frequently selected items accounted for 62% of the item administrations for movement & physical, 81% for personal care & instrumental, and 97% for applied cognition.

Table 4 summarizes the relative burden for the 94 respondents with complete initial and follow-up visits. Overall, the AM-PAC-CAT yielded large decreases in respondent burden as compared with the AM-PAC-66, requiring 33% of the number of items and 44% of the administration time of the full-length survey. The differences between the number of items and amount of time required to complete the 2 different formats were significant, favoring the more efficient AM-PAC-CAT.

Table 4. Respondent Burden of the AM-PAC Versions in a Longitudinal Sample (N=94)
VersionMean ± SD (min)RangeCAT as % of Fixed
AM-PAC-66
Time (min)13.09±5.644–47
AM-PAC-CAT
Time (min)5.65±2.381.5–17.943.2
No. of items21.6±5.419–3032.7

Paired t test (t=16.59, P<.001).

One-sample t test (t=111.09, P<.001).

Sensitivity 

Using bootstrap methods to generate 95% CIs for each ES and SRM, we found only 1 statistically significant paired difference. The movement & physical AM-PAC-66 SRM was statistically higher than the corresponding AM-PAC-CAT SRM (table 5). None of the ES comparisons were statistically significant. The personal care & instrumental domain yielded nearly similar ES and SRMs between the 2 testing formats. Neither CAT or fixed-length formats of the applied cognition domain were able to detect significant levels of change between initial and follow-up visits due largely to a high ceiling effect.

Table 5. Sensitivity of the AM-PAC-CAT Versus the AM-PAC-66 Over 3-Month Interval
Activity DomainTest FormatMean ± SDt Value (df=93)PESSRM
Initial VisitFollow-UpChange
Movement & physicalAM-PAC 6651.47±8.7456.03±9.744.56±6.267.067<.001.522.728
AM-PAC-CAT52.84±9.0756.04±9.993.20±7.264.272<.001.353.441
Personal care & instrumentalAM-PAC 6650.82±8.7553.68±8.592.86±6.264.426<.001.327.457
AM-PAC-CAT50.84±10.7154.33±12.643.48±8.583.938<.001.325.406
Applied cognitionAM-PAC 6652.24±7.8852.76±7.890.52±5.510.915.363.066.094
AM-PAC-CAT49.31±7.1150.53±7.421.23±6.091.952.054.173.202

95% CIs of the AM-PAC-66 SRM and AM-PAC-CAT SRM do not overlap, indicating significant difference; all other comparisons of ES and SRM are nonsignificant.

Responsiveness 

Figure 1 depicts a series of bar charts that highlight the amount of absolute change in the AM-PAC-66 and AM-PAC-CAT formats that corresponds to “no change,” “some change” (small to medium), or “large change” categories as defined by patients’ global ratings of change within each respective functional domain. We used absolute values in these calculations because a number of persons reported worsening functional activity status on the follow-up visit compared to the initial home visit after hospital discharge. In general, the AM-PAC-CAT and AM-PAC-66 were equally able to detect levels of absolute change across the 3 anchor-based change categories, because no statistical differences were found between the AM-PAC-CAT and the AM-PAC-66 in any of the paired comparisons within each change category. We defined “some change” as a minimal level of change that appeared to have meaning to patients. We also examined separately differences in change across the 3 categories of only the persons who changed in a positive direction (better function at follow-up than at initial), and found no statistical differences between the AM-PAC-CAT and the AM-PAC-66 scores.

Of the 38 individuals who reported “some change” in movement & physical, the average change values for the AM-PAC-CAT (mean ± SD, 6.1±5.0) and the AM-PAC-66 (mean, 6.3±4.5) were nearly identical. Forty persons indicated that they made “some change” in the personal care & instrumental domain, and their AM-PAC-CAT (mean, 5.2±5.5) and AM-PAC-66 (mean, 5.4±4.3) scores reflected analogous levels of responsiveness to those who reported “some change.” Similarly, even though the magnitude of the changes was smaller, we found comparable levels of change from the 20 individuals who reported “some change” on the applied cognition domain when comparing the AM-PAC-CAT (mean, 4.3±4.4) with the AM-PAC-66 (mean, 3.9±4.8).

Using the “some change” category as the cutpoint for examining paired-ROC curves to detect minimal levels of patient-reported responsiveness, we found in general that the AM-PAC-CAT and the AM-PAC-66 performed equally well in 2 of the 3 activity domains. For the movement & physical domain, both the AM-PAC-66 (ROC, .667±.125; P=.017) and AM-PAC-CAT (ROC, .692±.116; P=.006) ROC curves were statistically different from chance levels and there was no statistically significant difference between the paired-ROC curves (ROC difference, .024±.149; χ12 test=.103, P=.748). Similarly, for the personal & instrumental care domain, both the AM-PAC-66 (ROC, .680±.153; P=.025) and AM-PAC-CAT (ROC, .676±.137; P=.028) ROC curves were statistically different from chance levels and there was no statistically significant difference between the paired-ROC curves (ROC difference, .004±.145; χ12 test=.003, P=.959). Because of the relatively small amounts of change detected in the applied cognition domains, the ROC curves for both the fixed and CAT forms were not statistically different from chance levels.

Back to Article Outline

Discussion 

Previous studies of simulations using real patient data have suggested that CAT programs can offer substantial reductions in response burden with relatively small compromises in accuracy or sensitivity to change. In this study, we examined this question prospectively in a cohort of patients who were recently discharged from inpatient rehabilitation and followed over a 3-month interval during which, in keeping with standard rehabilitation practice, most received additional physical, occupational, and/or speech therapy services at home or in an outpatient setting. At the time of the initial visits 83% were receiving services; this proportion declined to 55% by the time of the second visit. The results suggest that CAT programs achieve good score correspondence with longer, fixed-length instruments. The average score agreement across the 3 activity domains was approximately ICC equal to .81 for the full sample. This level of agreement is considered acceptable for group level studies. Because we are early in the stage of using CAT in patient assessments, it is not yet clear if some of the higher correlations seen in recent empirical simulations (>.90) will be realized in real-world patient care assessments using CAT programs. The level of score agreement found in this study may not be acceptable when change scores of individual patients are of interest. This can be easily rectified by assigning a higher level of precision to the CAT stop-rule in future studies. Also, in future studies, we need to examine the test-retest reliability of the CAT programs over short intervals to estimate the level of measurement error. Previous studies have suggested that test-retest reliability of aggregate scores of functional items is quite acceptable,44 yet this needs to be confirmed with the CAT platform.

A number of alternative explanations can be presented to understand score disagreements between the CAT and fixed format instruments. We noted that most of the largest scoring disagreements between the 2 test formats occur at the extremes of the range. This is where the CAT may have some important advantages over the fixed form, because items can be better tailored to an individual using the CAT when a person has very low or high functional ability. The fixed form, even though more items are employed, may be providing a less accurate estimate of function than the CAT. Both fixed forms and CATs are typically more precise for persons who score in the middle of the range. CATs, however, have more flexibility to meet the content level of persons at the extremes, and thus may have some advantages over fixed forms for score precision at the extremes. In the case of the personal care & instrumental domain, disagreements in scores for 14 persons can be attributed to the greater content breadth of the CAT, and thus a higher ceiling score was obtained on the CAT than on the fixed form. This was due to the fact the personal care & instrumental fixed forms did not have as wide of content range as the CAT version. We also identified a number of individuals who made unexpected and inconsistent (to the IRT model) responses on a number of items on the CAT. On average across the three domains, 30% of the exact items that were answered in both the AM-PAC-66 and AM-PAC-CAT were answered inconsistently. These individuals, who tended to have relatively low functional scores, provided more challenges to obtaining a precise estimate of activity function with a CAT based on few items. Certainly, differences in the 2 testing formats (interview vs computer) and the particular sequence of item administration may all lead to score differences between the CAT and the alternative fixed forms.

We found that CAT-based scores have good discriminant validity vis-à-vis severity of disability across all 3 activity domains. We found only 1 comparison (SRMs for movement and physical) in which the fixed-length form had greater sensitivity than the CAT. This can be attributed mainly to the larger standard deviations seen with CAT scores, as there is more variability in scores when the estimates are generated by substantially fewer items than the fixed forms. This finding of reduced sensitivity in the movement & physical domain is not entirely unexpected, given that the average number of CAT items was 6 versus the 26 movement & physical items within the AM-PAC-66. Although we established a maximum stop-rule of 10 items per activity domain, on average, only 6 items were needed to meet the precision requirements (±5 points or ±0.5 SD) established for each activity CAT. In retrospect, this setting may have not been adequate to detect the full amount of change in this functional domain. In simulation studies, we have seen that sensitivity can be improved quite markedly as the number of items administered increases.12 In future work, we intend to explore different item-stop rules and precision levels to strike the appropriate balance between sensitivity and test burden.

Using the full study sample, we found that neither the AM-PAC-CAT or the AM-PAC-66 format of the applied cognitive domain detected statistically significant group change. This finding likely reflects a very high ceiling effect (≈38%) for both the AM-PAC-66 and AM-PAC-CAT scores at the initial home visit. The applied cognitive scale was developed to measure cognitive functional activities that involve limited movement requirements, including reading, communication, problem solving, organization and management of routines, as well as telephone use, money management, and management of medications.19 In our initial calibration work on this scale, we defined the scale only in terms of patients with neurologic disorders such as stroke and brain injury. However, in subsequent work with a short-form version of the applied cognitive scale, we found that many individuals with complex medical conditions also were limited in a number of cognitive functional skills.15 The amount of change on the AM-PAC-CAT in patients with neurologic disorders (n=36) was nearly twice the change (mean, 2.16; ES=.29) seen in the entire longitudinal sample (change mean, 1.23; ES=.17). However, in this study, the persons with complex medical disorders exhibited on average very small amounts of change on the CAT version of this scale (change mean, .59; ES=.08). Based on these results, we may emphasize use of the applied cognition scale with those patients with neurologic impairments or individuals with either suspected or clear signs of cognitive deficits.

We found that approximately 25% of the patients who were followed for approximately 3 months in this study actually obtained lower AM-PAC-CAT or AM-PAC-66 scores on 1 or more domains (indicating less functional ability) on the follow-up visit than the initial visit. This is not surprising because 23 of 94 (24.5%) patients within the final study sample reported significant new illnesses or injuries between study visits, and 17 of these patients reported that they had been rehospitalized during the study interval. Patients who had declining activity scores were primarily from the neurologic (42%) and complex medical subgroups (55%). Due to the relatively large number of patients who lost function during the study, we analyzed the responsiveness of the AM-PAC-66 and AM-PAC-CAT using absolute change values. Absolute values assume that the meaning of change in either direction is likely to be similar, and in the case of a general rehabilitation patient, this assumption is probably warranted. However, for conditions that almost always result in some functional deterioration over time, conditions in which small improvements are unexpected, or for cohorts with much larger sample sizes in which the direction of functional change is critical, perhaps separate analyses should be conducted to evaluate patients with either functional improvement or deterioration.45 By using absolute change in this study, we were able to include all subjects and to compare the ability of the CAT to detect both deterioration as well as improvement. In contrast, the overall change scores, on which the sensitivity analyses were based, combined patients who deteriorated, were stable, and who improved into the mean change scores.

The comparison of the responsiveness of the AM-PAC-66 and AM-PAC-CAT in this study was based on a global rating of change made by the patient at the follow-up home visit. Although patients are considered to be the “ideal” respondent regarding their own functioning, there are, nevertheless, a number of concerns regarding patient-based anchors of change. These include recall bias and accommodation to the illness or condition.46 However, methodologic issues aside, patient-based anchors are a very important consideration in evaluating the relative responsiveness of an instrument. Importantly, we found that the AM-PAC-CAT was just as responsive to patient-reported change as the much longer AM-PAC-66 in both the movement & physical and personal care & instrumental domains.

We note impressive efficiency gains in using the AM-PAC-CAT, as it required only one third of the items in the longer fixed-length form, with relatively minor accuracy, sensitivity, or responsiveness losses. These results suggest that the CAT format is a very promising technology for long-term follow-up. We may have actually underestimated the efficiency gains from the CAT in this study. The amount of time needed to administer the CAT was calculated by an internal computer clock that could not be stopped. During the interview in the home environment, a number of interruptions (eg, phone calls, persons coming to the door, greetings by spouses and family members) occurred during the CAT administration. We were unable to account for these interruptions in calculating the time for CAT administration; thus, the amount of time recorded is likely an overestimate, although this may represent realistic time required to administer the CAT in a home setting where interruptions are less controllable than in a clinical setting. When conducting the interviews of the fixed-length forms, data collectors were instructed to record administration time, but to record the length of interruptions greater than 1 minute and subtract this time from the overall administration time. Future studies in both clinical and home environments should continue to examine the potential efficiency gains by use of the CAT platform for functional assessments. A potential limitation of using primarily an interview format for collecting CAT data is that the responses may not generalize to situations in which a patient is responding to questions directly on the computer. We chose to err on the side of data completeness and quality in this early CAT work on activity functioning; comparisons of item response to the use of an interviewer versus full patient-report could be the focus of future studies.

Future CAT development will also need to balance the utility of generating scores for groups of patients with the perceived need by some clinicians for these assessments to provide usable information for individual patient treatment planning and monitoring. For example, CAT item selection programs may be developed in the future to choose items based on content considerations as well as maximizing information value, as was done in this study. A challenge for CAT applications in rehabilitation is to provide enough information at the individual level, and still minimize response burden so that CAT remains feasible in rehabilitation practice.

There was considerable attrition from the time of initial recruitment into the study prior to hospital discharge to the 3-month follow-up visit, and persons with higher level functional skills were disproportionately represented in the drop-outs. It appears that individuals who have fully recovered or have other community responsibilities such as return to work are less likely to stay involved with a longitudinal follow-up study. This factor has important implications for the generalizability of the findings and for retention of individuals in long-term follow-up studies. It should also be noted that because this was an observational design, the final functional status of each patient included changes due to rehabilitation services, the passage of time, and other factors specific to that patient.

Back to Article Outline

Conclusions 

The results of this study support the continued evaluation and development of CAT-based functional assessments in rehabilitation. The current CAT programs were developed for group-level analyses and not for care planning or identifying limitations of individuals in specific areas of functioning. The efficiency gains coupled with very strong psychometric performance of the activity scales suggest that CAT assessments may provide an important tool for future follow-up studies and group monitoring in postacute care.

Suppliers

Back to Article Outline

Acknowledgments 

We thank Jakob B. Bjorner, MD, PhD, for statistical support with the item bank development and subsequent analyses. Appreciation is extended to Kristen Foget for recruitment of patients for this study. We also acknowledge Maryann McGerigle, Cindy Garven, and Susanne Fantasia for their data collection activities, Christine Cahalan, Erika Wright, and Jeanne McGerigle for data entry, and Julie Cam and Ashley Harper for help with manuscript preparation.

Back to Article Outline

Appendix 1. Patient-reported global rating of change* 

We would like you to think how about much functional change in (movement & physical; personal care & instrumental; applied cognitioneach domain described in lay language) has occurred since when you entered the study.

Overall, would you say that your functioning is: 1. Worse, 2. About the same, or 3. Better?

Patients who stated that they were worse, were asked to rate how much worse on the following scale: −7 A very great deal worse, −6 A great deal worse, −5 A good deal worse, −4 Moderately worse, −3 Somewhat worse, −2 A little worse, or −1 Almost the same, hardly worse at all.

Patients who stated that they were better, were asked to rate how much better on the following scale: +1 Almost the same, hardly better at all, +2 A little better, +3 Somewhat better, +4 Moderately better, +5 A good deal better, +6 A great deal better, or +7 A very great deal better.

Those who indicated that they were about the same were given a score of zero (0=no change).

Back to Article Outline

References 

  1. Dijkers MP . A computer adaptive testing simulation applied to the FIM instrument motor component . Arch Phys Med Rehabil . 2003;84:384–393
  2. Ware J , Gandek B , Sinclair S , Bjorner B . Item response theory in computer adaptive testing (implications for outcomes measurement in rehabilitation) . Rehabil Psychol . 2005;50:71–78
  3. Jette AM , Haley SM . Contemporary measurement techniques for rehabilitation outcome assessment . J Rehabil Med . 2005;37:339–345
  4. Revicki DA , Cella DF . Health status assessment for the twenty-first century (item response theory, item banking and computer adaptive testing) . Qual Life Res . 1997;6:595–600
  5. Wainer H . Computerized adaptive testing (a primer) . Mahwah: Lawrence Erlbaum Associates; 2000;
  6. Haley SM, Ni P, Hambleton RK, Slavin MD, Jette AM. Computer adaptive testing improves accuracy and precision of scores over random item selection in a physical functioning item bank. J Clin Epidemiol. In press.
  7. Haley SM , Coster WJ , Andres PL , Kosinski M , Ni P . Score comparability of short forms and computerized adaptive testing (simulation study with the Activity Measure for Post-Acute Care) . Arch Phys Med Rehabil . 2004;85:661–666
  8. Andres PL , Black-Schaffer RM , Ni PS , Haley SM . Computer adaptive testing (a strategy for monitoring stroke rehabilitation across settings) . Top Stroke Rehabil . 2004;11(2):33–39
  9. Siebens H , Andres PL , Ni P , Coster WJ , Haley SM . Measuring physical function in patients with complex medical and postsurgical conditions (a computer adaptive approach) . Am J Phys Med Rehabil . 2005;84:741–748
  10. Hart DL , Mioduski JE , Stratford PW . Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments . J Clin Epidemiol . 2005;58:629–638
  11. Ware J , Kosinski M , Bjorner J , et al.   Applications of computerized adaptive testing (CAT) to the assessment of headache impact . Qual Life Res . 2003;12:935–952
  12. Haley SM , Raczek AE , Coster WJ , Dumas HM , Fragala-Pinkham MA . Assessing mobility in children using a computer adaptive testing version of the Pediatric Evaluation of Disability Inventory . Arch Phys Med Rehabil . 2005;86:932–939
  13. Haley SM, Fragala-Pinkham MA, Ni P. Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness program. Clin Rehabil. In press.
  14. Andresen EM . Criteria for assessing the tools of disability outcomes research . Arch Phys Med Rehabil . 2000;81(12 Suppl 2):S15–S20
  15. Coster WJ , Haley SM , Jette AM . Measuring patient-reported outcomes after discharge from inpatient rehabilitation settings . J Rehabil Med . 2006;38:237–242
  16. Haley SM , Andres PL , Coster WJ , Kosinski M , Ni PS , Jette AM . Short-form activity measure for post-acute care . Arch Phys Med Rehabil . 2004;85:649–660
  17. Haley SM , Coster WJ , Andres PL , et al.   Activity outcome measurement for post-acute care . Med Care . 2004;42(Suppl 1):I49–I61
  18. Coster WJ , Haley SM , Andres PL , Ludlow LH , Bond TL , Ni PS . Refining the conceptual basis for rehabilitation outcome measurement (personal care and instrumental activities domain) . Med Care . 2004;42(Suppl 1):I62–I72
  19. Coster WJ , Haley SM , Ludlow LH , Andres PL , Ni PS . Development of an applied cognition scale to measure rehabilitation outcomes . Arch Phys Med Rehabil . 2004;85:2030–2035
  20. Callahan CM , Uvnverzagt FW , Hui SL , Perkins AJ , Hendrie HC . Six-item screener to identify cognitive impairment among potential subjects for clinical research . Med Care . 2002;40:771–781
  21. van Swieten JC , Koudstaal PJ , Visser MC , Schouten HJ , van Gijn J . Interobserver agreement for the assessment of handicap in stroke patients . Stroke . 1988;19:604–607
  22. Ware J , Kosinski M , Dewey J , Gandek B . How to score and interpret single-item health status measures (a manual for users of the SF-8 Health Survey) . Lincoln: QualityMetric; 1999;
  23. Muraki E , Bock RD . Parscale (IRT item analysis and test scoring for rating—scale data) . Chicago: Scientific Software International; 1997;
  24. Mislevy RJ , Bock RD . BILOG-3 (item analysis and test scoring with binary logistic models) . Chicago: Scientific Software International; 1990;
  25. Hambleton R , Swaminathan H , Rogers H . Fundamentals of item response theory . Newbury Park: Sage; 1991;
  26. van der Linden W , Hambleton R . Handbook of modern item response theory . Berlin: Springer; 1997;
  27. Yen WM . Scaling performance assessments (strategies for managing local item dependence) . J Educ Meas . 1993;30:187–213
  28. Warm TA . Weighted likelihood estimation of ability in item response theory . Psychometrika . 1989;54:427–450
  29. Wang SD , Wang TY . Precision of Warm’s weighted likelihood estimates for polytomous model in computerized adaptive testing . Appl Psychol Meas . 2001;25:317–331
  30. Ware JE , Bjorner JB , Kosinski M . Practical implications of item response theory and computerized adaptive testing (a brief summary of ongoing studies of widely used headache impact scales) . Med Care . 2000;38(9 Suppl):II73–II82
  31. Guide for the Uniform Data System for Medical Rehabilitation (including the FIM instrument), version 5.1 . Buffalo: State Univ New York; 1997;
  32. Jaeschke R , Singer J , Guyatt GH . Measurement of health status (ascertaining the minimal clinically important difference) . Control Clin Trials . 1989;10:407–415
  33. Osoba D , Rodrigues G , Myles J , Zee B , Pater J . Interpreting the significance of changes in health-related quality-of-life scores . J Clin Oncol . 1998;16:139–144
  34. Wiebe S , Matijevic S , Eliasziw M , Derry PA . Clinically important change in quality of life in epilepsy . J Neurol Neurosurg Psychiatry . 2002;73:116–120
  35. Shrout PE , Fleiss JL . Intraclass correlations (uses in assessing rater reliability) . Psychol Bull . 1979;86:420–428
  36. de Haan R , Horn J , Limburg M , Van Der Meulen J , Bossuyt P . A comparison of five stroke scales with measures of disability, handicap, and quality of life . Stroke . 1993;24:1178–1181
  37. Kazis LE , Anderson JJ , Meenan RF . Effect sizes for interpreting changes in health status . Med Care . 1989;27(Suppl):S178–S189
  38. Liang M , Lew R , Stucki G , Fortin P , Daltroy L . Measuring clinically important changes with patient-oriented questionnaires . Med Care . 2002;40(4 Suppl):II45–II51
  39. Rosenthal R , Rosnow R . Essentials of behavioral research (methods and data analysis) . 2nd ed.. New York: McGraw-Hill; 1991;
  40. Dunlap W , Cortina J , Vaslow J , Burke M . Meta-analysis of experiments with matched groups or repeated measures designs . Psychol Methods . 1996;1:170–177
  41. Hanley JA , McNeil BJ . A method of comparing the areas under receiver operating characteristic curves derived from the same cases . Radiology . 1983;148:839–843
  42. Weinstein MC , Berwick DM , Goldman PA , Murphy JM , Barsky A . A comparison of three psychiatric screening tests using receiver operating characteristic (ROC) analysis . Med Care . 1989;27:593–607
  43. Delong E , DeLong D , Clarke-Pearson D . Comparing the areas under two or more correlated receiver operating characteristic curves (a nonparametric approach) . Biometrics . 1988;44:837–845
  44. Andres PL , Haley SM , Ni PS . Is patient-reported function reliable for monitoring post-acute outcomes? . Am J Phys Med Rehabil . 2003;82:614–621
  45. Cella D , Hahn EA , Dineen K . Meaningful change in cancer-specific quality of life scores (differences between improvement and worsening) . Qual Life Res . 2002;11:207–221
  46. Cella D , Eton DT , Lai JS , Peterman AH , Merkel DE . Combining anchor and distribution-based methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy (FACT) anemia and fatigue scales . J Pain Symptom Manage . 2002;24:547–561
  • a Scientific Software International, 7383 N Lincoln Ave, Ste 100, Lincolnwood, IL 60712-1747.
  • b QualityMetric Inc, 640 George Washington Hwy, Lincoln, RI 02865.

 Supported by the National Institute of Child Health and Human Development (grant no. R01 HD043568) and the Agency for Healthcare Research and Quality, and an independent scientist award (grant no. K02 HD45354-01).A commercial party having a direct financial interest in the results of the research supporting this article has conferred or will confer a financial benefit upon the author or 1 or more of the authors. Haley has stock interest in CRE Care LLC, which distributes the Activity Measure for Post-Acute Care products.

PII: S0003-9993(06)00402-3

doi:10.1016/j.apmr.2006.04.020

Archives of Physical Medicine and Rehabilitation
Volume 87, Issue 8 , Pages 1033-1042, August 2006