Archives of Physical Medicine and Rehabilitation
Volume 89, Issue 11 , Pages 2146-2155, November 2008

Performance-Based or Self-Report Measures of Physical Function: Which Should Be Used in Clinical Trials of Hip Fracture Patients?

Presented as an abstract to the American Congress of Rehabilitation Medicine, October 3–7, 2007, Washington DC, and to the Gerontological Society of America, November 17, 2007.

  • Nancy K. Latham, PT, PhD

      Affiliations

    • Health and Disability Research Unit, Boston University School of Public Health, Boston, MA
    • Corresponding Author InformationReprint requests to Nancy K. Latham, PT, PhD, Health and Disability Research Institute, School of Public Health, Boston University Medical Campus, 4th Fl, 580 Harrison Ave, Boston, MA 02118-2639
  • ,
  • Vinay Mehta, PhD

      Affiliations

    • Merck & Co, Whitehouse Station, NJ
  • ,
  • Allison Martin Nguyen, MS

      Affiliations

    • Merck & Co, Whitehouse Station, NJ
  • ,
  • Alan M. Jette, PT, PhD

      Affiliations

    • Health and Disability Research Unit, Boston University School of Public Health, Boston, MA
  • ,
  • Sippy Olarsch, ScD, PT

      Affiliations

    • Health and Disability Research Unit, Boston University School of Public Health, Boston, MA
  • ,
  • Dimitris Papanicolaou, MD

      Affiliations

    • Merck & Co, Whitehouse Station, NJ
  • ,
  • Julie Chandler, PT, PhD

      Affiliations

    • Merck & Co, Whitehouse Station, NJ

Article Outline

Abstract 

Latham NK, Mehta V, Nguyen AM, Jette AM, Olarsch S, Papanicolaou D, Chandler J. Performance-based or self-report measures of physical function: which should be used in clinical trials of hip fracture patients?

Objectives

To assess the validity, sensitivity to change, and responsiveness of 3 self-report and 4 performance-based measures of physical function: activity measure for postacute care (AM-PAC) Physical Mobility and Personal Care scales, the Medical Outcomes Study 36-Item Short Form Health Survey Physical Function scale (SF-36 PF), the Physical Functional Performance test (PFP-10), the Short Physical Performance Battery (SPPB), a 4-meter gait speed, and the six-minute walk test (6MWT).

Design

A prospective observational study of patients after a hip fracture. Assessments were performed at baseline and 12 weeks postenrollment.

Setting

Inpatient and outpatient rehabilitation facilities in Norway, the United Kingdom, Sweden, Israel, Germany, the United States, Denmark, and Spain.

Participants

A sample of study participants (N=108) who had a hip fracture.

Interventions

Not applicable.

Main Outcome Measures

Assessments of validity (known-groups, concurrent, construct, and predictive), sensitivity to change (effect size, standardized response mean [SRM], SE of measure, minimal detectable change (MDC), and responsiveness (optimal operating cut-points and area under the curve) between baseline and 12-week follow-up.

Results

All physical function measures achieved comparably acceptable levels of validity. Odds ratios in predicting patient Global Assessment of Improvement at 12 weeks were as follows: AM-PAC Physical Mobility scale, 5.3; AM-PAC Personal Care scale, 3.6; SF-36 PF, 4.3; SPPB, 2.0; PFP-10, 2.5; gait speed, 1.9; and 6MWT, 2.4. Effect sizes and SRM exceeded 1 SD for all 7 measures. Percent of patients who exceeded the MDC90 at week 12 were as follows: AM-PAC Physical Mobility scale, 90%; AM-PAC Personal Care scale, 74%; SF-36 PF, 66%; SPPB, 36%; PFP-10, 75%; gait speed, 69%; and 6MWT, 75%. When evaluating responsiveness using the area under receiver operating curves for each measure, all measures had acceptable responsiveness, and no pattern emerged of superior responsiveness depending on the type of measure used.

Conclusions

Findings reveal that the validity, sensitivity, and responsiveness of self-report measures of physical function are comparable to performance-based measures in a sample of patients followed after fracturing a hip. From a psychometric perspective, either type of functional measure would be suitable for use in clinical trials where improvement in function is an endpoint of interest. The selection of the most appropriate type of functional measure as the primary endpoint for a clinical trial will depend on other factors, such as the measure's feasibility or the strength of the association between the hypothesized mechanism of action of the study intervention and a functional outcome measure.

Key Words: Hip fractures, Rehabilitation

List of Abbreviations: ADLs, activities of daily living, AM-PAC, activity measure for postacute care, AUC, area under the curve, CAT, computer adaptive test, CI, confidence interval, ES, effect size, MDC, minimal detectable change, MDC90, minimal detectable change at the 90% CI, OR, odds ratio, PFP-10, Physical Functional Performance test, ROC, receiver operator characteristic, SF-36, Medical Outcomes Study 36-Item Short Form Health Survey, SF-36 PF, Medical Outcomes Study 36-Item Short Form Health Survey Physical Function scale, SPPB, Short Physical Performance Battery, 6MWT, six-minute walk test, SRM, standardized response mean

 

FUNCTIONAL ASSESSMENTS are a key component of evaluations of health and well being in older people. Loss of physical function is associated with greater disability, more reliance on others, and an increased likelihood of hospitalization and death.1, 2 Less certain, though, is the best approach to measuring physical function. For clinical trials with function as a primary endpoint, the decision about the most appropriate functional measure to select is challenging, because the ease of use and psychometric properties of different instruments may vary substantially across levels of disease severity or type of functional limitation.

There are 2 main approaches to functional assessment: self-report and performance-based testing of physical function. Self-report measures are those that are subject-completed, relying on self-perception of mobility status, and performance of daily activities.3, 4, 5 They typically assess the subject's performance difficulties, restrictions, or need for assistance associated with functional activity. Performance-based measures of physical function rely on a rater's assessment of a subject's performance of specific physical tasks, typically measured in controlled environments.5, 6, 7 These measures typically involve the completion or timing of basic strength, balance, or mobility tasks.

There has been debate about the relative advantages of performance-based versus self-report approaches for functional assessment. Theoretical advantages of performance-based measures of physical function compared with self-report measures included better reproducibility, greater sensitivity to change, and less vulnerability to external influences such as cognition, culture, language, and education.8 Numerous studies have compared self-report and physical performance measures,4, 5, 7, 9, 10, 11, 12, 13, 14, 15 and most have found a moderate correlation between the 2 approaches, although the association has ranged from low to high correlations.4, 9 Psychometric properties of self-report and performance-based measures are comparable.7 Performance-based measures, as a full battery of tests, or gait speed alone predict subsequent disability.1, 2 There is also support for predictive validity of self-report measures of preclinical disability.16, 17

For clinical trials, it is particularly important that measures are sensitive to change. The term sensitivity to change refers to the ability of an instrument to detect change that exceeds what can be attributed to chance or error, regardless of its clinical relevance or meaningfulness, whereas responsiveness refers to the detection of change that is clinically relevant or meaningful from a patient, provider, or researcher perspective.18, 19, 20, 21 Early studies suggested that self-report measures lacked sensitivity to change,11 but subsequent studies have shown that self-report measures are more sensitive than performance-based measures to functional change in higher-functioning populations,16 or in certain condition-specific populations such as people with low back pain.22 Performance-based measures have also been shown to be highly sensitive to change,23, 24 although some common performance-based measures may not distinguish ability at the higher levels of functioning.25

Based on the mixed evidence in this area, many researchers have recommended the use of both self-report and performance-based functional measures in gerontologic research because self-report and performance-based measures appear to impart distinct but complementary information regarding functional status.4, 5, 9, 10, 11 Studies have found that using both types of measures provides a more specific prediction of mobility difficulty,16, 26 rates of hospitalization, change in health, and change in function27 than using either alone.16, 26, 27

A major limitation of previous studies that have compared the psychometric characteristics of self-report and performance-based measures of function has been that the measures often have evaluated different types of tasks, or entirely different constructs, capturing information at different levels on the disablement spectrum. The disablement process as conceptualized by Verbrugge and Jette28 progresses from pathology to impairment to functional limitation to disability. Impairment represents abnormality or dysfunction at a body system level, and functional limitation indicates bodily restrictions when performing tasks, whereas disability involves the interaction of the body and the environment in life roles. Self-report instruments typically ask about complex ADLs, such as instrumental ADLs,13 capturing information at the disability end of the spectrum.4 Performance-based measures, in contrast, have almost always assessed basic mobility or balance tasks, such as repetitive chair stands, timed walk, and tandem stance, reflecting more of an impairment or functional limitation measure.1, 4 Therefore, it has been unclear whether differences found between self-report and performance-based measures have resulted from the mode of assessment or from the differing tasks and constructs being assessed.

The aim of this analysis was to compare the validity, sensitivity, and responsiveness of 3 self-report and 4 performance-based measures of physical function frequently used in clinical trials involving older adults to determine the relative psychometric advantage of using one measurement approach compared with the other.

Back to Article Outline

Methods 

Subjects 

Data used for these analyses were obtained from a 24-week, randomized, double-blind, placebo-controlled, multicenter study of an investigational drug to treat muscle wasting among patients recovering from a unilateral hip fracture with noncomplicated surgical repair. The patients were age 65 years or older (men or women), within 17 days of surgical repair of a unilateral hip fracture, and at least partially weight-bearing. Surgical repair occurred no more than 4 days posthip fracture. Prior to the hip fracture, patients were living in the community (ie, not residing in a nursing home), able to ambulate independently at home (ie, able to walk indoors in a familiar setting with little or no aid from another person), and without major cognitive impairment (Folstein's Mini-Mental State Examination score ≥24). Exclusion criteria included patients with hip fracture caused by bone pathology (other than osteoporosis) or major trauma, any disease or condition felt to affect recovery from the surgery, uncontrolled thyroid disease, type I diabetes, uncontrolled type II diabetes, type II diabetes with diabetic retinopathy or requiring combination therapy or insulin, systemic or anabolic steroid use, comorbid conditions felt to put the patient at risk or impair the ability to complete the study (eg, cardiovascular, neuromuscular, or neurologic disease; coronary heart disease; class III or IV congestive heart failure; severe peripheral vascular disease; uncontrolled hypertension; cancer), and drug or alcohol abuse. Prior to starting the study, patients were enrolled in a rehabilitation program providing a minimum of 10 hours of physical therapy over a period of 2 to 4 weeks. Informed consent was obtained from each participant prior to performing any study procedures, and the study was approved by the ethical review committees in each of the 8 countries included in the study (Norway, United Kingdom, Sweden, Israel, Germany, United States, Denmark, Spain).

Study Design 

Participants in the study were randomized to receive either an experimental medication or a matching placebo daily for 24 weeks. The study consisted of 6 visits conducted at the study site (a screening visit, a randomization or baseline visit, and visits at weeks 4, 8, 12, and 24), and 6 visits conducted over the phone (visits at weeks 2, 6, 10, 16, 20, and 26). Data obtained from the baseline and week 12 visits account for the main data for these analyses, and the treatment and control groups were treated as a single cohort. We chose to use data from week 12 because this provided a long enough interval from baseline to allow change to take place while also maximizing the sample available for analysis. Approximately 320 patients were targeted for enrollment; however, this blind and pooled analysis is based on a subset of the first 108 randomized patients.

Measures 

Prior to study start, site staff were trained and certified to conduct all assessments according to the developers' guidelines and appropriate safety precautions. All patient-completed questionnaires, instructions, and interview guides were translated and culturally adapted into the local languages for each country following international guidelines for translation. All site staff conducting the performance-based measures were required to be either physical therapists or orthopedic nurse specialists.

The following measures were administered to all participants at each clinic visit.

Continuous Scale Physical Functional Performance test 

The PFP-10 was developed to address a broad range of upper-extremity and lower-extremity strength and endurance activities important to independence in older adults. The tool includes 10 ADLs including carrying a pot, picking a scarf up off the floor, sitting on the floor, carrying a bag of groceries, sweeping the floor, loading and unloading laundry into a washer, reaching overhead, putting on a jacket and taking it off, and walking for 6 minutes. Each task contributes to the total score and to 1 or more of the 5 domains: Lower Body Strength, Upper Body Strength, Upper Body Flexibility, Balance and Coordination, and Endurance. Observed physical performance is rated by the examiner by measuring the time to complete a task, the amount of weight lifted, or the distance walked, adjusted for a patient's capacity. Each domain is scored on a 0 to 100 scale (with 0 indicating poor function and 100 indicating excellent function). The PFP-10 total score is an average of these 5 domain scores.29

Short Physical Performance Battery 

The SPPB is a lower-extremity performance-based test that is composed of 3 components: standing balance, gait speed, and chair rise. Testing begins with the balance component and consists of 3 tests of increasing difficulty (ie, each test has an increasingly narrower base of support), including a side-by-side stand, semitandem stand, and tandem stand. Gait speed was tested over a 4-m walkway and at the patient's usual speed. The final component, chair rise, was conducted among those patients able to complete 1 chair rise without the use of his/her hands and consisted of a series of 5 timed chair rises. Both total and composite scores can be calculated for the SPPB with total scores ranging from 0 to 12 (with 0 indicating poor function and 12 indicating excellent function).10

Six-minute walk test 

This endurance measure assesses the distance a person can walk on a measured walkway during 6 minutes. This standardized measure is 1 of the 10 tasks in the PFP-10, and scores from this task can be derived separately.

Activity measure for postacute care 

The AM-PAC is a self-reported questionnaire that assesses a subject's degree of difficulty or assistance in performing specific functional tasks. The AM-PAC was developed using item response theory methods for evaluation of functional status in adults receiving postacute care in various settings.30 The CAT version of the AM-PAC algorithm selects each question to be administered from a large pool of items based on responses to the previous question. Thus, the tool can focus on a relevant range of functioning for each patient with relatively few questions and estimate the level of functioning with minimal floor or ceiling effects. The AM-PAC CAT generates questions until the CI for the final score is below a prespecified range or the maximum allowed number of questions is reached, with higher scores indicating better function. The AM-PAC CAT has been validated against several widely used disability measures. Two physical function domain scores were evaluated in this analysis: Physical and Movement (AM-PAC Physical Mobility scale), and Personal Care and Instrumental (AM-PAC Personal Care scale).

Medical Outcomes Trust Short Form Health Survey 

The SF-36 is a self-administered patient questionnaire that measures general health-related quality of life in 8 domains of health: physical functioning (SF-36 PF), role limitations caused by physical health (role-physical), bodily pain, general health perceptions, vitality, social functioning, role limitations due to emotional problems (role-emotional), and mental health. The SF-36 yields a score for each of these domains, as well as summary scores for both physical and mental health, and a single health utility index. The SF-36 is scored from 0 to 100, with 0 indicating extreme difficulty and 100 indicating no difficulty. The domain of primary interest for this study is the physical functioning (SF-36 PF) subscale, which is scored on a 0 to 100 scale.31

Lower-extremity muscle strength 

Lower-extremity isometric strength is a measurement of force produced by muscles. A strain gauge dynamometer attached to a standardized chair was used at each site to measure bilateral knee extension force in kilograms. To perform the test, the patient was seated in the chair with hips at 90° and knees at 70° to 80°. A stationary hook on one end of the dynamometer was connected to the crossbar of the chair and a strap on the other end positioned approximately 10cm above the patient's lateral malleolus. The patient was then instructed to extend his or her knee and forcefully push against the strap as hard as possible. Three trials on each leg were performed with 10 to 20 seconds of rest given between trials. The patient's nonaffected leg was tested first, followed by the affected leg.32

Lower-extremity power 

Power, defined as force × velocity, is the ability to generate force quickly. Lower-extremity power was assessed based on the amount of time it took a patient to ascend a set of 4 stairs as quickly as possible. Lower-extremity power was then calculated as the product of the patient's weight (kg), gravitational force (g=9.8 m/s2), and vertical velocity (staircase height/time). The stairs used for this test were standardized across all sites. To perform the test, the patient stood at the bottom of the stairs with both hands on the stair rails. The patient was allowed to use a cane or other assistive device, if needed. The patient was instructed to climb the stairs safely and as fast as possible and to stop at the top platform. The tester started the stopwatch when the tester said “go” and stopped it when both feet were on the top platform.33

Patient Global Assessment of Improvement and Physician Global Assessment of Improvement 

These are 1-item self-reported questionnaires that ask the patient and physician to rate (how much worse or better) the patient's ability to get around in their home and community. Each question uses a 9-point categoric rating scale with anchors of A Great Deal Worse and A Great Deal Better and refers to the change in the patient's ability since the start of the study.

Hip Pain Status scale 

This is a patient self-reported 11-point numeric rating scale (0–10) that asks the patient to rate level of pain over the past 7 days. The rating scale ranges from no pain to pain as bad as you can imagine.

Postfracture condition 

At each clinic visit, patients were asked to report whether they were currently using any assistive devices (cane, walker, wheelchair, and so forth) for ambulation.

In order to limit the influence of performance-based measures on self-reported measures of physical functioning, the measures were administered in the following order at all clinic visits: AM-PAC, SF-36, Hip Pain Status Scale, PFP-10, SPPB, lower-extremity muscle strength, lower-extremity power, patient Global Assessment of Improvement, and physician Global Assessment of Improvement.

Statistical Methods 

Baseline and week 12 (when available) descriptive statistics (mean, median, SD, minimum and maximum for continuous variables, and frequency/percent for categoric variables) were calculated for demographic variables, assisted device use, patient and investigator Global Assessment of Improvement and each of the performance/self-report measures.

The statistical analyses focused on describing the psychometric characteristics of 4 physical performance measures (PFP-10, SPPB, 6MWT, and gait speed) and 3 self-reported measures (SF-36 PF, AM-PAC Personal Care scale, and AM-PAC Physical Mobility scale). The following analyses were conducted to test the validity, sensitivity to change, and responsiveness of the different measures.

Concurrent validity 

Spearman correlation coefficients ± 95% CIs between each of the performance and self-reported measures were computed to assess concurrent validity at week 12. Correlation coefficients with CIs that did not span 0 were considered to be statistically significant.

Known-groups analysis 

The mean or distribution of each of the performance and self-report measures at week 12 was compared according to known groups using a Student t test (for parametric distributions) or Wilcoxon rank-sum test (for nonparametric distributions).

The known groups were defined according to whether the patients (1) used an assisted device at week 12, (2) were above or below the median sex-specific affected leg strength at week 12 (stratified by sex), or (3) were above or below the median sex-specific stair climbing power at week 12 (stratified by sex).

For the analyses in this study, the median strength at week 12 was 20 and 13.5 for men and women, respectively. The median power was 128.3 and 72.6W for men and women, respectively. The mean differences between groups were compared for SPPB total, gait speed, and SF-36 PF (parametric distributions). The differences in distributions were tested for AM-PAC Physical Mobility scale, AM-PAC Personal Care scale, PFP-10, and 6MWT (nonparametric distributions).

Construct validity 

Spearman correlation coefficients ± 95% CIs were calculated between measures of lower-extremity muscle performance (affected side knee extension strength, stair climbing power) and measures of physical performance including performance-based measures (SPPB, PFP-10, 6MWT, gait speed) and self-reported measures (AM-PAC Personal Care scale, AM-PAC Physical Mobility scale, SF-36 PF). Correlations between other domains of the SF-36 (vitality, body pain, role physical, and role social) and the performance and self-reported physical functioning measures were also estimated using week 12 values. Correlation coefficients with CIs that did not span 0 were considered to be statistically significant.

Predictive validity 

Logistic regression models were used to assess the validity of the physical performance measures (performance-based and self-report) in predicting a Global Assessment of Improvement (patient and investigator) score of 8 or higher (much better) at week 12. The cutoff of 8 was chosen because this score has face validity for being a clinically relevant improvement, and the distribution of scores on the scale was such that an adequate number of respondents above and below the cutpoint would allow meaningful analysis. ORs and 95% CIs were computed to show the increased odds of having a Global Assessment score of 8 or higher for a 1-SD increase in each performance-based or self-reported physical performance measure. ORs with CIs that did not span 1 were considered to be statistically significant.

Distribution-based measures of sensitivity to change 

Distribution-based statistics assess sensitivity in regard to magnitude of change in relation to sample variation, as with ES and SRM, or instrument variation, as with SE of measure. The SE of measure can be used to calculate a MDC that can be considered as true change, exceeding the bounds of measurement error or noise.34, 35 Identifying the errors associated with the scores through the SE of measure, and estimating an MDC, enhances the interpretation of the scores and change scores, establishing benchmarks to help monitor change.36, 37 The following statistics were calculated to determine the different measurements' sensitivity to change from baseline to week 12.

Cohen's ES 

Cohen's ES was calculated as (M1−M2)/SP, where M1 is the mean score at week 12, M2 is the mean score at baseline, and SP=√([S12+S22]/2) is the pooled SD. Cohen describes an ES of 0.8 or higher to be large.38

Standardized response mean 

SRM was calculated as (M1−M2)/SΔ, where M1 is the mean score at week 12, M2 is the mean score at baseline, and SΔ is the SD of (M1−M2).

SE of the measurement 

SE of measure was calculated as SB*√(1−r) where SB is the SD at baseline and r is the test-retest reliability coefficient. Data for the reliability coefficients were obtained from previous studies of the SPPB,39 gait speed,39 PFP-10,40 SF-36 PF,41 AM-PAC,30 and 6MWT.39

Minimal detectable change 

MDC90 was calculated as SE of measure *1.645*√2. The MDC90 can be interpreted as the smallest detectable change that falls outside the measurement error of the instrument.36 In the formula, 1.645 is derived from the 90% CI of no change.

The percentage who exceeded MDC90 from baseline to week 12 was calculated as the percentage of patients who had at least a minimally detectable change from baseline to week 12.

ROC to Evaluate Responsiveness 

ROC curve analysis was used to determine whether a measure was able to discriminate between those who had a true change in function versus those who did not. The area under the ROC curve corresponds to the quantitative assessment of a measure's diagnostic or predictive performance.42 ROC curves were constructed to calculate a given measure's sensitivity and specificity in predicting a patient or investigator Global Assessment of Improvement score of 8 or higher. For each physical performance measure (performance-based and self-report), a separate ROC curve was constructed using the change from baseline to week 12. The AUC was calculated for each ROC curve.

Back to Article Outline

Results 

Table 1 shows descriptive statistics of the sample at baseline and week 12. At week 12, there were 26 patients (24.1%) lost to follow-up. The mean age of the sample ± SD (N=108) was 79.4±8.0, with approximately 75% (n=79) women. For all performance and self-report measures, the mean and median scores increased significantly from baseline to week 12.

Table 1. Study Sample Patient Characteristics
Characteristicsn%
Sex
Women7973.2
Men2926.9
Baseline assisted device use
Yes10698.2
No21.8
Week 12 assisted device use
Yes6376.8
No1923.2
Week 12 patient GAI ≥8
Yes3643.9
No4656.1
Week 12 investigator GAI ≥8
Yes4656.1
No3643.9
Mean ± SDMedianRange
Baseline age (y)
Women79.4±8.080.0(64.0–95.0)
Men77.0±7.578.0(65.0–92.0)
Total78.9±8.179.5(64.0–95.0)
Hip Pain Status scale (11-point scale)
Baseline4.2±2.25.0(0–9.0)
Week 122.5±2.02.0(0–7.0)
Lower-extremity muscle strength (kg)
Baseline11.2±7.310.0(0–38.0)
Week 1216.3±6.815.0(4.0–43.0)
Lower-extremity muscle power (W)
Baseline54.3±33.146.3(4.1–165.2)
Week 1295.l±54.885.4(0.4–306.4)
SF-36 PF (0–100)
Baseline23.0±21.815.0(0–90.0)
Week 1257.5±25.557.5(0–100)
AM-PAC Physical Mobility (0–100)
Baseline48.0±10.051.9(19.2–62.7)
Week 1259.7±8.260.2(29.0–82.5)
AM-PAC Personal Care (0–100)
Baseline49.2±8.048.5(22.0–68.0)
Week 1257.0±8.858.2(35.0–68.1)
PFP-10 (0–100)
Baseline10.4±9.96.8(0–45.2)
Week 1224.5±18.421.3(0–78.1)
SPPB (0–12)
Baseline4.7±2.85.0(0–12.0)
Week 127.9±2.68.0(0–12.0)
Gait speed (m/s)
Baseline0.50±0.280.45(0.01–1.33)
Week 120.77±0.350.76(0.003–1.83)
6MWT (m)
Baseline121.0±103.0109.6(0–408.1)
Week 12251.0±155.5259.0(0–651.9)

Abbreviation: GAI, Global Assessment of Improvement.

Concurrent Validity 

Spearman correlations between the different performance and self-report measures are displayed in table 2. At week 12, all of the physical function measures were significantly correlated with each other. In general, self-reported measures were more strongly correlated with other self-reported measures than with performance-based measures. Performance-based measures were more strongly correlated with other performance-based measures than with self-reported measures.

Table 2. Spearman Correlations Between Performance-Based and Self-Report Function Measures at Week 12
AssessmentCorrelation Coefficient (95% CI)
AM-PAC Physical MobilityAM-PAC Personal CareSF-36 PFPFP-10SPPB TotalGait Speed Value6MWT
AM-PAC Physical Mobility0.71(0.58–0.81)0.84(0.76–0.90)0.64(0.48–0.75)0.65(0.50–0.77)0.65(0.49–0.76)0.67(0.52–0.78)
AM-PAC Personal Care0.68(0.54–0.79)0.63(0.47–0.75)0.55(0.37–0.69)0.49(0.30–0.65)0.61(0.44–0.73)
SF-36 PF0.69(0.55–0.79)0.67(0.52–0.77)0.68(0.53–0.78)0.73(0.61–0.82)
PFP-100.73(0.60–0.82)0.80(0.70–0.87)0.85(0.77–0.90)
SPPB total0.84(0.76–0.89)0.75(0.64–0.84)
Gait speed value0.82(0.73–0.88)
6MWT

Abbreviation: coeff, coefficient.

Known-Groups Analysis 

All of the measures were significantly different between patients using and not using an assisted device at week 12 (table 3). Patients using an assisted device at week 12 had lower scores for all of the measures than patients not using an assistive device.

Table 3. Comparison of Performance-Based and Self-Report Measures of Function According to Assisted Device Use at Week 12
AssessmentsYes (n=63)No (n=19)P
Mean ± SDMedianMean ± SDMedian
AM-PAC PM57.4±7.459.567.5±6.165.7<.001
AM-PAC PC55.3±8.854.762.7±5.965.5.001
SF-36 PF50.0±23.355.082.1±15.390.0<.001
PFP-1018.9±14.715.042.5±17.845.3<.001
SPPB total7.2±2.57.010.2±1.210.0<.001
Gait speed (m/s)0.7±0.30.61.1±0.31.0<.001
6MWT (m)206.8±133.2199.7412.2±122.9391.0<.001

Abbreviations: PC, Personal Care domain; PM, Physical Mobility domain.

Table 4 describes the performance and self-report measures according to sex-specific median isometric knee extension strength (lower-extremity muscle strength) on the affected side. In women, scores for all measures were significantly lower for patients who were below the median strength value of 13.5kg compared to patients who were above the median strength. In men, 6MWT and PFP-10 total scores were significantly lower for patients who were below the median strength value of 20kg compared with patients who were above the median strength. For all other measures, there was no significant difference between groups defined by median strength.

Table 4. Comparison of Performance-Based and Self-Report Measures of Function According to Median Knee Extension Strength (kg) at Week 12
AssessmentWomen
LE Strength <13.5 (n=26)LE Strength ≥13.5 (n=26)P
Mean ± SDMedianMean ± SDMedian
AM-PAC PM57.4±7.459.567.5±6.165.7<.001
AM-PAC PC55.3±8.854.762.7±5.965.5.001
SF-36 PF50.0±23.355.082.1±15.390.0<.001
PFP-1018.9±14.715.042.5±17.845.3<.001
SPPB total7.2±2.57.010.2±1.210.0<.001
Gait speed (m/s)0.7±0.30.61.1±0.31.0<.001
6MWT (m)206.8±133.2199.7412.2±122.9391.0<.001
Men
LE Strength <20 (n=9)LE Strength ≥20 (n=14)P
Mean ± SDMedianMean ± SDMedian
AM-PAC PM61.1±5.663.660.9±4.361.0.78
AM-PAC PC56.8±8.657.062.1±6.464.6.24
SF-36 PF61.7±19.565.067.9±17.065.0.43
PFP-1020.6±18.313.836.0±19.632.6.05
SPPB total7.3±2.98.08.9±2.29.0.17
Gait speed (m/s)0.7±0.40.60.9±0.40.9.26
6MWT (m)203.7±124.6166.0380.8±113.0348.0.01

Abbreviations: LE, lower extremity; PC, Personal Care domain; PM, Physical Mobility domain.

The same pattern was found in men and women when comparing the physical performance measures between groups defined by median (128.3W for men and 72.6W for women) power (table 5).

Table 5. Comparison of Performance-Based and Self-Report Measures of Function According to Median Power (W) at Week 12
AssessmentWomen
LE Power <72.6 (n=25)LE Power ≥72.6 (n=25)P
Mean ± SDMedianMean ± SDMedian
AM-PAC Physical Mobility56.7±7.057.265.0±7.362.6.002
AM-PAC Personal Care52.3±7.454.162.2±6.764.0.001
SF-36 PF45.5±21.440.072.0±23.565.0.001
PFP-1013.2±8.213.036.8±15.132.6<.001
SPPB total6.6±2.17.09.6±1.69.0<.001
Gait speed (m/s)0.6±0.20.60.9±0.30.8<.001
6MWT (m)184.7±102.9188.5339.0±134.1316.2.001
Men
LE Power <128.3 (n=11)LE Power ≥128.3 (n=11)P
Mean ± SDMedianMean ± SDMedian
AM-PAC Physical Mobility61.4±2.860.061.8±4.864.2.37
AM-PAC Personal Care59.2±7.360.262.4±6.464.3.35
SF-36 PF60.9±15.560.071.8±18.770.0.15
PFP-1020.7±9.822.741.9±21.744.2.03
SPPB total8.1±1.48.09.1±2.510.0.27
Gait speed (m/s)0.8±0.30.81.0±0.51.0.33
6MWT (m)241.8±97.8211.5406.6±116.4384.3.01

Abbreviation: LE, lower extremity.

Construct Validity 

Table 6 shows Spearman correlations between the performance-based and self-report physical function measures and lower-extremity strength, lower-extremity power, hip pain, and different domains of the SF-36 (bodily pain, vitality, role physical, role social) at week 12. Strength and power were significantly positively correlated with each of the measures. Hip pain was significantly negatively correlated with all measures except the 6MWT.

Table 6. Spearman Correlations Between Performance-Based and Self-Report Measures of Function and Strength, Power, and SF-36 Domains
AssessmentCorrelation Coefficient (95% CI)
LE Strength (affected leg)LE PowerHip PainBodily PainVitalityRole-PhysicalRole-Social
AM-PAC Physical Mobility0.46(0.26–0.62)0.44(0.23–0.62)−0.43(−0.60to−0.23)0.51(0.32–0.65)0.49(0.30–0.64)0.71(0.58–0.80)0.67(0.52–0.77)
AM-PAC Personal Care0.54(0.35–0.68)0.57(0.39–0.71)−0.34(−0.52to−0.13)0.44(0.24–0.61)0.49(0.31–0.65)0.65(0.49–0.76)0.67(0.53–0.78)
SF-36 PF0.49(0.29–0.64)0.49(0.29–0.65)−0.33(−0.51to−0.13)0.43(0.23–0.59)0.42(0.23–0.59)0.75(0.64–0.83)0.68(0.54–0.78)
PFP-100.59(0.42–0.72)0.72(0.58–0.81)−0.25(−0.44to–0.04)0.33(0.12–0.51)0.35(0.14–0.53)0.54(0.36–0.68)0.48(0.29–0.63)
SPPB0.44(0.24–0.61)0.55(0.36–0.69)−0.27(−0.46to−0.05)0.40(0.19–0.57)0.40(0.20–0.57)0.60(0.44–0.73)0.50(0.30–0.65)
Gait speed (m/s)0.51(0.32–0.66)0.58(0.41–0.72)−0.23(−0.43to−0.01)0.30(0.09–0.49)0.26(0.04–0.46)0.54(0.36–0.68)0.42(0.22–0.59)
6MWT0.62(0.46–0.75)0.72(0.59–0.82)−0.18(−0.39to0.04)0.35(0.14–0.53)0.24(0.02–0.43)0.60(0.43–0.72)0.50(0.31–0.65)

Abbreviation: LE, lower extremity.

The different SF-36 domains were all significantly positively correlated with all of the physical function measures. Role physical and social were more strongly correlated with the measures than vitality and bodily pain.

Predictive Validity 

Table 7 presents logistic regression models that show the increased odds of having a patient and/or investigator Global Assessment of Improvement score greater than or equal to 8 for every 1 unit SD increase in performance and self-report score at week 12. All of the measures were significant predictors of patient and investigator Global Assessment of Improvement. The self-report measures (AM-PAC Physical Mobility scale, AM-PAC Personal Care scale, and SF-36 PF) were more predictive of patient Global Assessment of Improvement than investigator Global Assessment of Improvement. The performance-based measures were slightly more predictive of investigator Global Assessment of Improvement than patient Global Assessment of Improvement.

Table 7. Logistic Regression Models Describing the Relationship Between Performance-Based and Self-Report Measures of Function with Global Assessment of Improvement at Week 12
AssessmentGlobal Assessment of Improvement (patient)Global Assessment of Improvement (investigator)
OR95% CIOR95% CI
AM-PAC Physical Mobility5.34(2.03–14.08)3.25(1.53–6.90)
AM-PAC Personal Care3.64(1.93–6.87)1.83(1.12–2.99)
SF-36 PF4.33(2.18–8.63)2.71(1.56–4.70)
PFP-102.58(1.48–4.50)2.82(1.54–5.19)
SPPB2.03(1.20–3.43)2.85(1.58–5.12)
Gait speed (m/s)1.93(1.15–3.25)3.20(1.66–6.19)
6MWT2.46(1.41–4.30)2.45(1.41–4.26)
Power2.26(1.25–4.09)1.15(0.70–1.87)

Models show the increased odds of having a week 12 Global Assessment of Improvement score ≥8 for a 1 SD increase in performance/self-report score.

Sensitivity to Change 

The distribution-based statistics of sensitivity for each measure are shown in table 8. Within this sample, the ES ranged from 0.85 to 1.45, and the SRM ranged from 1.04 to 1.48. Both of these statistics are standardized and can be compared across measures. The SE of measure and MDC90 are both statistics that are specific to the metric of the measure itself and thus cannot be compared directly across measures. However, the percentage of the sample that exceeded the MDC90 from baseline to week 12 can be compared across measures (fig 1). Within this sample, the percent that exceeded the MDC90 from baseline to week 12 was greater than 65% for all of the measures except SPPB total. For SPPB total, the percentage who exceeded the MDC90 was 37%.

Table 8. Distribution-Based Measures of Responsiveness (Baseline to Week 12)
AssessmentES (Cohen's d)SRMSE of MeanMDC90Percentage Who Exceed MDC90 (Baseline to Week 12) (%)
AM-PAC Physical Mobility1.281.431.734.0290.9
AM-PAC Personal Care0.931.221.603.7274.0
SF-36 PF1.451.489.8122.8266.7
PFP-100.951.132.215.1475.0
SPPB total1.181.281.473.4236.5
Gait speed (m/s)0.851.04.0750.1769.1
6MWT0.991.1123.053.5175.7
Power0.900.993.317.7088.7

When the percentage of subjects who were at the floor (ie, had the lowest possible score for a measure) or the ceiling (ie, had the highest possible score for a measure) were compared for each measure, considerable variability was found. Floor effects were more commonly observed, with these effects seen in the SF-36 (10.7% at baseline, 3.9% at 12 weeks), PFP-10 (2.6% at baseline, 2.1% at 12 weeks) and SPPB (3.9% at baseline, 0% at 12 weeks). Ceiling effects were seen in the SF-36 only at 12 weeks (2.9%), but in the SPPB at both baseline (2.9%) and 12 weeks (9.8%). No subjects reached the floor or the ceiling at either time point for the AM-PAC Personal Care scale or AM-PAC Physical Mobility scale.

Responsiveness 

ROC curves were created to illustrate the ability of each measure to capture change at week 12, based on patient and investigator Global Assessment of Improvement score of 8 or higher. Plots were created for each measure, based on each anchor (patient Global Assessment of Improvement, physician Global Assessment of Improvement), describing the change in performance and self-report measures from baseline to week 12. AUC was calculated for each plot, which ranged from .556 to .725 for predicting investigator Global Assessment of Improvement of 8 or higher and .535 to .771 for predicting patient Global Assessment of Improvement of 8 or higher (fig 2). AUC greater than .50 indicates that the instrument is able to discriminate between those who have truly changed and those who have not, based on the anchor (external criterion measure) that indicates change has occurred. Values for AUC greater than or equal to .70 are desired.

  • View full-size image.
  • Fig 2. 

    Comparison of AUC for each measure based on each anchor. Abbreviations: GAI-i, physician (investigator) Global Assessment of Improvement; GAI-p, patient Global Assessment of Improvement; PC, Personal Care domain; PM, Physical Mobility domain.

Back to Article Outline

Discussion 

The main finding of this study is that the validity, sensitivity to change, and responsiveness of self-report and performance-based measures of physical function are comparable in a sample of patients with recent hip fracture.

Consistent with previous studies, all measures included in this study of patients recovering from hip fracture had a high degree of concurrent, known-groups, construct, and predictive validity as measures of physical function. When concurrent validity was assessed, all measures had significant correlations with one another. The finding that measures were more strongly correlated with other measures that used the same mode of assessment (ie, self-report with self-report) was consistent with previous findings.9

Differences in all measures were found between groups known to differ. All measures demonstrated adequate known-groups validity when assistive device use was the criterion to create the groups. However, the performance of the measures was different for men and women when known-groups validity was explored based on a sex-specific median cutpoint of strength and power. All measures had good known-groups validity for women, but only the PFP-10 and the 6MWT were significantly different in men. It has been shown that the relationship between strength and physical performance in elderly subjects is nonlinear.43 It is possible that the median strength and power value for the men in this study was above a threshold value that might have better discriminated performance on simple performance tasks or self-reported physical performance. There was a trend toward a difference in strength and power for all of the groups except the AM-PAC Physical Mobility scale. It is not clear why the same trend does not exist for AM-PAC Physical Mobility scale as was seen for the AM-PAC Personal Care scale. Given the small sample size for men, it is possible that these findings were influenced by the unusual performance of a small number of men.

When construct validity was explored, the 3 self-report measures tended to have higher associations with self-report measures of pain and vitality than the performance-based measures. This is not surprising because self-report instruments capture the activities that people actually do in their day-to-day life (what people do do), while performance-based instruments capture how well people can perform a task during a single assessment (what people can do). It might be possible for people to ignore their pain or fatigue during a performance-based assessment, but the pain or fatigue reduces what they can accomplish in routine activities that are captured with a self-report assessment.

To explore the predictive validity of the measures, regression models were created to determine the odds of having a patient/investigator Global Assessment of Improvement score of 8 or higher for every 1 unit SD increase in performance and self-report score. All measures significantly predicted both patient and investigator ratings of global improvement at 12 weeks. The standardized ORs were higher for the self-report measures than for the performance-based measures for the patient global assessments. While a self-report instrument (ie, AM-PAC Physical Mobility scale) had the highest odds for predicting the investigator global rating of improvement, the other 2 self-report measures did not perform as well as most of the performance measures. This could suggest that in general, the investigators were more influenced in their ratings by observing how the patients moved, rather than by the symptoms or other information the subjects reported.

In contrast with some earlier studies that suggested superior sensitivity to change of performance-based measures,11, 12 in this study, self-report measures of physical function were found to be as sensitive or even slightly more sensitive to change than performance-based measures. Although all measures demonstrated large ES and SRM, some of the self-report measures tended to be slightly higher. When comparing the percentages exceeding the MDC90, the AM-PAC Physical Mobility scale showed the highest percentage of subjects exceeding the threshold. For example, 90% of the patients achieved the MDC90 on the AM-PAC Physical Mobility scale (a self-report measure), whereas only 36% of patients achieved the MDC90 on the SPPB (a widely used physical performance measure). This means that over 90% of the subjects in this study had a change in their AM-PAC Physical Mobility scale score that is likely to be a true change, and not just a result of chance. However, just slightly more than one third of people in this study had a change in their SPPB score that was likely to be a real change.

In the assessment of responsiveness, the AUC calculations suggest that each measure demonstrated acceptable discriminative properties to detect functional change. Although some of the measures showed slightly higher AUCs and there was variation depending on the anchor used, there is no real pattern to suggest that one type of measure is superior to the other.

A major strength of this study is the inclusion of a performance-based measure that assessed a similar physical functional construct to the self-report measures. The inclusion of the PFP-10 physical performance measure, which measures performance of everyday activities such as walking, stairs, and carrying groceries, was compared with the self-reported AM-PAC Physical Mobility scale, which assesses physical difficulty with similar tasks. Thus, comparison of the 2 modes of assessment was not confounded by differences in the difficulty or complexity of tasks being assessed. This analysis provides a more valid head-to-head comparison of measures that capture advanced functional mobility limitations for the first time.

Unlike previous studies comparing self-report and performance-based measures, this study went beyond simply assessing sensitivity to change and evaluated responsiveness. The inclusion of 2 methods of assessing responsiveness, based on both the patient's and investigator's reports, allowed a full exploration of this property. Knowing responsiveness of an instrument can further inform decisions about which tools are most likely to identify a treatment effect in a clinical trial.

Study Limitations 

Limitations of this study include the relatively small sample size. Although the sample size was adequate for most analyses, when the analyses were broken down into subgroups (eg, men and women), the sample size became quite small. In addition, only 1 performance-based measure was a similar construct to the self-report measures. The other physical performance measures assessed more basic mobility and balance activities than the self-report measures. Another limitation of this study was that all the participants were recruited to be part of a clinical trial. People enrolled in controlled trials are probably more homogeneous than the general population. This might result in a relatively narrow variability in performance observed in these subjects compared with the full spectrum of illness in real life. Therefore, these results might not be representative of what would be observed in a nonresearch setting. Finally, it might not be appropriate to generalize these findings to other patient populations or to other self-report or performance measures beyond those examined in this study. There is evidence that some measures perform differently in people with chronic rather than acute onset of disability, such as hip fracture.16, 26 The specific content of self-report and performance instruments can vary considerably, with more recently developed measures providing a more comprehensive assessment of higher-functioning patients. The wording of self-report instruments is also important, because there is evidence that measures asking about dependency might be less responsive than measures asking about difficulty.11 Therefore, the specific content of measures must be considered when comparisons are made.

Back to Article Outline

Conclusions 

This study did not support prior findings that performance-based measures of physical function have superior psychometric properties when compared with self-report measures of physical function, and, in particular, did not find evidence that performance measures have superior sensitivity to change. Both measurement approaches (self-report and performance-based) produced similar and acceptable results in terms of validity, sensitivity to change, and responsiveness.

Because the psychometric properties were comparable, the selection of the most appropriate type of measure as the primary endpoint for a clinical trial should focus on other selection criteria. In particular, feasibility of implementing a measurement protocol is a key criterion and includes considerations such as the cost of administration in terms of equipment and time, the need for the assessment to be conducted in person, and the burden on the patient. A second key criterion is the evidence supporting the strength of the association between the hypothesized mechanism of action of the study intervention and a given measure. For example, a clinical trial that focused on improving lower-extremity muscle strength might choose a functional outcome measure that focused more on lower-extremity function such as the SPPB or gait speed versus a more generalized functional measure such as the SF-36. Finally, determining where to measure on the disablement spectrum is important. Regardless of the mode of administration of the measure used, focus on the complexity of the functional tasks (basic vs advanced mobility) should be considered. For any given study, self-reported or performance-based measures of physical function may be most appropriate. Numerous studies have concluded that self-report and performance measures offer distinct but complementary information about function, suggesting that in some trials, the inclusion of both approaches to functional assessment might provide the most comprehensive assessment of function in older adults.

Back to Article Outline

References 

  1. Guralnik JM, Ferrucci L, Pieper CF, et al. Lower extremity function and subsequent disability: consistency across studies, predictive models, and value of gait speed alone compared with the short physical performance battery. J Gerontol A Biol Sci Med Sci. 2000;55:M221–M231
  2. Guralnik JM, Ferrucci L, Simonsick EM, Salive ME, Wallace RB. Lower-extremity function in persons over the age of 70 years as a predictor of subsequent disability. N Engl J Med. 1995;332:556–561
  3. Haywood KL, Garratt AM, Fitzpatrick R. Quality of life in older people: a structured review of generic self-assessed health instruments. Qual Life Res. 2005;14:1651–1668
  4. Hoeymans N, Feskens EJ, van den Bos GA, Kromhout D. Measuring functional status: cross-sectional and longitudinal associations between performance and self-report (Zutphen Elderly Study 1990-1993). J Clin Epidemiol. 1996;49:1103–1110
  5. Reuben DB, Seeman TE, Keeler E, et al. Refining the categorization of physical functional status: the added value of combining self-reported and performance-based measures. J Gerontol A Biol Sci Med Sci. 2004;59:1056–1061
  6. Hoenig H, Ganesh SP, Taylor DH, Pieper C, Guralnik JM, Fried LP. Lower extremity physical performance and use of compensatory strategies for mobility. J Am Geriatr Soc. 2006;54:262–269
  7. Hoeymans N, Wouters ER, Feskens EJ, van den Bos GA, Kromhout D. Reproducibility of performance-based and self-reported measures of functional status. J Gerontol A Biol Sci Med Sci. 1997;52:M363–M368
  8. Guralnik J, Branch L, Cummings S, Curb J. Physical performance measures in aging research. J Gerontol. 1989;44:M141–M146
  9. Stretton CM, Latham NK, Carter KN, Lee AC, Anderson CS. Determinants of physical health in frail older people: the importance of self-efficacy. Clin Rehabil. 2006;20:357–366
  10. Guralnik JM, Simonsick EM, Ferrucci L, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994;49:M85–M94
  11. Kivinen P, Sulkava R, Halonen P, Nissinen A. Self-reported and performance-based functional status and associated factors among elderly men: the Finnish cohorts of the Seven Countries Study. J Clin Epidemiol. 1998;51:1243–1252
  12. Seeman TE, Charpentier PA, Berkman LF, et al. Predicting changes in physical performance in a high-functioning elderly cohort: MacArthur studies of successful aging. J Gerontol. 1994;49:M97–M108
  13. Simonsick EM, Kasper JD, Guralnik JM, et al. Severity of upper and lower extremity functional limitation: scale development and validation with self-report and performance-based measures of physical function (WHAS Research Group. Women's Health and Aging Study). J Gerontol B Psychol Sci Soc Sci. 2001;56:S10–S19
  14. Jette A, Assmann S, Rooks D, Harris B, Crawford S. Interrelationships among disablement concepts. J Gerontol A Biol Sci Med Sci. 1998;53:M395–M404
  15. Cress M, Schenchtman K, Mulrow C, Fiatorone M, Gerety M, Buchner D. Relationship between physical performance and self-perceived physical function. J Am Geriatr Soc. 1995;43:93–101
  16. Fried LP, Bandeen-Roche K, Chaves PH, Johnson BA. Preclinical mobility disability predicts incident mobility disability in older women. J Gerontol A Biol Sci Med Sci. 2000;55:M43–M52
  17. Fried LP, Tangen CM, Walston J, et al. Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci. 2001;56:M146–M156
  18. Liang MH. Evaluating measurement responsiveness. J Rheumatol. 1995;22:1191–1192
  19. Liang MH. Longitudinal construct validity: establishment of clinical meaning in patient evaluative instruments. Med Care. 2000;38(9):II84–II90
  20. Liang MH, Lew RA, Stucki G, Fortin PR, Daltroy L. Measuring clinically important changes with patient-oriented questionnaires. Med Care. 2002;40(4):II45–II51
  21. Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res. 2003;12:349–362
  22. Fritz JM, Piva SR. Physical impairment index: reliability, validity, and responsiveness in patients with acute low back pain. Spine. 2003;28:1189–1194
  23. Onder G, Penninx BW, Lapuerta P, et al. Change in physical performance over time in older women: the Women's Health and Aging Study. J Gerontol A Biol Sci Med Sci. 2002;57:M289–M293
  24. Ostir GV, Volpato S, Fried LP, Chaves P, Guralnik JM. Reliability and sensitivity to change assessed for a summary measure of lower body function: results from the Women's Health and Aging Study. J Clin Epidemiol. 2002;55:916–921
  25. Simonsick EM, Newman AB, Nevitt MC, et al. Measuring higher level physical function in well-functioning older adults: expanding familiar approaches in the Health ABC Study. J Gerontol A Biol Sci Med Sci. 2001;56:M644–M649
  26. Fried LP, Young Y, Rubin G, Bandeen-Roche K. Self-reported preclinical disability identifies older women with early declines in performance and early disease. J Clin Epidemiol. 2001;54:889–901
  27. Studenski S, Perera S, Wallace D, et al. Physical performance measures in a clinical setting. J Am Geriatr Soc. 2003;51:314–322
  28. Verbrugge LM, Jette AM. The disablement process. Soc Sci Med. 1994;38:1–14
  29. Cress ME, Buchner DM, Questad KA, Esselman PC, deLateur BJ, Schwartz RS. Continuous-scale physical functional performance in a broad range of older adults: a validation study. Arch Phys Med Rehabil. 1996;77:1243–1250
  30. Haley SM, Andres PL, Coster WJ, Kosinski M, Ni PS, Jette A. Short-form activity measure for post-acute care. Arch Phys Med Rehabil. 2004;85:649–660
  31. Ware JE, Sherbourne CD. The MOS 36-item Short Form Health Survey (SF-36), I: conceptual framework and item selection. Med Care. 1992;30:473–483
  32. Sherrington C, Lord SR. Reliability of simple portable tests of physical performance in older people after hip fracture. Clin Rehabil. 2005;19:496–504
  33. Bean JF, Kiely DK, LaRose S, Alian J, Frontera WR. Is stair climb power a clinically relevant measure of leg power impairments in at-risk older adults?. Arch Phys Med Rehabil. 2007;88:604–609
  34. Beaton DE. Understanding the relevance of measured change through studies of responsiveness. Spine. 2000;25:3192–3199
  35. Beaton DE, Bombardier C, Katz JN, Wright JG. A taxonomy for responsiveness. J Clin Epidemiol. 2001;54:1204–1217
  36. Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Phys Ther. 2006;86:735–743
  37. Kennedy DM, Stratford PW, Wessel J, Gollish JD, Penney D. Assessing stability and change of four performance measures: a longitudinal study evaluating outcome following total hip and knee arthroplasty. BMC Musculoskelet Disord. 2005;6:3
  38. Altman DG. Practical statistics for medical research. 1st ed.. London: Chapman and Hall; 1991;
  39. Perera S, Mody SH, Woodman RC, Studenski SA. Meaningful change and responsiveness in common physical performance measures in older adults. J Am Geriatr Soc. 2006;54:743–749
  40. Cress ME, Petrella JK, Moore TL, Schenkman ML. Continuous-scale physical functional performance test: validity, reliability and sensitivity of data for the short version. Phys Ther. 2005;85:323–335
  41. McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health surveys accurate?. Qual Life Res. 1995;4:293–307
  42. Ward MM, Marx AS, Barry NN. Identification of clinically important changes in health status using Receiver Operating Characteristic Curves. J Clin Epidemiol. 2000;53:279–284
  43. Buchner DM, Larson EB, Wagner EH, Koepsell TD, De Lateur BJ. Evidence for a non-linear relationship between leg strength and gait speed. Age Ageing. 1996;25:386–391

 Supported by Merck and Co.

 A commercial party having a direct financial interest in the results of the research supporting this article has conferred or will confer a financial benefit on the author or one or more of the authors. Jette has stock interests in CRE Care, LLC, which distributes the activity measure for postacute care products.

PII: S0003-9993(08)00550-9

doi:10.1016/j.apmr.2008.04.016

Archives of Physical Medicine and Rehabilitation
Volume 89, Issue 11 , Pages 2146-2155, November 2008