If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Crossley KM, Bennell KL, Cowan SM, Green S. Analysis of outcome measures for persons with patellofemoral pain: which are reliable and valid? Arch Phys Med Rehabil 2004;85:815-22.
To examine the test-retest reliability, validity, and responsiveness of several outcome measures in the treatment of patellofemoral pain.
Evaluation of the clinimetric properties of individual outcome measures for patellofemoral pain treatment, using data collected from a previously published randomized controlled trial (RCT).
General community and private practice.
The data from 71 persons enrolled in an RCT of a conservative intervention for patellofemoral pain were used to evaluate the measures’ validity and responsiveness. A subset of this cohort (n=20) was used to assess reliability.
Main outcome measures
Three 10-cm visual analog scales (VASs) for usual pain (VAS-U), worst pain (VAS-W), and pain on 6 aggravating activities (walking, running, squatting, sitting, ascending and descending stairs) (VAS-activity); the Functional Index Questionnaire (FIQ); the Anterior Knee Pain Scale (AKPS); and the global rating of change.
The test-retest reliability ranged from poor (intraclass correlation coefficient [ICC]=.49) to good (ICC=.83), and the measures correlated moderately with each other (r range, .56–.72). Median change scores differed significantly between improved and unimproved persons for all measures. The effect sizes for VAS-U (.79), VAS-W (.88), and the AKPS (.98) were large, indicating greater responsiveness than the FIQ (.37) and VAS-activity (.66). Similarly, the AKPS and VAS-W were the most efficient measures for detecting a treatment effect when compared with a reference measure (VAS-U, which was assigned a value of 1). The minimal difference that patients or clinicians consider clinically important for the AKPS is 10 (out of 100) points and for the VAS it is 2cm (out of 10cm).
The AKPS and VAS for usual or worst pain are reliable, valid, and responsive and are therefore recommended for future clinical trials or clinical practice in assessing treatment outcome in persons with patellofemoral pain.
PATELLOFEMORAL PAIN IS A COMMON musculoskeletal condition that is characterized by anterior- or retropatellar pain associated with activities that load the patellofemoral joint, such as stair climbing, squatting, running, and kneeling. Thus, this common condition has an impact on many aspects of daily life, including the ability to perform exercise or work-related activities without pain. Disability, pain, and function are most often assessed in trials of interventions for patellofemoral pain. Appropriate measurement tools are essential for the clinician to effectively monitor treatment and patient response and for the researcher to make informed decisions about treatment effects in clinical trials. There is a paucity of studies that have evaluated the usefulness of outcome measures in the assessment of persons treated for patellofemoral pain.
An outcome measure must be able to evaluate change over time in patient or participant status of either improvement or worsening.
Therefore, in addition to its being reliable and valid, an outcome measure must have properties that permit detection of clinically relevant change. Indices or measures that are sensitive to change (responsive) and thus can demonstrate a greater change in a treatment group than in a placebo or control group—given that such a difference exists—are essential in the evaluation of interventions. There is considerable debate in the literature as to the optimal method for determining responsiveness of outcome measures and for comparing different measures; therefore, several methods are used. One method, called the relative efficacy of a measure, assesses how well an outcome measure can detect changes that result from interventions that have different efficacies. The ratio of the effect size (magnitude of difference between 2 treatments) of an instrument is compared with the effect size of a reference measure. However, in patellofemoral pain (and many other musculoskeletal disorders for which less tangible constructs such as pain and quality of life are being measured), there is no criterion standard measure that can be used as a reference measure with which to establish responsiveness to treatment. In this analysis of the relative efficiency of outcome measures in the treatment of patellofemoral pain, we chose to use as a reference measure one that was used as a primary outcome measure in our previously published randomized controlled trial
When assessing the responsiveness of a measure, its minimal clinically important difference (either improvement or worsening) can be calculated. This is important because it provides the clinician with meaningful information about the significance of differences obtained with the measure. Translating statistically significant changes in outcome measure scores into clinically relevant terms is essential in the interpretation of study results. The minimal clinically important difference quantifies the minimal difference that patients or practitioners consider clinically important and thus will indicate a relevant change in the patient’s symptoms.
Smallest detectable and minimal clinically important differences of rehabilitation interventions with their implications for acquired sample sizes using WOMAC and SF-36 quality of life measurement instruments in patients with osteoarthritis of the lower extremities.
and they are mostly self-administered questionnaires that address pain and disability. Despite the small number of clinical trials that have evaluated treatment effectiveness in patellofemoral pain, many outcome measures have been used to assess the effects of treatment. This probably reflects the lack of a criterion standard assessment tool for patellofemoral pain. Pain is the dominant feature of this condition, and thus the majority of trials used a pain scale to assess outcome, mostly on a 10-cm visual analog scale (VAS). Some of the disability scales
The majority of clinical trials included impairment measures of muscle function, but less frequently assessed were the ability of participants to perform functional activities, global ratings of improvement, participant satisfaction, physical activity level, and clinical evaluation.
Of the available measurement tools, several self-administered scales were selected as outcome measures in our RCT
on the basis of the reliability and validity of the tool, its use in previous trials, and its relevance to patellofemoral pain. These measures included VASs for worst (VAS-W) and usual pain (VAS-U), a series of VASs measuring pain on aggravating activities (VAS-activity),
(FIQ). However, measurement issues for these outcome measures has not been clarified. In particular, there was debate about the reliability of some of the tests and no information available regarding the responsiveness of the outcome measures.
In this study, we investigated the reliability, validity, and responsiveness of several outcome measures used to assess results of treatment for patellofemoral pain. The responsiveness was assessed on the data obtained from our RCT in which a differential response was expected in the active treatment (physiotherapy) group versus the placebo treatment group. This study provides recommendations for the use of a core set of outcome measures and determines the minimal clinically relevant difference required to detect a treatment effect for future trials and for use in clinical settings.
The specific aims of this study were to confirm the test-retest reliability of a set of outcome measures for patellofemoral pain, to evaluate the concurrent validity of the outcome measures with a participant-perceived global rating of change, to examine the responsiveness of the outcome measures, and to identify the minimal clinically important difference required to detect a treatment effect for each measure.
The analyses for this study were performed using the results of a randomized, double-blind, placebo-controlled trial that evaluated the efficacy of conservative treatment for patellofemoral pain.
In that trial, 71 participants (age range, 12–40y) were recruited from among health professionals, by advertisements, and through the media in Melbourne, Australia. For all participants, outcome assessment was performed by 1 of the 2 investigators (KMC, SMC), who were blinded to group allocation. For each questionnaire, standardized instructions were given to all participants about its purpose, and the requirements for its completion were explained. Investigators were trained to provide standard responses to questions about each measure and to check the completed questionnaires for missing data points and for appropriateness of the responses. These measures were completed at baseline and at the conclusion of the 6-week treatment program. The participants were randomized into 1 of 2 treatment groups: the conservative treatment was a physiotherapy intervention (quadriceps muscle retraining, patellar tape, stretching, education) and the placebo treatment consisted of a sham ultrasound.
To evaluate test-retest reliability, a subset of 20 consecutive participants was selected from the larger cohort. In addition to the baseline assessment performed for all trial participants, this subset completed a second set of questionnaires within 7 days of the original assessment but before their first appointment for a treatment session. Participants also recorded in a yes-no format whether their pain had increased or decreased after they had completed the first set of questionnaires. The questionnaires were sealed in envelopes and returned to the investigator. The outcome measures are summarized as follows.
Overall assessment of pain
Participants’ overall assessment of pain was measured with a 10-cm VAS for their worst (VAS-W) and usual (VAS-U) pain in the past week.
is a 13-item questionnaire with discrete categories related to various levels of current knee function. Categories within each item are weighted, and responses are summed to provide an overall index in which 100 represents no disability.
Global ratings of change by the participant
A 5-point scale was used to determine the global rating of change compared with baseline (1, marked worsening; 2, moderate worsening; 3, same; 4, moderate improvement; 5, marked improvement).
To calculate the degree of error in absolute terms and to express it in the scale of each outcome measure, the standard error (SE) of measurement was calculated.
For each outcome measure, the change from baseline to final assessment was calculated. Concurrent validity of the outcome measures was established by correlating the mean change scores with the participant-perceived global rating of change, using the Spearman correlation coefficient.
Baseline comparability of groups
Comparison of characteristics of the treatment groups at baseline was performed with chi-square analyses.
Responsiveness (sensitivity to change)
For this study, participant-perceived rating of change was used as the global status reference measure for correlation with outcome measures.
The VAS-U outcome measure was selected as the reference measure for determining the relative efficiency of each outcome measure because it was the primary outcome measure in our clinical trial, it is a traditional end point in other patellofemoral pain trials, and it is known to be sensitive to change.
The 3 methods of determining responsiveness were as follows.
Comparison of median scores
The results of the participant-perceived global rating of change were dichotomized into 1 (significantly worse, moderately worse, same) or 2 (moderately improved, significantly improved). Mann-Whitney U tests were then used to determine if median changes in the outcome measures discriminated between the participants who had responded that they were worse or the same from those who were improved.
The comparative analyses for effect size for each outcome measure, where effect size was calculated by 2 methods (relative treatment effect, standardized effect size [SES]).
The relative treatment effect is the ratio of the observed treatment effect (mean change score of active treatment [da] minus mean change score of placebo treatment [dp]) to the mean change from baseline in the placebo group.
The SES is the ratio of the treatment effect (mean change score of active treatment minus mean change score of placebo treatment) to the pooled standard deviation (SD) of the active and placebo mean scores. The pooled SD was a pooling of the SD of the mean changes in scores from baseline to final assessment in the active and placebo groups (ie,
where n is the number of participants in each group).
The relative efficiency represents the ability of an outcome measure to detect a change with interventions that have different efficacies. The ratio of the effect size of each tool is compared with the effect size of a reference measure. The reference measure chosen was the VAS-U. The relative efficiency to detect a difference among treatment groups was calculated by taking the square of the ratio of the SES of each instrument to the SES of VAS-U (ie,
A relative efficiency of greater than 1 implies that the instrument is better than usual pain (10-cm VAS) in detecting a treatment effect. The relative efficiency can be directly compared between instruments.
Ascertaining the minimal clinically important difference
The evaluation of responsiveness permits a determination of the minimal clinically relevant change required to detect a treatment effect. Similar to the calculation of responsiveness, there is no accepted method of determining this score. The minimal clinically important difference for each outcome measure was calculated using 2 methods: median change score and receiver operating characteristic (ROC).
The median change scores (and 95% confidence intervals [CIs] of these scores) were calculated for each of the participant-perceived global rating of change categories (significantly worse, moderately worse, same, moderately improved, significantly improved).
The sensitivity and specificity of change scores for each outcome measure to detect improvement were calculated (fig 1). The sensitivity (true positive) was plotted against the false-positive rate (1 − sensitivity). The change score that is closest to the upper left-hand corner of the graph indicates the score change that represents the best cutoff for making the distinction between improvement and no improvement.
Of the 71 participants who enrolled in the trial, 67 (33 physiotherapy, 34 placebo) completed the final assessment and were included in the analyses of the validity and responsiveness of outcome measures. Subject characteristics are described in table 1. There were 23 women and 13 men in the physiotherapy group and 23 women and 12 men in the placebo group. There were no differences in the frequency of leg dominance or side of worst pain between the 2 groups. Visual analysis of baseline data revealed no difference between the 67 participants who did and the 4 who did not complete the trial.
Of the 20 participants who completed the reliability study, 3 reported that their symptoms had changed between the first and second assessments. The results of the paired t test, ICC3,1, and SE of measurement for the 17 who were unchanged are presented in table 2. There was no significant difference in mean scores between test 1 and test 2, indicating negligible systematic error between the 2 administrations. The ICC3,1 values ranged from .49 for the FIQ to .83 for the VAS-activity scale, indicating poor to excellent agreement between scores on test 1 and test 2.
Based on the size of the SE of measurement, a change of 1.0cm (12%) in the total possible score for the VAS-U, VAS-W, and VAS-activity; 7 points (6%) in the AKPS score; and 6 points (16%) in the FIQ are required to represent a change greater than the error associated with the questionnaire 95% of the time in this patellofemoral pain cohort.
Table 2Summary of Test-Retest Reliability of the Outcome Measures for Patellofemoral Pain
The mean changes (final minus baseline assessment scores) for the outcomes correlated moderately with the participant-perceived global rating of change (table 3). In addition, mean changes for the VAS-U and AKPS correlated with mean changes in the other outcome measures. For the outcome measures, VAS-U and AKPS demonstrated slightly higher correlation coefficients than the global rating of change, except for the VAS-U with the FIQ. All correlations were significant and moderate.
Table 3Spearman ρ Correlation Coefficient Demonstrating Relationship Between Change Scores of Outcome Measures and Global Rating of Change (n=67)
Responsiveness was calculated using 3 different methods that yielded slightly different results. For each measure, the median change score could distinguish between the group of participants who improved and those who were worse or remained the same (table 4). The relative treatment effect was calculated for each measure. The AKPS (1.15), VAS-W (1.09), and VAS-U (.95) demonstrated the highest effect sizes, indicating greater responsiveness than the VAS-activity (.76) and the FIQ (.49). Similar results were obtained by calculating the SESs, and they are represented by the relative efficiency of the outcome measures (fig 2) to detect a treatment effect when compared with the difference in usual pain on a 10-cm VAS. The VAS-activity and FIQ were the least responsive outcome measures.
Table 4Changes in Outcome Measures Corresponding to Global Ratings of Change
Minimal clinically important difference of outcome measures
For most of the outcome measures, the participant-perceived response category “same” corresponded to a mean change that approximated zero. The larger the change, as assessed by the global rating, the larger the change in each outcome measure (table 5). The median change score required for a participant to describe his/her condition as improved was 2cm for VAS-U and VAS-W, 13cm for VAS-activity, 10 points for AKPS, and 2 points for FIQ. These figures may be more meaningful when described as a percentage of the total scale. The change score required to detect an improvement represented 20% of the total score for the VAS-U and VAS-W, 21% of the VAS-activity, 10% of the AKPS, and 13% of the FIQ.
Table 5Summary of Empirical Testing of Patellofemoral Pain Specific Outcome Measures
The ROC curves (fig 1) enable visual identification of the change score that represents the best cutoff for making the distinction between improved or unimproved. The change score closest to the left-hand corner of the figure was 1.5cm for VAS-U, 2cm for VAS-W, 8cm for VAS-activity, 8 points for the AKPS, and 1 point for the FIQ. These change scores are similar but slightly smaller than those identified by the previous method.
This study evaluated the outcome measures used in a clinical trial of treatment for patellofemoral pain. The results indicate that the VAS-U and the AKPS were the most responsive outcome measures in this patient population. The minimal clinically relevant difference required to detect a treatment effect was calculated for each measure, thus enabling recommendations for future trials. The following is a discussion of the results of the empirical testing of the outcome measures (reliability, validity, responsiveness), the importance of these findings for patellofemoral pain trials, and recommendations for future studies.
Reliability of outcome measures for patellofemoral pain
The outcome measures demonstrated moderate to good reliability, with the exception of the FIQ. The variability in reliability between outcome measures may indicate that the measures have different stability over time. This may reflect a varying responsiveness to minor changes in symptoms, which may not have been identified by the participants when asked to indicate if their patellofemoral pain symptoms had changed. Because patellofemoral pain is associated with subtle daily variations in pain, it is unlikely that high reliability correlations will be found in this population.
The SEs of measurement and their 95% CIs indicate the magnitude of the error in absolute terms, which is more useful for both clinicians and researchers. The size of the SEs of measurement were reasonably modest and provide an indication of the amount of change required to detect change that is greater than that expected as a result of errors. The results suggest that, in a homogenous cohort of patellofemoral pain patients with similar demographics, the amount of change needed may not be as large as that suggested by previous trials.
However, generalization to other populations may not be possible. Test-retest reliability is important in relation to the desired or obtained effect size. A low ICC and large SE of measurement may indicate that a large score change may be required to be confident that the effect is due to the intervention. The lack of consistency in test results leads us to recommend that future trials assume a higher SE of measurement than those obtained in this trial, or that reliability studies be done to establish their population-specific ICC and SE of measurement.
Changes in the outcome measures correlated only moderately with improvement, as measured by the global rating of change. Thus, no outcome measure perfectly reflected the participants’ perception of change due to treatment. Because the perceived global response encompasses many aspects of participants’ symptoms (including the perceptions of pain, disability, and functional impairment), strong correlations were not unexpected with outcome measures that focus on 1 or 2 aspects of patellofemoral pain, with more emphasis on different domains. Thus, clinical trials and clinical practice should continue to incorporate a participant-perceived response to treatment in addition to other outcome measures.
Comparison between outcome measures revealed that changes in usual pain and the AKPS correlated moderately with change in the other measures, reflecting that change in 1 outcome measure cannot be fully explained by the change in another. The most closely correlated measures were the VAS-U with the VAS-W (.72) and the AKPS with the VAS-activity (.74). This suggests that the outcome measures are measuring different aspects of features associated with patellofemoral pain. It was expected that the 2 pain measures (VAS-U, VAS-W) would correlate. However, because the AKPS incorporates questions about function and disability with pain and the VAS-U and VAS-W describe pain without reference to function, it is not surprising that high correlations were not found. In addition, the different format (numeric scale vs multiple choice) and different recall timeframes might affect the correlations.
Because there is no accepted means of establishing responsiveness of the outcome measures, this was examined using several methods. Comparison of the median changes in scores indicated that all of the outcome measures could distinguish between those participants who did or did not improve. Further analyses of the effect sizes revealed that of the outcome measures, the VAS-U, VAS-W, and AKPS exhibited the largest effect sizes. Last, the SESs for each outcome measure were compared with the SES for the VAS-U. This revealed that the most efficient measure in detecting a treatment effect relative to the VAS-U was the AKPS (relative efficiency = 1.24). It also had a large SES (.98). The FIQ performed poorly, exhibiting the least efficiency (.18) and a small relative effect size. The FIQ is a simple, 8-question index, but the response categories are narrow, offering only “no problem,” “can do with problems,” or “cannot perform.” It is not surprising that this scale was the least responsive of the outcome measures.
Issues impacting on choice of outcome measures
In addition to the clinimetric properties of an outcome, other aspects that must be taken into consideration when choosing outcome measures include the mode of data collection and practicalities of administration. There is conjecture in the literature as to whether the mode of data collection (self- or interviewer-administered) can affect the results; most studies have found no differences when the interviewers are well trained.
The advantages of using self-administered questionnaires include willingness of persons to describe their complaints, less time constraints to think about their responses, avoidance of interviewer bias, and conservation of interviewer time. The disadvantages of using self-administered questionnaires, which include higher missing data points and language or reading restrictions, can be minimized by having the interviewer check the form and clarify any questions.
Another factor that must be considered is the practicalities associated with administering the questionnaire. A useful outcome measure must have low responder burden. It should be easy to understand, concise, time-efficient, and encompass the symptoms of the patient. In our recent study, the AKPS was described by participants as easy to understand and depicting well the symptoms.
Recommendations for choice of outcome measures for patellofemoral pain
Table 5 summarizes the results of the empirical testing performed on the outcome measures. Based on the 3 methods of determining responsiveness, it appears that the VAS-U, VAS-W, and AKPS are the most valid and responsive outcome measures for patellofemoral pain.
Because pain is the dominant feature of patellofemoral pain, the amount of knee pain is paramount in the assessment of treatment outcome. It is common to use a 10-cm VAS to assess pain; we chose to evaluate both VAS-U and VAS-W, the primary measures of pain. Both have been used as intervention for patellofemoral pain.
Although their reliability appears moderate, they seem responsive to change in participants’ conditions. They both exhibit large effect sizes and can distinguish between improved and nonimproved participants. The VAS-W was less efficient than the VAS-U in determining change resulting from treatment, but the VAS-U was less reliable. Assessments of the minimal clinically important difference for the VAS indicate that a change of 1.5 to 2.0cm (15%–20%) is required to detect improvement.
Patellofemoral pain frequently leads to disability, which often means difficulty performing activities that load the patellofemoral joint, including stairs climbing, squatting, and running. These activities are evaluated in the AKPS. The test-retest reliability of this index was high, and it appears to be a useful measure because it was responsive to change. It had a large effect size, could differentiate between improved and nonimproved participants, and was more efficient than the VAS-U as an outcome measure. A change of between 8 and 10 points in the AKPS is required to detect an improvement.
Rationale for exclusion of selected outcome measures for patellofemoral pain
Of the outcome measures evaluated, the FIQ and the VAS-activity index were the least responsive and should not be included in an outcome set for patellofemoral pain. Intuitively, the VAS-activity appeared to be an appropriate outcome measure for patellofemoral pain. Although it had been used previously in only 1 trial, it targeted 6 common aggravating activities and asked participants to indicate their pain severity on each activity. Although it was reliable and valid, it was a less efficient measure than the AKPS and VAS and thus appears to contribute little to an outcome measures set.
Implications for clinical practice
Based on the results of this study, 3 outcome measures can be used with confidence in the clinical setting. Consider a hypothetical patient with patellofemoral pain who consults a clinician. At baseline, the 2 questionnaires (AKPS, VAS) can be administered and filed in 5 minutes. A participant score of 70 on the AKPS and 6cm on the VAS would imply a moderate amount of pain and disability. After treatment, if the scores have increased to 80 and decreased to 4cm, the clinician can be confident that this change is greater than the error associated with the measures (SE of measurement) and that it reflects a real change that can be attributed to the intervention and is meaningful to the patient.
There is no criterion standard outcome measure to assess outcomes of treatment for patellofemoral pain. Therefore, clinical trials have traditionally used a combination of outcome measures designed to address the pain, disability, and functional capacity of participants with patellofemoral pain. This study investigated the reliability, validity, and responsiveness of several outcome measures. The results of these tests indicate that the VAS for usual or worst pain and AKPS are the most valid and responsive outcome measures of treatment for patellofemoral pain. A change in these measures of 10 points (out of 100) on the AKPS and 2cm on a 10-cm VAS reflects real change in patient symptoms.
aMicrosoft Corp, One Microsoft Way, Redmond, WA 98052-6399.
bSPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60611.
Measuring functional outcomes in therapeutic trials for chronic disease.
Smallest detectable and minimal clinically important differences of rehabilitation interventions with their implications for acquired sample sizes using WOMAC and SF-36 quality of life measurement instruments in patients with osteoarthritis of the lower extremities.