| | A Pilot Study to Modify the SF-36V Physical Functioning Scale for Use With Veterans With Spinal Cord InjuryPresented to the VA Health Services Research and Development Service Annual Research Conference, 2005, Baltimore, MD, and the American Paraplegia Society, 2005, Las Vegas, NV. Abstract Luther SL, Kromrey J, Powell-Cope G, Rosenberg D, Nelson A, Ahmed S, Quigley P. A pilot study to modify the SF-36V physical functioning scale for use with veterans with spinal cord injury. ObjectiveTo develop a valid and reliable spinal cord injury (SCI) specific physical functioning (PF) scale for the Veterans Health Administration (VHA) version of the 36-Item Short-Form Health Survey. DesignA mixed qualitative and quantitative research design was used. In phase 1, a pool of SCI-specific PF items was generated based on focus groups with patients and health care providers. In phase 2, the psychometric properties of the SCI-specific PF scale were established. ParticipantsThe sample consisted of valid responses from 359 veterans with traumatic SCI who were seen at a VHA SCI center during the prior year (2002). InterventionsNot applicable. Main Outcome MeasurePhysical functioning in people with SCI. ResultsExploratory factor analysis was conducted separately on respondents with lower neurologic-level injuries (paraplegia, 53% [n=190]) and those with higher neurologic-level injuries (tetraplegia, 45% [n=163]) and identified 9 items loading on 1 factor in both groups. These 9 items were included in separate item response theory (IRT) model analyses for each subgroup. Based on the IRT analysis, 1 item was eliminated, resulting in an 8-item, SCI-specific PF scale. ConclusionsAlthough several of the items in the SCI-specific PF scale showed floor effects, particularly in people with tetraplegia, we found excellent reliability and strong support of convergent and divergent validity of the scale.
THE MEDICAL OUTCOMES STUDY 36-Item Short-Form Health Survey (SF-36) is a generic measure of health-related quality of life (HRQOL). The tool comprises 36 items that yield 8 scales (physical functioning [PF], role–physical, bodily pain, general health, vitality, social functioning, role–emotional, mental health) and 2 summary scores (the physical component score [PCS], the mental component score [MCS]). Validity and reliability for the SF-36 have been established through a series of studies both in the United States and internationally.1, 2, 3, 4, 5, 6 The SF-36 has been used to compare the relative burden of disease and differentiate the health benefits across a wide range of diseases. Currently there are more than 5000 published articles of research using the SF-36. Among the diseases most commonly studied with the SF-36, with more than 100 articles each, were arthritis, asthma, back pain, cancer, cardiovascular disease, migraine headache, human immunodeficiency virus and acquired immune deficiency syndrome, kidney disease, low back pain, multiple sclerosis, musculoskeletal conditions, osteoarthritis, renal disease, rheumatoid arthritis, stroke, and surgical procedures.7
A version of the SF-36 (SF-36V) has been designed to use with the Veterans Health Administration (VHA) ambulatory care populations.8, 9 The SF-36V modified the original SF-36 by increasing the number of response options for items measuring role, thereby reducing problems with floor and ceiling effects in the 2 subscales measuring role functioning. Discriminant validity indicated that the modified scales have greater explanatory power than the unmodified scales for measuring disease burden in VHA populations.10
The VHA uses the SF-36V primarily as a monitor of self-reported health in patient populations and as an outcome measure to evaluate programs, population health, health services, and clinical interventions. Research has shown that wording on the SF-36 PF scale may be inappropriate (and even offensive) for people with spinal cord injury (SCI) who use a wheelchair for mobility,11 a large and important patient population in the VHA. For example, the PF scale includes items about the respondents’ assessment of their own abilities to “walk” 90m (100y), several hundred meters, and 1.6km (1 mile). Respondents are also asked about their abilities to “climb” 1 or several flights of stairs. Meyers and Andresen12 suggested that the PF scale could be made appropriate for people with SCI with relatively minor modifications such as substituting the word “go” for “walk” or “climb.” Although others have observed this need, to date no SCI-specific modifications of the SF-36 have been published. Adapting the SF-36V PF scale to be appropriate for people with SCI ensures that these people will not be excluded from important research studies and program evaluation efforts12 in the VHA.
The purposes of this study were to modify the SF-36V PF scale for use with veterans with SCI and to evaluate the psychometric properties of the new scale.
Methods  Study Design A 2-phase mixed qualitative and quantitative research design was used. Approval from appropriate research groups at the Tampa VHA and the institutional review board of the University of South Florida was obtained before commencing the study. Phase 1: sample Convenience samples of health care providers working in the SCI center and veterans with SCI at the Tampa VHA were recruited to participate in focus groups. Inclusion criteria for the patient focus groups included discharge from the spinal cord service and a score of 5 or higher on the Uniform Data System for Medical Rehabilitation’s FIM instrument for comprehension, problem solving, and memory. Exclusion criteria included people on ventilators and those with a score of 1 to 4 (totally dependent to minimum assistance) on the FIM measure for comprehension, problem solving, and memory. All professional health care providers were eligible for the provider focus groups including physicians, registered nurses, physical therapists, kinesiotherapists, social workers, and occupational therapists. The inclusion criterion for providers was at least 1 year of experience working with SCI patients. Three focus groups were conducted with SCI inpatients and 3 with groups of professionals. Each group had 3 to 8 members, with a total of 22 patients and 13 providers participating. The focus group facilitator described each item from the standard version of the SF-36V PF scale and elicited “SCI-appropriate” wording. Focus groups lasted 1 to 1.5 hours and were audiotape recorded and transcribed for subsequent analysis. The focus group assistant used a flip chart in full view of the respondents to document comments generated by the group. Wording for the PF scale items was modified based on the results of the focus groups. Phase 2: sample In phase 2, a cross-sectional survey using a self-administered written questionnaire was conducted of people with SCI who receive care at the Tampa VHA. Inclusion criteria included discharge from the inpatient spinal cord service, English literacy, and the ability to complete the questionnaire. People on ventilators were excluded. Data Collection Instrument Based on the results of the focus groups, a reworded pool of PF scale items was developed. Because we were concerned that the traditional PF items might unduly emphasize lower-body mobility, we added additional items from the function component of the Late Life Function and Disability Instrument, a 32-item tool measuring 3 dimensions—upper-extremity, basic lower-extremity, and advanced lower-extremity functions.13 The instructions to respondents when completing the PF scale were modified to require respondents to consider each item as if they were using their assistive devices. A group of experts in SCI reviewed the preliminary instrument and recommended 2 additional items. The pool of 30 items was included in a written questionnaire with the other components of the SF-36V and information about activities of daily living (ADLs), level of injury, physical functioning, and demographics. The SF-36V was modified from its original published version to reflect minor wording and response scale modifications recommended in the most recent version of the SF-36 designed for non-VHA populations.14, 15 The VHA Spinal Cord Dysfunction Registry Questionnaire (SCDRQ) was used to obtain a self-reported measure of disease, impairment, and function.16 Items from the SCDRQ were designed to measure the impact of disease, including the cause of the spinal cord condition, the level of injury, areas of the body affected, loss of function, and comorbid conditions. Items from the SCDRQ designed to measure impairment included those measuring basic ADLs and instrumental ADLs (IADLs). Items related to resource utilization such as mobility aids and assistants were also included. Items from the SCDRQ used to classify either paraplegia or tetraplegia included self-reported neurologic level, description of movement loss, and area of the body affected. The resultant questionnaire was pilot tested with 20 patients from the inpatient unit at the Tampa VHA. Data collection procedures Subjects were mailed a survey packet that contained an invitation letter, the questionnaire, and instructions for completion and returning it via a self-addressed stamped envelope. Nonresponders were mailed 2 reminders, 1 at 3 weeks and 1 at 6 weeks after the initial invitation. Data Analysis A series of analytic steps were conducted to evaluate the validity and reliability of the revised PF scale. To establish unidimensionality of the underlying construct of physical functioning, regardless of injury level, we conducted exploratory factor analysis using principal factor extraction followed by orthogonal and oblique rotation. Because the factor solutions correlated, we interpreted the oblique (promax) rotated factor pattern. We also provide the oblique rotated factor structure as a reference for those readers who prefer this matrix when interpreting correlated factors. Because we were concerned that the construct of physical functioning might be operationally different in respondents with less severe injuries compared with those with more severe injuries, we conducted the factor analysis separately on respondents with cervical-level injury (tetraplegia) and those with lower-level injury (paraplegia). Similar factor solutions would support that the items represent an underlying construct in both groups and that the additional analysis could be conducted on the whole sample. Dissimilar results would suggest that separate analyses should be conducted. The subset of items identified through factor analyses were then analyzed using item response theory (IRT) techniques. We used a generalized partial credit (GPCM) IRT model and the marginal maximum likelihood estimation procedures of the Parscale software17a as recommended by Ware.18 This program is specifically designed to analyze data contained in 2 or more predefined categories. The GPCM, formulated by Muraki,19 is based on Masters’s partial credit model but relaxes the assumptions of uniform discrimination of test items. It also differentiates from Masters’s model in the basic assumptions it makes about the latent trait. The GPCM is viewed as a member of a family of latent variable models.19 We restrict our analysis here to a 1-parameter GPCM model. Item calibration and scaling were conducted as described by Muraki.19 We examined goodness of fit for each item and for the total set of items. Based on the model specifications, respondents’ scores were placed into 10 groups with expected versus observed frequency distributions and were tested using a likelihood-ratio chi-square statistic. We also interpreted the item-category response functions (ICRF), item information curves, and test information and standard error curves for the set of items. It is beyond the scope of this study to provide an extensive overview of the ICRFs generated by a GPCM IRT model. For such a description, we refer readers to Ware.18 However, we will provide a brief description of the basic components of the ICRF to enable readers unfamiliar with the method to interpret our results. Figure 1 provides an ICRF for one of the items from the item pool calibrated with paraplegic respondents. This item. “lifting or carrying groceries,” has 3 categoric response options: “limited a little,” “limited a lot,” and “not limited at all.” The solid lines represent the model’s prediction of the probability of choosing each of the item response categories for varying levels of ability (physical functioning). The horizontal axis is physical functioning normalized with the mean equal to 0 and the standard deviation equal to 1. The vertical dashed lines represent the intersection points of the first and third item-category response lines, where respondents are equally likely to choose the 2 adjacent response categories (thresholds). These thresholds are used to define the “difficulty” of each response category.18 The midpoint of the interval between the thresholds represents the mean threshold, which we use to represent the item’s location on the PF scale. Figure 2 presents the item information curve for the same item, which for the GPCM represents the change in response with the change in ability level. Item information curves of individual items are summarized in the test information curve (fig 3).19 The total IRT scores for each respondent were converted to 0-to-100 SF-36V scoring conventions. Finally, along with the scores from the other SF-36V scales, the SCI-specific PF scores were converted to standardized scores, and the PCS and MCS scores were calculated as described by Kazis.20 To establish reliability the Cronbach α was calculated for the scale. To further establish the validity of the SCI-specific PF scale, a series of bivariate analyses were conducted. Pearson product-moment correlation coefficients were calculated between the SCI-specific PF scale and the other scales of the SF-36V, the PCS and MCS, and the functional status and independence and ADL scales. Data were scanned into a database using Teleform software.b SASc was used to conduct factor analysis and other descriptive and bivariate analyses. Parscale, version 4.1, was used to conduct IRT analysis.
Results  Phase 1 The pool of revised PF items is outlined in table 1. Three items were unaltered SF-36V PF items, 12 were reworded PF scale items, and 15 were adapted or new items. In revising items we changed the wording as little as possible. For example, in the item describing “vigorous activities,” the term “running” was replaced with “wheelchair jogging/racing.” Based on the focus groups we could not identify the best way to reword the items that include the word “walking” or “climbing” in the original PF scale. Therefore, 2 options of alternative words for walking (“wheeling” and “going”) and 2 options for “climbing” (“climbing” and “going”) were used. We relied on empiric evidence from phase 2 to choose which of these wordings to include in the final scale. | | |  | Item Pool | Original SF-36 Item | Reworded SF-36 Item | New Item |  |
 | Pouring from a large pitcher | | | X |  |
 | Unscrewing a lid | | | X |  |
 | Bathing or dressing yourself | X | | |  |
 | Washing dishes by hand | | | X |  |
 | Making a bed | | | X |  |
 | Bending as if to pick something from the floor | | | X |  |
 | Holding a full glass of water | | | X |  |
 | Reaching overhead | | | X |  |
 | Opening a heavy outside door | | | X |  |
 | Transferring to bed | | | X |  |
 | Lifting or carrying groceries | X | | |  |
 | Shopping for groceries | | | X |  |
 | Getting into and out of a car | | | X |  |
 | Transferring to toilet | | | X |  |
 | Moderate activities, such as moving a table, pushing a vacuum, or bowling | | X | |  |
 | Getting up and down from a curb | | | X |  |
 | Bending or stooping | X | | |  |
 | Wheeling several hundred yards | | X | |  |
 | Wheeling more than one mile | | X | |  |
 | Going up several wheelchair ramps | | X | |  |
 | Wheeling one hundred yards | | X | |  |
 | Going up one wheelchair ramp | | X | |  |
 | Climbing several wheelchair ramps | | X | |  |
 | Climbing one wheelchair ramp | | X | |  |
 | Going several hundred yards | | X | |  |
 | Going more than one mile | | X | |  |
 | Going one hundred yards | | X | |  |
 | Vigorous activities such as wheelchair jogging racing, lifting heavy objectives, and participating strenuous sports | | X | |  |
 | Moving around one floor of home | | | X |  |
 | Shopping for two or more hours | | | X |  | | | |
Phase 2 Questionnaires were sent to all veterans with SCI who had been seen at the Tampa VHA SCI center in the previous year (N=787). A total of 392 usable questionnaires were obtained, yielding a response rate of 49.8%. Thirty-three of the respondents were found to have diagnoses other than SCI (eg, multiple sclerosis) and were eliminated from the analysis. Of the 359 patients in the analytic data set (table 2) most were white non-Hispanic (n=287 [80%]), aged 40 to 59 years (n=171 [51%]), and male (n=330 [94%]). Slightly more respondents were categorized as being paraplegic 190 (53%) than tetraplegic 163 (45%), and there were 6 (2%) respondents for whom data were not available to accurately assign them to one of the categories. Respondents used a wide range of devices to aid mobility. Most used a manual device (n=184 [51%]), and 166 (46%) indicated use of a scooter or electric wheelchair. Relatively few respondents (n=22 [6%]) reported that they could walk without the help of assistive devices, and even fewer (n=6 [2%]) reported being bedridden. Bivariate analysis of items with 2 different wordings (“going” vs “wheeling,” etc) found no significant difference in response patterns in these items. Because it was acceptable to replace the original PF items worded with both “walking” and “climbing,” the word “going” was chosen to be included in the SCI-specific PF scale. Exploratory factor analysis of the revised PF items for the severity of injury groups (paraplegic and tetraplegic) and interpretation of the eigenvalues and scree plots for each analysis supported a 2-factor solution for each group. For respondents with tetraplegia (n=121) the 2-factor solution explained 81% of the unique variance, whereas for respondents with paraplegia (n=141) 79% of the variance was explained by the factors. (Factor analysis was restricted to respondents with no missing values on any item.) The 2 factors highly correlated for respondents both with tetraplegia (r=.49) and with paraplegia (r=.57); therefore, oblique factor pattern coefficients were interpreted. The factor structure matrix for each factor is provided for readers who prefer to interpret that matrix with correlated factors. The pattern of loadings on the first and second factors in each group was substantially different in the 2 severity groups. For respondents with tetraplegia, the highest loadings in factor 1 involved items measuring upper-body activities, whereas the highest loadings in factor 1 for respondents with paraplegia related to mobility. This pattern reversed itself in the factor loadings for factor 2 in each group. Although these results suggest that the whole item pool did not represent a unidimensional construct in both groups, a close inspection of the results identified a set of 9 items with similar factor loading on factors 1 in both groups (table 3, shaded area). All of the items had a simple structure (loading on only 1 factor) based on responses of those with tetraplegia, but 3 of the 9 item loadings were complex (loading on both factors) among those with paraplegia. The presence of simple structure in the results supports the validity of the factor solution. | | |  | Item | Factor Loadings (Tetraplegia) | Factor Loadings (Paraplegia) |  |
|---|
 | Factor Pattern | Factor Structure | Factor Pattern | Factor Structure |  |
|---|
 | Factor 1 | Factor 2 | Factor 1 | Factor 2 | Factor 1 | Factor 2 | Factor 1 | Factor 2 |  |
 | Pouring from a large pitcher | .91 | — | .87 | .37 | — | .78 | .41 | .77 |  |
 | Bathing or dressing yourself | .89 | — | .88 | .41 | — | .73 | .52 | .78 |  |
 | Bending as if to pick something from the floor | .83 | — | .85 | .45 | — | .53 | .58 | .70 |  |
 | Washing dishes by hand | .83 | — | .85 | .38 | — | .54 | .58 | .70 |  |
 | Making a bed | .83 | — | .81 | .38 | .41 | .45 | .67 | .69 |  |
 | Transferring to bed | .82 | — | .77 | .32 | — | .78 | .45 | .78 |  |
 | Unscrewing a lid | .81 | — | .79 | .36 | — | .84 | .37 | .77 |  |
 | Transferring to toilet | .80 | — | .76 | .31 | — | .76 | .33 | .76 |  |
 | Holding a full glass of water in one hand | .77 | — | .75 | .34 | — | .64 | .30 | .61 |  |
 | Shopping for groceries | .77 | — | .84 | .52 | .56 | — | .72 | .60 |  |
 | Lifting or carrying groceries | .74 | — | .82 | .54 | .60 | — | .75 | .62 |  |
 | Moderate activities, such as moving a table, pushing a vacuum, or bowling | .72 | — | .78 | .48 | .49 | — | .65 | .56 |  |
 | Reaching overhead | .71 | — | .78 | .49 | .39 | .39 | .61 | .61 |  |
 | Bending or stooping | .70 | — | .75 | .44 | .39 | — | .53 | .48 |  |
 | Opening a heavy outside door | .69 | | .78 | .53 | .47 | — | .71 | .69 |  |
 | Vigorous activities such as wheelchair jogging racing, lifting heavy objectives, and participating strenuous sports | .54 | — | .61 | .42 | .58 | — | .59 | .35 |  |
 | Getting up and down from a curb | .54 | — | .62 | .44 | .26 | .35 | .46 | .50 |  |
 | Getting into and out of a car | .47 | — | .61 | .52 | — | .56 | .51 | .68 |  |
 | Going more than one mile | — | .90 | — | .80 | .89 | — | .76 | .28 |  |
 | Going several hundred yards | — | .89 | .39 | .87 | .93 | — | .87 | .42 |  |
 | Going one hundred yards | — | .85 | .46 | .87 | .81 | — | .82 | .47 |  |
 | Going up one wheelchair ramp | — | .77 | .49 | .83 | .63 | — | .75 | .48 |  |
 | Going up several wheelchair ramps | — | .75 | .46 | .79 | .82 | — | .77 | .39 |  |
 | Moving around one floor of home | — | .53 | .49 | .64 | .34 | — | .75 | .58 |  |
 | Shopping for two or more hours | — | .51 | .54 | .65 | .67 | — | .73 | .48 |  | | | |
The 9 items loading on the first factor for respondents with tetraplegia and paraplegia were then included in separate IRT model analyses for each severity group. Based on a review of item fit statistics of these models, 1 item (“moderate activities, such as moving a table, pushing a vacuum, or bowling”) was eliminated. For the remaining 8 items both individual item and total model fit statistics supported model fit based on likelihood-ratio chi-square statistics, with all of the P values being greater than .05. Inspection of the item category response functions supported the ordinal relation of the response options for all items. The rank order of the items, as described by the mean category threshold, was very similar, with the most obvious difference being the item “reaching overhead,” which was the easiest item for the respondents in the paraplegic severity category and the fifth most difficult item for the tetraplegic respondents (table 4). Also provided in table 4 are polyserial correlations between responses to the individual items and the total score for the scale. The correlations were very high for respondents with tetraplegia and paraplegia (≥.59). The total information curves and standard error curves for the SCI-specific PF scales for respondents with tetraplegia and paraplegia (fig 3) suggest that the set of items provides substantial information in the middle two thirds of the ability scale, with a rapid decrease for more extreme abilities. Standard errors were lowest in the middle range of the ability scales. Neither model resulted in a wide range of mean category threshold values. Descriptive responses to the items showed a floor effect on many of the items (table 5), particularly for tetraplegic respondents who chose the “yes, limited a lot” option for at least half of the revised PF items. | | |  | Item | Tetraplegia⁎ | Paraplegia⁎ |  |
|---|
 | Yes, Limited a Lot | Yes, Limited a Little | No, Not Limited at All | Yes, Limited a Lot | Yes, Limited a Little | No, Not Limited at All |  |
 | Vigorous activities (eg, wheelchair jogging racing, lifting heavy objectives, participating strenuous sports) | 119 (77.3) | 24 (15.6) | 11 (7.1) | 116 (64.0) | 50 (27.6) | 15 (8.3) |  |
 | Getting up and down from the curb | 102 (66.7) | 34 (22.2) | 17 (11.1) | 85 (33.3) | 73 (41.0) | 20 (11.2) |  |
 | Bending or stooping | 102 (66.2) | 36 (23.4) | 16 (10.4) | 80 (44.7) | 69 (38.6) | 30 (16.8) |  |
 | Making a bed | 95 (62.5) | 37 (24.3) | 20 (13.2) | 64 (35.8) | 78 (43.6) | 37 (20.7) |  |
 | Opening a heavy outside door | 95 (62.9) | 39 (25.8) | 17 (11.3) | 61 (33.3) | 85 (46.5) | 37 (20.2) |  |
 | Lifting or carrying groceries | 91 (58.7) | 44 (28.4) | 20 (12.9) | 56 (30.8) | 91 (50.0) | 35 (19.2) |  |
 | Shopping for groceries | 74 (47.7) | 56 (36.1) | 25 (16.3) | 50 (27.2) | 79 (42.9) | 55 (29.9) |  |
 | Reaching overhead | 86 (55.8) | 54 (35.0) | 14 (9.0) | 40 (21.7) | 98 (53.3) | 46 (25.0) |  | | | |
|
⁎
Paraplegia n range, 178–183; tetraplegia n range, 151–155. |
Finally, a series of analyses were conducted to establish reliability and validity of the PF scores based on the models developed. Internal consistency reliability (Cronbach α) calculated for the SCI-specific PF scale was .90. Pearson product-moment correlation coefficients (table 6) between the SCI-specific PF score and the raw scores from the other 7 scales on the SF-36V found that the strongest association (r=.55) was between the revised PF scale and the role–physical subscale. The weakest association (r=.18) was between the PF score and the mental health subscale. The SCI-specific PF scale correlated more highly with the PCS (r=.64) than with the MCS (r=.16). The SCI-specific PF scale score was very strongly associated with ADLs (r=.63) and even more strongly associated with IADLs (r=.70).
Discussion  Previous research has shown that wording of the SF-36 PF scale is not ideal for people with SCI,11 leading researchers to call for alternative wording to measure this construct.12 The purpose of this pilot study was to develop an alternative set of items that might supplement or replace the original scale for use with this population. Multiple large well-designed studies have established the unidimensional structure and scoring strategies of the original PF scale. For example, Haley et al21 used IRT analysis to show empiric support for item hierarchy and confirmed the unidimensionality for the PF scale in multiple patient groups. Building on this work, McHorney et al22 found that IRT scoring discriminated better between patients with different diseases and for those patient groups that most approximated the extremes of the score distribution. Our goal was to change the wording in the original PF items as little as possible, thereby building on this previous work and making comparisons between the original scale and our new SCI-specific PF scale as direct as possible. We also restricted our analysis to 1-parameter models, as have been used with previous work with the PF scale. Unfortunately, the results of the factor analyses showed that many of the original PF items (even after rewording) functioned very differently in tetraplegic and paraplegic respondents, leading us to identify a set of items for our SCI-specific PF that were very different from the original items. Nonetheless, we were able to identify a set of 8 items that appear to validly and reliably measure PF in both patient groups. Recently, Forchheimer et al23 published a study using the SF-36 with a sample of 215 people with SCI who had been discharged at least 1 year previously from their initial hospitalizations from a major university hospital. The SF-36 was included, along with other items, as part of follow-up care and was conducted via telephone. They reported that participation in the study was high, no respondent refused to complete the SF-36, and high reliabilities of the 8 SF-36 scales were obtained. They found support for construct validity of the SF-36 PF scale by showing that the resultant PCS score (which included the results of the PF score) varied significantly across levels of neurologic impairment whereas the MCS score did not. The investigators suggested that their findings provide support for the use of the SF-36 as a measure of HRQOL among people with SCI. Given these results they concluded that although changing the wording of items for people with SCI (eg, from “walking” to “going”) might improve the sensitivity of the PF scale, such changes might alter the dynamics of the items, bringing into question the interpretation of the scale. They also raised concerns that allowing respondents to answer items on the basis of the availability of assistive devices might reflect access to care rather than physical functioning. However, Forchheimer et al23 did not explore the factor structure of the PF scale in their cohort nor did they provide values of the PF scale across neurologic impairment levels. Based on factor analysis in the current study, the reworded PF scale did not appear to measure a unidimensional construct, across severity levels, calling into question the use of a summative scale calculated either through Likert or IRT methods. Because of concerns with respondent burden, we did not include the original wording of the SF-36V PF items in our item pool. If we had done so we could have compared the factor structure of the original items with that of the revised items. In addition, Forchheimer23 conducted telephone interviews, but we used a mailed questionnaire. Our research provides preliminary evidence for an SCI-specific PF scale for the SF-36V. One could argue that because we asked respondents to answer as if they were using their assistive devices, we measured something other than physical functioning. Our decision to instruct respondents to answer the SCI-specific PF items based on the use of all available assistive devices was driven by our intent to measure “capacity” rather than “performance,” as defined by the World Health Organization’s International Classification of Functioning, Disability and Health (ICF).24 The ICF describes functioning (and disability) as having 2 components: (1) bodily functions and structures and (2) activities and participation. Impairments of bodily functions or structures can lead to limitations of activities and/or participation. In the ICF framework, activities and participation can be qualified as reflecting performance (what a person does in his/her current environment) or capacity (the highest probable level of functioning that a person may reach in a given domain or environment), and it reflects the environmentally adjusted ability of an individual. The concept of “standardized” environment is said to neutralize the varying impacts of the different environments on the ability of the person.25 We used this convention because access to resources in the veteran population should ensure that all assistive devices needed are provided, thereby creating a standardized environment. This strategy also should reduce the impact of floor and ceiling effects. Although the results of the current study support the validity and reliability of the revised PF scale for use with people with SCI, several study limitations should be kept in mind. First, our sample was drawn from a single institution rather than a national representative sample. Results may reflect the characteristics unique to the respondents from that institution. Second, although the sample size is relatively large, after splitting the sample into 2 groups (paraplegia, tetraplegia), both the factor analysis and the IRT analysis were conducted with relatively small samples. Replication of this study in a larger group of representative veterans would allow for more stable estimates of both factor analytic and IRT analyses. Last, although the resultant PF scale had excellent psychometric properties, the problem of floor effects may still occur. This problem might be solved in several ways. One solution would be to expand the item pool to include more sensitive items; another would be to develop separate PF scales for people with higher or lower levels of neurologic injury. Finally, expanding the number of response options for the items might improve sensitivity to physical functioning in this population. The major advantage of using a generic HRQOL instrument with people with SCI is that it allows for comparisons with people from the general population and other patient groups. The major disadvantages of a generic HRQOL instrument with people with SCI is that condition-specific areas of life are often not measured and that floor and ceiling effects are more of a problem than in the general population. The SF-36 (and the SF-36V) exemplifies both the strengths and weaknesses of generic HRQOL tools. Because of its wide use in multinational settings, the extensive evidence of validity and reliability, and the normative data available for interpretation, the SF-36 would be an ideal tool to use. However, the aforementioned problems with the wording of the PF scale potentially exacerbate the issues of range of response and floor and ceiling effects inherent in all generic measure of HRQOL by calling into question the validity of the items for the SCI population. Given these strengths and weakness of the SF-36 PF scale, a number of researchers26, 27 have recommended its use with SCI populations while calling for studies to further validate the tool. Although the current study represents a first step toward accomplishing this goal and we believe it provides important information about the issue, more work needs to be done to further refine this instrument.
Conclusions  To our knowledge, the current study represents the first attempt to revise items for the SF-36V PF scale to be more appropriate for use with people with SCI. We administered the items to a relatively large sample and applied rigorous psychometric analyses. Validation in a larger, representative, national sample is warranted before definitively recommending the revised PF scale as part of the SF-36V when surveying veterans with SCI.
Suppliers
Acknowledgements  We thank Maria Mullins, MD, for her support for the project. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs. References  1.
1
Ware JE
, Sherbourne CD
.
The MOS 36-Item Short-Form Health Survey (SF-36). I. Conceptual framework and item selection
.
Med Care
. 1992;30:473–483
.
MEDLINE |
CrossRef
2.
2
McHorney CA
, Ware JE
, Raczek AE
.
The MOS 36-Item Short-Form Health Survey (SF-36). II. Psychometric and clinical tests of validity in measuring physical and mental health constructs
.
Med Care
. 1993;31:247–263
.
MEDLINE 3.
3
McHorney CA
, Ware JE
, Lu JF
, Sherbourne CD
.
The MOS 36-Item Short-Form Health Survey (SF-36). III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups
.
Med Care
. 1994;32:40–66
.
MEDLINE |
CrossRef
4.
4
Ware JE
, Gandek B
, Kosinski M
, et al.
The equivalence of SF-36 summary health scores estimated using standard and country-specific algorithms in 10 countries: results from the IQOLA Project. International Quality of Life Assessment
.
J Clin Epidemiol
. 1998;51:1167–1170
.
Abstract | Full Text |
Full-Text PDF (160 KB)
|
CrossRef
5.
5
Ware JE
, Kosinski M
, Gandek B
, et al.
The factor structure of the SF-36 Health Survey in 10 countries: results from the IQOLA Project. International Quality of Life Assessment
.
J Clin Epidemiol
. 1998;51:1159–1165
.
Abstract | Full Text |
Full-Text PDF (155 KB)
|
CrossRef
6.
6
Gandek B
, Ware JE
, Aaronson NK
, et al.
Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: results from the IQOLA Project. International Quality of Life Assessment
.
J Clin Epidemiol
. 1998;51:1149–1158
.
Abstract | Full Text |
Full-Text PDF (175 KB)
|
CrossRef
7.
7
Ware JE
.
SF-36 health survey update
.
In:
Maruish ME
editors.
The use of psychological testing for treatment planning and outcomes assessment
. 3rd ed.. Mahwah: Lawrence Erlbaum Associates; 2004;p. 693–718
.
8.
8
Kazis LE
, Miller DR
, Clark J
, et al.
Health-related quality of life in patients served by the Department of Veterans Affairs
(results from the Veterans Health Study)
.
Arch Intern Med
. 1998;158:626–632
.
MEDLINE |
CrossRef
9.
9
Kazis LE
, Miller DR
, Clark J
, et al.
Improving the response choices on the Veterans SF-36 health survey role functioning scales
(results from the Veterans Health Study)
.
J Ambul Care Manage
. 2004;27:3;
.
10.
10
Kazis LE
, Ren XS
, Lee A
, et al.
Health status in VA patients
(results from the Veterans Health Study)
.
Am J Med Qual
. 1999;14:28–38
.
MEDLINE |
CrossRef
11.
11
Andresen EM
, Fouts BS
, Romeis JC
, Brownson CA
.
Performance of health-related quality-of-life instruments in a spinal cord injured population
.
Arch Phys Med Rehabil
. 1999;80:877–884
.
Abstract |
Full-Text PDF (1004 KB)
|
CrossRef
12.
12
Meyers AR
, Andresen EM
.
Enabling our instruments
(accommodation, universal design, and access to participation in research)
.
Arch Phys Med Rehabil
. 2000;81(Suppl 2):S5–S9
.
Abstract |
Full-Text PDF (44 KB)
|
CrossRef
13.
13
Haley SM
, Jette AM
, Coster WJ
, et al.
Late Life Function and Disability Instrument: II. Development and evaluation of the function component
.
J Gerontol A Biol Sci Med Sci
. 2002;57:M217–M222
.
MEDLINE 14.
14
Ware JE
, Kosinski M
, Dewey JE
.
How to score version two of the SF-36 health survey
. Lincoln: QualityMetric; 2000;
.
15.
15
Ware JE
, Kosinski M
, Dewey JE
.
Version 2 of the SF-36 health survey
. Lincoln: QualityMetric; 2003;
.
16.
16
Hoenig H
, McIntyre L
, Sloane R
, Branch LG
, Truncali A
, Horner RD
.
The reliability of a self-reported measure of disease, impairment, and function in persons with spinal cord dysfunction
.
Arch Phys Med Rehabil
. 1998;79:378–387
.
Abstract |
Full-Text PDF (1124 KB)
|
CrossRef
17.
17
Muraki E
, Bock RD
.
Parscale
(parameter scaling of rating data)
. Chicago: Scientific Software; 1991;
.
18.
18
Ware JE
.
Conceptualization and measurement of health-related quality of life
(comments on an evolving field)
.
Arch Phys Med Rehabil
. 2003;84(4 Suppl 2):S43–S51
.
Abstract |
Full-Text PDF (209 KB)
|
CrossRef
19.
19
Muraki E
.
A generalized partial credit model
.
In:
van der Linden WJ
, Hambelton RK
editor.
Handbook of modern item response theory
. New York: Springer-Verlag; 1997;p. 153–164
.
20.
20
Kazis LE
.
Short Form-36 for veterans (SF-36V)
. Bedford: Center for Health Quality, Outcomes, & Economic Research; 1993;
.
21.
21
Haley SM
, McHorney CA
, Ware JE
.
Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. Unidimensionality and reproducibility of the Rasch item scale
.
J Clin Epidemiol
. 1994;47:671–684
.
MEDLINE |
CrossRef
22.
22
McHorney CA
, Haley SM
, Ware JE
.
Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II. Comparison of relative precision using Likert and Rasch scoring methods
.
J Clin Epidemiol
. 1997;50:451–461
.
Abstract |
Full-Text PDF (1174 KB)
|
CrossRef
23.
23
Forchheimer M
, McAweeney M
, Tate DG
.
Use of the SF-36 among persons with spinal cord injury
.
Am J Phys Med Rehabil
. 2004;83:390–395
.
MEDLINE |
CrossRef
24.
24
World Health Organization
.
ICF
(International classification functioning, disability, and health)
. Geneva: WHO; 2001;
.
25.
25
World Health Organization
.
Toward a common language for functioning, disability and health
(ICF)
. Geneva: WHO; 2002;
.
26.
26
Hallin P
, Sullivan M
, Kreuter M
.
Spinal cord injury and quality of life measures
(a review of instrument psychometric quality)
.
Spinal Cord
. 2000;38:509–523
.
MEDLINE 27.
27
Wood-Dauphinee S
, Exner G
, Bostanci B
, et al.
Quality of life in patients with spinal cord injury—basic issues, assessment, and recommendations
.
Restor Neurol Neurosci
. 2002;20:135–149
.
MEDLINE a VISN 8 Patient Safety Research Center, James A. Haley Veterans Hospital, Tampa, FL b College of Public Health, University of South Florida, Tampa, FL c College of Education, University of South Florida, Tampa, FL d College of Nursing, University of South Florida, Tampa, FL. Reprint requests to Stephen L. Luther, PhD, Patient Safety Center, 11605 N Nebraska Ave, Tampa, FL 33612-5738
Supported by the Department of Veterans Affairs, Veterans Health Administration VISN 8, VISN 8 Patient Safety Research Center, and VISN 8 Measurement and Evaluation Team. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated. PII: S0003-9993(06)00441-2 doi:10.1016/j.apmr.2006.05.010 © 2006 American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved. | |
|