Volume 83, Issue 1 , Pages 10-18, January 2002
The Wheelchair Skills Test: A pilot study of a new outcome measure☆☆☆★★★♢
Article Outline
Abstract
Kirby RL, Swuste J, Dupuis DJ, MacLeod DA, Monroe R. The Wheelchair Skills Test: a pilot study of a new outcome measure. Arch Phys Med Rehabil 2002;83:10-8. Objective: To evaluate the practicality, safety, reliability, validity, and usefulness of a new Wheelchair Skills Test (WST). Design: A pilot study with within-subject comparisons. Setting: Rehabilitation center. Patients: Twenty-four wheelchair users (11 with amputations, 4 with stroke, 3 with musculoskeletal disorders, 3 with spinal cord injury, 3 with neuromuscular disorders). Intervention: The WST. Main Outcome Measures: Subjects were videotaped while performing 33 skills twice (>10d apart). Their ability to perform each skill was rated on a 3-point ordinal scale. The test-retest, intra-, and interrater reliabilities were determined. Each subject's occupational therapist completed a visual analog scale (VAS), reflecting a global rating of the subject's manual wheelchair skills. We assessed validity by evaluating whether the WST detected expected changes (construct validity) and how well the total WST scores correlated with the occupational therapists' global ratings (concurrent validity). Each occupational therapist also used a VAS to quantify the usefulness of the WST. Results: The mean time required to administer the WST was 29 minutes. There were no adverse incidents. For the test-retest, intra-, and interrater reliabilities, the correlations for the total scores were .65 (P = .001), .96 (P < .001), and .95 (P < .001), respectively. The 9 therapists unanimously endorsed 30 (91%) of the 33 WST skills. The correlation between the mean changes in the WST and global rating scores was .45 (P < .05). There was a slight negative relationship between total WST score and age (P < .05). There were no significant differences related to the diagnoses accounting for wheelchair use. Wheelchair users with more than 3 weeks of experience with their wheelchairs scored higher than those with less experience (P = .0085). The correlations between the WST and global rating scores ranged from .40 to .54 (P < .05). Through Rasch analysis, we eliminated 6 skills, with the remaining skills comprising a unidimensional screening test of wheelchair ability. The mean VAS score for perceived usefulness was 59%. Conclusions: The WST is practical, safe, well tolerated, exhibits good to excellent reliability, excellent content validity, fair construct and concurrent validity, and moderate usefulness. This pilot study makes an important contribution toward meeting the need for a well-validated outcome measure of manual wheelchair ability. © 2002 by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation
Keywords: Motor skills, Occupational therapy, Rehabilitation, Reproducibility of results, Wheelchairs
The wheelchair is among the most important therapeutic devices used in rehabilitation.1 There were 1.4 million wheelchair users in the United States in 1992,2 a number that was expected to have increased to about 2 million by 2000.3 Although much is now known about wheelchairs, there is much room for improvement in the process through which wheelchairs are prescribed, and in the environment in which wheelchair users function. Prescribing the most appropriate wheelchair for an individual, adjusting it for that person, and training the user in its safe use are important elements in the rehabilitation process.4, 5, 6
Outcome measures in rehabilitation range from global overviews of function7 and community reintegration8 to highly specific kinesiologic measures.9 Between these extremes are intermediate-level measures, among the most useful of which (at least during periods of intensive rehabilitation) are profiles of specific functions. Although there are some such profiles for the operation of powered wheelchairs,10, 11, 12, 13, 14, 15 there is no widely accepted measure of manual wheelchair skills.
Webster et al16, 17 have reported on a wheelchair obstacle course (consisting of a series of left and right turns on a level surface) designed specifically for patients who have had strokes. Harvey et al18 recently published a test of 6 core wheelchair skills (moving from lying to sitting, horizontal transfer, vertical transfer, push on flat, push on ramp, negotiating curbs), each of which was evaluated by using a 6-point scale. Although both the Webster and Harvey tests have value, the former as a means of identifying perceptual problems and the latter as an overview of manual wheelchair function, neither test provides the level of detail that corresponds to wheelchair function in daily life.
The ultimate outcome measure of wheelchair use is how safe and effective the wheelchairs are for users in their own environments. However, the physical challenges that face wheelchair users are widely dispersed in both space and time. It may be only after months of use that a serious accident occurs or a major limitation presents itself. Simulation is an option used successfully in several other settings, including the evaluation of ambulation disabilities.19 Simulation in the context of wheelchair use involves challenging their users to perform tasks in a standardized and obstacle-laden environment. In that way, the challenges can be presented in a reasonable period of time, in an environment where the results can be documented, in a setting where safety can be assured, and where there is access to the expertise needed to explore alternative wheelchair components or adjustments when difficulties are identified.
Such a measure is needed to document initial status and subsequent improvements for individual wheelchair users, and also to use as an outcome measure for rehabilitation programs, to assist in testing research hypotheses, and to assist in the development of new technologies. We have developed a Wheelchair Skills Test (WST) that we believe meets these needs. This pilot study tested the hypotheses that the WST is practical, safe, well tolerated by wheelchair users, has good reliability and validity, and is useful to clinicians.
Methods
Subjects
We studied 24 wheelchair users who were either inpatients or outpatients in the tertiary care rehabilitation program at the Queen Elizabeth II Health Sciences Centre in Halifax, NS. Their demographic and clinical characteristics are presented in table 1.
Table 1. Study participants' demographic and clinical data (N = 24)
| Parameter | Value |
|---|---|
| Age (yr)* | 59 ± 19 (range, 18-85) |
| Gender (M/F) | 16/8 |
| Diagnosis (n) | |
| 11 | |
| 4 | |
| 3 | |
| 3 | |
| 3 | |
| Time using any wheelchair (wk)* | 51 ± 144 (range, 3-678) |
| Time using current wheelchair (wk)* | 17 ± 42 (range, 0.3-182) |
| Where wheelchair used | |
| 24 | |
| 10 | |
| 10 | |
| 3 | |
| 1 | |
| 1 | |
| 0 | |
| Method of wheelchair propulsion | |
| 18 | |
| 4 | |
| 2 | |
| * Values presented as mean ± SD (range). | |
Each participant met all of the following inclusion criteria: 18 years of age or older; alert, cooperative, and able to answer questions about wheelchair use; competent to give an informed consent; willing to participate; user of a manual wheelchair; being treated by an occupational therapist who was willing to participate; and had been using the current wheelchair for at least 2 days at the time of the first WST. The criteria also required that occupational therapists consider their patients to be candidates for some independent wheelchair skills, that the therapists expected their clients to be using a wheelchair 2 to 4 weeks after the initial WST, and that the therapists anticipated that their clients would show some changes in their skills in the 2 to 4 weeks after the initial WST. Although training in wheelchair skills would have been appropriate for our study population at some stage before the data collection, we did not record the nature and extent of any such training. Excluded were persons who had an unstable medical condition (eg, angina, seizures) or who had emotional or psychiatric problems that might make testing unpleasant.
Wheelchairs
Participants were evaluated in the wheelchairs that they had used for at least 2 days before the study. All were manual, rear-wheel drive, with a wide range of components (table 2).
Table 2: Manual wheelchair characteristics (N = 24)
| Parameter | N |
|---|---|
| Style | |
| 22 | |
| 1 | |
| 1 | |
| Frame | |
| 19 | |
| 4 | |
| Seat | |
| 17 | |
| 6 | |
| 24 | |
| 11 | |
| Armrests | |
| 24 | |
| 21 | |
| 17 | |
| 5 | |
| 3 | |
| 2 | |
| Front rigging | |
| 21 | |
| 19 | |
| 19 | |
| 19 | |
| 2 | |
| Brakes | |
| 23 | |
| 1 | |
| 19 | |
| 4 | |
| 16 | |
| 2 | |
| 4 | |
| 1 | |
| Rear antitip devices | 22 |
Wheelchair Skills Test
WST, version 1.0,a consisted of 33 skills (table 3) spanning the spectrum from skills as easy as applying the brakes to skills as difficult as performing a wheelie. The WST was administered twice, with a mean time between tests of 17.9 days (range, 12-28d). Each subject was dressed and equipped (eg, wearing prostheses or orthoses) in his/her usual manner when using the wheelchair. The order in which the skills were tested reflected both their difficulty (beginning with the easier skills and progressing to the more difficult) and their natural groupings. For each skill, the users were oriented to the test expectations. If the first attempt to perform a skill was unsuccessful, a second try was permitted if requested.
Table 3. WST (version 1.0): Success rates and total scores
| WST 1 | WST 2 | |||||
|---|---|---|---|---|---|---|
| Group | Skill No. | Skill | N* | % Success† | N | % Success |
| Brakes | 1 | On | 24 | 100 | 21 | 100 |
| 2 | Off | 24 | 100 | 21 | 100 | |
| 3 | Brake extensions | 4 | 25 | 5 | 40 | |
| 4 | Caster locks | 1 | 100 | 1 | 100 | |
| Footrests | 5 | Flip up/down | 19 | 32 | 17 | 29 |
| 6 | Swing away/back | 19 | 84 | 17 | 71 | |
| 7 | Remove/replace | 19 | 63 | 17 | 65 | |
| 8 | Elevate/lower | 2 | 0 | 2 | 100 | |
| Armrests | 9 | Flip or swing | 19 | 47 | 17 | 35 |
| 10 | Remove/replace | 22 | 23 | 19 | 26 | |
| 11 | Elevate/lower | 5 | 20 | 5 | 40 | |
| Transfer | 12 | Unweighting | 24 | 71 | 21 | 67 |
| 13 | To/from | 24 | 33 | 21 | 48 | |
| Folding | 14 | Cushion, and so forth | 24 | 63 | 21 | 57 |
| 15 | Fold/open | 22 | 37 | 20 | 45 | |
| 16 | Quick-release | 4 | 0 | 4 | 50 | |
| Reaching | 17 | Floor | 24 | 75 | 21 | 90 |
| 18 | Knapsack | 5 | 80 | 8 | 75 | |
| 19 | High object | 23 | 96 | 21 | 95 | |
| Maneuvering | 20 | Slalom | 24 | 92 | 21 | 100 |
| 21 | 3-point turns | 24 | 67 | 21 | 57 | |
| 22 | Parallel parking | 24 | 67 | 21 | 81 | |
| Doors | 23 | Open toward | 24 | 88 | 21 | 81 |
| 24 | Open away | 24 | 54 | 21 | 71 | |
| 25 | Threshold | 24 | 63 | 21 | 67 | |
| Level | 26 | 50m | 24 | 96 | 21 | 95 |
| Surfaces | 27 | Soft | 24 | 92 | 21 | 81 |
| 28 | Gravel | 24 | 67 | 21 | 43 | |
| Incline (5°) | 29 | Ascend | 24 | 63 | 21 | 57 |
| 30 | Descend | 24 | 75 | 21 | 81 | |
| Curb (10cm) | 31 | Ascend | 24 | 0 | 21 | 0 |
| 32 | Descend | 24 | 38 | 21 | 43 | |
| Wheelie | 33 | Pop/hold | 24 | 0 | 21 | 0 |
| Overall (raw) | Mean ± SD | 24 | 36 ± 8 | 21 | 38 ± 10 | |
| Overall (%) | Mean ± SD | 24 | 51 ± 11 | 21 | 52 ± 13 | |
| * The N columns reflect cases in which a 0, 1, or 2 was assigned (ie, not applicable cases were ignored). † Success was defined as a score of 2. | ||||||
Rests were permitted, unless precluded by the nature of the skill being tested. There was no time limit, but, for descriptive purposes, we recorded the time required for individuals to propel their wheelchairs 50 meters during the level-propulsion skill. Subjects were asked to propel their wheelchairs to a pylon 25 meters away and to return to the starting point, but they were unaware that they were being timed. For the dynamic tests, in which there was the risk of a wheelchair tip, a spotter maintained contact with the subject by using a spotter strap looped around the chair frame.20 We videotaped the WST for subsequent scoring, but videotaping is not necessary in the routine administration of the WST.
Each skill was scored with a 3-point ordinal scale—0 for failure to complete the test criteria safely, 1 for partial completion (eg, for the brake application skill, if the subject was successful on 1 side but not the other), and 2 for successful and safe completion. The setting and criteria for success for each skill (eg, if necessary, pulling on the footrest, after it has been replaced, to ensure that the lock has been engaged) were specified in a test manual. If the subject's wheelchair was not equipped with a component (eg, armrests), the associated skill was considered “not applicable.” Total scores (both raw and as a percentage of the total possible with the wheelchair used) were calculated and a graphic report was generated.
Global rating of manual wheelchair skills
Within 24 hours after witnessing the WST (either in person or by viewing the videotape), the subjects' occupational therapists completed a 100mm visual analog scale (VAS), reflecting their global assessments of the subjects' manual wheelchair skills. There were 9 occupational therapists involved, with clinical experience that ranged from 2 to 19 years. The extremes of the VAS were labeled 0 (no wheelchair skills at all) and 100 (perfect wheelchair skills). The VAS has been widely used to quantify subjective impressions.21 The therapists were unaware of the WST test scores until after they completed the VAS.
Practicality
We designed the WST to require as little equipment as necessary, and to keep costs, space requirements, and set-up time to a minimum. Although we strove to make the test as comprehensive as possible (to allow wheelchair users with a wide range of skill levels to be measured on the same scale), we limited the test to representative skills rather than attempting to evaluate every possible skill. To evaluate practicality, we made qualitative observations and recorded time taken by each subject.
Safety and tolerance
We monitored the test sessions for adverse incidents. After the testing, subjects were asked whether the tests were uncomfortable or stressful, and if they would perform the test again if recommended by their clinicians.22
Reliability
Reliability is a quantitative expression of consistency, repeatability, or precision. Test-retest reliability was evaluated by comparing the subjects' performances on the 2 WSTs. Although we anticipated that improvement would have occurred (see Construct validity below), we also expected each participants' initial performance to correlate moderately well with the subsequent performance. The intrarater reliability was determined by having a single investigator score the same videotaped recordings of WST 2 twice, at least 2 weeks apart. On the second occasion, the investigator was not permitted to review the results of the initial evaluation and was blind as to whether the second videotape was from WST 1 or 2. For interrater reliability, 2 investigators independently scored the same videotapes of WST 1. Neither rater had witnessed the initial recording of the data. For each of the 3 evaluations of reliability, we calculated the percentage concordance for the individual skills and computed Spearman's rank correlation coefficient for the total scores. Our purpose in performing the reliability studies in the way we did was to separate the effect of subject variability from the effect of rater variability.
Validity
Validity is the degree to which a test truly measures what it is intended to measure.
Content validitySometimes called face validity, credibility, or comprehensiveness, content validity is a nonnumeric judgment of whether the test measures what it claims to and whether the domain has been adequately covered. The WST is intended to provide insights into the ability of wheelchair users to perform a set of skills that are relevant to their daily lives—the appropriateness of the content of the WST measure to its purpose is self-evident. The content validity was additionally assessed by a review of the literature5, 16, 18, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 that had been completed while the test was developed. As an additional assessment of the content validity of the WST, we asked each subject's occupational therapist to comment on the validity of the skills that were included in the WST and to identify any skills that he/she believed should be added.
Construct validityConstruct validity is a measure of the extent to which the test detects expected differences. One such expected difference was a treatment-related improvement in the total WST scores between the administration of the first and second tests. The scoring of the second test from the videotapes was performed by an investigator who was unaware of the order of testing. To assess whether the WST could detect change in a subject's status, we calculated Spearman's rank correlation coefficient between the mean changes in the total raw WST scores and the mean changes in the occupational therapists' global ratings (n = 21). In addition to the extent of change in the WST scores and global ratings, change was determined by the therapists' categoric opinions before WST 2 that their patients' skills were “improved,” “the same,” or “worse” than at the time of WST 1. We used t tests on the total raw WST scores and the global ratings to determine whether significant change had occurred. Another expected difference was that the total WST score (the dependent measure) would be inversely related to the subjects' ages (the independent measure). We used regression analysis for this evaluation. We also anticipated a relationship between the total WST scores and wheelchair experience. By using a 2-sample t test, we compared the total WST scores for participants who had used their wheelchairs for 1 to 21 days with those who had used them for 22 or more days. Finally, we anticipated differences in the total WST scores according to diagnosis and we evaluated this with analysis of variance.
Concurrent validityConcurrent validity involves comparison with a criterion measure (or gold standard). Unfortunately, no well validated criterion measure exists for manual wheelchair skills; global evaluation by a therapist comes nearest to such a measure. Spearman's rank correlation coefficient between the subjects' total raw WST results and the occupational therapists' global ratings served as our means of evaluating concurrent validity. We were not assessing whether the occupational therapists were aware of their patients' wheelchair skills before the WST, but rather whether the occupational therapist, viewing the same performances that the testers did (but without the test criteria and scoring method) would quantify the performances in a similar manner (ie, a Gestalt impression vs a criterion-based approach).
Usefulness to clinicians
After viewing their patients' second WST and reviewing the graphic report, each therapist used a VAS to quantify his/her perception of the WST's overall usefulness in that patient's care (n = 21). The extremes of the VAS were labeled 0 (not useful at all) and 100 (extremely useful).
Data analysis
The analyses for specific components of the study have been described. We also generated descriptive statistics for the quantitative data. We defined statistical significance as P less than .05, with a Bonferroni adjustment to .0015 (.05/33) for the evaluation of the reliability of individual skills. This was performed to avoid difficulties resulting from multiple comparisons. We used Rasch analysis42, 43, 44, 45 to determine if the skill set in a screening WST could be reduced.
Results
The percentage of subjects who were successful on individual skills and their total scores are shown in table 3. The mean wheeling speeds for the 50-meter, level-propulsion skill (including a 180° turn) were .60m/s (range, .26-.98m/s) and .62m/s (range, .20-1.03m/s) for WSTs 1 and 2.
Regarding practicality, we found that it was feasible to perform the WST in a single session without unduly tiring the subject or tester. The raters identified a need to better define the starting and finishing positions and the success criteria for several skills, a need to separate the scoring of skills on the left and right sides, and a need to improve the flow from 1 skill to the next. The mean time ± standard deviation (SD) taken to administer WST 1 was 31 ± 9 minutes (range, 18-50min), an average of 5.3 ± 9 minutes longer than the 27 ± 9 minutes (range, 17-50min) for WST 2 (P = .013, paired t test).
From the safety and tolerance perspectives, there were no adverse incidents. One woman with stroke became frustrated about two thirds of the way through WST 1 and refused to finish it, but she completed WST 2 uneventfully. Although 2 subjects found the WST uncomfortable (eg, during the 10-cm curb test), none who completed the test found it stressful, and all indicated that they would perform the test again if their clinicians suggested that it would be appropriate.
As for reliability, table 4 shows the percentage concordance scores on individual skills.
Table 4. Reliability of the WST (version 1.0)
| Test-Retest | Intrarater | Interrater | ||||||
|---|---|---|---|---|---|---|---|---|
| Group | Skill No. | Skill | N* | %† | N | % | N | % |
| Brakes | 1 | On | 21 | 100† | 21 | 100† | 24 | 100‡ |
| 2 | Off | 21 | 100† | 21 | 100† | 24 | 100† | |
| 3 | Brake extensions | 3 | 67 | 5 | 100 | 4 | 50 | |
| 4 | Caster locks | 1 | 100 | 1 | 100 | 1 | 100 | |
| Footrests | 5 | Flip up/down | 16 | 75† | 17 | 82† | 19 | 89† |
| 6 | Swing away/back | 16 | 69 | 17 | 94† | 19 | 100† | |
| 7 | Remove/replace | 16 | 56 | 17 | 88† | 19 | 95† | |
| 8 | Elevate/lower | 1 | 0 | 2 | 100 | 2 | 100 | |
| Armrests | 9 | Flip or swing | 16 | 44 | 16 | 56 | 19 | 84† |
| 10 | Remove/replace | 18 | 67 | 19 | 89† | 22 | 91† | |
| 11 | Elevate/lower | 4 | 0 | 5 | 40 | 5 | 100 | |
| Transfer | 12 | Unweighting | 21 | 62 | 21 | 81† | 24 | 79† |
| 13 | To/from | 21 | 67 | 21 | 62 | 24 | 58 | |
| Folding | 14 | Cushion, and so forth | 21 | 76† | 21 | 90† | 24 | 79† |
| 15 | Fold/open | 19 | 68 | 19 | 89† | 22 | 91† | |
| 16 | Quick-release | 4 | 25 | 4 | 75 | 4 | 100 | |
| Reaching | 17 | Floor | 21 | 86† | 21 | 95† | 24 | 88† |
| 18 | Knapsack | 5 | 80 | 8 | 75 | 5 | 100 | |
| 19 | High object | 20 | 100† | 21 | 95† | 23 | 78† | |
| Maneuvering | 20 | Slalom | 21 | 90† | 21 | 100† | 24 | 92† |
| 21 | 3-point turns | 21 | 43 | 21 | 57 | 24 | 67† | |
| 22 | Parallel parking | 21 | 81† | 21 | 86† | 24 | 79† | |
| Doors | 23 | Open toward | 21 | 86† | 21 | 86† | 24 | 75† |
| 24 | Open away | 21 | 71† | 21 | 86† | 24 | 96† | |
| 25 | Threshold | 21 | 71† | 21 | 90† | 24 | 83† | |
| Level | 26 | 50m | 21 | 100† | 21 | 95† | 24 | 100† |
| Surfaces | 27 | Soft | 21 | 81† | 21 | 95† | 24 | 96† |
| 28 | Gravel | 21 | 62 | 21 | 90† | 24 | 83† | |
| Incline | 29 | Ascend | 21 | 76† | 21 | 81† | 24 | 79† |
| 30 | Descend | 21 | 71† | 21 | 81† | 24 | 71† | |
| Curb | 31 | Ascend | 21 | 90† | 21 | 100† | 24 | 100† |
| 32 | Descend | 21 | 57 | 21 | 67 | 24 | 92† | |
| Wheelie | 33 | Pop/hold | 21 | 100† | 21 | 100† | 24 | 100† |
| * The N columns reflect cases in which a 0, 1, or 2 was assigned (ie, not applicable cases were ignored). In some cases, N differs from those in table 3 because of missing data in 1 of the 2 data sets needed for reliability statistics. † Percentage concordance. ‡ Statistically significant at the P < .0015 level (Bonferroni adjusted). | ||||||||

Fig. 1.
Reliability of the WST. In each figure, the total raw scores are plotted and the unity line is shown. (A) Test-retest reliability. Subjects' scores on WSTs 1 and 2 (n = 21, r = .65). (B) Intrarater reliability. Scores from 2 evaluations by the same rater (n = 21, r = .96). Two points are superimposed (at x41, y42). (C) Interrater reliability. Scores from the evaluations of 2 different raters (N = 24, r = .95).
Regarding content validity, the subjects' occupational therapists unanimously endorsed 30 (91%) of the 33 WST skills. Two suggested that the wheelie skill be eliminated, 1 recommended elimination of the curb skill, and 1 said the soft-surface skill should be omitted. One therapist suggested adding a “lateral threshold to simulate a grass/sidewalk difference” and 1 suggested that the test door should have a “hydraulic close mechanism.”
For construct validity and the question of whether a significant change had occurred between the 2 WSTs, by using the categoric responses, the therapists felt that 13 (62%) of 21 subjects had improved, 8 (38%) had not changed, and none had worsened. The paired t tests between the subjects' total raw WST scores on tests 1 and 2 showed a mean improvement of 2.1 ± 5.7 (P < .05). The mean improvement in the total WST scores for those in the “improved” category was 3.2 ± 5.9, compared with 0.5 ± 5.2 for persons in the “unchanged” category (P = .15). The mean improvement in the global ratings was 6% ± 14% (P < .05); for users in the “improved” category, it was 12% ± 14%, significantly greater than the −3% ± 10% for persons in the “unchanged” category (P = .012). Spearman's rank correlation coefficient between the mean changes in the total raw WST scores and global ratings was .45 (P = .025).
We reviewed the health records of the 8 subjects whose WST 2 scores were lower than their WST 1 scores. Three were clinically stable and 5 had experienced significant clinical problems between the 2 test administrations. These problems included progressive myelopathy from a spinal cord tumor, severe dermatitis of the stump of a patient with a transtibial amputation, newly diagnosed metastatic carcinoma of the prostate, new gangrene in a leg scheduled for amputation, new urinary tract infection, new onset of deep vein thrombosis, and new onset of back pain. Three subjects were using a different wheelchair when tested the second time. Five patients had therapeutic priorities other than manual wheelchair skills that occupied their time in the intervening period (eg, extensive bowel and bladder management in 1, assessment for powered mobility in 1, gait training in 3). One subject had a 2-week hiatus from rehabilitation when he went away for Christmas.
Regarding the relationship of WST scores with age, 1 outlying value was excluded from the analysis because it violated the model assumptions. The variability was great, but the slope was slightly negative (P < .05).
The mean ± SD total WST score for the 6 subjects who had used their wheelchairs for 1 to 21 days was 31.3 ± 3.1; for the 18 persons who had used wheelchairs for 22 or more days, it was 38.1 ± 8.3 (P = .0085).
The mean ± SD total WST scores, from lowest to highest, were 28.3 ± 11.6 for 4 wheelchair users with stroke, 35.0 ± 2.7 for 3 persons with neuromuscular disorders, 36.9 ± 6.3 for 11 patients with amputations, 39.8 ± 6.7 for 3 persons with spinal cord injury, and 45.5 ± 3.5 for 3 patients with musculoskeletal disorders. These differences were not significant.
For concurrent validity, Spearman's rank correlation coefficient between the total raw WST scores and the occupational therapist's global ratings was .40 (P = .01) for WST 1 and .54 (P = .008) for WST 2.
For usefulness, subjects' therapists perceived the WST to be moderately useful in caring for their patients, with a mean VAS score of 59% ± 22% after WST 2. Several therapists commented that their patients' performances were better than they had expected, particularly for skills that the therapists had not yet witnessed in the treatment setting.
There were sufficient data for Rasch analysis on 20 skills (shown in fig 2).

Fig. 2.
Rasch analysis of the 20 skills from the WST for which there were sufficient data for the analysis. Six items (indicated by the open boxes) could be eliminated, leaving a unidimensional test of wheelchair ability. The dimension line is plotted.
Discussion
Our qualitative observations and the subjects' mean time of 29 minutes to perform the WST confirmed its practicality. Some refinements on the basis of this study (eg, the addition of prerequisites for some tasks) should reduce the performance time. The WST was safe and generally well tolerated by patients.
The test-retest reliability of .65 for the total scores was good, despite the lengthy interval between tests and the intervening changes for some subjects. Had we administered the WST twice within a 24-hour period, we would have had a better measure of subject variability. The intra- and interrater reliabilities of .96 and .95 for the total scores were excellent. The almost identical r values imply that the masking we used for the intrarater reliability scoring was adequate. Our use of a single rater for intrarater reliability evaluation and 1 pair of raters for interrater reliability are limitations of the study; however, this was practical and consistent with other studies of the reliability of clinical rehabilitation measures.22
The reliability results for individual skills were less consistent. Low sample sizes (n < 5 for 5 of the 33 skills) probably contributed to the lack of statistically significant concordance in some skills. The low n value for these test items resulted from the fact that the subjects' wheelchairs were not equipped with the relevant component (eg, caster locks, elevating footrests). These low n values have implications for the content validity of the WST. But, scoring dilemmas were also a factor in the low concordance values for some skills. To determine whether the reliability would have been improved if a binary pass-fail scoring system had been used instead of the 0, 1, or 2 scoring, we combined the 0 and 1 scores and recalculated the percentage concordance. Some of the recalculated values were improved and we will use pass-fail scoring in future versions of the WST. However, there appeared to be more room for improvement in the percentage concordance values by clarifying the scoring criteria rather than by simply narrowing the number of scoring options.
Regarding content validity, the occupational therapists unanimously endorsed most of the WST skills, with 1 to 2 dissenting opinions on 3 of the most difficult skills (the soft surface, the curb ascent, the wheelie). Regarding skills that therapists believed should have been included but were not, 1 therapist suggested a new test and another suggested a modification in the test equipment. The skills included in the WST covered a broad spectrum—all subjects had success with at least some skills (ie, no floor effect) and no subject was successful with them all (ie, no ceiling effect). A patient's failure to successfully ascend the curb or to perform a wheelie, 2 skills that clinical practice suggests many young wheelchair users (eg, with paraplegia) can perform, probably reflects the user's characteristics more than a lack of content validity for these skills. The actual number of wheelchair users capable of performing these and other skills is not known, a fact that shows the need for, and more studies of, the WST.
Our findings suggest 2 important implications for the WST content. First, its design criteria should require that a significant proportion of wheelchair users have the equipment that is related to a specific test item. Second, a significant number of wheelchair users should be capable of performing each test. By using these and other criteria, some skills, such as the caster-lock skill, might be excluded from future versions of the WST.
In evaluating construct validity, the changes in total WST and global rating scores between WSTs 1 and 2 were statistically, but not clinically, significant. Although it could be argued that it is self-evident that there would be a strong correlation between the changes in the WST and the global rating scores (given that the occupational therapists witnessed the WSTs), the fact that we found only a moderate correlation refutes that assumption. We had hoped for a greater improvement in the WST scores between tests 1 and 2, but the average time that subjects had spent using a wheelchair was almost 1 year. It is likely that many of the subjects had already reached a plateau in their skill levels by the time they were recruited for the study. Also, our review of the health records of subjects who had slightly lower WST scores on the second occasion (the greatest being ̃12%), provided several plausible explanations for the unimproved WST scores, including changes in clinical status and in the wheelchairs used. Future attempts to evaluate the construct validity of the WST by relating changes in it to therapeutic rehabilitation programs should be made with subjects who are clinically stable and have only recently begun using a wheelchair. An alternative explanation of the slight apparent worsening of the WST scores is subject variability in the performance of the test; future test-retest trials within 24 hours should answer this question. Despite these study limitations, Spearman's rank correlation coefficient between the mean changes in the WST and global rating scores was fair (.45).
We found a significant but slightly negative relation between the total scores on WST 1 and the subjects' ages. There was also a significant relation between the total WST scores and the duration of wheelchair use; subjects who had used their wheelchairs for more than 3 weeks scored an average of 22% higher than did those with less experience. Both findings support the construct validity of the WST. There was no significant relation between the total scores on WST and the diagnoses accounting for wheelchair use, though there were some trends that might be significant with a larger sample size.
The correlation between the total WST and global rating scores (concurrent validity) was fair (r value range, .40-.54). This was not entirely unexpected; indeed, if the correlation between the WST and the criterion measure had been excellent, there would be no need for a WST. Further study should explore the reasons for the discrepancies between the WST scores and the perceptions of the occupational therapists.
The therapists' qualitative and quantitative perceptions of the usefulness of the WST were also encouraging. The qualitative observations support our assumption that objective WST testing is needed, rather than leaving therapists to rely on their impressions. We anticipate that the WST's usefulness will improve when it has been refined, when it is used as an integral part of the rehabilitation program (rather than with subjects recruited just for a research study), and when it is used during training rather than for just pre- and posttraining comparisons. In addition to its clinical usefulness (which, as noted, remains to be convincingly shown), the WST has helped us identify several wheelchair design flaws that lend themselves to engineering solutions. We have also used elements of the WST as a research tool, to evaluate the safety and efficacy of a new wheelie aid.46
This pilot study has helped us to refine the WST. By using version 2.4, we are replicating the initial study with a larger, more diverse sample. If the test is to become more widely adopted, we must develop the objectives, curriculum, and evaluation methods needed to train WST testers. Also, the test makes possible several other interesting studies: (1) using the WST for a multicenter comparison of rehabilitation outcomes in specific diagnostic categories; (2) testing the skills of rehabilitation professionals in wheelchair use, and extracting the educational implications of any identified deficits in those skills; (3) testing the combined WST abilities of wheelchair users and their attendants as a way to determine any implications for the training of attendants; (4) researching how well performance on the WST predicts a person's ability to use a powered wheelchair; (5) evaluating the impact of wheelchair innovations on WST performance; (6) determining the legal and ethical implications of using the WST as a wheelchair “driver's license” in long-term care facilities; (7) comparing the cost effectiveness of various training environments (eg, rehabilitation centers vs community); and (8) using the WST to identify the functional implications of the neglect syndrome in persons with hemiplegia.16, 17, 24
With the current emphasis on evidence-based and cost-effective practice, it has become increasingly important that rehabilitation practitioners document the effects of their interventions. Well-validated measurement instruments are needed to document program outcomes, to assist in testing research hypotheses, and to assist in the development of new technologies. Although there are many excellent measurement instruments available, there has been no instrument designed specifically to evaluate manual wheelchair function in daily living activities.
Conclusion
The WST is practical, safe, and well tolerated, and exhibits good to excellent reliability, excellent content validity, fair construct and concurrent validity, and moderate usefulness. This pilot study, though performed with a small and diverse sample of wheelchair users, reports an important development in the quest for a well-validated outcome measure of manual wheelchair ability. It has provided interesting insights about wheelchair function and a solid foundation for future studies.
Acknowledgements
The authors thank the occupational therapists of the Queen Elizabeth II Health Sciences Centre who participated in this study.
References
- . Principles of wheelchair design and prescription. In: Lazar RB editors. Principles of neurologic rehabilitation. New York: : McGraw-Hill; 1997;p. 465–481
- People with disabilities in basic life activities in the U.S. Disabilities Statistics Program. No. 3. San Francisco: : Univ California, San Francisco; April 1992; Disability Statistics Abstract
- . People with mobility impairments in the United States today and in 2010. Assist Technol. 1996;8:43–53
- . A guide to wheelchair selection: how to use the ANSI/RESNA wheelchair standards to buy a wheelchair. Washington (DC): : Paralyzed Veterans of America; 1994;
- . Powered mobility device skills test. In: RESNA 2000, Technology for the New Millenium; 2000 June 28-July 2. Orlando (FL). Washington (DC): : RESNA Pr; 2000;p. 450–452
- . Wheelchair selection and configuration. New York: : Demos Medical; 1998;
- . Status of functional outcomes for stroke survivors. Phys Med Rehabil Clin North Am. 1999;10:957–966
- . Assessment of global function: the Reintegration to Normal Living Index. Arch Phys Med Rehabil. 1988;69:583–590
- . Wheelchair pushrim kinetics: body weight and median nerve function. Arch Phys Med Rehabil. 1999;80:910–915
- . Motorized wheelchair driving by disabled children. Arch Phys Med Rehabil. 1984;65:95–97
- . Pediatric power wheelchair: evaluation of function in the home and school environment. Assist Technol. 1991;3:24–31
- . Cognitive predictors of successful powered wheelchair control in the very young child. In: Presperin JJ editors. RESNA International 92, Technology for Consumers; 1992 April; Toronto (Ont). Washington (DC): : RESNA Pr; 1992;p. 412–414
- In: Jaffe KM editors. RESNA First Northwest Regional Conference, Childhood powered mobility: developmental, technical and clinical perspective. Washington (DC): : RESNA Pr; 1987 Mar 6; Seattle (WA) 1987
- . Powered mobility: a literature review illustrating the importance of a multifaceted approach. Assist Technol. 1999;11:20–33
- . Development of the power-mobility community driving assessment. Can J Rehabil. 1998;11:123–129
- . Wheelchair obstacle performance in right cerebral vascular accident victims. J Clin Exp Neuropsychol. 1988;11:295–310
- Rightward orienting bias, wheelchair maneuvering, and fall risk. Arch Phys Med Rehabil. 1995;76:924–928
- . Reliability of a tool for assessing mobility in wheelchair-dependent paraplegics. Spinal Cord. 1998;26:427–430
- . Obstacle course performance and risk of falling in community-dwelling elderly persons. Arch Phys Med Rehabil. 1998;79:1570–1576
- . Spotter strap for the prevention of wheelchair tipping. Arch Phys Med Rehabil. 1999;80:1354–1356
- . A comparison of Likert and visual analogue scales for measuring change in function. J Chron Dis. 1987;40:1129–1133
- . Clinical measurement of the static rear stability of occupied wheelchairs. Arch Phys Med Rehabil. 1999;80:199–205
- . Effects of side slope on wheelchair performance. J Rehabil Res Dev. 1986;23:55–57
- . Wheelchair propulsion: descriptive comparison of hemiplegic and two-hand patterns during selected activities. Am J Phys Med Rehabil. 1999;78:131–135
- . Providing accessibility and useability for physically handicapped people. ANSI A117.1-1986. New York: : American National Standards Institute; 1986;
- . Accessibility guidelines for buildings and facilities (ADAAG). 56 Federal Register 35635. 1991;
- Americans with Disabilities Act of 1990. Part 1192. Accessibility guidelines for transportation vehicles. Pub L No. 101-336, 104 Stat 370, 42 USC §12204 (1991).
- . Ramp length/grade prescriptions for wheelchair dependent individuals. Paraplegia. 1991;29:479–485
- . Prediction of ramp traversability for wheelchair dependent individuals. Paraplegia. 1991;29:470–478
- . Barriers to mobility: physically-disabled and frail elderly people in their local outdoor environment. Int J Rehabil Res. 1991;14:303–312
- . Influence of floor surface on the energy cost of wheelchair propulsion. Phys Ther. 1977;57:1022–1027
- . The architectural accessibility of urban facilities to the disabled: a summary of descriptive survey results. Paraplegia. 1989;27:370–371
- . Personal care/home: accessibility/architectural adaptation/design. In: Enders A, Hall M editor. Assistive technology sourcebook. Washington (DC): : RESNA Pr; 1990;p. 156–159
- . Equal access to public accommodations. In: West J editors. The Americans with Disabilities Act: from policy to practice. New York: : Milbank Memorial Fund; 1991;p. 183–213
- . Ramps not steps: a study of accessibility preferences. J Rehabil. 1992;58:65–69
- . Barriers experienced by nondisabled wheelchair users: a university-based occupational therapy program educational exercise. Assist Technol. 1999;11:54–58
- . Promoting wheelchair accessibility of private business settings: an analysis of the effects of information prompts, feedback, and incentives. Environ Behav. 1986;18:132–145
- . Physical accessibility guidelines of consumer product controls. Assist Technol. 1997;9:3–14
- . Barriers to access: frustrations of people who use a wheelchair for full-time mobility. Rehabil Nurs. 1998;23:120–125
- . Wheelchair accessibility—living the experience: function in the community. Occup Ther J Res. 1998;18:25–43
- . An analysis of the effects of ramp slope on people with mobility impairments. Assist Technol. 1997;9:22–33
- . ABILHAND: a Rasch-built measure of manual ability. Arch Phys Med Rehabil. 1998;79:1038–1042
- . Short form of the dizziness handicap inventory. Am J Phys Med Rehabil. 1999;78:233–241
- . ADL structure for nondisabled Japanese children based on the Functional Independence Measure for children (WeeFIM). Am J Phys Med Rehabil. 1999;78:208–212
- . A brief outpatient functional assessment measure: validity using Rasch measures. Am J Phys Med Rehabil. 1997;76:8–13
- . New wheelie aid for wheelchairs: controlled trial of safety and efficacy. Arch Phys Med Rehabil. 2001;82:380–390
☆ Supported in part by the Canadian Institutes for Health Research and the Queen Elizabeth II Health Sciences Centre Research Fund.
☆☆ No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated.
★ Reprint requests to R. Lee Kirby, MD, Queen Elizabeth II Health Sciences Centre, Rehabilitation Centre, 1341 Summer St, Halifax, NS B3H 4K4, Canada, e-mail: kirby@is.dal.ca.
★★ Supplier
♢ a. Clinical Locomotor Function Laboratory, Queen Elizabeth II Health Sciences Centre, Rehabilitation Centre Site, 1341 Summer St, Halifax, NS B3H 4K4, Canada.
PII: S0003-9993(02)89995-6
doi:10.1053/apmr.2002.26823
© 2002 American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Volume 83, Issue 1 , Pages 10-18, January 2002
