| | The Development and Validity of the Salford Gait Tool: An Observation-Based Clinical Gait Assessment ToolAbstract Toro B, Nester CJ, Farren PC. The development and validity of the Salford Gait Tool: an observation-based clinical gait assessment tool. ObjectivesTo develop the construct, content, and criterion validity of the Salford Gait Tool (SF-GT) and to evaluate agreement between gait observations using the SF-GT and kinematic gait data. DesignTool development and comparative evaluation. SettingUniversity in the United Kingdom. ParticipantsFor designing construct and content validity, convenience samples of 10 children with hemiplegic, diplegic, and quadriplegic cerebral palsy (CP) and 152 physical therapy students and 4 physical therapists were recruited. For developing criterion validity, kinematic gait data of 13 gait clusters containing 56 children with hemiplegic, diplegic, and quadriplegic CP and 11 neurologically intact children was used. For clinical evaluation, a convenience sample of 23 pediatric physical therapists participated. InterventionsWe developed a sagittal plane observational gait assessment tool through a series of design, test, and redesign iterations. The tool’s grading system was calibrated using kinematic gait data of 13 gait clusters and was evaluated by comparing the agreement of gait observations using the SF-GT with kinematic gait data. Main Outcome MeasuresCriterion standard kinematic gait data. ResultsThere was 58% mean agreement based on grading categories and 80% mean agreement based on degree estimations evaluated with the least significant difference method. ConclusionsThe new SF-GT has good concurrent criterion validity. INSTRUMENTED GAIT ANALYSIS remains the criterion standard assessment tool for the management of gait abnormalities,1, 2 although the interpretation of gait analysis data for use in making clinical decisions varies.3 In routine clinical practice, however, access to an instrumented gait laboratory is relatively rare,4 and has resulted in the development of a variety of observational gait assessment (OGA) tools as an alternative. The process by which these tools have been developed, however, is often unclear and there is scarce evidence to support their validity.5 In the context of gait analysis, validity refers to the degree to which the assessment measures the actual events of gait. Face validity is the lowest level of validity and is based on the observer’s personal opinion. Construct validity is determined by theoretical reasoning that a gait assessment tool adequately measures selected gait variables. When a gait assessment tool is believed to include the domains that are required to adequately assess gait, its content is considered valid (content validity). Criterion validity, the highest level of validity of a gait assessment tool, is evaluated by comparing the results obtained by the tool to the criterion standard measurement of gait, which is instrumented analysis of gait kinematics, kinetics, and muscle activity. The use of tools that have criterion validity is justified because we can have confidence in their ability to accurately reflect actual gait events. The development of most existing OGA tools appears to have been based on their clinical construct validity, rather than by demonstration of their criterion validity through comparison with quantitative kinematic gait data. Those tools that have been compared with the criterion standard have only poor to moderate validity.6, 7, 8, 9, 10 The Hugh Williamson Gait Laboratory Scale,7 a modified version of the Physician Rating Scale (PRS),6 had poor criterion validity when it was compared with the quantitative gait assessment of sagittal plane foot and knee joint kinematics of 25 children with cerebral palsy (CP) gait, using 4 experienced raters (κ range, .46–.61). The Observational Gait Scale (OGS), another variant of the PRS, also had modest validity when compared with quantitative kinematic data (κ=.69; range, .38–.94) in the assessment of 20 children with spastic diplegia by 2 experienced assessors.8 That investigation was limited to the scale’s first 4 sections (knee position at mid stance, initial foot contact, foot position at mid stance, timing of heel rise). There was poor agreement between 3-dimensional gait data and 2 experienced observers using the Visual Gait Assessment Scale (VGAS), the most recent variant of the PRS, in the assessment of the gait of 31 children with CP hemiplegia.9 The mean κ scores ranged from −.05 to .51 (mean, .22) for 4 parameters (hip position in terminal stance, hip position in mid-swing, knee peak extension in terminal stance, knee peak flexion in swing). The validity of the Edinburgh Visual Gait Score (EVGS) was measured by reporting agreement between scores and quantitative kinematics for each of the 10 numeric gait items that measure movement at the ankle, knee, hip, and pelvis.10 Percentage agreement between kinematic data and 5 experienced observers who assessed sagittal gait images from 4 children with CP and 1 neurologically intact child ranged from 47% for maximum knee extension in stance, up to 83% for maximum ankle dorsiflexion in swing (mean agreement, 64%). The lack of objective information about existing observation-based gait assessment tools,5 coupled with evidence of the unmet needs of clinicians,4 led us to develop the Salford Gait Tool (SF-GT), a new clinically orientated OGA tool for therapists who manage gait problems of children with CP. This article describes the development of the tool’s construct, content, and criterion validity, using a combination of clinical experience and quantitative kinematic data, and then reports the levels of agreement between gait assessments made with the SF-GT and quantitative kinematic data. Methods  Initial Design of the SF-GT The initial structure of the SF-GT was based on our previous experience,11 a review of existing tools,5 and reviews of video images of 10 children (6 boys, 4 girls; mean age, 7y; age range, 5−10y) with hemiplegic (n=3), diplegic (n=3), and quadriplegic (n=4) CP gait. All 10 gave their written consent to participate in the research and ethics approval was granted. The initial tool was designed to assess sagittal plane hip, knee, and ankle angular positions at 6 specific events during the gait cycle (initial contact, end double support, mid stance, start double support, toe-off, mid swing). We selected 6 events as a compromise between using a large number of events that would enable a comprehensive assessment but would be time consuming, or using a small number of events that would be quicker to complete but might provide too little detail. Moreover, the gait events needed to be precisely and repeatedly identifiable visually. We devised a 5-point category scoring system (2, 1, 0, −1, −2) to describe the positions of the hip, knee, and ankle at the 6 gait events (3 joints, 6 gait events = 18 total assessments). Each scoring category would correspond to a specific range of angular positions. Importantly, we felt that the range of angular positions defining each category must be small enough to be sensitive to differences between different gait styles and changes resulting from clinical interventions, but large enough to be identified by the naked eye when observing gait on a video screen. The sum of the 6 category scores for each joint would then represent that joint’s function over the entire gait cycle and provide a qualitative description of the entire joint pathology. It was our intention that the final definitions of the boundaries between 1 scoring category and the next, as well as the boundaries between the summed category scores, would be based on quantitative kinematic gait data that described different types of gait pathology. To enable us to obtain early user feedback on the tool’s overall design, however, the boundaries between categories were provisionally set based on gait literature12, 13, 14 and clinical observation. Development of Construct and Content Validity This initial version of the tool was then evaluated for its user friendliness, physical layout, wording, and construct and content validity by 9 successive focus groups over a 1-year period. Eight groups had an average of 19 physical therapy students per group (total N=152) and 1 group included 4 physical therapists who were specialists in the fields of gait assessment, pediatrics, neurology, and musculoskeletal physical therapy. There were several testing, redesign, and testing iterations as changes were made to the tool at each stage. Development of Criterion Validity In this development stage, we used quantitative kinematic gait data to adjust the upper and lower boundaries of the scoring categories for each of the 3 joints at the 6 gait phases of the SF-GT. Such boundaries in existing tools appeared to be defined based on clinical experience6, 7, 8, 9, 10 rather than on formal evaluation of the gait patterns the tools would subsequently be used to evaluate. We assumed that careful adjustment of these boundaries would improve the SF-GT’s validity, sensitivity, and specificity. In previous work15 we defined 13 gait styles by using cluster analysis of kinematic gait data from 56 children with a mixture of CP types and 11 neurologically intact children. Details of the kinematic data collection protocol and statistical analysis have been reported.15 We adjusted the SF-GT so that the upper and lower boundaries of each scoring category and the boundaries of the summed category scores reflected the mean joint positions and standard deviations (SDs) of the kinematic data of the children with each of the 13 distinct gait types. The calibration procedure was completed for the hip, knee, and ankle at all 6 gait phases. It was stopped when the scoring of the kinematic data of all gait types resulted in an adequate numeric and qualitative description of the entire joint pathology that was different from the other gait types. Clinical Evaluation To evaluate the clinical validity of gait assessments made with the SF-GT, the gait of 13 children was assessed by 23 pediatric physical therapists. The visual assessments were then compared with the quantitative assessment of the children’s gait. The observers were recruited from 9 National Health Service Trusts from the Greater Manchester area in the United Kingdom and their clinical expertise ranged from junior (n=1), senior II (n=2), senior I (n=18), to superintendent (n=2). All observers gave their written consent before participating. The 13 children each represented 1 of the 13 gait styles previously defined with cluster analysis (9 boys, 4 girls; age range, 6−16y; mean age, 9.5y).15 Eleven children had hemiplegia (n=4), diplegia (n=6), or quadriplegia (n=1) CP and 2 children were neurologically intact. The video data were collected during the same laboratory visit at which the kinematic gait data were collected. We used that data, which were part of a larger clinical gait analysis database, to define the 13 gait styles.15 Parents and children had previously given their consent to use the images for research purposes and the research was approved by the appropriate ethics committees. Before making the gait assessments, the 23 observers were given general training on gait, video assessment of gait, and the use of the SF-GT. Observers were each allocated a workstation with a DVD player and television screen and did a trial gait assessment. The observers then worked individually without discussion to assess 1 gait cycle of 1 leg of each of the 13 children, using the SF-GT. They were permitted to work at their own speed, to review the gait cycles as often as required, and there was no time limit. The assessments were completed in from 3 to 5 hours. Data Analysis We analyzed the level of agreement between clinicians’ gait assessments using the SF-GT and the criterion standard kinematic gait assessments in 2 ways. First, we compared the frequency of agreement between the scoring categories estimated by each observer with the scoring categories assigned by the kinematic data (eg, if 5° of knee flexion at initial contact by the kinematic data were category 1, there was agreement if the observer also rated the knee as category 1). We also analyzed the extent of agreement by counting the frequencies of observer category scores that deviated by 1, 2, or more categories from the categories assigned by the kinematic data. The results of this analysis would enable direct comparison with the results of other OGA tools. A disagreement, however, between observer category scores with categories assigned by the kinematic data may not always reflect the true closeness (or difference) between the observer assessment and kinematic data. For example, if an observer estimated the hip flexion to be 46° (category 2) and the actual kinematic data were 45° (category 1), this would have resulted in a disagreement by 1 category, even though there would have been only 1° difference between the assessments. Conversely, if an observer estimated the hip flexion to be 45° (category 1) and the actual kinematic degrees were 16° (also category 1), this would have resulted in a perfect agreement, even though the estimations would have differed by 29°. Therefore, the observer derived angles in degrees (recorded to 1°) were compared with the joint positions derived from the kinematic gait data (recorded to 0.1°). For this analysis, we used the least significant difference (LSD) method.10, 16, 17 The LSD defines a “proximity” range above and below the target kinematic data. These upper and lower boundaries define what would be considered acceptable deviations from a perfect match to the kinematic data. We assumed that if the observers stated a number of degrees that was within this range then it was close enough to the actual kinematic data to be valid. This approach has been used in a similar way to assess intraobserver repeatability of the EVGS.10 The LSD represents a statistically significant difference between observer and kinematic data and occurs because of interclinician variation. It considers the distance between all the observations and the target kinematic data and is derived by multiplying the SD between 2 assessments (observer and kinematic data) with the relevant t distribution from the Student t test (with 1 degree of freedom, 2-tailed t test, .05 level of significance). If the observer’s estimated degree value lies outside the LSD range, then the difference between observers and kinematic degrees is statistically significant and too large to be considered acceptable. We used Excela for the statistical analysis. Results  Clinical Evaluation Agreement between category scores Observers agreed on 2900 category scores (58%; range, 28%−75%) of a possible 5004 category scores with the kinematic gait data (table 2). Regarding the magnitude of disagreement, of the 2104 category scores that disagreed, 95% (1991 category scores) disagreed by 1 category, 4% (93 category scores) disagreed by 2 categories, 1% (19 category scores) differed by 3 categories, and 1 observer’s category score differed by 4 categories from the kinematic category score. Best agreement (83%) was for the hip joint of cluster 1 (gait style is mild crouch gait) and the lowest agreement (22%) was for the ankle joint of cluster 3 (gait style is moderate crouch gait). The knee joints were assessed more accurately than the hip and ankle joints (mean agreement: hip, 58%; knee, 61%; ankle, 56%). Cluster 10 (weak plantarflexion gait) was the gait style with the best agreement (75%). Cluster 3 (moderate crouch gait) had the least agreement (28%). Agreement between degrees We computed the LSD on a total of 4446 pairs of observer and kinematic data for all gait events (n=6) of all joints (n=3) and clusters (n=13). The degree data from 2 observers were not recorded on the SF-GT and therefore were not available for analysis. The mean LSD was 16.25° (range, 2.31°−63.20°). This means that the mean observations differed by 16.25° from the kinematic data. An average of 80% (range, 68%–91%) of gait observations lay within the LSD range of the kinematic data at the 6 phases of gait for all joints and clusters (table 3). Best agreement (91%) was for the hip joint of cluster 4 (gait style is severe crouch) and the lowest agreement (68%) was for the hip joint of cluster 2 (gait style is mobile crouch). The mean agreement between joints was similar (mean agreement: hip, 80%; knee, 80%; ankle, 81%). The clusters that demonstrated best agreement (both 84%) were clusters 6 (gait style is moderate equinus/knee extension) and 13 (gait style is normal gait). The cluster with the least agreement was cluster 2 (gait style is mobile crouch). Regarding the magnitude of agreement between the observers and the kinematic data, 86% of observations were within 25° of kinematic data, 79% of observations were within 20° of kinematic data, 62% of observations were within 15° of kinematic data, 24% within 10°, and 2.6% within 5°. Table 4 shows the data from 1 observer for mild equinus gait (cluster 5); this gait cluster was representative of a mean agreement of 80% between all observers. | | |  | Gait Cluster 5: Mild Equinus | Kinematic Data | Observer Data | SD | t=39n | LSD | Mean LSD |  |
|---|
 | Hip | | | | | | |  |  | Initial contact | 19 | 20 | 0.71 | 2.02 | 1.43 | 31.85 |  |  | End double support | 18 | 30 | 8.49 | 2.02 | 17.14 | 19.78 |  |  | Mid stance | −2 | 10 | 8.49 | 2.02 | 17.14 | 17.14 |  |  | Start double support | −19 | −10 | 6.36 | 2.02 | 12.86 | 13.57 |  |  | Toe-off | −3 | 0 | 2.12 | 2.02 | 4.29 | 11.28 |  |  | Mid swing | 27 | 45 | 12.73 | 2.02 | 25.71 | 22.21 |  |  | Knee | | | | | | |  |  | Initial contact | 20 | 30 | 7.07 | 2.02 | 14.28 | 16.07 |  |  | End double support | 22 | 20 | 1.41 | 2.02 | 2.86 | 14.21 |  |  | Mid stance | 12 | 10 | 1.41 | 2.02 | 2.86 | 9.57 |  |  | Start double support | 14 | 15 | 0.71 | 2.02 | 1.43 | 11.28 |  |  | Toe-off | 61 | 50 | 7.78 | 2.02 | 15.71 | 13.43 |  |  | Mid swing | 71 | 90 | 13.44 | 2.02 | 27.14 | 22.07 |  |  | Ankle | | | | | | |  |  | Initial contact | −3 | −5 | 1.41 | 2.02 | 2.86 | 16.00 |  |  | End double support | −1 | −10 | 6.36 | 2.02 | 12.86 | 5.50 |  |  | Mid stance | 1 | −10 | 7.78 | 2.02 | 15.71 | 8.28 |  |  | Start double support | 2 | −5 | 4.95 | 2.02 | 10.00 | 9.57 |  |  | Toe-off | −33 | −30 | 2.12 | 2.02 | 4.29 | 17.85 |  |  | Mid swing | −15 | −10 | 3.54 | 2.02 | 7.14 | 15.35 |  | | | |
Discussion  There was 58% mean agreement between clinicians and kinematic data based on rating joints on a 5-point scale. Using the LSD method on the observed degree value of joint positions, the mean clinicians’ gait observations agreed an average of 80% with kinematic gait data. Our results (mean agreement, 58%) are similar to those found for the EVGS (mean agreement, 64%).10 Both results were based on percentage agreement of kinematic and estimated category scores of sagittal gait. It is difficult, however, to compare our validity results with the results of other tools when different statistical approaches, such as the Cohen κ, were used. Assessing validity by comparing the agreement between assigned categories can lead to overly optimistic results. This can be the case when an OGA tool uses categories that correspond to a specific range of angular positions and κ scores are based on the agreement of categories, not on agreement of the observed degrees of joint position. There is, therefore, no precise information on how close the observations were to the kinematic data, other than that they lay within the same category range. These ranges of angular values vary between OGA tools, from 15° (OGS) to 20° (VGAS) for most categories, and open-ended categories (ie, >25°) are common to all tools.6, 7, 8, 9, 10 This means that because of the tools’ validity measurements, an observer can deviate by up to 20° (or more in the case of open-ended categories) from the target kinematic data and the validity agreement is still considered to be “perfect.” By using the LSD method for measuring the SF-GT’s validity, we compared the actual observed degrees of joint position with the degrees given by the kinematic gait data. This method makes possible a precise evaluation of the level of validity of the SF-GT and the diversion of the observation from kinematic data. Assuming that perfect agreement when using observational gait assessment is not achievable, the question of how much deviation from the criterion standard measurement is acceptable remains unanswered. The answer is subjective and is likely to differ, depending on the context for the use of the OGA. The LSD method we used showed statistical significance, which relates only to our confidence that the differences we observed were real and not by chance. Some LSDs were very large (ie, 40°) and certainly beyond any reasonable clinical limits of validity, but they were rare. Large LSDs also occurred when only 1 observer diverted hugely from the kinematic data; this phenomenon reflects the wide-ranging abilities of practitioners to perceive 3-dimensional motion via 2-dimensional images. The majority of joint position observations were between 5° and 20° of the kinematic data, which could be deemed acceptable for visual evaluation of large movements as they occur in the lower limb and for classifying the gross characteristics of pathologic gait. Whether the data could be used to detect changes resulting from interventions is less clear, although if measures are stable over time (repeatable), that would infer some clinical value. In terms of our methodologic approach there are several notable points. We used relatively inexperienced clinicians in our assessments of the clinical validity of the SF-GT. Other tools are more commonly assessed by expert clinicians. The SF-GT includes 18 individual assessments per gait cycle and this scrutiny is far more detailed than most other OGA tools have attempted.6, 7, 8, 9, 10 We assume that with a more detailed breakdown of gait and more complexity in the assessment, the task is more challenging and therefore errors are more likely. The quantitative kinematic data was the mean of 10 to 16 gait trials, whereas the observers analyzed just 1 gait cycle. This could be seen as a limitation because it is not strictly a comparison of like with like. This work, however, was deliberately pragmatic in its approach because this approach best reflects gait assessment in clinical practice. To ensure that the gait cycles scored with the SF-GT were as they would be in the clinical setting, the children had to walk without reflective markers on their limbs; consequently, the quantitative and the observation gait data could not have been recorded at the same time. Furthermore, the whole basis of clinical gait assessment is that observations of a small number of cycles can be used to represent a subject’s gait pattern and pathology as it is outside the laboratory. As such, pragmatically it should not matter that we compared data that were not from the same gait cycle, because they are all supposed to represent the same subject and his/her gait pattern. To create a more “like-for-like” comparison, the clinicians’ assessments could have been made using several gait cycles, and this would have taken greater account of the known variations between cycles.18, 19 This would have proved very time consuming, however, given that the observers took several hours to assess the 13 gait cycles. Also, our use of only 1 cycle means that agreement between SF-GT scores and quantitative data might in fact be better than described here, because the most likely outcome of using only 1 gait cycle is that the agreement is adversely affected. Conclusions  We have described the development of the SF-GT and its adjustment to facilitate identification of 13 different gait styles defined by prior statistical analysis of kinematic gait data from children with CP and normal gait. Development of existing gait assessment tools is not described in the literature and the use of quantitative kinematic data in tool development is a more systematic approach than what appears to have been adopted in the past. Assessment of the tool’s use by clinicians revealed good agreement between SF-GT scores and quantitative kinematic data, although the question of what is acceptable disagreement between observations and quantitative kinematic data remains unanswered. Our further work will include the evaluation of the SF-GT’s inter- and intrarater repeatability. Supplier APPENDIX 1.  References  1. 1Gage JR, DeLuca PA, Renshaw TS. Gait analysis: principle and applications with emphasis on its use in cerebral palsy. Instr Course Lect. 1996;45:491–507. MEDLINE 2. 2Sutherland DH. The evolution of clinical gait analysis (Part II kinematics). Gait Posture. 2002;16:159–179. Abstract | Full Text |
Full-Text PDF (909 KB)
|
CrossRef
3. 3Skaggs DL, Rethlefsen SA, Kay RM, Dennis SW, Reynolds RA, Tolo VT. Variability in gait analysis interpretation. J Pediatr Orthop. 2000;20:759–764. MEDLINE 4. 4Toro B, Nester CJ, Farren PC. The status of gait assessment among physical therapists in the United Kingdom. Arch Phys Med Rehabil. 2003;84:878–884. 5. 5Toro B, Nester CJ, Farren PC. A review of observational gait assessment in clinical practice. Physiother Theory Pract. 2003;19:137–149.
CrossRef
6. 6Koman LA, Mooney JF, Smith BP, Goodman A, Mulvaney T. Management of spasticity in cerebral palsy with botulinum-A toxin: report of a preliminary, randomized, double-blind trial. J Pediatr Orthop. 1994;14:299–303. MEDLINE 7. 7Pirpiris M, Ugoni A, Starr R, et al. The ‘physician rating scale’—validity and reliability. Gait Posture. 2001;13:293. 8. 8Mackey AH, Lobb GL, Walt SE, Stott NS. Reliability and validity of the Observational Gait Scale in children with spastic diplegia. Dev Med Child Neurol. 2003;45:4–11. MEDLINE 9. 9Dickens WE, Smith MF. Validation of a visual gait assessment scale for children with hemiplegic cerebral palsy. Gait Posture. 2006;23:78–82. Abstract | Full Text |
Full-Text PDF (166 KB)
|
CrossRef
10. 10Read HS, Hazlewood ME, Hillman SJ, Prescott RJ, Robb JE. Edinburgh visual gait score for use in cerebral palsy. J Pediatr Orthop. 2003;23:296–301. MEDLINE |
CrossRef
11. 11Hudson PC. An evaluation of the management of tendoachilles shortening in cerebral palsied children. Salford: Univ Salford; 2000;. 12. 12Whittle M. Gait analysis: an introduction. Oxford: Butterworth-Heinemann; 1991;. 13. 13Perry J. Gait analysis: normal and pathological function. Thorofare: Slack; 1992;. 14. 14Sutherland DH, Olshen RA, Biden EN, Wyatt MP. The development of mature walking. Oxford: Mac Keith Pr; 1988;. 15. 15Toro B, Nester CJ, Farren PC. Cluster analysis for the extraction of sagittal gait patterns in children with cerebral palsy. Gait Posture. 2007;25:157–165. Abstract | Full Text |
Full-Text PDF (845 KB)
|
CrossRef
16. 16Christensen R. Analysis of variance, design and regression: applied statistical methods. Boca Raton: Chapman & Hall/CRC Pr; 1996;. 17. 17Milliken GA, Johnson DE. Analysis of messy data: designed experiments. Boca Raton: Chapman & Hall/CRC Pr; 1992;. 18. 18Kirkpatrick M, Wytch R, Cole G, Helms P. Is the objective assessment of cerebral palsy gait reproducible?. J Pediatr Orthop. 1994;14:705–708. MEDLINE 19. 19Gorton GE, Stevens CM, Masso PD, William M. Repeatability of the walking patterns of normal children. Gait Posture. 1997;5:155.
Full-Text PDF (164 KB)
|
CrossRef
a Directorate of Physiotherapy, University of Salford, Salford, England b Centre for Rehabilitation and Human Performance Research, University of Salford, Salford, England. Reprint requests to Christopher J. Nester, PhD, Centre for Rehabilitation and Human Performance Research, University of Salford, Salford, M6 6PU, England
No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated. PII: S0003-9993(06)01587-5 doi:10.1016/j.apmr.2006.12.028 © 2007 American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved. | |
|