Volume 88, Issue 1 , Pages 94-98, January 2007
Assessments of Interrater Reliability and Internal Consistency of the Norwegian Version of the Berg Balance Scale
Article Outline
Abstract
Halsaa KE, Brovold T, Graver V, Sandvik L, Bergland A. Assessments of interrater reliability and internal consistency of the Norwegian version of the Berg Balance Scale.
Objective
To investigate the interrater reliability and the internal consistency of the Norwegian version of the Berg Balance Scale (BBS) when applied to patients in a geriatric department.
Design
Interrater reliability was measured using the κ statistics and intraclass correlation coefficients (ICCs).
Setting
Geriatric rehabilitation unit and geriatric day hospital in Norway.
Participants
Eighty-three patients were included; 25 were inpatients in a geriatric rehabilitation unit, whereas 58 were admitted to a geriatric day hospital.
Interventions
Not applicable.
Main Outcome Measure
The BBS.
Results
The κ values for the different BBS items varied from 0.83 to 1.00, and the ICC for the sum score of the BBS was .998 (95% confidence interval, .996−.999). The mean value of the BBS was 44.4. There was a negative significant relation between age and the sum score (r=−.36). The sum scores of BBS ranged from 12 to 56. The patients were able to perform the BBS without ceiling effect. The score values 3 and 4 were more frequently used than the score values 0, 1, and 2.
Conclusions
The Norwegian version of the BBS seems to have an excellent interrater reliability and high internal consistency when applied to patients in geriatric rehabilitation.
Key Words: Balance, Geriatrics, Outcome assessment (health care), Rehabilitation
BALANCE IS OFTEN IMPAIRED in the elderly, and improvement in balance is an important goal of rehabilitation. Systematic physiotherapeutic assessment of patients with balance problems is important in planning treatment and assessing changes in motor function over time. Measuring balance can assist the clinician in selection of appropriate therapy and serve as an outcome measurement.1, 2 The Berg Balance Scale (BBS) is a brief and frequently used measure of balance for elderly people. The construct, concurrent, and predictive validity of BBS has been found to be good.3 More than 100 articles have cited the BBS since 1992.4 The BBS can be used to assess the balancing ability of the frail elderly, to monitor changes in balance over time, to screen patients for rehabilitation therapy services, and to predict falls in both community-dwelling and institutionalized older adults.1, 4, 5, 6, 7, 8, 9, 10, 11 Several studies have shown high levels of inter- and intraobserver agreement for the test as a whole and for the individual items.5, 12, 13, 14, 15, 16, 17 During rehabilitation, more than 1 physiotherapist may assess an elderly patient and high interrater reliability is therefore essential. Because errors can occur within each testing, high reliability is required when repeated measures are used to monitor the clinical status of patients or evaluate the effectiveness of treatments. The BBS has been translated into Norwegian, but the reliability of the translated test has not been evaluated. One reason for translation into Norwegian is the possibility of participating in international clinical trials that use this instrument. Another is that we can safely assume that studies using the English language version could be applicable to older adults in Norway.
The purposes of this study were to assess the interrater reliability of the Norwegian version of the BBS when applied to patients in geriatric rehabilitation departments, to assess the internal consistency, and to investigate how the different scoring levels of the 14 items were used.
Methods
Participants
The subjects were a total of 83 patients all admitted to Ullevaal University Hospital, Oslo, Norway; 25 were inpatients in a geriatric rehabilitation unit, and 58 were admitted to a geriatric day hospital (ie, an outpatient geriatric rehabilitation unit where the patients stay for 5 hours, 2 or 3 days a week, during a period of 3 weeks).
Criteria for exclusion were impairment causing difficulties in understanding verbal communication such as cognitive deficit and aphasia (diagnosed by the physician who had responsibility for inclusion) and not speaking the Norwegian language properly. Subjects with a recent fracture were excluded because we thought that pain would affect their performance. All the patients who were able to walk with or without a walking aid and had a recommendation for physiotherapy from a doctor were consecutively included. The mean age was 82 years (range, 69−95y); 58 were women and 25 men. The primary reasons for admittance were as follows: several falls (23 persons), cerebral stroke (11 persons), general poor health (9 persons), Parkinson’s disease (9 persons), low back pain (7 persons), pneumonia (6 persons), heart failure (6 persons), an osteoarthritis hip (5 persons), rheumatoid arthritis (4 persons), and diabetes (2 persons). All the subjects were ambulatory. Twenty-eight people did not require the use of walking aids, 17 used a cane, and 38 used walking frames. Data on demographic characteristics and comorbidity were collected from medical records.
Procedure
Two experienced physiotherapists who had used the BBS for several years were involved in the study. They were accustomed to using the standardized instructions of administering the test. Before commencing the study, they had 2 weeks of intensive practical training with the Norwegian version of the BBS including discussing and comparing results of testing in order to be quite sure how details concerning the patients’ performances should be scored.
The patients were tested once only. This model was chosen because all the patients were undergoing rehabilitation, and their condition could have been improved if they had been tested on 2 different days. They could also have performed better after knowing the assessment and thus have felt more secure if they had been tested twice. In addition, the scores simply could be different on a test on day 2 because the patient had a better or worse day.
Both physiotherapists scored all the patients simultaneously. They alternated between instructing and scoring and observing and scoring. They did not look at each other’s ratings and did not discuss their assessments. All the tests were performed in the same room.
The Regional Committee for Ethics in Medical Research approved the study.
Instrument
The BBS is a performance-based measure of balance consisting of 14 observable tasks frequently encountered in everyday life (table 1). Scoring is based on the patients’ ability to perform the 14 tasks or movements independently and meet certain time and distance requirements. The test is simple, easy to administer, and safe for the elderly to perform. The evaluators rate performance on a 5-level scale from 0 (cannot perform) to 4 (normal performance) for 14 different tasks involving functional balance control, including transfer, turning, and stepping.3, 5 The sum score ranges from 0 to 56.
Table 1. Distribution of Scores From 1 Evaluator Within Each of the 14 Items of the BBS (N=83)
| Item Number and Description | Scoring Values | Mean | ||||
|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | ||
| 1. Sitting to standing | 0 | 1 | 2 | 29 | 51 | 3.6 |
| 2. Standing unsupported | 1 | 1 | 2 | 1 | 78 | 3.9 |
| 3. Sitting unsupported | 0 | 0 | 0 | 0 | 83 | 4.0 |
| 4. Standing to sitting | 0 | 0 | 1 | 32 | 50 | 3.6 |
| 5. Transfers | 0 | 1 | 2 | 25 | 55 | 3.6 |
| 6. Standing with eyes closed | 1 | 2 | 2 | 2 | 76 | 3.8 |
| 7. Standing with feet together | 6 | 4 | 4 | 5 | 64 | 3.4 |
| 8. Reaching forward with outstretched arm | 9 | 4 | 5 | 30 | 35 | 2.9 |
| 9. Retrieving an object from floor | 5 | 1 | 0 | 3 | 74 | 3.7 |
| 10. Turning to look behind | 4 | 1 | 9 | 10 | 59 | 3.5 |
| 11. Turning 360° | 4 | 7 | 34 | 11 | 27 | 2.6 |
| 12. Placing alternate foot on stool | 17 | 6 | 5 | 14 | 41 | 2.7 |
| 13. Standing with 1 foot in front | 34 | 0 | 8 | 33 | 8 | 1.8 |
| 14. Standing on 1 foot | 11 | 54 | 5 | 6 | 7 | 1.4 |
| Total | 92 | 82 | 79 | 201 | 708 | |
Statistical Analysis
Data were analyzed by using the SPSS program.a Intraclass correlation (2-way mixed-model, single measure) was used to measure interrater reliability of the BBS’s sum score.18 An intraclass correlation coefficient (ICC) of .80 or higher reflects high reliability, .60 to .79 moderate reliability, and less than .60 indicates that reliability is poor.18, 19, 20
The interrater agreement of individual items of the BBS was analyzed by means of a κ score. A κ score indicates the agreement between raters, adjusted for the amount of agreement expected by chance and the magnitude of disagreements.20 To calculate κ and to construct categories (used by both evaluators), we condensed item-rating categories to eliminate the categories used by only 1 evaluator. A κ value of .75 or higher indicates excellent agreement, 0.4 to .74 indicates fair to good agreement, and less than 0.4 indicates poor agreement.18, 21
The floor and ceiling effects of the sum score reflect the extent that scores cluster at the bottom and top of the scale range. Floor and ceiling effects of more than 20% are considered to be significant.14 The magnitude of the floor and ceiling effects may be indicative of the sum score’s ability to discriminate between subjects.
To test the construct validity and dimensionality of the BBS, factor analysis with varimax rotation was performed. Factors were extracted with an eigenvalue greater than 1. Internal consistency of the BBS was tested both by item-to-total correlation and by calculating the Cronbach α12, 22, 23 for each evaluator’s scorings. The Cronbach α is regarded as high if it is at least .80.22 An item-to-total correlation shows the degree of association between each item and the total score of the other items in the scale. An item-to-total correlation is considered adequate if it is above 0.4.20 The Spearman rank correlation coefficient was used to investigate the relation between variables.
Cross-Cultural Translation
The procedure used to produce the Norwegian version of the BBS was the forward-backward translation method,24 involving the following steps.
Step 1Step 1 is the translation into Norwegian of the original version of BBS. English-Norwegian translators, native Norwegian speakers, with more than 15 years of education, were involved. Each translator independently translated the BBS and then compared and discussed the result with that of the other, until a common version was reached.
Step 2Step 2 is back-translation of the Norwegian version of the BBS into English: the preliminary version was given to 2 native English people who were experienced translators, each producing a translation into English. These translators were unaware either of the methodology or of the aims of the study.
Results
A total of 83 patients (25 men, 58 women; mean age ± standard deviation, 82±5.5y; range, 69–95y) were included. The mean values of the BBS scored by the 2 evaluators were 44.4±8.6 and 44.3±8.6, respectively. The items are presented in table 1. There was a negative significant relation between age and the BBS sum score (r=−.36), the items sitting to standing (r=−.24), standing with feet together (r=−.24), reaching forward with outstretched arm (r=−.24), turning to look behind (r=−.27), turning 360° (r=−.41), placing alternate foot on stool (r=−.31), and standing on 1 foot (r=−.28) for both evaluators. The sum score of BBS was similar for men and women.
Distribution
The sum scores ranged from 12 to 56 for both evaluators. Two persons got the top sum score (56) on the 14 items, and nobody got the sum score of 0. Table 1 displays the frequency distribution for the scores of the 14 items. Some rating categories were not used at all, and others were used very sparingly. Totally, each evaluator completed 1162 scores (see table 1). The score values 0, 1, 2, 3, and 4 were used in 7.9%, 7.1%, 6.8%, 17.3%, and 60.9% of the times, respectively. The items standing with 1 foot in front and standing on 1 foot had the lowest mean score, indicating a greater degree of difficulty.
Reliability and Construct
The extent of agreement (κ) between scores for each of the 14 items obtained by both evaluators was excellent (table 2). The κ value ranged from 0.83 to 1.00, and the mean κ was .94. The evaluators scored differently on only 17 occasions out of the total 1162 scores (1.5%). The largest score difference was 2, which was related to turning to look behind. The ICC between the 2 raters for the BBS’s sum score was .988 (95% confidence interval, .966−.999).
Table 2. Reliability Coefficient (κ) for Each Item of the BBS (N=83)
| Item | κ |
|---|---|
| 1. Sitting to standing | 0.95 |
| 2. Standing unsupported | 1.00 |
| 3. Sitting unsupported | ⁎ |
| 4. Standing to sitting | 0.85 |
| 5. Transfers | 0.97 |
| 6. Standing with eyes closed | 1.00 |
| 7. Standing with feet together | 0.94 |
| 8. Reaching forward with outstretched arm | 1.00 |
| 9. Retrieving object from floor | 0.94 |
| 10. Turning to look behind | 0.83† |
| 11. Turning 360° | 0.97 |
| 12. Placing alternate foot on stool | 0.98 |
| 13. Standing with 1 foot in front | 0.96† |
| 14. Standing on 1 foot | 0.88 |
⁎Everyone scored 4. |
†Rating categories 0 and 1 are merged because the 2 score levels were not used by both evaluators. |
Factor analysis on the 14 items of the BBS gave 3 factors with eigenvalues greater than 1. Together the 3 factors accounted for 65% of the matrix variance (30%, 26%, and 9%, respectively). The first factor, which we decided to call changing position, consisted of the items sitting to standing, standing to sitting, transfers, turning 360°, and placing alternate foot on stool (table 3). The second factor, which we called maintaining the position, contained the items standing unsupported, standing with eyes closed, standing with feet together, reaching forward with outstretched arm, retrieving an object from floor, and turning to look behind. The third factor, which we called from broad to narrow base of support, covered the items standing with 1 foot in front and standing on 1 foot.
Table 3. Results From Factor Analyses of the BBS
| Item | Name of the Factor | ||
|---|---|---|---|
| Changing Position | Maintaining Position | From Broad to Narrow Base of Support | |
| 1. Sitting to standing | .80 | † | † |
| 2. Standing unsupported | † | .78 | † |
| 3. Sitting unsupported⁎ | † | † | † |
| 4. Standing to sitting | .77 | † | † |
| 5. Transfer | .86 | † | † |
| 6. Standing with eyes closed | † | .78 | † |
| 7. Standing with feet together | † | .58 | † |
| 8. Reaching forward with outstretched arm | † | .61 | † |
| 9. Retrieving an object from floor | † | .70 | † |
| 10. Turning to look behind | † | .59 | † |
| 11. Turning 360° | .65 | † | † |
| 12. Placing alternate foot on stool | .86 | † | † |
| 13. Standing with 1 foot in front | † | † | .85 |
| 14. Standing on 1 foot | † | † | .43 |
⁎Everyone scored 4 (not entered in the factor analysis). |
†Item load scored below 0.4 on the factors. |
The Cronbach α coefficient of the BBS’s sum score was .87. The correlation matrix, calculated for the 14 items and item-to-total correlation, is presented in table 4. A correlation coefficient could not be computed for item 3 (sitting unsupported) because the scores did not vary. The significant item-to-item correlations range from r equal to .15 to r equal to .87. Except for 2 items, the item-to-total correlations for all items were higher than 0.4.
Table 4. Correlations Between Items and Between the BBS Sum Score and Items
| Item | 1 | 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | ||||||||||||
| 2 | .26⁎ | 1 | |||||||||||
| 4 | .67† | .16 | 1 | ||||||||||
| 5 | .87† | .31† | .68† | 1 | |||||||||
| 6 | .44† | .31† | .33† | .35† | 1 | ||||||||
| 7 | .55† | .41† | .42† | .56† | .51† | 1 | |||||||
| 8 | .55† | .28⁎ | .34† | .46† | .37† | .48† | 1 | ||||||
| 9 | .37† | .40† | .27⁎ | .42† | .43† | .40† | .37† | 1 | |||||
| 10 | .51† | .26⁎ | .50† | .49† | .41† | .45† | .37† | .47† | 1 | ||||
| 11 | .60† | .32† | .44† | .58† | .41† | .58† | .58† | .44† | .42† | 1 | |||
| 12 | .68† | .25⁎ | .59† | .74† | .31† | .59† | .53† | .41† | .54† | .70† | 1 | ||
| 13 | .17 | .23⁎ | .23⁎ | .24⁎ | .15† | .14 | .09 | .04 | .21 | .10 | .12 | 1 | |
| 14 | .35† | .33† | .37† | .36† | .31† | .42† | .38† | .42† | .41† | .43† | .51† | .09 | .1 |
| Sum | .76† | .33† | .65† | .74† | .41† | .64† | .69† | .47† | .56† | .80† | .81† | .37† | .60† |
⁎P<.05. |
†P<.01. |
Discussion
All the κ values in the present study were above .82. Considering that κ values greater than .75 signify excellent agreement,18, 21, 25 our study shows an excellent interrater reliability when using BBS to assess balance of patients in geriatric rehabilitation. These findings fit well with the results in other studies.5, 12, 13, 14, 15, 16, 17 The generalization of the results is strengthened by the varied clinical characteristics of the subjects and the lack of control of the test conditions (see participant description in the Methods section).
The raters had used BBS for several years, and the test cannot be assumed to be as reliable with less experienced health care professionals. The Cronbach α measure was high (.87), indicating strong internal consistency. This finding confirms that BBS items describe a homogeneous variable, in line with results from the original version of the BBS.25
The primary advantage of having multiple homogeneous items in the BBS is that they provide a basis for more consistent estimate of the ability of subjects to balance. Most of the item-to-total correlation coefficients are above the critical value 0.4 (see table 4). Although some item relations showed fairly high correlation (see table 4), none had a correlation coefficient exceeding .90 and were thus not so highly related as to be redundant.5
The BBS assesses both static and dynamic aspects of balance,26 as shown in table 3. Our factor analysis indicates that 3 factors have emerged (see table 3). The first factor (changing position) addresses the ability to maintain balance when changing position(s). The second factor (maintaining position) relates to maintaining the same position with a broad base of support. The third factor (from broad to narrow base) is related to maintaining balance with a narrow base of support when starting in a position with a broad base of support. Factor analyses in other studies have shown that only 1 or 2 factors have emerged,16 and a possible reason for this discrepancy could be that our study population was less heterogeneous than the study population of Ottonello.16
In our study, the mean value 44.4 of the BBS was higher than reported by Ottonello.16 This difference is probably associated with lower level of function and more impairments in the Ottonello study. Floor and ceiling effects have been shown in other studies.14, 15 However, in our study, no significant ceiling and floor effect were seen.
A distinct feature of the BBS in our study was that some ratings were not used at all or were underused (see table 1). We found no variability between patients in the item sitting unsupported, which corresponds with the experience of Ottonello16 and Berg12 and colleagues, who reported that more than 90% had a top score on this item, indicating a very low degree of difficulty. By condensing item-rating categories, we could eliminate underused categories and construct categories that separated people of differing abilities better. The score values 3 and 4 were significantly more frequently used than the score values 0, 1, and 2, indicating that 3 levels might be better than 5 levels in our population. This is supported by the results of Kornetti4 and Wang27and colleagues.
The BBS is frequently used4 also in Norway. Our study has shown that the Norwegian version of the BBS has excellent interrater reliability and internal consistency. Thus, Norwegian researchers may participate in multicenter international clinical trials that use this instrument. In addition, studies performed by using the Norwegian version of the BBS can safely be included in review articles and meta-analyses.
Conclusions
The Norwegian version of the BBS appears to have excellent interrater reliability and high internal consistency when used by experienced physiotherapists on patients in geriatric rehabilitation.
Supplier
References
- . Use of clinical and impairment-based tests to predict falls by community-dwelling older adults. Phys Ther. 2003;83:328–339
- . Measurement in neurological rehabilitation. Oxford: Oxford Univ Pr; 1992;
- . Measuring balance in the elderly: validation of an instrument. Can J Public Health. 1992;83(Suppl 2):S7–S11
- . Rating scale analysis of the Berg Balance Scale. Arch Phys Med Rehabil. 2004;85:1128–1135
- . Measuring balance in the elderly: preliminary development of an instrument. Physiother Can. 1989;41:304–311
- . Predicting the probability for falls in community-dwelling older adults. Phys Ther. 1997;77:812–818
- . Use of the Berg Balance Test to predict falls in elderly persons. Phys Ther. 1996;76:576–585
- . Standing balance and function over the course of acute rehabilitation. Arch Phys Med Rehabil. 1995;76:994–999
- . A review of clinical balance tools for use with elderly populations. Crit Rev Phys Rehabil Med. 2003;15:167–205
- . A review of balance instruments for older adults. Am J Occup Ther. 1998;52:666–671
- . Relationship of balance and mobility to fall incidence in people with chronic stroke. Phys Ther. 2005;85:150–158
- . The balance scale: reliability assessment with elderly residents and patients with acute stroke. Scand J Rehabil Med. 1995;27:27–36
- . Reliability and validity of functional balance tests post stroke. Clin Rehabil. 2004;18:916–923
- . Analysis and comparison of the psychometric properties of three balance measures for stroke patients. Stroke. 2002;33:1022–1027
- . Balance assessment in patients with peripheral arthritis: applicability and reliability of some clinical assessments. Physiother Res Int. 2001;6:193–204
- . Psychometric evaluation of the Italian version of the Berg balance scale in rehabilitation inpatients. Eur Med Phys. 2003;39:181–189
- . Reliability and validity of measures obtained from stroke patients using the Balance Master. Arch Phys Med Rehabil. 1996;77:425–430
- . In: The design and analysis of clinical experiments. New York: John Wiley & Sons; 1986;p. 1–32
- . Research methodology and applied statistics. Physiother Can. 1980;32:253–257
- . Practical statistics for medical research. London: Chapman & Hall; 1991;
- Interobserver variation in the reporting of cervical colposcopic biopsy specimens: comparison of grading systems. J Clin Pathol. 1996;49:833–835
- . Coefficient alpha and the internal structure of test. Psychometrika. 1951;16:297
- . Health measurement scales. New York: Oxford Univ Pr; 1989;
- . Cross-cultural research and back-translation. Sport J. 2005;8:1–10
- . Foundations of clinical research. Upper Saddle River: Prentice Hall Health; 2000;
- . Clinical and laboratory measures of postural balance in an elderly population. Arch Phys Med Rehabil. 1992;73:1073–1080
- . Psychometric properties of 2 simplified 3-level balance scales used for patients with stroke. Phys Ther. 2004;84:430–438
- a Version 13.00; SPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606.
Supported by the Norwegian Fund for Postgraduate Training in Physiotherapy.No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated.
PII: S0003-9993(06)01424-9
doi:10.1016/j.apmr.2006.10.016
© 2007 American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Refers to erratum:
- Correction
Volume 88, Issue 1 , Pages 94-98, January 2007
