Advertisement

Interrater Reliability of Functional Status Scores for Patients Transferred From One Rehabilitation Setting to Another

      Abstract

      Kohler F, Redmond H, Dickson H, Connolly C, Estell J. Interrater reliability of functional status scores for patients transferred from one rehabilitation setting to another.

      Objective

      To report the interrater reliability of FIM total score, FIM motor subscore, and FIM cognitive subscore from scoring that occurred in routine clinical practice in 2 closely linked inpatient rehabilitation services in Sydney, Australia.

      Design

      A natural-experiment blind clinical interrater reliability cohort study of the FIM across 2 rehabilitation units.

      Setting

      This study is set in 2 inpatient rehabilitation units immediately adjacent to each other in southwestern Sydney, New South Wales, Australia.

      Participants

      All patients (N=143) who were transferred between the 2 rehabilitation units between August 2006 and October 2007 were included in the study.

      Intervention

      Discharge FIMs were scored by the first unit and an admission FIM was scored independently by the second unit within a few days. The FIM scores were analyzed for agreement and systematic bias.

      Main Outcome Measure

      Intraclass correlation coefficients, kappa statistic, weighted kappa statistic, and Bland-Altman plots were used.

      Results

      There were 143 sets of scores identified. The range of differences between the 2 FIM totals was −32 to 50, between the FIM motor subscores was −22 to 43, and between the FIM cognitive subscores was −14 to 21. Bland-Altman plots demonstrated poor agreement. Few FIM totals were perfectly matched. The intraclass correlation coefficients ranged from .872 for the FIM total to .830 for the cognitive subscales. Values for kappa ranged from −.007 (FIM motor subscore) to .123 (FIM cognitive subscore). Values for weighted kappa ranged from .465 (FIM cognitive subscore) to .521 (FIM total).

      Conclusions

      There was no systematic scoring bias evident. Intraclass correlation coefficients were high, but tests of agreement demonstrated poor agreement. These findings have implications for the use of the FIM and any patient classification or funding system based on the FIM, especially if poor levels of agreement were found in the presence of all staff being FIM credentialed and standardization of methods of assessment. This study indicates that further investigation of agreement of both FIM totals and FIM item scores in the clinical setting is warranted.

      Key Words

      List of Abbreviations:

      AN-SNAP v2 (The Australian National Sub-Acute and Non-Acute Patient Classification), CI (confidence interval), ICC (intraclass correlation coefficient)
      INCREASING EMPHASIS ON patient classification and funding systems using activities of daily living scales in rehabilitation medicine mandates a good understanding of the underlying reliability of the scale used in the classification. Activities of daily living scales such as the FIM, which can be used as proxy measures of outcome, have been used in classification and funding systems for rehabilitation patients for about 15 years.
      • Stineman M.G.
      • Escarce J.J.
      • Gion J.E.
      • Hamilton B.B.
      • Granger C.V.
      • Williams S.V.
      A case-mix classification system for medical rehabilitation.
      Fundamental requirements of a classification system include accuracy and easy reproducibility of allocation into the classes. Accuracy of class allocation in rehabilitation is in turn dependent on the underlying measures used in the classification. Clinicians should be aware of the strengths and limitations of the underlying measurements, which form the basis of any classification, because this influences or determines the accuracy and reliability of the classification system.
      The FIM, which is commonly used in the inpatient setting to measure the functional level of patients,
      • Ravaud J.F.
      • Delcey M.
      • Yelnik A.
      Construct validity of the functional independence measure (FIM): questioning the unidimensionality of the scale and the “value” of FIM scores.
      assesses performance during tasks that can be broadly categorized as activities of daily living, mobility, and cognition. The FIM has a total of 18 items, for which a score ranging from 1 to 7 is given, with 7 signifying complete independence or normative function and 1 signifying complete dependence or requiring total assistance. The total maximum score of the FIM is 126, which implies total independence; the minimum score is 18, which implies full assistance is required for all 18 items.
      The prospective payment system in the United States uses the admission FIM motor score to allocate patients into a case mix group that ultimately determines funding to the inpatient rehabilitation facility.
      • Zorowitz R.D.
      Inpatient rehabilitation facilities under the prospective payment system: lessons learned.
      In Australia, the Australian National Subacute and Non Acute Patient Classification version 2
      • Green J.G.
      • Poulos C.
      • Broadbent A.
      Report on the development of Version 2 of the AN-SNAP Classification.
      has been promoted for use in funding. This classification system uses the total FIM score as well as the motor or the cognitive subscores in different parts of its classification. A detailed description of AN-SNAP v2 or other classifications based on the FIM is beyond the scope of this article, but for ease of understanding, a copy of the Australian National Subacute and Non Acute Patient Classification version 2 classification is included as appendix 1. Of particular relevance is the smallest range of FIM points for allocation into specific classes. For FIM totals, the smallest range defining a class is 24 points; for FIM motor, the smallest range is 10 points; and for FIM cognitive, the smallest range is 4 points. Good interrater agreement of FIM ratings is essential to ensure reliability of the classification.
      Reliability refers to the degree that a scale is free from random error. It is the stability or consistency of measurement. Two components of reliability that are commonly examined are test-retest reliability or reproducibility, and interrater reliability.
      • Hammersley M.
      Some notes on the terms “validity” and “reliability”.
      Test-retest reliability is a measure of the consistency of scores on repeated testing, and interrater reliability refers to the consistency of scores when 2 different raters score the patient. In clinical practice, reliability is generally not routinely measured. However, clinical reliability is important in the context of using outcome measures for benchmarking or classification and funding purposes. Differences in assessments or poor clinical reliability of measurements, when used as a surrogate measure of unit performance or to determine unit funding, may result in a skewed perception of a unit's performance.
      A review of international literature on the interrater reliability of instruments for measuring functional dependence discusses some of the shortfalls of the interrater reliability studies on the FIM.
      • Kliebisch U.
      • Brenner H.
      Inter-Rater_reliabilitaet von Instrumenten zur Beurteilung der Pflegebeduerftigkeit: Ein Review der internationalen Literatur.
      The review suggests that a more intensive study of reliability and validity of these instruments is required, and in particular that interrater studies with multiple raters need to be carried out. Some published studies of FIM reliability have been carried out in controlled settings using standardized patients and patient descriptions or videos of patients.
      • Fricke J.
      • Unsworth C.
      • Worrell D.
      Reliability of the functional independence measure with occupational therapists.
      A review of published studies of reliability found acceptable reliability of the FIM across various settings.
      • Ottenbacher K.J.
      • Hsu Y.
      • Granger C.V.
      • Fiedler R.C.
      The reliability of the functional independence measure: a quantitative review.
      In a study of 20 community patients that demonstrated good interrater reliability as measured by ICCs, with values ranging from .90 to .99, the mean FIM cognitive score difference was 6, the mean FIM motor score difference was 17, and the mean FIM total score difference was 23.
      • Ottenbacher K.J.
      • Mann W.C.
      • Granger C.V.
      • Tomita M.
      • Hurren D.
      • Charvat B.
      Inter-rater agreement and stability of functional assessment in the community-based elderly.
      This suggests that there is poor underlying agreement between the raters, although this was not reported in the study. One study has reviewed interinstitutional agreement in patients transferred from the acute setting to the rehabilitation setting. It reported on both the reliability of individual FIM items and the reliability coefficient for the total FIM scores. It reported that the reliability coefficient varied from .49 to .87 depending on the subgroup; however, the subgroups were quite small. Although some of the limitations of correlation coefficients were acknowledged and agreement was measured for individual FIM items, results on agreement of the total FIM scores were not included in the published article.
      • Segal M.E.
      • Ditunno J.F.
      • Staas W.E.
      Interinstitutional agreement of individual Functional Independence Measure (FIM) items measured at two sites on one sample of SCI patients.
      ICCs are most commonly used to report reliability of FIM totals.
      • Ottenbacher K.J.
      • Hsu Y.
      • Granger C.V.
      • Fiedler R.C.
      The reliability of the functional independence measure: a quantitative review.
      Pearson product moment correlation and kappa have also been used.
      • Ottenbacher K.J.
      • Hsu Y.
      • Granger C.V.
      • Fiedler R.C.
      The reliability of the functional independence measure: a quantitative review.
      Correlation requires paired values to be linked by a linear relationship, or if one defines correlation more broadly, the paired values need to be linked by some mathematic function.
      • Robinson W.S.
      The statistical measure of agreement.
      In patients where disability is measured by the FIM, the mathematic relationship is that as one is more independent, one is scored higher and therefore has a higher total FIM score. However, even if there is a good correlation between 2 measurements, the actual values (in this case, the FIM scores) may not be in agreement with each other.
      • Bartko J.J.
      On various intraclass correlation reliability coefficients.
      High correlation coefficients are associated with small within-subjects variance
      • Bartko J.J.
      On various intraclass correlation reliability coefficients.
      and the range of the true quantity in the sample. If the range of values in the sample is wide, the correlation will be greater than if it is narrow.
      • Donner A.
      • Wells G.
      A comparison of confidence interval methods for the intraclass correlation coefficient.
      Because the total FIM has a range of 108 different possible scores, even a subject variance of 5 FIM points is a relatively small proportion. This would explain a high correlation between the scores even if there was relatively low absolute agreement. Agreement between raters ultimately becomes a question of what difference in score is clinically relevant.
      The aim of this article is to report the FIM interrater reliability for FIM total score, FIM motor subscore, and FIM cognitive subscore from assessments that occurred in routine clinical practice in 2 closely linked inpatient rehabilitation services in Sydney, Australia.

      Methods

       Settings

      This study was set in 2 rehabilitation units immediately adjacent to each other in southwestern Sydney, New South Wales, Australia. The subacute unit is a 20-bed combined geriatric/rehabilitation ward within a 200-bed acute hospital. Patients admitted to this unit are generally not sufficiently medically stable to allow admission directly to the rehabilitation unit. The rehabilitation unit is a 36-bed mixed general rehabilitation unit in an immediately adjacent subacute hospital.

       Data Collection Process

      All patients who are admitted to either unit have their functional levels measured using the FIM within 72 hours of admission and in the 72 hours prior to discharge as part of routine patient care. For most patients, the admission FIM is assessed within 24 hours of admission to the unit. In the case of FIM mobility items, this is done by the physiotherapist, usually within 2 or 3 hours of admission to the rehabilitation unit as part of a detailed physical assessment. FIM scoring and data collection are performed independently by various therapists from different disciplines, each concentrating on their area of expertise. The physiotherapists rate the mobility items; the occupational therapists rate the self-care items; the nursing staff rate bowel, bladder, and cognition; and the speech therapists rate the language items. In the subacute unit, the team consists of 1 physiotherapist, 1 occupational therapist, and a complement of nurses, of whom 5 are regularly involved in measuring the patients' activity for FIM scores. Up to 7 people were involved in the data collection in the subacute unit. In the rehabilitation unit, there are 3 physiotherapists, 3 occupational therapists, and 8 nurses who are involved in collecting the FIM data. In both units, some of the raters are FIM credentialed and some are not. The identity and number of individual staff involved in collecting the FIM data for any particular patient are not recorded. This process for scoring FIM items occurs in many rehabilitation units.
      • Segal M.E.
      • Ditunno J.F.
      • Staas W.E.
      Interinstitutional agreement of individual Functional Independence Measure (FIM) items measured at two sites on one sample of SCI patients.
      The individual ratings are collected on the FIM data sheet and are subsequently entered into a data base, usually by a member of the clerical staff.

       Patients

      All patients included in this study commenced their rehabilitation in the subacute unit and completed it in the rehabilitation unit. These patients have a discharge FIM scored by the subacute unit staff within 3 days of discharge but most frequently in the last 2 days before discharge. The patients then have an independent admission FIM scored by the rehabilitation unit always within 3 days of admission, but most frequently in the first 24 hours. All patients therefore have independent assessments within a few days of each other, with a maximum of 6 days between assessments. Because the patients who are transferred to the rehabilitation unit are medically stable and are usually part way into their rehabilitation program, it would be expected that there would be minimal to no difference between the 2 scores, because it is unlikely, although certainly not impossible, for any significant functional change to occur in the interim.
      All patients who were transferred between the 2 units between August 2006 and October 2007 were included in this study.

       Relationship of Ratings

      Copies of functional and mobility assessments, but not the actual FIM scores, accompanied the patients on transfer between the 2 hospitals. The general practice is for a complete assessment to be carried out on all patients who are admitted to either of the units. The FIM scores are collected and processed independently in the 2 units and are not available for the clinical staff to peruse.
      The clinicians working on the 2 units were unaware of the study and were thus blind, continuing their usual practice of FIM scoring. The transfer of patients from one unit to the other therefore constituted a natural experiment allowing the interrater properties of the FIM to be studied.
      Approval for the study was gained from the respective human research ethics committees of the hospitals. The FIM data were routinely collected for the purpose of demonstrating patient improvement as well as service quality and funding purposes. Patient identifiers were not required for the purpose of the analysis, and no staff involved in the data collection could be identified from the data. All staff involved in FIM scoring and data collection were in agreement with the study when they were informed. The need to seek individual consent for analysis of the data and publication of the study was waived by the human research ethics committees because loss of data by refusal to participate would jeopardize the integrity of the study. The study presented no risk to participants.
      Information on patient demographics, diagnostic groupings, FIM totals, and FIM motor and FIM cognitive subscores were recorded. The differences between the totals were calculated by subtracting the admission FIM score or subscore from the relevant discharge FIM score or subscore. The minimum and maximum values from this calculation were taken as the extremes of the range.
      In view of the ongoing debate regarding appropriate measures of agreement and their use, we analyzed the data in a number of ways including calculation of the 1-way random model ICC, kappa statistic, weighted kappa statistic, and Bland-Altman plots. The Bland-Altman plot is a graphic presentation of the 2 sets scores in which the differences between the 2 scores are plotted against the averages of the 2 scores. Horizontal lines are drawn at the mean difference and at the limits of agreement, which are defined as the mean difference plus or minus 1.06 times the SD of the differences.
      Power calculations were completed and included in the discussion. The analysis and graphing were performed using SPSS Statistics 17.0 for Windows
      SPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606.
      and MedCalc statistical software.
      MedCalc Statistical Software, Broekstraat 52, 9030 Mariakerke, Belguim.

      Results

      The review included 143 paired FIM scores.
      The average age of the patients was 76 years, and the median age was 79 years. Most of the patients, 63%, had an orthopedic condition, and 13% of patients had a stroke, with the rest coming from diverse groups.
      The results for FIM totals and FIM motor and FIM cognitive subscores are summarized in table 1, and a summary of the distribution of differences is outlined in table 2.
      Table 1Summary of Results for FIM Total Scores and Subscores for Discharge, Admission, and Differences
      Item DescriptionDischarge FIMAdmission FIMDifference Between Discharge and Admission FIM
      No. of patients143143NA
      Mean FIM total score82.882.30.50
      Median FIM total score86851
      Range of FIM total scores18 to 12218 to 125–32 to 50
      Mean FIM motor score53.554.20.7
      Median FIM motor score55561
      Range of FIM motor scores13 to 8813 to 90–22 to 43
      Mean FIM cognitive score29.328.1–1.2
      Median FIM cognitive score3130–1
      Range of FIM cognitive scores5 to 355 to 35–14 to 21
      Abbreviation: NA, not applicable.
      Table 2Summary of Distribution of Differences Between Discharge and Admission FIM Scores
      Category DescriptionTotal FIM ScoreFIM Motor ScoreFIM Cognitive Score
      No difference4235
      4 or more FIM point difference1109959
      10 or more FIM point difference665316
      20 or more FIM point difference1841
      There was considerable difference between the 2 FIM total scores, with a range of −32 to 50. The results are shown graphically in figure 1. The Bland-Altman plot of FIM total scores is shown in figure 2. On statistical analysis of the FIM total scores, the ICC was .872 (CI, 0.822–0.908), the kappa was .011, and the weighted kappa was .521.
      Figure thumbnail gr1
      Fig 1Graph showing scatter plot of discharge FIM totals and admission FIM totals.
      Figure thumbnail gr2
      Fig 2Bland-Altman plot showing distribution of the difference between discharge and admission FIM total score against the mean of the discharge and admission FIM total score. Abbreviations: FimDisTot, FIM discharge total; FimAdmTot, FIM admission total.
      There was considerable difference between the 2 FIM motor subscores with a range of −22 to 43. The results are shown graphically in a Bland-Altman plot in figure 3. On statistical analysis of the FIM motor subscores, the ICC was .854 (CI, .797–.895), the kappa was −.007, and the weighted kappa was .493.
      Figure thumbnail gr3
      Fig 3Bland-Altman plot showing distribution of the difference between discharge and admission FIM motor score against the mean of the discharge and admission FIM motor score. Abbreviations: FimDisMot, FIM FIM discharge motor; FimAdmMot, FIM admission motor.
      There was considerable difference between the 2 FIM cognitive subscores with a range of –14 to 21. Of the 35 scores in perfect agreement, most had agreement because of the ceiling effect of the measure; 28 of these patients had the highest possible score of 35. The results are shown graphically in a Bland-Altman plot in figure 4. On statistical analysis of the FIM cognitive scores, the ICC was .830 (CI, .764–.878), the kappa was .123, and the weighted kappa was .465.
      Figure thumbnail gr4
      Fig 4Bland-Altman plot showing distribution of the difference between discharge and admission FIM cognitive score against the mean of the discharge and admission FIM cognitive score. Abbreviations: FimDisCog, FIM discharge cognition; FimAdmCog, FIM admission cognition.

      Discussion

      All 143 patients who were transferred between the 2 units over the 15-month period were included in the study. No patients needed to be excluded from the review for incomplete data because we place significant emphasis on having completed FIM scores for our patients.
      In our setting, only the FIM total and the motor and cognitive subscores are clinically or administratively relevant. We considered it appropriate to concentrate on these 3 elements in our study. There was no systematic bias evident between the scoring practices in the 2 units, as demonstrated by the small median differences between the FIM total as well as the FIM subscores. We are confident that the scoring reflects our normal practice, because the staff were not informed that the data for these patients would be analyzed in this manner prior to the commencement of the study and were thus blind during the study. It has been suggested that scoring of functional status could be open to manipulation to maximize apparent improvement
      • Segal M.E.
      • Ditunno J.F.
      • Staas W.E.
      Interinstitutional agreement of individual Functional Independence Measure (FIM) items measured at two sites on one sample of SCI patients.
      ; however, these results show no evidence of manipulation.
      From the clinical point of view, a difference of 4 points in the FIM cognition score, 10 points in the FIM motor score, or 20 points in the FIM total score between test and retest may be acceptable. However, it could also mean the difference between a patient being independent or not independent, determined by the actual underlying difference in the FIM item scores. In view of the narrow range in the AN-SNAP v2 classification, such a difference would ensure that the patient falls into a different class. If the patient is allocated into a different class, this might alter the predicted length of stay and associated cost weights for funding. In this case, such a difference (FIM cognitive, motor, or total) is highly relevant.
      The method described by Weir
      • Weir J.P.
      Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM.
      uses the SEM and the ICC to construct CIs and to determine the minimal difference required in order to be confident that there has been a real change in the performance. When this method is applied to the total FIM scores, a minimum difference of 20 FIM points in a repeated total FIM score is required before the difference could be considered to be a real change. Twenty FIM points in either direction signifies considerable clinical variability within the spectrum of nonsignificant change. However, even when such a wide range of statistical nonsignificance is applied, there are still 18 of the 143 patients who have a real change in total FIM scores. The broad statistical range suggests that great care needs to be taken when measuring and interpreting FIM improvement and efficiency.
      There has been considerable discussion in the literature regarding the best tools and methods for analyzing categorical data for reliability and agreement both with respect to the FIM and from a general statistical point of view.
      • Sheikh K.
      Disability scales: assessment of reliability.
      • Posner K.L.
      • Sampson P.D.
      • Caplan R.A.
      • Ward R.
      • Cheney F.W.
      Measuring interrater reliability among multiple raters: an example of methods for nominal data.
      • Banjeree M.
      • Capozzoli M.
      • McSweeney L.
      • Sinha D.
      Beyond kappa: a review of interrater agreement measures.
      • Bland J.M.
      • Altman D.G.
      Statistical methods for assessing agreement between two methods of clinical measurement.
      A detailed discussion would go well beyond the limits of this work, and interested readers are referred to the literature. No clear best measure of agreement is evident, and ultimately the issue of agreement is a matter of the clinical importance of differences in repeated measures.
      The ICC is a measure of correlation or association rather than absolute agreement. The values we observed, .872 for the FIM total score, .854 for the FIM motor subscore, and .830 for the FIM cognitive subscore, fit into a category of high correlation (with 1.0 signifying complete correlation) between the paired measurements. This would be expected, because patients would be grouped into similar bands of functional independence but not necessarily scored exactly the same. The correlation coefficient for total FIM in this study lies at the lower end of the range (.83–.99) published in the literature in a review of 11 studies of validity of the FIM.
      • Ottenbacher K.J.
      • Hsu Y.
      • Granger C.V.
      • Fiedler R.C.
      The reliability of the functional independence measure: a quantitative review.
      However, the correlation coefficient was slightly higher than .83 published in the article on interinstitutional agreement for the FIM.
      • Segal M.E.
      • Ditunno J.F.
      • Staas W.E.
      Interinstitutional agreement of individual Functional Independence Measure (FIM) items measured at two sites on one sample of SCI patients.
      It is noteworthy that the correlation coefficients decrease (although they remain high) as the number of possible scores attainable decreases, in line with the properties of correlation coefficients as outlined in the introductory section.
      Weighted kappa statistics approximate ICCs and therefore are also more of a measure of association than agreement.
      • Fleiss J.L.
      • Cohen J.
      The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability.
      In this study, the weighted kappa values fall into the fair to moderate agreement range. As regards the methodology of weighting, it is not clinically sensible for FIM scores, which vary by considerable amounts, such as greater than 5 or 6 FIM points, to contribute to improved agreement. However, this is exactly the case in calculating weighted kappa.
      Unweighted kappa values are a better reflection of pure agreement, but there are well published concerns about their limitations.
      • Cohen J.
      Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
      In this study, the unweighted kappa results demonstrated very poor to negative agreement, probably because of asymmetry of the matrix of scores.
      • Feinstein A.R.
      • Cicchetti D.V.
      High agreement but low kappa, I: the problems of two paradoxes.
      Bland and Altman
      • Bland J.M.
      • Altman D.G.
      Statistical methods for assessing agreement between two methods of clinical measurement.
      • Altman D.D.
      • Bland J.M.
      Measurement in medicine: the analysis of method comparison studies.
      suggested that a high correlation for any 2 methods designed to measure the same property is in itself just a sign that one has chosen a wide spread sample. A high correlation does not automatically imply that there is good agreement between the 2 methods. They described a method of data plotting used in analyzing the agreement between 2 different measurements. The Bland-Altman plot shows the difference between the variables against the average of the variables. Limits of agreement usually set at ±1.96 SD of the mean measure the agreement between the variables. Large limits of agreement show poor agreement between the variables. In this study, the Bland-Altman plots show that there is no obvious agreement between the FIM differences and the means of the FIM scores for the total FIM score as well as the motor and cognitive FIM subscores. For the FIM total, the limits of agreement are 26.2 points on either side of the mean. Therefore, based on the values in this study, even a difference of 52 FIM total points would fall within the limits of agreement. Fifty-two FIM total points translate to an average difference of as much as 3 FIM points per FIM item. For the FIM motor subscore, the limits of agreement are 20.7 points on either side of the mean, indicating a difference as much as 4 FIM points per FIM motor item, and for the FIM cognitive subscore, the limits of agreement are 10.9 points on either side of the mean, or a difference as much as 6 FIM points per FIM cognitive item. Although there is fair to good correlation between the measures, there is very poor agreement.

       Study Limitations

      Possible contributing factors to the poor agreement in our study include the degree of attention or rigor given to the FIM scoring by the staff, the staff level of training, and staff experience both with rehabilitation patients and using a functional outcome measure.
      All patients in this study had total FIM assessments completed, suggesting that staff were aware of the importance of scoring.
      The need for staff training or FIM credentialing/certification for accurate scoring has been highlighted in previous studies and by the owners of the FIM copyright.
      Uniform Data System for Medical Rehabilitation FIM credentialing examination Version 4.
      Not all of the staff were FIM-certified at the time of this study, and this is a potential weakness of our clinical practice. However, in many units, in Australia at least, with regular staff turnover and staff leave, at any one time there might be some staff who have not been FIM-credentialed but who perform patient assessments. If all staff were FIM-credentialed, the results might potentially be different. This issue needs further investigation. However, the reality of clinical practice is reflected in our results and indicates a challenge faced by all rehabilitation providers.
      Another possible limitation in the study could be the variable period between the 2 measurements. Generally, in these units, admission FIMs are scored within 24 hours of admission as part of a holistic comprehensive patient assessment.
      It is possible that patients might be scored on a combination of performance (at the actual time of measurement) and capacity (based on previous performance in sessions with the therapists), because the therapists are better acquainted with the patients at the time of discharge. Capacity would not be known at the time of admission, and therefore the score would be based solely on performance in the environment at admission. There is also a possibility of bias being evoked when the FIM is scored by therapists who are treating the patient rather than by more objective evaluators.
      • Dodds T.A.
      • Martin D.P.
      • Stolov W.C.
      • Deyo R.A.
      A validation of the Functional Independence Measure and its performance among rehabilitation inpatients.
      Performance might be influenced by differences in physical setting
      • Alexander N.B.
      • Galecki A.T.
      • Nyquist L.V.
      • et al.
      Chair and bed rise performance in ADL-impaired congregate housing residents.
      between the 2 units. There are no published studies on the effects of performance and capacity on FIM scores.
      There might have been a change in the patients' functional status in the time between the measurements. Based on the natural history of the diseases and progression of function in rehabilitation units, one would expect that this change would be toward improvement. However, uniform improvement was not evident in the total FIM scores of the group as a whole or in individual patients.
      The scoring of the FIM in this study, as in other studies in clinical settings, by a number of members of a team rather than a single rater might also be a source of bias, but this has not been evaluated in the literature.
      • Dodds T.A.
      • Martin D.P.
      • Stolov W.C.
      • Deyo R.A.
      A validation of the Functional Independence Measure and its performance among rehabilitation inpatients.
      A further potential contribution to the variability could be the different number of raters in the 2 institutions. There were twice as many raters involved in the rehabilitation unit. This has not been evaluated in the literature.
      Individual FIM item scores might be scored differently because it might be difficult to distinguish between standby assistance and independence, or there might be variability of performance between assessments. Subtle differences might always be a problem in functional assessments, but if the measure is sufficiently stable, then one would hope that this would balance out over the 18 items and would not be unidirectional with any subject. A separate analysis of agreement of individual item scores of the FIM
      • Kohler F.
      • Dickson H.
      • Redmond H.
      • Estell J.
      • Connolly C.
      Agreement of functional independence measure item scores in patients transferred from one rehabilitation setting to another.
      showed poor agreement among the individual FIM item scores as well.
      From previous studies,
      • Ottenbacher K.J.
      • Hsu Y.
      • Granger C.V.
      • Fiedler R.C.
      The reliability of the functional independence measure: a quantitative review.
      the expected ICC should be 0.8 or higher. With conventional power parameters alpha equal to .05 and beta equal to 0.2, and 2 trials, the number of subjects required is approximately 46, and our sample of 143 is sufficiently large to draw conclusions regarding the findings.
      • Walter S.D.
      • Eliasziw M.
      • Donner A.
      Sample size and optimal designs for reliability studies.

      Conclusions

      There was no systematic scoring bias evident. ICCs were high, but tests of agreement demonstrated poor agreement. There were wide limits of agreement. These findings may have implications for the use of the FIM, especially if poor levels of agreement were found to exist in the presence of all staff being FIM-credentialed and standardization of methods of assessment. Patient classification and funding systems based on FIM scores with any potentially inherent difficulties and inaccuracies would reflect these difficulties and inaccuracies in classification.
      This study indicates that further investigation of agreement of both FIM totals and FIM item scores in the clinical setting is warranted.
      Suppliers
      aSPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606.
      bMedCalc Statistical Software, Broekstraat 52, 9030 Mariakerke, Belguim.

      Appendix 1. Detailed Description of AN-SNAP v2 Classification

      Tabled 1
      Class No.Description
      S2-201Admit for assessment only
      S2-202Brain,Neuro,Spine & Major Multiple Trauma, FIM 13
      S2-203All other impairments, FIM 13
      S2-204Stroke, FIM motor 63-91, FIM cognition 20-35
      S2-205Stroke, FIM motor 63-91, FIM cognition 5-19
      S2-206Stroke, FIM motor 47-62, FIM cognition 16-35
      S2-207Stroke, FIM motor 47-62, FIM cognition 5-15
      S2-208Stroke, FIM motor 14-46, Age≥75
      S2-209Stroke, FIM motor 14-46, Age≤74
      S2-210Brain Dysfunction, FIM motor 56-91, FIM cognition 32-35
      S2-211Brain Dysfunction, FIM motor 56-91, FIM cognition 24-31
      S2-212Brain Dysfunction, FIM motor 56-91, FIM cognition 20-23
      S2-213Brain Dysfunction, FIM motor 56-91, FIM cognition 5-19
      S2-214Brain Dysfunction, FIM motor 24-55
      S2-215Brain Dysfunction, FIM motor 14-23
      S2-216Neurological, FIM motor 63-91
      S2-217Neurological, FIM motor 49-62
      S2-218Neurological, FIM motor 18-48
      S2-219Neurological, FIM motor 14-17
      S2-220Spinal Cord Dysfunction, FIM motor 81-91
      S2-221Spinal Cord Dysfunction, FIM motor 47-80
      S2-222Spinal Cord Dysfunction, FIM motor 14-46, Age≥33
      S2-223Spinal Cord Dysfunction, FIM motor 14-46, Age≤32
      S2-224Amputation of limb, FIM motor 72-91
      S2-225Amputation of limb, FIM motor 14-71
      S2-226Pain Syndromes
      S2-227Orthopaedic Conditions, Fractures, FIM motor 58-91
      S2-228Orthopaedic Conditions, Fractures, FIM motor 48-57
      S2-229Orthopaedic Conditions, Fractures, FIM motor 14-47, FIM Cognition 19-35
      S2-230Orthopaedic Conditions, Fractures, FIM motor 14-47, FIM Cognition 5-18
      S2-231Orthopaedic Conditions, Replacement, FIM motor 72-91
      S2-232Orthopaedic Conditions, Replacement, FIM motor 49-71
      S2-233Orthopaedic Conditions, Replacement, FIM motor 14-48
      S2-234Orthopaedic Conditions, Other, FIM motor 68-91
      S2-235Orthopaedic Conditions, Other, FIM motor 53-67
      S2-236Orthopaedic Conditions, Other, FIM motor 14-52
      S2-237Cardiac
      S2-238Major Multiple Trauma, FIM total 101-126
      S2-239Major Multiple Trauma, FIM total 74-100
      S2-240Major Multiple Trauma, FIM total 44-73
      S2-241Major Multiple Trauma, FIM total 19-43
      S2-242Other Impairments, FIM motor 67-91
      S2-243Other Impairments, FIM motor 53-66
      S2-244Other Impairments, FIM motor 25-52
      S2-245Other Impairments, FIM motor 14-24

      References

        • Stineman M.G.
        • Escarce J.J.
        • Gion J.E.
        • Hamilton B.B.
        • Granger C.V.
        • Williams S.V.
        A case-mix classification system for medical rehabilitation.
        Med Care. 1994; 32: 366-379
        • Ravaud J.F.
        • Delcey M.
        • Yelnik A.
        Construct validity of the functional independence measure (FIM): questioning the unidimensionality of the scale and the “value” of FIM scores.
        Scand J Rehabil Med. 1999; 31: 31-41
        • Zorowitz R.D.
        Inpatient rehabilitation facilities under the prospective payment system: lessons learned.
        Eur J Phys Rehabil Med. 2009; 45: 259-263
        • Green J.G.
        • Poulos C.
        • Broadbent A.
        Report on the development of Version 2 of the AN-SNAP Classification.
        Centre for Health Service Development, Univ Wollongong, Wollongong2006
        • Hammersley M.
        Some notes on the terms “validity” and “reliability”.
        Br Educ Res J. 1987; 13: 73-81
        • Kliebisch U.
        • Brenner H.
        Inter-Rater_reliabilitaet von Instrumenten zur Beurteilung der Pflegebeduerftigkeit: Ein Review der internationalen Literatur.
        Soz Praventivmed. 1996; 41: 303-314
        • Fricke J.
        • Unsworth C.
        • Worrell D.
        Reliability of the functional independence measure with occupational therapists.
        Aust Occup Ther J. 1993; 40: 7-15
        • Ottenbacher K.J.
        • Hsu Y.
        • Granger C.V.
        • Fiedler R.C.
        The reliability of the functional independence measure: a quantitative review.
        Arch Phys Med Rehabil. 1996; 12: 1226-1232
        • Ottenbacher K.J.
        • Mann W.C.
        • Granger C.V.
        • Tomita M.
        • Hurren D.
        • Charvat B.
        Inter-rater agreement and stability of functional assessment in the community-based elderly.
        Arch Phys Med Rehabil. 1994; 75: 1297-1301
        • Segal M.E.
        • Ditunno J.F.
        • Staas W.E.
        Interinstitutional agreement of individual Functional Independence Measure (FIM) items measured at two sites on one sample of SCI patients.
        Paraplegia. 1993; 31: 622-631
        • Robinson W.S.
        The statistical measure of agreement.
        Am Sociol Rev. 1957; 22: 17-25
        • Bartko J.J.
        On various intraclass correlation reliability coefficients.
        Psychol Bull. 1976; 43: 762-765
        • Donner A.
        • Wells G.
        A comparison of confidence interval methods for the intraclass correlation coefficient.
        Biometrics. 1986; 42: 401-412
        • Weir J.P.
        Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM.
        J Strength Cond Res. 2005; 19: 231-240
        • Sheikh K.
        Disability scales: assessment of reliability.
        Arch Phys Med Rehabil. 1986; 67: 245-249
        • Posner K.L.
        • Sampson P.D.
        • Caplan R.A.
        • Ward R.
        • Cheney F.W.
        Measuring interrater reliability among multiple raters: an example of methods for nominal data.
        Stat Med. 1990; 9: 1103-1115
        • Banjeree M.
        • Capozzoli M.
        • McSweeney L.
        • Sinha D.
        Beyond kappa: a review of interrater agreement measures.
        Can J Stat. 1999; 27: 3-23
        • Bland J.M.
        • Altman D.G.
        Statistical methods for assessing agreement between two methods of clinical measurement.
        Lancet. 1986; 1: 307-310
        • Fleiss J.L.
        • Cohen J.
        The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability.
        Educ Psychol Meas. 1973; 33: 613-619
        • Cohen J.
        Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
        Psychol Bull. 1968; 70: 213-220
        • Feinstein A.R.
        • Cicchetti D.V.
        High agreement but low kappa, I: the problems of two paradoxes.
        J Clin Epidemiol. 1990; 43: 543-549
        • Altman D.D.
        • Bland J.M.
        Measurement in medicine: the analysis of method comparison studies.
        Statistician. 1983; 32: 307-317
      1. Uniform Data System for Medical Rehabilitation FIM credentialing examination.
        State University of New York at Buffalo, Buffalo1994
        • Dodds T.A.
        • Martin D.P.
        • Stolov W.C.
        • Deyo R.A.
        A validation of the Functional Independence Measure and its performance among rehabilitation inpatients.
        Arch Phys Med Rehabil. 1993; 74: 531-536
        • Alexander N.B.
        • Galecki A.T.
        • Nyquist L.V.
        • et al.
        Chair and bed rise performance in ADL-impaired congregate housing residents.
        J Am Geriatr Soc. 2000; 48: 526-533
        • Kohler F.
        • Dickson H.
        • Redmond H.
        • Estell J.
        • Connolly C.
        Agreement of functional independence measure item scores in patients transferred from one rehabilitation setting to another.
        Eur J Phys Rehabil Med. 2009; 45: 487-492
        • Walter S.D.
        • Eliasziw M.
        • Donner A.
        Sample size and optimal designs for reliability studies.
        Stat Med. 1998; 17: 101-110