Volume 90, Issue 1 , Pages 87-94, January 2009
Reliability of Rehabilitative Ultrasound Imaging of the Transversus Abdominis and Lumbar Multifidus Muscles
Article Outline
Abstract
Koppenhaver SL, Hebert JJ, Fritz JM, Parent EC, Teyhen DS, Magel JS. Reliability of rehabilitative ultrasound imaging of the transversus abdominis and lumbar multifidus muscles.
Objectives
To evaluate the intraexaminer and interexaminer reliability of rehabilitative ultrasound imaging (RUSI) in obtaining thickness measurements of the transversus abdominis (TrA) and lumbar multifidus muscles at rest and during contractions.
Design
Single-group repeated-measures reliability study.
Setting
University and orthopedic physical therapy clinic.
Participants
A volunteer sample of adults (N=30) with current nonspecific low back pain (LBP) was examined by 2 clinicians with minimal RUSI experience.
Interventions
Not applicable.
Main Outcome Measures
Thickness measurements of the TrA and lumbar multifidus muscles at rest and during contractions were obtained by using RUSI during 2 sessions 1 to 3 days apart. Percent thickness change was calculated as thicknesscontracted–thicknessrest/thicknessrest. Intraclass correlation coefficients (ICC) were used to estimate reliability.
Results
By using the mean of 2 measures, intraexaminer reliability point estimates (ICC3,2) ranged from 0.96 to 0.99 for same-day comparisons and from 0.87 to 0.98 for between-day comparisons. Interexaminer reliability estimates (ICC2,2) ranged from 0.88 to 0.94 for within-day comparisons and from 0.80 to 0.92 for between-day comparisons. Reliability estimates comparing measurements by the 2 examiners of the same image (ICC2,2) ranged from 0.96 to 0.98. Reliability estimates were lower for percent thickness change measures than the corresponding single thickness measures for all conditions.
Conclusions
RUSI thickness measurements of the TrA and lumbar multifidus muscles in patients with LBP, when based on the mean of 2 measures, are highly reliable when taken by a single examiner and adequately reliable when taken by different examiners.
Key Words: Abdominal muscles, Low back pain, Rehabilitation, Reproducibility of results, Ultrasonography
List of Abbreviations: ADIM, abdominal drawing-in maneuver, ASLR, active straight leg raise, CI, confidence interval, ICC, intraclass correlation coefficient, LBP, low back pain, LOA, limits of agreement, MDC, minimal detectable change, ODI, modified Oswestry Disability Index, RUSI, rehabilitative ultrasound imaging, TrA, transversus abdominis
THE TRANSVERSUS ABDOMINIS and lumbar multifidus muscles have been proposed to play an important role in spinal stability1, 2, 3 and have been shown to have functional deficits in individuals with LBP.1, 2, 3, 4, 5, 6, 7, 8, 9 RUSI has been advocated as a noninvasive method to quantify muscle morphology and behavior and has been increasingly used both in research and as a clinical tool throughout the rehabilitative process.10, 11 RUSI has been validated as a measure of TrA and lumbar multifidus muscle morphology through comparisons with magnetic resonance imaging measurements12, 13 and as an indicator of muscle activation with indwelling electromyography.4, 14, 15, 16, 17
For RUSI to be useful as a research and rehabilitative tool, the reliability of its measurements must be determined as it is used clinically. Although several researchers have investigated the reliability of RUSI measures of the TrA16, 18, 19, 20, 21, 22, 23, 24, 25, 26 and lumbar multifidus6, 13, 15, 27, 28, 29, 30, 31 muscles, all have done so in small (n<10) and/or asymptomatic samples. Most of these studies have shown very high reliability (ICC>0.90) and good precision (TrA standard error of measurement<1.2mm and lumbar multifidus standard error of measurement<3.7mm); however, estimates obtained in asymptomatic samples cannot be generalized to individuals with LBP, and estimates obtained with small samples are often associated with wide confidence intervals. Furthermore, most researchers have investigated reliability in limited conditions, most commonly only during resting states repeated during a single testing session. Because RUSI is used primarily in symptomatic patients, of muscles during both resting and contracted states, and across different days, the reliability of such measures still needs to be established.
The primary purpose of this study was to evaluate the intraexaminer and interexaminer reliability in obtaining RUSI thickness measurements of the TrA and lumbar multifidus muscles at rest and during contractions both during a single session (within day) and between 2 sessions (between day) in patients with LBP. We hypothesized that RUSI measurements are adequately reliable (ICC>0.75) for research and clinical use in patients with LBP.
Methods
Participants
Thirty volunteers aged 18 to 60 with current nonspecific LBP were recruited for this study by either responding to fliers posted around the University of Utah campus or by referral from a local orthopedic physical therapy clinic. LBP was defined as current symptoms of pain and/or numbness between the twelfth rib and buttocks with or without symptoms into 1 or both legs that limits function. Participants were excluded for prior lumbar surgery; the inability to lie both prone and supine for a minimum of 20 minutes each; or the presence of medical red flags of potentially serious conditions including cauda equina syndrome, major or rapidly progressing neurologic deficit, fracture, cancer, infection, or systemic disease. Participants signed consent forms approved by the institutional review boards of recruiting institutions.
Examiners
One physical therapist (S.K.) and 1 chiropractor (J.H.) participated as examiners for the reliability analysis. Although both examiners had been practicing clinically for more than 8 years, neither had previously used RUSI in their clinical practice. Before testing, both examiners underwent 16 hours of hands-on training with a coinvestigator (D.T.) experienced with the specific RUSI protocol used in the study. Additionally, the physical therapist completed 70 hours of didactic training including a course and certification by the Burwin Institute in musculoskeletal ultrasound.
Procedures
This single-group repeated-measures design involved a baseline measurement session and a follow-up session 1 to 3 days later. After providing consent, participants completed self-report measures including demographic/historic information and questionnaires on pain and disability. An 11-point numeric rating scale, ranging from 0 to 10, was used to estimate the mean of current pain intensity and the best and worst pain intensity in the past 24 hours.32, 33, 34 The ODI questionnaire was used to quantify self-reported disability. ODI scores range from 0 to 100, with higher scores representing more disability.35, 36 During the physical examination, the examiner determined the participant's symptomatic side, which was then used for all subsequent RUSI images. If pain was evenly distributed, the side of measurement was determined randomly. After the initial session, participants were asked to avoid any exercises or treatments for LBP between sessions.
Images of the TrA and lumbar multifidus muscles were acquired in B-mode with a Sonosite Titan ultrasound machinea and a 60-mm 2- to 5-MHz curvilinear array. Image acquisition for each condition was performed 3 times by each of the 2 examiners. To maximize time efficiency, 1 examiner positioned the transducer and optimized the quality of the image (imaging examiner), whereas the other examiner captured and saved the image. To help avoid an order effect associated with potential learning or fatigue, the order in which each examiner obtained the images and the order in which the muscles (TrA and lumbar multifidus) were imaged were counterbalanced. A total of 108 images were taken of each participant (72 during session 1 and 36 during session 2) to be able to calculate a mean from 2 or 3 measures and to calculate all within-days and between-days intraexaminer and interexaminer comparisons for all muscle conditions.
Transversus abdominisImages of the TrA muscle were acquired during the ASLR maneuver37, 38 and during the ADIM.25, 26 Ultrasound images of the TrA muscle were obtained with the transducer positioned just superior to the iliac crest along the midaxillary line and followed the techniques outlined by Teyhen et al26 in which the middle of the muscle belly was centered within the field of view. All images were collected at the end of normal exhalation to control for the influence of respiration.
The ASLR maneuver37, 38 was used in this study to assess automatic changes in the TrA muscle thickness without the subject being asked to volitionally activate the muscle. Participants were positioned supine with hips and knees extended at rest and were instructed to “raise your leg off of the table approximately 8 inches (20cm) without bending your knee.” All participants were given a single practice of the ASLR maneuver before image acquisition.
The ADIM is a fundamental motor control exercise used to train the TrA muscle and has been found to preferentially contract the TrA muscle relative to the more superficial lateral abdominal muscles.25, 26 The ADIM was used in this study to assess changes in muscle thickness associated with a volitional activation of the TrA muscle. The resting position involved the participants lying supine in a hook-lying position.25, 26 To perform the ADIM, participants were instructed to “take a relaxed breath in and out, hold the breath out, and then draw-in your lower abdomen without moving your spine.” Alternate cues of “cut off the flow of urine” or “close your rear passage” were sometimes given in an attempt to maximize a preferential TrA contraction. The cue resulting in the largest preferential TrA contraction was practiced approximately 5 times until a ceiling effect occurred in performance of the ADIM as visualized by changes in muscle thickness on the ultrasound image.
Lumbar multifidusImages of the lumbar multifidus muscle at rest and during a submaximal contraction were obtained following techniques outlined by Kiesel et al.15 To assess automatic changes in the lumbar multifidus muscle thickness during a task, a contralateral arm lift maneuver was performed prone with the elbows flexed 90°, shoulders abducted 120°, and holding a hand weight based on the participants body mass.15 Participants were instructed to “lift your arm approximately 2 inches (5cm) off the table” and were given 1 practice contralateral arm lift trial before image acquisition.
MeasurementsAll images were measured offline by using Image J software (V1.38t)b on a different date than the images were obtained. TrA thickness measurements were made between the superficial and deep borders of the muscle, as visualized by the hyperechoic fascial lines (fig 1). Lumbar multifidus thickness measurements were made between the posterior-most portion of the L4/5 zygapophyseal joint and the plane between the muscle and subcutaneous tissue (fig 2). Each examiner measured all of the images they generated, allowing for the analysis of intraexaminer and interexaminer reliability. The physical therapist also measured all of the images obtained by the chiropractor to assess the reliability of 2 examiners measuring the same image. By using Image J's automatic measurement function (control M) and concealing the measurement output on the computer screen, examiners were blinded during measurement to the thickness values. Additionally, examiners were blinded to each other's measurements and to their own previous measurements.

Fig 1.
Ultrasound images of the TrA, internal oblique (IO), and external oblique (EO) muscles (A) during rest and (B) during an ADIM. Thickness measurements were made between the superficial and deep borders of the TrA muscle.

Fig 2.
Ultrasound images of the lumbar multifidus (LM) muscle (A) during rest and (B) during a contralateral arm raise. Thickness measurements were made between the posterior-most portion of the L4/5 facet joint and the plane between the muscle and subcutaneous tissue.
Data Analysis
Data management and statistic analyses were performed by using the Statistical Package for the Social Sciences version 16.0c software. TrA data from 30 participants across 2 days and 4 different measurement conditions (supine rest, ASLR, hook-lying rest, and ADIM) and lumbar multifidus data from 29 participants across 2 days and 2 different measurement conditions (prone rest and contralateral arm lift) were included for analysis. The dependent measures for the TrA and lumbar multifidus muscles were resting thickness, contracted thickness, and percent thickness change. Percent thickness change was calculated for the TrA and lumbar multifidus muscles by using the following equation: thicknesscontracted–thicknessrest/thicknessrest.
ICCs with 95% CIs were calculated to assess intraexaminer (model 3,k) and interexaminer (model 2,k) reliability both within and between days.39 As recommended by Bland and Altman,40 biases with 95% CIs were estimated by calculating the mean difference between measures, and LOAs were calculated as the mean difference ± 2 × SD. To assess measurement precision, the standard error of measurement was calculated as (SD × √ [1-ICC]).41, 42 MDCs were calculated as 1.96×standard error of measurement×√2 and represent the minimal change in thickness that must occur to be 95% confident that a true change occurred.43, 44 To investigate the effect of using the mean of multiple thickness measurements on reliability and measurement precision, ICCs and standard error of measurements using the mean of the first 2 and 3 measures were compared with those using single measures.
Results
Demographic and baseline characteristics of the patient sample are provided in table 1. Images from 1 participant for the lumbar multifidus muscle were excluded because examiners were unable to identify muscle boundaries. Although specific pain level was not solicited during imaging, all participants satisfactorily completed all TrA and lumbar multifidus muscle-contraction tasks without verbal complaints of pain.
Table 1. Demographic and Baseline Characteristics of Participants (N=30)
| Characteristic | |
|---|---|
| Age (y) | 42.4±11.4 |
| Sex | 43% women |
| BMI (kg/m2) | 26.6±4.8 |
| Oswestry Disability Score (%) | 20.4±14.1 |
| Numeric pain rating scale⁎ | 2.9±2.0 |
| 77 | |
| 10 | |
| 13 | |
| Duration of symptoms (d) | 75 (17, 847)† |
| Prior history of LBP (%) | 80 |
⁎Reports the average of the worst, best, and current scores for pain over the last 24 hours. |
†Median (interquartile range). |
The standard error of measurement was calculated to determine if a single measure or an average of 2 or 3 images resulted in the greatest precision (table 2). Overall, the mean of 2 measurements across each condition decreased the standard error of measurement by a mean of 32.4%, whereas the mean of 3 measurements decreased the standard error of measurement by a mean of 36.4%. In comparing the 4.0% (95% CI, 2.9%–5.2%) mean improvement in precision relative to the additional time required to acquire and measure an additional set of images, the remaining reliability data have been analyzed by using the mean of the first 2 measurements of each condition. Reliability coefficients with corresponding 95% CIs, standard error of measurements, MDCs, bias, and LOAs are presented in table 3 for intraexaminer estimates and table 4 for interexaminer estimates. Means and SDs are also presented in Table 3, Table 4 and represent pooled values from all measures in the corresponding condition.
Table 2. Difference in Standard Error of Measurement⁎ Using the Mean of 2 and 3 Measures Compared to a Single Measure
| Intraexaminer | Interexaminer | |||||
|---|---|---|---|---|---|---|
| Muscle/State | Single Measure | Mean of 2 Measures (% from 1 measure) | Mean of 3 Measures (% from 1 measure) | Single Measure | Mean of 2 Measures (% from 1 measure) | Mean of 3 Measures (% from 1 measure) |
| Within day | ||||||
| 0.2 | 0.1 | 0.1 | 0.4 | 0.3 | 0.3 | |
| 0.6 | 0.3 | 0.2 | 0.6 | 0.4 | 0.4 | |
| 0.4 | 0.2 | 0.2 | 0.4 | 0.2 | 0.2 | |
| 0.4 | 0.3 | 0.2 | 0.6 | 0.5 | 0.5 | |
| 1.5 | 1.0 | 1.0 | 2.9 | 2.1 | 2.1 | |
| 0.9 | 0.6 | 0.5 | 2.5 | 1.7 | 1.5 | |
| Avg | Avg | Avg | Avg | |||
| Between days | ||||||
| 0.3 | 0.2 | 0.2 | 0.4 | 0.3 | 0.2 | |
| 0.6 | 0.4 | 0.3 | 0.8 | 0.6 | 0.5 | |
| 0.3 | 0.2 | 0.2 | 0.4 | 0.3 | 0.2 | |
| 0.7 | 0.5 | 0.5 | 0.6 | 0.4 | 0.4 | |
| 1.3 | 0.9 | 0.9 | 2.9 | 2.1 | 2.1 | |
| 1.8 | 1.1 | 1.1 | 2.7 | 1.8 | 1.8 | |
| Avg | Avg | Avg | Avg | |||
⁎Values in millimeters except % change. |
Table 3. Intraexaminer Reliability of Examiner 1 Using a Mean of 2 Measures for Each Rating
| Muscle/State | Mean ± SD (mm⁎)† | ICC3,2 | SEM (mm⁎) | MDC (mm⁎) | Bias (95% CI) ± 95% LOA (mm⁎)‡ |
|---|---|---|---|---|---|
| Within day | |||||
| 3.2±0.8 | 0.98 | 0.1 | 0.4 | 0.1 | |
| 3.7±1.4 | 0.96 | 0.3 | 0.8 | 0.0 | |
| 15.7±32.8 | 0.92 | 9.2 | 25.4 | −2.4 | |
| 3.3±1.0 | 0.96 | 0.2 | 0.5 | 0.0 | |
| 5.8±1.5 | 0.97 | 0.3 | 0.7 | 0.1 | |
| 80.8±39.0 | 0.94 | 9.8 | 27.1 | 5.0 | |
| 34.6±6.2 | 0.97 | 1.0 | 2.8 | −0.5 | |
| 37.9±6.5 | 0.99 | 0.6 | 1.6 | 0.0 | |
| 9.8±8.3 | 0.78 | 4.0 | 11.0 | 1.6 | |
| Between days | |||||
| 3.1±0.8 | 0.94 | 0.2 | 0.6 | 0.1 | |
| 3.7±1.5 | 0.93 | 0.4 | 1.1 | −0.1 | |
| 18.3±36.7 | 0.89 | 12.3 | 34.1 | −7.7 | |
| 3.2±0.9 | 0.93 | 0.2 | 0.7 | 0.2 | |
| 5.7±1.4 | 0.87 | 0.5 | 1.3 | 0.2 | |
| 83.9±37.1 | 0.73 | 19.2 | 53.3 | −1.2% | |
| 34.4±6.2 | 0.98 | 0.9 | 2.5 | −0.1 | |
| 38.2±6.6 | 0.97 | 1.1 | 3.1 | −0.6 | |
| 11.2±8.7 | 0.79 | 4.0 | 11.0 | −1.1 |
⁎Values in millimeters except % change. |
†Pooled from all measures in condition. |
‡Mean difference ± 2 SDs. |
Table 4. Interexaminer Reliability Using a Mean of 2 Measures for Each Rating
| Muscle/State | Mean ± SD (mm⁎)† | ICC2,2 (95% CI) | SEM (mm⁎) | MDC (mm⁎) | Bias (95% CI) ± 95% LOA (mm⁎)‡ |
|---|---|---|---|---|---|
| Within day | |||||
| 3.1±0.9 | 0.89 | 0.3 | 0.8 | −0.2 | |
| 3.5±1.3 | 0.91 | 0.4 | 1.1 | −0.3 | |
| 13.1±29.0 | 0.91 | 8.7 | 24.2 | −2.8 | |
| 3.1±1.0 | 0.94 | 0.2 | 0.7 | −0.3 | |
| 5.6±1.5 | 0.89 | 0.5 | 1.4 | −0.3 | |
| 85.2±36.3 | 0.73 | 19.0 | 52.7 | 3.8 | |
| 33.2±6.0 | 0.88 | 2.1 | 5.8 | −2.3 | |
| 37.5±6.4 | 0.93 | 1.7 | 4.7 | −0.8 | |
| 13.4±11.0 | 0.45 | 8.1 | 22.6 | 5.5 | |
| Between days | |||||
| 3.1±0.9 | 0.91 | 0.3 | 0.7 | −0.1 | |
| 3.5±1.4 | 0.80 | 0.6 | 1.7 | −0.4 | |
| 16.9±32.9 | 0.78 | 15.6 | 43.2 | −10.5 | |
| 3.1±0.9 | 0.92 | 0.3 | 0.7 | −0.1 | |
| 5.5±1.3 | 0.90 | 0.4 | 1.2 | −0.1 | |
| 85.8±34.0 | 0.55 | 22.8 | 63.3 | 2.6 | |
| 33.3±6.0 | 0.88 | 2.1 | 5.8 | −2.4 | |
| 37.7±6.6 | 0.92 | 1.8 | 5.1 | −1.4 | |
| 13.9±10.7 | 0.73 | 5.5 | 15.3 | 4.4 |
⁎Values in millimeters except % change. |
†Pooled from all measures in condition. |
‡Mean difference ± 2 SDs. |
§Statistically significant bias (different from zero). |
Depending on the muscle (TrA vs lumbar multifidus) and muscle condition (rest vs contraction), intraexaminer reliability point estimates (ICC3,2) of thickness measurements ranged from 0.96 to 0.99 for same-day comparisons and from 0.87 to 0.98 for between-day comparisons (see table 3). Estimates from the 2 different examiners were not statistically different from one another (ie, 95% CIs overlapped), therefore, intraexaminer data is only presented for examiner 1 (J.H.). Depending on the muscle and muscle condition, interexaminer reliability estimates (ICC2,2) of thickness measurements ranged from 0.88 to 0.94 for same-day comparisons and from 0.80 to 0.92 for between-day comparisons (see table 4). Reliability estimates comparing thickness measurements by the 2 examiners of the same image (ICC3,2) ranged from 0.96 to 0.98. Reliability estimates were lower for percent thickness change measures than the corresponding single thickness measures for both muscles in all conditions (see Table 3, Table 4).
Bias estimates were small and statistically not significantly different from 0 in all intraexaminer comparisons (see table 3). However, statistically significant interexaminer bias was found in approximately 50% of comparisons with estimates ranging between 0.3 and 0.4 mm for the TrA measurements and 1.4 and 2.4mm for the lumbar multifidus measurements (see table 4). Statistically significant bias was also found in 5 of 6 comparisons performed between examiners measuring the same image. Estimates ranged from 0.2mm (TrA) to 0.7mm (lumbar multifidus) with examiner 1 (chiropractor) consistently measuring a larger value than examiner 2 (physical therapist).
Discussion
This study evaluated the intraexaminer and interexaminer reliability in obtaining RUSI thickness measurements of the TrA and lumbar multifidus muscles at rest and during submaximal contractions both during a single session and between days in patients with LBP. Intraexaminer comparisons of thickness measures generally showed excellent reliability with only the ICC point estimate of between-day ADIM reliability below 0.90. Although generally lower than intraexaminer estimates, all interexaminer ICC estimates remained above 0.80, indicating good interexaminer reliability. These findings are consistent with previous studies that investigated both symptomatic20, 26, 45 and asymptomatic15, 16, 18, 19, 21, 22, 24, 25, 29, 31, 46 individuals and support our primary hypothesis that RUSI measurements are adequately reliable for research and clinical use in patients with LBP.
The comparison that resulted in the lowest intraexaminer reliability estimate for single thicknesses was that of between-day ADIM measures (ICC=0.87). The ADIM requires examiners to teach participants to volitionally contract the TrA to a specific degree that results in maximal thickness change of the TrA with minimal to no thickening of the more superficial abdominal muscles.26 Factors such as the instructions from the examiner, participant motivation, and participant's skill at motor control may all affect interrepetition performance during an ADIM and could explain the decreased reliability of these measures. In contrast to intraexaminer comparisons, it was the between-day ASLR measures that showed the poorest interexaminer reliability (ICC=0.80). The ASLR was included in this study in an attempt to avoid the additional performance variability that comes with volitional muscle contractions. It was the experimenters' observation in this study that TrA thickness change during the ASLR was highly variable in many participants between repetitions. Although not instructed to do so, it is possible that some participants purposefully altered their abdominal contraction during the ASLR. It is also likely that very small variations between repetitions (eg, 0.2–0.5mm) had moderate adverse effects on reliability because mean TrA thickness only increased approximately 0.5mm during the ASLR.
Percent thickness change measures may be more useful clinically than single thickness measures but incorporate the measurement error from both resting and contracted measurements ([thicknesscontracted–thicknessrest)/thicknessrest). Therefore, it is not surprising that estimates of the reliability of percent thickness change were consistently lower than those for single thickness measurements and is likely attributable to the fact that change scores are based on 2 imperfect measurements (rather than 1). To our knowledge, only 1 other study46 investigated the reliability of percent thickness change using RUSI. Although this study found high reliability in patients with LBP for both single thickness measures and percent thickness change of the TrA and lumbar multifidus muscles, they only investigated intraexaminer reliability during a single session. A potential problem with the lower reliability of percent thickness change measures is that they result in relatively large standard error of measurements and MDCs. For example, when using between-day intraexaminer MDCs, if a patient with LBP initially showed a lumbar multifidus thickness change of 10%, after rehabilitation they would have to increase it to at least 21% for the examiner to be 95% confident that a true change occurred. An even larger change would be necessary in the TrA during an ADIM. A patient initially showing an 80% thickness change would have to increase to at least 133% for the examiner to be 95% confident that a true change occurred. In some cases, these minimal detectable postrehabilitation values are larger than the percent thickness changes found in asymptomatic individuals46 and may not be realistically attainable.
To better identify the sources of variability during RUSI, the reliability of 2 examiners measuring the same image was calculated. Reliability was excellent, with all resting and contracted thickness measure point estimates above 0.96. This finding is consistent with previous work26 and suggests that the great majority of interexaminer measurement “error” is introduced during image acquisition as opposed to during measurement of muscle thickness on a previously obtained image. However, a statistically significant bias was found in all except 1 intraimage comparison with examiner 1 (chiropractor) consistently measuring a larger value than examiner 2 (physical therapist). During image measurement, standardization for the lateral cursor placement consisted of examiners agreeing to measure the horizontal “visual center of the muscle” for the TrA (see fig 1) and at the most posterior portion of the L4/5 facet for the lumbar multifidus (see fig 2). A systematic difference in how each examiner interpreted the “visual center of the muscle” or in the choice of landmark used to represent the muscle-fascial boundary or facet joint may have existed. Regardless of where the specific bias occurred, the actual differences between examiners were very small and did not result in poor reliability.
Throughout this discussion, we have mostly interpreted reliability estimates as suggested by Portney and Watkins41 who advocate that coefficients below 0.50 represent poor reliability, those between 0.50 and 0.75 represent moderate reliability, and coefficients above 0.75 represent good reliability. Other authors propose differing cutoff standards,47, 48, 49 some with different minimal reliability criteria for group comparisons (0.70) and individual comparisons (0.90–0 .95).48 In fact, there seems to be a growing consensus that any such standards in interpreting reliability coefficients should also consider both the precision of the measured variable and how the measures will be ultimately used.41, 42 RUSI measurements are most likely used clinically to make patient-management decisions regarding lumbar stabilization exercise. Because the cost of an “incorrect” decision in the clinic would likely be relatively benign (eg, having a patient without TrA deficits perform abdominal motor control exercises), a lower level of reliability of RUSI measures may be acceptable. Using RUSI as an outcome measure during research may similarly allow a lower level of reliability because measures are usually averaged across multiple individuals, thereby decreasing measurement error.
Study Limitations
Several limitations exist within this study. Both examiners were clinicians with minimal RUSI experience other than 16 hours of training on the specific ultrasound machine and imaging protocol. Because some evidence suggests that the reliability of RUSI measurement may differ depending on user experience,50 it is unknown whether practitioners with either more or less experience will show a different level of reliability than did the examiners in this study. Additionally, pain was assessed during the initial evaluation only. Although all participants were able to complete the muscle contraction tasks without verbal complaints, the level of pain during contractions was not solicited and may have adversely affected reliability. Moreover, abdominal muscle thickness has been found to vary depending on the exact location of measurement, with more superior portions of the muscle being thicker than inferior locations.24Although the current study used a standardized transducer placement protocol, specific transducer placement was not marked between image acquisitions and likely varied to some small degree. Finally, some ICC point estimates were associated with wide 95% CIs in which the upper-bound and lower-bound estimates represent very different degrees of reliability. Although the current study is the largest study to date to investigate RUSI reliability in patients with LBP, our results should not be considered definitive. Further studies should continue to investigate the reliability of RUSI measures, especially of percent thickness change in symptomatic samples. Future studies should additionally attempt to better identify the sources of error involved with RUSI image acquisition and the measurement of muscle thickness. Lastly, attempts should be made to identify more reliable contraction strategies for the TrA and methods to reduce error during such measurements.
Conclusions
RUSI thickness measurements of the TrA and lumbar multifidus muscles in patients with LBP, when based on the mean of 2 measures, are highly reliable when taken by a single examiner and adequately reliable when taken by different examiners. Using the mean of 2 measures substantially increased the reliability and precision of all measurements and is recommended. Percent thickness change measures may be adequately reliable because clinical use of RUSI usually involves benign patient-management decisions regarding lumbar stabilization exercise, and measures are typically averaged across multiple individuals in research.
Suppliers
Acknowledgments
We would like to thank Aaron Swalberg and Steven Moffit of Intermountain Health Care, Salt Lake City, UT, for their help with participant recruitment.
References
- . Delayed postural contraction of transversus abdominis in low back pain associated with movement of the lower limb. J Spinal Disord. 1998;11:46–56
- . Altered trunk muscle recruitment in people with low back pain with upper limb movement at different speeds. Arch Phys Med Rehabil. 1999;80:1005–1012
- . Evidence of altered lumbopelvic muscle recruitment in the presence of sacroiliac joint pain. Spine. 2003;28:1593–1600
- . Changes in recruitment of the abdominal muscles in people with low back pain: ultrasound measurement of muscle activity. Spine. 2004;29:2560–2566
- . Evidence of lumbar multifidus muscle wasting ipsilateral to symptoms in patients with acute/subacute low back pain. Spine. 1994;19:165–172
- . Rapid atrophy of the lumbar multifidus follows experimental disc or nerve root injury. Spine. 2006;31:2926–2933
- . Rehabilitative ultrasound measurement of select trunk muscle activation during induced pain. Man Ther. 2008;13:132–138
- . Histochemical changes in the multifidus muscle in patients with lumbar intervertebral disc herniation. Spine. 2001;26:622–626
- . Histochemistry and morphology of the multifidus muscle in lumbar disc herniation: comparative study between diseased and normal sides. Spine. 2000;25:2191–2199
- . Rehabilitative ultrasound imaging symposium San Antonio, TX, May 8–10, 2006. J Orthop Sports Phys Ther. 2006;36:A1–A3
- . Rehabilitative ultrasound imaging: the roadmap ahead. J Orthop Sports Phys Ther. 2007;37:431–433
- An MRI investigation into the function of the transversus abdominis muscle during “drawing-in” of the abdominal wall. Spine. 2006;31:E175–E178
- . Magnetic resonance imaging and ultrasonography of the lumbar multifidus muscle: comparison of two different modalities. Spine. 1995;20:54–58
- . Measurement of muscle contraction with ultrasound imaging. Muscle Nerve. 2003;27:682–692
- . Measurement of lumbar multifidus muscle contraction with rehabilitative ultrasound imaging. Man Ther. 2007;12:161–166
- . The relationship between EMG and change in thickness of transversus abdominis. Clin Biomech (Bristol, Avon). 2004;19:337–342
- . Muscle activity onset in the lumbar multifidus muscle recorded simultaneously by ultrasound imaging and intramuscular electromyography. Clin Biomech. 2006;21:905–913
- . The response of the transverse abdominis and internal oblique muscles to different postures. Man Ther. 2006;11:54–60
- . M-mode ultrasound: a reliable measure of transversus abdominis thickness?. Clin Biomech (Bristol, Avon). 2002;17:315–317
- . Abdominal muscle function in chronic low back pain patients: Measurement with real-time ultrasound scanning. Physiotherapy. 2002;88:322–332
- . Ultrasound imaging assessment of abdominal muscle function during drawing-in of the abdominal wall: an intrarater reliability study. J Orthop Sports Phys Ther. 2007;37:480–486
- . Reliability of real-time ultrasound for the assessment of transversus abdominis function. J Gravit Physiol. 2002;9:P131–P132
- In vivo ultrasound assessment of respiratory function of abdominal muscles in normal subjects. Eur Respir J. 1997;10:2861–2867
- . Abdominal muscle size and symmetry in normal subjects. Muscle Nerve. 2006;34:320–326
- . Relationships among lateral abdominal muscles, gender, body mass index, and hand dominance. J Orthop Sports Phys Ther. 2006;36:289–297
- The use of ultrasound imaging of the abdominal drawing-in maneuver in subjects with low back pain. J Orthop Sports Phys Ther. 2005;35:346–355
- . Diagnostic ultrasound imaging for measurement of the lumbar multifidus muscle in normal young adults. Physiother Theory Pract. 1992;8:19–26
- . Pattern of asymmetry of paraspinal muscle size in adolescent idiopathic scoliosis examined by real-time ultrasound imaging (A preliminary study). Spine. 1993;18:913–917
- . Between-day repeatability and symmetry of multifidus cross-sectional area measured using ultrasound imaging. J Orthop Sports Phys Ther. 2006;36:10–18
- . Ultrasound imaging of lumbar multifidus muscle: normal reference ranges for measurements and practical guidance on the technique. Man Ther. 2005;10:116–126
- . The use of real-time ultrasound imaging for biofeedback of lumbar multifidus muscle contraction in healthy subjects. J Orthop Sports Phys Ther. 2006;36:920–925
- . Clinically important changes in acute pain outcome measures: a validation study. J Pain Symptom Manage. 2003;25:406–411
- . Comparative reliability and validity of chronic pain intensity measures. Pain. 1999;83:157–162
- . Postoperative pain intensity assessment: a comparison of four scales in Chinese adults. Pain Med. 2007;8:223–234
- . The Oswestry low back pain disability questionnaire. Physiotherapy. 1980;66:271–273
- . A comparison of a modified Oswestry Disability Questionnaire and the Quebec Back Pain Disability Scale [published erratum appears in Phys Ther 2008;88:138-9]. Phys Ther. 2001;81:776–788
- . Validity of the active straight leg raise test for measuring disease severity in patients with posterior pelvic pain after pregnancy. Spine. 2002;27:196–200
- . The active straight leg raising test and mobility of the pelvic joints. Eur Spine J. 1999;8:468–473
- . Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–428
- . Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310
- . Foundations of clinical research: applications to practice. In: 3rd ed.. Upper Saddle River: Pearson/Prentice Hall; 2008;p. 912
- . Health measurement scales: a practical guide to their development and use. In: New York: Oxford Univ Pr; 2003;p. 296
- . Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther. 1994;74:777–788
- . The application of generalizability theory to reliability assessment: an illustration using isometric force measurements. Phys Ther. 1993;73:386–395discussion 396-401
- . Reliability of B-mode ultrasonography for abdominal muscles in asymptomatic and patients with acute low back pain. Journal of Bodywork and Movement Therapies. 2007;11:17–20
- . A comparison of select trunk muscle thickness change between subjects with low back pain classified in the treatment-based classification system and asymptomatic controls. J Orthop Sports Phys Ther. 2007;37:596–607
- . The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174
- Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clin Ther. 1996;18:979–992
- . Measurement reliability and agreement in psychiatry. Stat Methods Med Res. 1998;7:301–317
- . Assessment of abdominal muscle function during a simulated unilateral weight-bearing task using ultrasound imaging. J Orthop Sports Phys Ther. 2007;37:467–471
Supported in part by Sonosite Inc, Bothell, WA, by providing the ultrasound machine used in this study at no charge to the Division of Physical Therapy, University of Utah.
No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit on the authors or on any organization with which the authors are associated.
Reprints are not available from the author.
PII: S0003-9993(08)01497-4
doi:10.1016/j.apmr.2008.06.022
© 2009 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Volume 90, Issue 1 , Pages 87-94, January 2009
