| | Assets and Liabilities of the Burn Model System Data Model: A Comparison With the National Burn RegistryAbstract Lezotte DC, Hills RA, Heltshe SL, Holavanahalli RK, Fauerbach JA, Blakeney P, Klein MB, Engrav LH. Assets and liabilities of the Burn Model System data model: a comparison with the National Burn Registry. ObjectivesTo determine whether the Burn Model System (BMS) population is representative of the larger burn population and to investigate threats to internal and external validity in a multicenter longitudinal database of severe burns. DesignCohort data for the BMS project have been collected since 1994. Follow-up data have been collected at 6, 12, and 24 months postburn. The demographic and burn characteristics of the BMS population were compared with those of patients in the National Burn Registry (NBR). SettingThe BMS, which collected data for these analyses from 5 regional burn centers in the United States, and the NBR dataset, which is a registry of information collected through the Trauma Registry of the American College of Surgeons and includes data from 70 hospitals in the United States and Canada. ParticipantsBMS study participants were severely burned patients treated at 1 of the 5 participating burn centers. We compared the BMS population with that of the NBR both in total and filtered to include only patients with comparable injuries. InterventionsNot applicable. Main Outcome MeasuresComparable demographic and burn characteristics contained in both the NBR and the 5-center BMS longitudinal database and baseline and follow-up distributions of demographic variables and burn characteristics in the BMS database. ResultsAlthough minor deviations in demographic distributions were found between the BMS and NBR and between discharge and follow-up populations, our results show that the BMS population sample is internally and externally valid and is adequate for answering research questions. ConclusionsCohort studies examining long-term outcomes have the potential flaw of using a nonrepresentative study population. The BMS population was found to be sufficiently representative, but future analyses will require cautious and purposeful application of statistical adjustment strategies. RESEARCH ON BURN INJURIES and other disabilities has been limited by the lack of longitudinal data on large samples of patients. Fortunately, the number of large databases is growing. Analyzing data from large samples requires that certain steps are taken to ensure that analyses lead to valid conclusions. An example of a large patient data set that is amenable to such statistical scrutiny is that of the Burn Model Systems (BMS), established by the National Institute on Disability and Rehabilitation Research (NIDRR). NIDRR began funding the BMS Data Coordination Center (BMS/DCC) in 1994. At that time, the BMS/DCC developed comprehensive guidelines to address the operational processes and timing for collecting and transferring data between each BMS clinical site and the DCC. The DCC processes, corrects, and combines study data; it then publishes the combined database for the BMS investigators on its Web page (http://bms-dcc.uchsc.edu) for sharing and scientific analyses. NIDRR’s model system programs were originally developed to show the value of a comprehensive integrated continuum of care for people with spinal cord injury (SCI), traumatic brain injury (TBI), or burn injury.1 The 3 BMS programs include 28 centers (with only 4 burn centers at present) that directly conduct or indirectly sponsor research activities designed to improve interventions that optimize levels of community participation, employment, and quality of life for people with SCI, TBI, and burns. The utility of NIDRR-funded SCI, TBI, and burn research largely depends on how well practitioners can relate relevant findings to the needs of their particular patients. Practitioners can further benefit from practical information about the scientific utility of data from large samples; if the underlying science behind epidemiologic studies is not sound, they must be informed. The burn data center’s goal is to ensure high-quality research. This starts with defining and implementing useful and rigorous data models for eventual statistical analyses. Although we seek the highest quality informatics solutions for database design for these projects, the practice of research may constrain the use of ideal database design. That is, clinical research goals and local clinical constraints often determine how studies, especially longitudinal outcomes studies, are conducted and how data are collected and processed. Consequently, outcomes studies that collect cohort data are more common than the more rigorous randomized controlled trial (RCT) study design. Although such loss of rigor is unfortunate, cohort studies are not without merit. With statistical adjustment to account for confounding variables and effect-modification, they can provide valuable information. Frequently in rehabilitation outcomes research, quasi-experimental designs are invoked—often because the criterion standard RCT is not feasible, ethical, or cost effective. Examples of the most widely used quasi-experimental designs used include prospective cohort, retrospective case-control, nested case–control, and pre- and postintervention study designs.2, 3 Such nonrandom study designs are widely used in population-based epidemiologic and outcomes studies. The data management strategy adopted by the model systems projects is to produce useful, accurate, and comprehensive data regarding burn care and rehabilitation. Our role in collecting and disseminating data is to identify potential problems, if any, in BMS study methodologies, data representation, and study generalizability so that researchers can appropriately process the information we provide. For a summary of the BMS data management strategies and accomplishments, refer to the recent article by Klein et al,4 wherein they describe our processes and provide summary statistics of major critical variables. Most studies contributing data to the BMS use nonrandomized study designs. Consequently, this brings the usefulness of our data into question. Use of our data for scientific publications and for promoting appropriate burn care policies requires our data to be highly scientific and representative. One purpose of this study was to determine whether the BMS population is representative of the larger burn population. To quantify the level of generalizability of the BMS study population, we compared the demographic characteristics of the BMS population with those of the population included in the National Burn Registry (NBR), a well-established national, voluntary registry started by the American Burn Association (ABA). Levels of variation that exist between distributions of BMS data and the ABA national data repository (the NBR) are described. Also, we describe differences within BMS demographic characteristics that arise because of attrition in serial assessments in our longitudinal research projects. A secondary goal of this article was to identify potential shortcomings of quasi-experimental designs, in general and specifically in BMS studies, and to suggest compensatory methods of adjusting for possible study design weaknesses. We identify precautions that can be taken in either the data collection or the data analysis phase to produce sound results. We provide comments and general strategies for analyzing these data; most of these are multivariate adjustment techniques for cohort data. Methods  Data Sources and Study Populations The NIDRR BMS study is a long-term prospective longitudinal study of the rehabilitation of severely burned patients. Members of the BMS project included severely burned patients from 5 regional burn centers. The project aims to identify and develop interventions that lead to better short- and long-term outcomes, especially the reintegration of injured people into their communities. Five sites have collected data for the project since it began in 1994. Subjects are followed up for 2 years postburn; after discharge, data are collected at 3 time points: 6 months, 1 year, and 2 years postburn. Criteria for severe burns include burns that meet any of the following conditions: (1) deep second- and third-degree burns greater than 10% of total body surface area (TBSA) in patients under 10 or over 50 years old; (2) deep second- and third-degree burns greater than 20% of TBSA in other age groups; (3) deep second- and third-degree burns with serious threat of functional or cosmetic impairment that involve the face, hands, feet, genitalia, perineum, or major joints; (4) third-degree burns greater than 5% of TBSA in any age group; (5) deep electrical burns including lightning injury; (6) inhalation injury with burn injury; or (7) circumferential burns of the extremities and chest. Principal investigators developed the BMS study criteria after consideration of NIDRR’s priorities and limitations of study resources. Burn patients qualify as NIDRR subjects if they (1) meet the clinical criteria for a severe burn injury as defined above; (2) receive treatment in the BMS from the time of burn (outpatient or inpatient) for primary burn wound closure; (3) are of any age (a case is considered pediatric if age is <16y at time of burn and adult if age is ≥16y at time of burn); (4) will be provided some rehabilitation services at the BMS, including psychiatric, physical, occupational, recreational, psychologic, vocational, or other traditional rehabilitation therapies throughout the 2 years after the burn injury and agree to follow-up assessments at 6 months postburn, 1 year postburn, and 2 years postburn; (5) understand and sign an institutional informed consent to participate, or if unable, a family or legal guardian understands and signs an informed consent for the patient; and (6) agree to release data collected at discharge and at the subsequent follow-ups to the data center so that all data can be combined and used for research purposes. People are not in the BMS database if (1) they die in the hospital before being discharged from the initial acute injury admission, (2) they will not or cannot provide follow-up responses, or (3) they do not sign the informed consent or data release forms. We compared distributions of demographic characteristics in ABA/NBR data with corresponding distributions found in the NIDRR BMS database.5 The NBR database contains records on burn patients collected from 70 burn centers in the United States and Canada. For the purposes of our analysis, data from acute burn admissions (ie, NBR classifications of admission defined as acute admission, burn injury related; self/EMS [emergency medical services] admit; hospital referral; emergency room referral; or burn center referral) to the burn facility occurring between 1994 and 2004 were selected for evaluation. Year of admission was used for this criterion in lieu of year of burn because it was a more reliable variable in the NBR dataset. Records were excluded from analysis if age or year of burn fields were missing or invalid. To compare with our longitudinal rehabilitation study wherein subjects must be alive at discharge, patients in the NBR who died in the hospital were excluded from the comparison dataset, giving a final comparable dataset, namely, “NBR/all.” Adult and pediatric (age at burn <16y) cases are analyzed separately, primarily because BMS inclusion dates of adults (1994–present) and children (1998–present) differ. The BMS project did not include pediatric cases until the second funding cycle (October 1997), at which time 1 regional site was replaced with a new international pediatric burn program. The concern is that this new clinical site is a dedicated pediatric burn center that contributes small numbers of adults and many children with very large burns. Moreover, a large proportion of these complicated cases come from outside the United States. The BMS study criteria (described earlier) for inclusion in the BMS study population were applied to the NBR/all sample to derive a second comparison dataset, namely, “NBR/criteria”; this reduced the adult population from 73,407 to 14,790 and the pediatric population from 24,736 to 3616 (fig 1). Because the application of the BMS selection criteria to the NBR was based on available data elements in the NBR dataset, minor losses of sensitivity and specificity in the classification algorithm exist. That is, applying our criteria to the NBR sample was inexact because of the limited variables in the NBR dataset and slightly different definitions of common concepts in the BMS and NBR data values. For example, a criterion required identification of deep, second-degree burns, but only full-thickness and partial-thickness estimates are included in the NBR. Consequently, it was not possible to determine deep second-degree burns; we therefore used only full-thickness surface area in our criteria. Also, it was impossible to identify in the NBR data either burns that posed a threat of functional or cosmetic impairment in the face, hands, feet, genitalia, perineum, or major joints or circumferential burns. We therefore included any patient with a full-thickness burn on the head, neck, face, hands, feet, or genitalia. Data Analysis Frequency distributions for demographic and burn-related characteristics were calculated for the NBR/all sample, the smaller NBR/criteria sample, and the BMS sample. Chi-square goodness-of-fit tests were used to compare the various distributions of the BMS variables with the smaller NBR sample only.6 Comparisons with the larger NBR populations are not formally reported here because these results are identical, in all cases, to those found in the analyses performed for the smaller NBR/criteria population. Goodness-of-fit method was used because of an overlap of subjects in the 2 (BMS, NBR) populations, because most members of the BMS project contribute data annually to the NBR as well. NBR distributions are used as the true or reference values. In this case, the BMS distributions are compared (using 1-sample test vs 2-sample test) with the assumed fixed and known distributions determined in the NBR database. The large sample size of the current BMS database (N=3800) renders most statistical comparisons with the NBR population distributions statistically significant. The major concern to researchers using BMS data, however, is how well these data represent the general burn population nationally. Consequently, our analyses and discussions focus on the magnitude of differences among the various distributions rather than statistical significance. In addition, we compared the discharge demographic characteristics of BMS patients with the demographic characteristics of those who returned for the 6-month assessment, which is a subset of our total study population. Similarly, we performed the same comparison for those with 12-month assessments and those with 24-month assessments. In these comparisons, the BMS/discharge population was used as the reference population. Typical chi-square tests of association were inappropriate because of nonindependent samples: samples at 6, 12, and 24 months were subsamples of the discharge population. We therefore made these comparisons using goodness-of-fit methods. These analyses were performed separately for the adult and pediatric populations. SAS was used for all statistical analyses.6,a Results  Threats to External Validity Clinicians and researchers should always question the external validity of any published study, where external validity refers to the investigator’s ability to generalize results beyond the studied sample. The BMS project is often criticized because of the seemingly restrictive criteria applied to its BMS patients. To assess threats to external validity we compared our data with the NBR datasets. Table 1 identifies distributional differences between NBR/criteria and BMS subjects.7 Separate frequency distributions for the BMS, NBR/all, and NBR/criteria groups for many of their common variables are provided. The statistical comparisons contained in this table relate to the BMS and NBR/criteria populations wherein all BMS entry criteria have been applied to the NBR population. | | |  | Variable | Adults (≥16y) | Children (<16y) |  |
|---|
 | NBR/All | NBR/Criteria | BMS | P⁎ | NBR/All | NBR/Criteria | BMS | P⁎ |  |
|---|
 | Total | 73,407 | 14,790 | 2188 | <.001 | 24,736 | 3616 | 1067 | <.001 |  |  | Age group (y) | N=73,407 | N=14,790 | N=2188 | | N=24,736 | N=3616 | N=1067 | |  |  | 0−2 | ND | ND | ND | | 12,853 (51.96) | 1434 (39.66) | 301 (28.21) | |  |  | 3−5 | ND | ND | ND | | 3941 (15.93) | 718 (19.86) | 276 (25.87) | |  |  | 6−10 | ND | ND | ND | | 3879 (15.68) | 782 (21.63) | 242 (22.68) | |  |  | 11−15 | ND | ND | ND | | 4063 (16.43) | 682 (18.86) | 248 (23.24) | |  |  | 16−20 | 7309 (9.96) | 1209 (8.17) | 273 (12.48) | | ND | ND | ND | |  |  | 21−30 | 15,439 (21.03) | 2844 (19.23) | 421 (19.24) | | ND | ND | ND | |  |  | 31−40 | 15,968 (21.75) | 3380 (22.85) | 551 (25.18) | | ND | ND | ND | |  |  | 41−50 | 14,190 (19.33) | 3034 (20.51) | 449 (20.52) | | ND | ND | ND | |  |  | 51−60 | 8655 (11.79) | 1942 (13.13) | 251 (11.47) | | ND | ND | ND | |  |  | 61−70 | 4968 (6.77) | 1127 (7.62) | 125 (5.71) | | ND | ND | ND | |  |  | 71−80 | 3955 (5.39) | 813 (5.50) | 80 (3.66) | | ND | ND | ND | |  |  | 81+ | 2923 (3.98) | 441 (2.98) | 38 (1.74) | | ND | ND | ND | |  |  | Sex | N=73,059 | N=14,726 | N=2188 | <.004 | N=24,471 | N=3585 | N=1067 | <.169 |  |  | Male | 52,705 (72.14) | 10,972 (74.51) | 1689 (77.19) | | 15,811 (64.61) | 2343 (65.36) | 676 (63.36) | |  |  | Female | 20,354 (27.86) | 3754 (25.49) | 499 (22.81) | | 8660 (35.39) | 1242 (34.64) | 391 (36.64) | |  |  | Ethnicity | N=70,755 | N=14,228 | N=2169 | <.001 | N=23,474 | N=3438 | N=1046 | <.001 |  |  | White | 48,662 (68.78) | 9347 (65.69) | 1494 (68.88) | | 12,584 (53.61) | 1659 (48.25) | 321 (30.69) | |  |  | Black | 11,400 (16.11) | 2435 (17.11) | 307 (14.15) | | 4938 (21.04) | 676 (19.66) | 165 (15.77) | |  |  | Hispanic | 8013 (11.32) | 1801 (12.66) | 266 (12.26) | | 4291 (18.28) | 904 (26.29) | 532 (50.86) | |  |  | Asian | 1328 (1.88) | 315 (2.21) | 45 (2.07) | | 881 (3.75) | 88 (2.56) | 7 (0.67) | |  |  | Native American | 382 (0.54) | 98 (0.69) | 36 (1.66) | | 214 (0.91) | 31 (0.90) | 10 (0.96) | |  |  | Other | 970 (1.37) | 232 (1.63) | 21 (0.97) | | 566 (2.41) | 80 (2.33) | 11 (1.05) | |  |  | TBSA (%) | N=73,407 | N=14,790 | N=2140 | <.001 | N=24,736 | N=3616 | N=1049 | <.001 |  |  | 0−15 | 70,820 (96.48) | 12,203 (82.51) | 1145 (53.50) | | 23,921 (96.71) | 2801 (77.46) | 346 (32.98) | |  |  | 16−30 | 1692 (2.30) | 1692 (11.44) | 591 (27.62) | | 455 (1.84) | 455 (12.58) | 263 (25.07) | |  |  | 31−50 | 614 (0.84) | 614 (4.15) | 269 (12.57) | | 223 (0.90) | 223 (6.17) | 249 (23.74) | |  |  | 51−100 | 281 (0.38) | 281 (1.90) | 135 (6.31) | | 137 (0.55) | 137 (3.79) | 191 (18.21) | |  |  | Work-related | N=44,855 | N=13,213 | N=2060 | <.748 | NA | NA | NA | |  |  | Yes | 12,235 (27.28) | 3611 (27.33) | 530 (25.73) | | NA | NA | NA | |  |  | No | 32,620 (72.72) | 9602 (72.67) | 1530 (74.27) | | NA | NA | NA | |  |  | Assault | N=73,401 | N=14,790 | N=2147 | <.001 | N=24,731 | N=3616 | N=1037 | <.001 |  |  | Yes | 1028 (1.40) | 265 (1.79) | 71 (3.31) | | 855 (3.46) | 236 (6.53) | 47 (4.53) | |  |  | No | 72,373 (98.60) | 14,525 (98.21) | 2076 (96.69) | | 23,876 (96.54) | 3380 (93.47) | 990 (95.47) | |  |  | Etiology | N=50988 | N=14,564 | N=2180 | <.001 | N=21049 | N=3565 | N=1054 | <.001 |  |  | Fire/flame | 25,094 (49.22) | 8453 (58.04) | 1337 (61.33) | | 6124 (29.09) | 1599 (44.85) | 591 (56.07) | |  |  | Scald | 15,094 (29.60) | 3406 (23.39) | 187 (8.58) | | 11,166 (53.05) | 1239 (34.75) | 310 (29.41) | |  |  | Other | 10,800 (21.18) | 2705 (18.75) | 656 (30.09) | | 3759 (17.86) | 727 (20.39) | 153 (14.52) | |  |  | Inhalation injury | N=70376 | N=14,297 | N=2079 | <.001 | N=23790 | N=3503 | N=1026 | <.718 |  |  | Yes | 4581 (6.51) | 3753 (26.25) | 231 (10.73) | | 779 (3.27) | 656 (18.73) | 191 (18.62) | |  |  | No | 65,795 (93.49) | 10,544 (73.75) | 1921 (89.27) | | 23,011 (96.73) | 2847 (81.27) | 853 (81.38) | |  |  | Length of stay | N=73,407 | N=14,790 | N=2169 | <.001 | N=24,736 | N=3616 | N=1041 | <.001 |  |  | 0−7d | 44,280 (60.32) | 4197 (28.38) | 308 (14.20) | | 18,244 (73.75) | 1210 (33.46) | 289 (27.76) | |  |  | 8−30d | 23,587 (32.13) | 7280 (49.22) | 1290 (59.47) | | 5574 (22.53) | 1732 (47.90) | 477 (45.82) | |  |  | 31−183d | 5484 (7.47) | 3273 (22.13) | 560 (25.82) | | 904 (3.65) | 662 (18.31) | 271 (26.03) | |  |  | 6mo to 1y | 50 (0.07) | 38 (0.26) | 11 (0.51) | | 10 (0.04) | 9 (0.25) | 4 (0.38) | |  |  | 1y+ | 6 (0.01) | 2 (0.01) | 0 (0.00) | | 4 (0.02) | 3 (0.08) | 0 (0.00) | |  |  | Days on ventilator (d) | N=59286 | N=11,954 | N=2146 | <.001 | N=19,681 | N=2913 | N=965 | <.001 |  |  | 0−7 | 55,121 (92.97) | 9591 (80.23) | 1919 (89.42) | | 18,987 (96.47) | 2429 (83.38) | 847 (87.77) | |  |  | 8−30 | 2997 (5.06) | 1520 (12.72) | 160 (7.46) | | 559 (2.84) | 370 (12.70) | 96 (9.95) | |  |  | 31+ | 1168 (1.97) | 843 (7.05) | 67 (3.12) | | 135 (0.69) | 114 (3.91) | 22 (2.28) | |  | | | |
| ⁎ Chi-square goodness-of-fit test between frequency distribution of the NBR/criteria population and the NIDRR database population. |
In Table 1, statistical significance of all tests performed is most notable. This is not surprising because of the large sample sizes of the 3 (NBR/all, NBR/criteria, BMS) populations. Consequently, the absolute differences, or clinical significance, become more relevant in consideration of internal and external validity. Selection bias (table 2) is the biggest threat to external validity. Other pitfalls (eg, synergism, confounding, effect-modification) usually associated with cohort studies can be controlled or adjusted for during the analysis phase.8 If there are similar distributions and all values of the variable exist in the data, the use of multivariate statistical adjustments will compensate for distributional differences found in the sample populations. Appropriate adjustments can identify real risk and causal factors that apply to the target populations of interest.9, 10, 11 (Criteria for determining whether a risk factor is causal are provided in appendix 1.) | | |  | Affected | Threat due to | Concept |  |
|---|
 | External validity | Selection bias | Deviation or distortion, consistently in 1 direction that may occur because of unequal allocation or assembly of subjects into study groups. |  |  | | Censoring/lost to follow-up | Inability to collect data and to determine outcomes on 1 or more subjects in a longitudinal study design after some time period within the designated follow-up period. Censoring can be systematic or at random. Systematic censoring is most detrimental to external validity. |  |  | | Ascertainment bias | Deviation or distortion consistently in 1 direction that may occur due to incomplete identification or recognition of the target population for whom the cohort is supposed to represent. |  |  | Internal validity | Response bias | Events occurring concurrently with interventions could cause the observed effect. |  |  | | Misclassification bias | Deviation or distortion, consistently in 1 direction that may occur because of inconsistently identifying the appropriate outcome. Systematic differences or conditions in respondent characteristics that could also cause the observed effect. |  |  | | Measurement bias | Inaccurate measurement device or measurement strategy or the nature of a measurement may change over time and/or change experimental conditions. |  |  | | Recall bias | Deviation or distortion consistently in 1 direction that may occur because of inconsistently identifying the appropriate exposure levels or inaccurately measuring exposure levels during the observational period. |  |  | | Interviewer bias | Deviation or distortion consistently in 1 direction that may occur because of a researcher’s ability to influence a research subject’s response either directly or indirectly by leading questions. |  |  | | Contamination | Deviation or distortion consistently in 1 direction that may occur because of combining multiple levels of exposure |  |  | | Intent-to-treat analysis | Analysis strategy that assigns people to 1 exposure level or 1 intervention level, even though they may have changed levels during the study period. |  |  | | Crossover effects | Deviation or distortion consistently in 1 direction that may occur because of study individuals crossing over or changing exposure levels during the observational period. |  |  | | Compliance | Deviation or distortion consistently in 1 direction that may occur because of people inappropriately following the prescribed protocol or stopping the intervention too soon. |  |  | | Testing/observation bias | Deviation or distortion consistently in 1 direction that may occur because of being observed, evaluated, or exposed to an evaluation instrument (Hawthorne effect). |  |  | | Maturation | Naturally occurring changes over time could be confused with a treatment effect. Inappropriate concurrent control group. |  |  | | Interactive effects | The impact of an intervention may depend on the level of another intervention or risk factor (effect modification). |  |  | | Regression | When subjects exhibit extreme scores initially, they will often have less extreme subsequent scores, an occurrence that can be confused with an intervention effect. |  |  | | Missing data values | Missing data differ from censoring in that individual data points are lost at various data collection times during the entire follow-up period. Biases due to nonrandom missing data affect internal validity the most. |  | | | |
Table 1 provides insight into possible selection bias in the BMS population. The adult BMS study population is slightly younger and has larger burns, often caused by fire or flame. Compared with the NBR/all distributions, the BMS population has more burns involving inhalation injury, with a larger proportion using ventilators for long periods. When compared with the NBR/criteria group, the opposite is true. This NBR/criteria group includes more than twice the proportion of inhalation injuries as the BMS group, with a higher percentage on ventilators for more than 7 days. Consequently, the adult BMS study population has longer hospital stays than both NBR populations (85% with 8−183d vs 40% and 72% in the NBR/all and NBR/criteria groups, respectively). The BMS pediatric study population is older (72% >3 years old vs 49% and 60% in the NBR/all and NBR/criteria groups, respectively). Again, the BMS group consists of more severe burns, predominately due to fire or flame. The BMS and NBR/criteria groups have similar length of stay patterns (33% and 28% for 1−7d, the lowest category, respectively). However, in the NBR/all group, approximately 74% of patients have hospital stays of 1 to 7 days. The fact that the BMS pediatric population has more severe burns with possibly more complicated acute care needs is not surprising given the make-up of the BMS clinical sites. Of the 4 sites now contributing to the BMS database, one is dedicated to treating very serious, complicated burns in the pediatric population at little or no cost to the patient’s family. The remaining 3 general burn centers contribute pediatric cases, but the majority of pediatric cases come from the dedicated pediatric BMS site. Threats to Internal Validity Threats to internal validity in longitudinal studies emerge from 2 equally unfavorable conditions associated with protocol design and data collection. First, internal validity is compromised if the research protocol, data definitions, or analytic strategies are inappropriately applied during the study and/or are not consistent with the predefined research objectives. Moreover, in longitudinal studies, an additional threat to internal validity emerges when samples at subsequent assessment periods are inconsistent or substantially different from the initial targeted sample at baseline. To assess threats to internal validity in our longitudinal evaluations, we made cross-sectional comparisons of characteristics among the baseline (discharge) distributions in the BMS dataset to subsets of compliant subjects at each of the 3 follow-up periods and to BMS subjects who complied completely by undergoing all 4 assessments (discharge, 6, 12, and 24mo).6 From table 3 we can observe that BMS follow-up data are not systematically missing. They are therefore as generalizable as they would be if a random sample had been used at each time point. Again, despite statistically significant differences in distributions for a few characteristics, in general, these differences are not large and should have little effect on statistical inference. The fact that all statistical tests comparing body surface area burned at discharge among the various subpopulations were nonsignificant provides positive evidence that missing observations are not due to severity of burn complications. Although we can make generalizations based on table 3, reevaluation of the various sources of missing observations for each analytic assessment we performed is a necessary precaution for researchers. That is, certain scientific questions that use a specific assessment instrument may result in slightly different patterns of missing responses than would others. If patterns are suggestive of nonrandom missing data, even usually correct and appropriate analyses can lead to faulty inferences and inappropriate generalizations. Discussion  Generalizability The perceived inability to generalize results to the larger burn population is not a real threat for the BMS project. Because BMS sites contribute to both the NBR and BMS databases, our BMS population will always be a subset of the larger burn population. In addition, from the analyses provided here, the BMS population, in fact, is not very different from the general burn population and these minor differences should not preclude generalizing to the larger group. In fact, the larger differences in processes of care among all major burn centers, the various distinctions in the target populations around the United States, and the variations in rehabilitation strategies used at the various burn centers far outweigh the slight differences that we have shown between our study’s population and the larger burn population. Stratified analysis is a very simple and direct analytic strategy for handling situations in which generalizations may be a concern. Although stratified analysis can lead to many and varied conclusions, especially when the numbers of strata are large, this analytic method allows consumers to separately adopt some conclusions and reject others that are deemed to be less generalizable and/or reliable. For example, one should always consider stratified analyses between adult and pediatric burns when assessing outcome data among these 2 populations. This recommendation is not new; most burn literature presents stratified analyses and findings because of the unique etiology, natural course, and rehabilitation strategies for pediatric cases that set them apart from the adult burn population. Analyses that combine these groups can have confounding bias that obscures any reasonable interpretations. In addition, in the presence of major confounding (ie, effect modification), the recommended analytic strategy is to conduct stratified analyses. Compensatory Analytic Methods In longitudinal studies, significant resources are consumed trying to ensure subject compliance. Ensuring that subjects remain on their designated interventions and that they return for their required assessments, or both, is costly in follow-up studies. The BMS project has been struggling to improve follow-up assessment rates, especially in some subgroups.12 Nevertheless, the 2 main concerns in longitudinal analyses presented by BMS researchers are nonrandom missing data and large attrition rates. Random losses of subjects’ information at random time points specified in the BMS follow-up protocols are of less concern than subject attrition. At present, we have more and powerful analytic methods that compensate for losses in the precision of estimates generated from random missing data than methods we have for handling problems accruing from shrinking sample sizes due to attrition. Statistical packages like SAS, SPSS, GLIM, R and S-Plus now offer more powerful multivariate modeling and analysis procedures that provide efficient and effective data analyses. These procedures allow analysts to easily declare random effects and to prespecify complex covariance structures that account for unbalanced (missing data) study designs. For example, unstructured, compound-symmetric and auto-regressive correlation structure assumptions allow us to estimate critical variance structures implied in the data from series of observations with missing data at various time points. Repeated-measures analyses that use SAS’s Proc Mixed or GLIMIX are much more efficient, accurate, and precise than the traditional split-plot analysis of variance (ANOVA) or multivariate ANOVA methods previously used. Because of the availability of these sophisticated analytic procedures to adjust for random missing data, analyses in the presence of missing information is now a manageable analytic problem. Concerns about nonrandom missing data, however, still remain a major issue. Statisticians are providing guidance, analytic tests, and adjustment procedures when studies are at risk for this limitation.13 Unfortunately, variations exist in data processing strategies both within and between the different BMS centers. Although each clinical site manages its own data collection processes, they are required to submit commonly collected data elements to our centralized data center. Between model systems, variation in data collection strategies, especially with respect to patient tracking for follow-up assessments, can have a significant influence on the study’s validity. Censored or missing data generated by subject “lost to follow-up” can sometimes produce very different follow-up rates across model systems, systematically dissimilar study groups across the combined study population, and sometimes a group very different from the original target population of interest, thus affecting our study’s internal and external validity. Missing data at designated follow-up times do not affect the makeup of the study population across time. The loss of subjects, however, will compromise internal validity and lead to incorrect inferences or models if there are selective losses during the evaluation period. For example, less severe burns do not require constant or long-term medical attention but do require vigilant psychosocial counseling. In these cases, the more severe cases are more likely to return to the medical facility for follow-up and consequently affect the make-up of the studied population. Because these undesirable influences are not completely avoidable, we always use analytic methods (like those mentioned above) that best compensate for these problems and allow for accurate inferences. Epidemiologists and outcomes researchers are often interested in factors that are not only associated with outcomes but are factors that cause particular diseases or outcomes. Causation or causal factors in epidemiologic and outcomes studies typically refer to risk factors, exhibiting the property that the expected outcome (disease burden) vanishes (diminishes) as the punitive factors are removed (reduced). Showing causation in observational studies is difficult and requires the researcher to discover necessary and sufficient associations between the factors and outcome, biologic plausibility for the association, and no alternative explanations. (See appendix 1 for the proposed set of conditions that imply causation in quasi-experimental studies, because causation cannot be determined directly in observational studies.) Appendix 1 outlines a strategy for establishing causality in cohort studies. Although the criteria are clear, the process of applying these criteria is rarely straightforward. None of the criteria are either necessary or sufficient for making a causal inference. In fact, strict adherence to any 1 criterion without consideration of bias, random error, synergism, confounding, and effect modification could result in incorrect assignment of causality.14, 15 These common statistical concepts (eg, bias, random error, synergism, confounding, effect modification) are well defined in numerous textbooks on biostatistics and epidemiology.7 The general recommendation, however, is to always test for, and if present, adjust for the effect of these conditions.7 Other sources of misinterpretation, excluding bias, are easily overcome by appropriate multivariate statistical modeling strategies, which often amount to no more than simple statistical adjustments. Well-known analytic methodologies for incorporating complex statistical adjustments during the analysis phase of an investigation include logistic or Poisson regression for discrete or count data16, 17 and mixed models, generalized linear models, and nonlinear multiple regression modeling for continuous outcomes measures.18, 19, 20, 21 The remaining liability of using cohort study designs, once appropriate statistical adjustments are made, is related to the degree in which sources of bias (see table 2) are accounted for during the phases of study design and data collection. Issues of Generalizability of BMS Subjects Because observational studies are an important tool in outcomes research, the analysis, interpretation, and generalization of results to broader populations must proceed with analytic diligence. The potential lack of generalizability in the BMS project is related to selection and response biases. Selection bias is cumulative and results from several procedural steps that researchers must apply during the recruitment and follow-up assessment periods. Sources of selection bias emerge in our BMS project for one or more of the following reasons: (1) all BMS subjects represent those types of patients who would normally be admitted to a regional burn center (this is probably not a serious limitation); (2) application of our entry criteria identifies study subjects who meet criteria for serious burns and consequently selects a more severely injured study population; (3) because of the longitudinal nature of our study and regard for improvement of rehabilitation outcomes, very serious cases that result in a hospital death are not included; (4) because of consent issues, our study selects only certain types of personalities—subjects who consent to participate in a longitudinal study and provide medical data to multiple centers and researchers; and finally, (5) our study is more likely to select subjects who will remain compliant with the designated research protocol and return to the BMS for follow-up assessments. Most analyses of BMS data, in general, avoid comparing sample characteristics of BMS subjects directly with those of other populations. We generally focus on differences due to alternative rehabilitation regimes within and across the different model systems. BMS researchers are mostly concerned with how many and how quickly people return to an acceptable state of community reintegration and with barriers to early and effective reintegration. For assessing changes and improvement rates, we necessarily use internal controls that naturally occur by varying burn severity levels, levels of disability, or burden of burn injury—for example, in the BMS study population. These results are generalizable after appropriate adjustments so that other non-BMS burn centers can apply our estimates of relative improvement or degradation to their specific reference populations and make reasonable estimates, inferences, and associations for their own patient populations. Researchers should be cautious about blindly computing simple descriptive population estimates from our BMS data despite the reasonable similarity between the BMS and the NBR datasets described in this article. Whenever criteria are applied to study populations, simple incidence and prevalence measures can become distorted, thus providing an inaccurate representation of more general population characteristics. Conclusions  We identify the inherent liabilities of the BMS multicenter data collection project that produced a longitudinal, observational outcomes database. Although most of the common liabilities of quasi-experimental studies are present in this information resource, the BMS project has produced important information with notable assets. First, data are produced from multiple large burn centers with a wide range of burns, burn care, and burn rehabilitation programs. Second, the diversity of the BMS population is derived from the very different characteristics of the subjects obtained from center to center. Third, the participating burn centers represent a wide range of burn injuries with a broad spectrum of burn care and rehabilitation programs used to achieve optimal outcomes. Finally, because of the diversity of injury severity and subsequent rehabilitation needs seen in these burn centers, our participating centers have experienced research staff who are well trained and experienced in conducting complicated research protocols. Consequently, this database is distinguished from other outcomes databases by being operational for over 12 years and having the largest collection of sequential outcomes measures on burn victims currently available for monitoring trends in physical, psychologic, and social reintegration. Numerous scientific publications (see http://bms-dcc.uchsc.edu) and presentations have been generated from these data and from related, site-specific substudies of this long-term funded project. By appropriately controlling selection bias and controlling for variation in distributions of study subjects (eg, age, race, TBSA burned) using adjustments and appropriate analyses to control for confounding and effect-modification, we can improve external validity. However, it is not possible to improve internal validity prospectively with complex analysis strategies. Internal validity is affected by other sources of bias, and generally these biases come from inaccurate data. Avoiding the many sources of bias requires diligence in precise conceptualization, detailed operational definitions, and instituting well-described processes to accrue data that are internally valid. Although the BMS project team believes that considerable effort has been spent on acquiring quality data, subjects’ lack of compliance in the follow-up protocol remains the biggest threat to internal validity and is difficult to control with limited resources. Although there is no strict policy against sharing BMS data, the rationale for not providing BMS data openly is a concern for misrepresentation of results because of the internal and external validity threats described in this study. Assurance that appropriate analytic methods will be used and important validation steps are taken must accompany the use of these data. Internally, our policy for producing scientific publications and presentations that use BMS data is to share final drafts with investigators at the other clinical centers for critical review and comments to ensure accuracy and generality. Our goal is to ensure that results and inferences, based on the BMS database, are appropriate and that they generalize to the larger population of burn victims. Suppliers Appendix 1. Epidemiologic Criteria for Cause and Effect Relationships  Causation and Disease Associations are easily found in observational studies when many risk factors are evaluated. Finding associations is not the problem—understanding and interpreting them is. What types of association are possible? Statisticians and epidemiologists are concerned about 4 types of discovered associations. These are summarized as follows: •Spurious associations: caused by chance alone. •Artifactual associations: bias—for example, misclassification or interviewer bias. •Indirect associations: confounding—for example, the association between 2 factors may actually be due to a third factor. •Causal associations: most difficult to determine. Consequently, over the years a number of researches have derived postulates for causality. There have been numerous attempts to unify the concept of causal factors under the many different study designs and for many different scientific questions relying on the scientific method of proof. Criteria have been proposed that, if met, increase the probability that a risk factor is causal.22 Most of the proposed criteria were derived from the original Henle and Koch postulates or are traced back to the works of John Stuart Mill (1865) and which are referred to as Mill’s cannons.23, 24 The original postulates were proposed many years ago to address necessary and sufficient conditions for a causal relationship between parasites and diseases. This original set is summarized as follows: 1.The parasite occurs in every case of the disease in question and under circumstances that can account for the pathologic changes and clinical course of the disease. 2.It occurs in no other disease as a fortuitous and nonpathogenic parasite. 3.After being fully isolated from the body and repeatedly grown in pure culture, it can induce the disease anew. A unifying set of conditions has been proposed by Alfred Evans24 in a review article that summarizes many different sets of postulates that address defining causal relationships in various levels of scientific investigations. His set of unifying concepts is summarized in the table below. Although the criteria listed here may help us determine whether an exposure or characteristic is a causal risk factor for a disease, their application to a given hypothesis is never an uncomplicated or straightforward affair. None of the criteria are either necessary or sufficient for making a causal interpretation. In fact, strict adherence to any one of them without other considerations could result in incorrect conclusions.14, 15 Why is it still so difficult to establish causality? Many reasons exist but some include 1.Multifactorial etiology 2.Multiplicity of effects 3.These criteria depend on change (what about genes?) or how high is high if the cause is a continuous variable (high blood pressure?) 4.Imperfect knowledge 5.Need to show no alternative explanations for the increased incidence in the outcome in the subpopulation with the risk factor. Criteria for Showing That a Risk Factor is a Causal Factor  | 1. Prevalence of the disease should be significantly higher in those exposed to the putative cause than in those not so exposed. |  |  | 2. Exposure to the putative cause should be present more commonly in those with the disease than in controls without the disease when all risk factors are held constant. |  |  | 3. Incidence of the disease should be significantly higher in those exposed to the putative cause than in those not so exposed as shown in prospective studies. |  |  | 4. Temporality: the disease should follow exposure to the putative agent with a distribution of incubation periods on a bell-shaped curve. |  |  | 5. A spectrum of host responses should follow exposure to the putative agent along a logical biologic gradient from mild to severe. |  |  | 6. A measurable host response after exposure to the putative cause should regularly appear in those lacking this before exposure (ie, antibody, cancer cells) or should increase in magnitude if present before exposure; this pattern should not occur in people so exposed. |  |  | 7. Experimental reproduction of the disease should occur in higher incidence in animals or humans appropriately exposed to the putative cause than in those not so exposed; this exposure may be deliberate in volunteers experimentally induced in the laboratory or demonstrated in a controlled regulation of natural exposure. |  |  | 8. Elimination or modification of the putative cause or of the vector carrying it should decrease the incidence of the disease (control of polluted water or smoke or removal of the specific agent). |  |  | 9. Prevention or modification of the host’s response on exposure to the putative cause should decrease or eliminate the disease (immunization, drug to lower cholesterol, specific lymphocyte transfer factor in cancer). |  |  | 10. The whole thing should make biologic and epidemiologic sense. |  | | | |
Below are some additional and overlapping concepts that are used to talk about causal associations in observational studies. The degree to which scientific investigations address the postulates just listed or the concepts discussed next and provide explanations of critical rationale, the stronger the arguments for causal relationships in the absence of higher levels of scientific investigations. Important concepts for assessing causal associations: 1.Strength of the association. The stronger the observed association, the less likely it is that the association is entirely due to various sources of error that might distort the results. Thus, in general, weaker associations do not lend as much support to a causal interpretation. 2.Dose-response effect. The observation that frequency of disease increases with the dose or level of exposure usually lends support to a causal interpretation. In the absence of such an effect, the investigator may not be able to rule out certain alternative explanations, such as a threshold effect or a saturation effect. An observed dose-response effect may be due entirely to a graduated distortion of bias. 3.Lack of temporal ambiguity. It is very important for the researcher to establish that the hypothesized cause preceded the occurrence of the disease. In general, this task is more difficult when investigating diseases with long latent periods and study factors that change over time. The above criteria can be applied to the findings of a single study, and thus, they may be regarded as internal validity issues. However, any of them may be satisfied in some studies and not in others that deal with the same hypothesis. The following criteria are not necessarily study-specific and depend, to a certain extent, on a priori knowledge. 4.Consistency of the findings. If all studies dealing with a given relationship produce similar results, a causal interpretation is enhanced. 5.Biologic plausibility of the hypothesis. If the hypothesized effect makes sense in the context of current biologic knowledge, we are more likely to accept a causal interpretation. However, biologic plausibility cannot be demanded of a hypothesis, because the current state of knowledge may be inadequate to explain our observations. 6.Coherence of the evidence. If the findings do not seriously conflict with our understanding of the natural history of the disease or with other accepted facts about disease occurrence (eg, secular trends), a causal interpretation is strengthened. In essence, this criterion combines aspects of consistency and biologic plausibility and, therefore, is similarly delineated as described in points 4 and 5. 7.Specificity of the association. If the study factor is found to be associated with only 1 disease or if the disease is found to be associated with only 1 factor (after testing many possible associations), a causal interpretation is suggested. However, this criterion cannot be used to reject a causal hypothesis, because many factors have multiple effects and all (or most) diseases have multiple causes. References  1. 1Burn Model System announced priority. 62 Federal Register 9886 (Mar 4, 1997). 2. 2Cook TD, Campbell DT. Quasi-experimentation: design and analysis issues for field settings. Chicago: Rand McNally; 1997;. 3. 3Harris AD, McGregor JC, Perencevich EN, et al. The use and interpretation of quasi-experimental studies in medical informatics. J Am Med Inform Assoc. 2006;13:16–23. MEDLINE |
CrossRef
4. 4Klein M, Lezotte D, Fauerbach J, et al. The National Institute on Disability and Rehabilitation Research burn model system database: a tool for the multicenter study of the outcome of burn injury. J Burn Care Res. 2007;28:84–96.
CrossRef
5. 5Miller SF, Bessey PQ, Schurr MJ, et al. National Burn Repository 2005: a ten-year review. J Burn Care Rehabil. 2006;27:411–436. 6. 6SAS Institute Inc. SAS/STAT user’s guide, version 8. Cary: SAS Institute Inc; 1999;. 7. 7Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, Newman TB. Designing clinical research. 2nd ed.. Philadelphia: Lippincott Williams & Wilkins; 2001;. 8. 8Jekel JF, Elmore JG, Katz DL. Epidemiology, biostatistics and preventive medicine. Philadelphia: WB Saunders; 1996;. 9. 9Rothman KJ, Greenland S. Modern epidemiology. Philadelphia: Lippincott-Raven; 1998;. 10. 10Kelsey JL, Thompson WD, Evans AS. Methods in observational epidemiology. New York: Oxford Univ Pr; 1986;. 11. 11Shadish M, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin; 2002;. 12. 12Holavanahalli R, Lezotte D, Hayes M, et al. Profile of patients lost to follow-up in the Burn Injury Rehabilitation Model Systems’ longitudinal database. J Burn Care Res. 2006;27:703–712.
CrossRef
13. 13Fairclough DL. Design and analysis of quality of life studies in clinical trials. New York: Chapman & Hall/CRC Pr; 2002;. 14. 14Hill AB. The environment and disease: association or causation?. Proc R Soc Med. 1965;58:295–300. MEDLINE 15. 15Hill AB. A short textbook of medical statistics. London: Hodder & Stoughton; 1977;. 16. 16Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed.. New York: John Wiley & Sons; 2000;. 17. 17Allison P. Logistic regression using the SAS system: theory and applications. Cary: SAS Institute Inc; 1999;. 18. 18Cnaan A, Laird NM, Slasor P. Tutorial in biostatistics: using the generalized linear mixed model to analyze unbalanced repeated measures and longitudinal data statistics in medicine. Stat Med. 1987;16:2349–2380. MEDLINE |
CrossRef
19. 19Cole JWL, Grizzle JE. Applications of multivariate analysis of variance to repeated measurements experiments. Biometrics. 1966;22:810–828.
CrossRef
20. 20Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974.
CrossRef
21. 21Littell RC, Stroup WW, Freund RJ. SAS system for mixed models, version 4. Cary: SAS Institute Inc; 2002;. 22. 22Susser M. Causal thinking in the health sciences. New York: Oxford Univ Pr; 1973;. 23. 23Mill JS. A system of Logic (1856). In: Last JM editors. A dictionary of epidemiology. 2nd ed.. New York: Oxford Univ Pr; 1988;. 24. 24Evans AS. Causation and disease: the Henle-Koch postulates revisited. Yale J Biol Med. 1976;149:175–195. a Department of Preventive Medicine and Biometrics, University of Colorado and Health Sciences Center, Denver, CO b Department of Physical Medicine and Rehabilitation, University of Texas Southwestern Medical Center, Dallas, TX c Johns Hopkins University School of Medicine, Baltimore, MD d Department of Psychiatry and Behavioral Science, University of Texas Medical Branch, Galveston, TX e University of Washington Burn Center and Division of Plastic Surgery, Harborview Medical Center, Seattle, WA Reprint requests to Dennis C. Lezotte, PhD, Dept of Preventive Medicine and Biostatistics, University of Colorado Health Sciences Center School of Medicine, 4200 E 9th Ave, Campus Box B-119, Denver, CO 80262
Supported by the National Institute on Disability and Rehabilitation Research, Office of Special Education and Rehabilitative Service, U.S. Department of Education (grant no. H133A020402). No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated. PII: S0003-9993(07)01562-6 doi:10.1016/j.apmr.2007.09.011 © 2007 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved. | |
|