Advertisement

Open and Abundant Data is the Future of Rehabilitation and Research

  • Duncan R. Babbage
    Correspondence
    Corresponding author Duncan R. Babbage, PhD, Centre for Person Centred Research, Auckland University of Technology, Private Bag 92006, Auckland 1142, New Zealand.
    Affiliations
    Centre for Person Centred Research, Auckland University of Technology, Auckland, New Zealand
    Search for articles by this author
Published:December 30, 2013DOI:https://doi.org/10.1016/j.apmr.2013.12.014

      Abstract

      Development of our current research practices has been driven by a number of assumptions and from operating within practical constraints. Technological change is beginning to remove many of these limits, although our research and practice has so far only gradually and partially evolved in response. The U.S. federal government is now mandating open data repositories for research that it funds. Policy changes regarding open data repositories and an increasing abundance of data arising from both research and practice provide the opportunity to revisit some assumptions. With abundant sources of data that may increasingly be collected automatically during rehabilitation, it seems fundamentally flawed that the resolution of the primary quantitative analysis approaches widely understood in our field is so limited by the need to contain the risk of false positives. Identification of more sophisticated approaches to our data, which may well already exist in the statistical literature, is a high priority.

      Keywords

      Open Data

      In May 2013, the U.S. federal government mandated that both publications and datasets resulting from all research they fund (through agencies with grant pools of ≥$100 million) must be made openly accessible in machine-readable formats.

      The White House/President Barack Obama. Executive Order – Making open and machine readable the new default for government information. 2013. Available at: http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-. Accessed August 17, 2013.

      Funding agencies have been required to rapidly prepare implementation plans.

      Burwell SM, VanRoekel S, Park T, Mancini DJ. Open data policy—managing information as an assett. Memorandum for the heads of executive departments and agencies. Washington (DC). 2013. Available at: http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf. Accessed December 13, 2013.

      This policy reflects wider calls for researchers to provide open online access to their data,
      • Strasser C.
      Closed data… excuses, excuses.
      • Eichler H.G.
      • Abadie E.
      • Breckenridge A.
      • Leufkens H.
      • Rasi G.
      Open clinical trial data for all? A view from regulators.
      including a 2004 declaration signed by 33 countries regarding access to research data from public funding.
      Organisation for Economic Co-Operation and Development
      It is a practice that has already become standard in some fields (eg, economics) and is a publication requirement of some journals. Many open research data repositories already exist: one website indexes 602 such online repositories.

      Databib. Available at: www.databib.org. Accessed December 13, 2013.

      Vanishingly few studies in rehabilitation have used such services, though given the lead from the U.S. federal government, which may well be followed elsewhere, this is coming.

      Abundant Data

      It is not only open data repositories that mean researchers will need to become familiar with working with large datasets—there is reason to believe that data will become increasingly abundant. Twenty-five years ago, conducting a literature review or looking for subsequent citations to an article included laborious manual review of printed indexes,
      • Adams J.A.
      • Bonk S.C.
      Electronic information technologies and resources: use by university faculty and faculty preferences for related library services.
      locating in a library the paper copy of the issue of the journal in question, and spending many hours over a photocopier to duplicate articles of interest. Source articles were hard wrung from our academic archives. Today, electronic databases and full-text articles provide improved access to the latest knowledge in the field.
      • Tenopir C.
      • King D.W.
      • Edwards S.
      • Wu L.
      Electronic journals and changes in scholarly article seeking and reading patterns.
      • Chan L.
      • Costa S.
      Participation in the global knowledge commons: challenges and opportunities for research dissemination in developing countries.
      However, high-quality data remains a scarce resource. Recruitment of participants appears to be a constant challenge for nearly all research studies, particularly in rehabilitation,
      • Bell K.R.
      • Hammond F.
      • Hart T.
      • Bickett A.K.
      • Temkin N.R.
      • Dikmen S.
      Participant recruitment and retention in rehabilitation research.
      and the burden for both participants and researchers to collect quality data is high. Current technological developments may be the initial tremor that heralds a tsunami of data heading to our shores. The multiple real-time data streams that can be collected by smartphones, wearable devices, and ambient sensors embedded in the environment
      • Dobkin B.H.
      • Dorsch A.
      The promise of mHealth: daily activity monitoring and outcome assessments by wearable sensors.
      may soon provide us with access to more data on rehabilitation practice than we are equipped to process, let alone interpret. Furthermore, not only may such data be collected for formal research studies, but in the future it could be collected as part of the standard clinical record of every person receiving rehabilitation—ready and waiting for intelligent analysis. Deep, complete, and arguably invasive clinical records would need careful consideration of issues of privacy, consent, and data security. This data would, however, also provide the opportunity for insight into issues previously opaque, such as enabling clearer connections to be drawn between the actual treatment dose that has been received and outcome—through, for example, knowing precisely how much a compensatory device was actually used in the community. In order to draw meaningful conclusions from this abundance of data, our analysis methods will need to evolve alongside our data collection.

      New Analysis Approaches

       Topography rather than biopsy

      Null hypothesis testing has been a major quantitative research tool. Statistical techniques (eg, t tests, F tests) have provided answers regarding effective interventions. However, such techniques suffer from major drawbacks, most notably that the entire premise is the ability to be 95% confident about a single specific analysis. This limitation has long been recognized, as has the associated inherent risks in undertaking multiple analyses: with a 5% chance of a spurious result from any one analysis, the investigations conducted with a dataset must, of necessity, be selected in advance and be carefully limited. A reasonable analogy is taking biopsies. In the absence of any other tool to determine the nature of a condition, a biopsy is a necessary but intrusive and potentially risky operation; therefore, it is always important to take as few biopsies as possible. The use of statistical significant testing seems analogous to taking a biopsy of a dataset; it is a penetrating analysis but needs to be used as sparingly as possible to contain the risk of false positives. Beginning to address this, we have moved our quantitative analyses beyond simple statistical significance testing to the reporting of associated effect sizes. We have a range of more sophisticated multivariate analyses, such as analyses using the generalized linear model and structural equation modeling. The complexity of the relationships between the variables that we examine in rehabilitation seems analogous to mapping unfamiliar terrain. In this metaphor of a geography containing a number of cities, statistical difference tests allowed us to determine if the altitude of two cities differed, providing that we did not make this comparison between too many cities in one region. Our more sophisticated tools allowed us to describe a route from one city to another, but the more roads we described between two places, the less confidence we could have that we knew any reliable routes. It seems fundamentally flawed that the resolution of the primary quantitative analysis approaches widely understood in rehabilitation is so limited by the need to contain the risk of false positives. Measuring just one or two points on a landscape can give us almost no confidence that we truly understand the geography in which we are operating. In topographic mapping, the more data points you have, the better you can understand the terrain.
      Quantitative modeling and analysis tools that bear more resemblance to topographic mapping would enable us to examine and learn from our data in great depth using techniques where closer investigation strengthens rather than weakens the interpretations we can draw from our data. There will no doubt be researchers who have been working in these areas for years, and it is time for their voices to be widely heard. Identification of approaches along these lines that already exist in the statistical literature and dissemination to make them accessible to rehabilitation researchers is a high priority.

       Closing the feedback loop

      Control of even highly overlearned behavior like speech relies fundamentally on continuous feedback loops. Disruption of such feedback with interference (eg, presenting the average speaker with their own voice on a one fifth of a second delay) results in immediate and marked deterioration in speech performance.
      • Yates A.J.
      Delayed auditory feedback.
      It is reasonable to consider the importance of feedback loops in other, more complex behaviors—something that is, for instance, already having positive effect in mental health research.
      • Lambert M.J.
      Emerging methods for providing clinicians with timely feedback on treatment effectiveness: an introduction.
      In rehabilitation practice, the primary goal is long-term adjustment of the person receiving rehabilitation—their level of community integration, meaningful activity to engage in, having close and warm personal relationships a positive adjustment to life. In most rehabilitation services, however, we only have limited feedback on individual outcomes at 6 months post-discharge and almost no feedback loop at all regarding longer-term outcomes for the people receiving our services.
      Why do we have such limited long-term outcome data? As we increasingly deploy Rasch methods/item response theory methods, we are finally learning to properly measure the constructs we are interested in,
      • Tesio L.
      Measuring behaviours and perceptions: Rasch analysis as a tool for rehabilitation research.
      • Jette A.M.
      • Haley S.M.
      Contemporary measurement techniques for rehabilitation outcomes assessment.
      but we still have an ongoing lack of certainty about how to properly define a good long-term outcome.
      • McPherson K.M.
      • Taylor W.J.
      • Leplege A.
      Rehabilitation outcomes: values, methodologies and applications.
      Second, routinely collecting long-term outcome data has been logistically and financially impractical to undertake to date. While the definition of a good outcome may remain open for debate, the lack of data might, in the future, no longer be a constraint. As data can increasingly be automatically collected through ambient sensors
      • Dobkin B.H.
      • Dorsch A.
      The promise of mHealth: daily activity monitoring and outcome assessments by wearable sensors.
      and (further in the future) sifted and refined with the assistance of artificial intelligence systems, longitudinal outcome data may become routinely available. This may not even be restricted to quantitative data. Longitudinal qualitative data
      • Conneeley A.L.
      Quality of life and traumatic brain injury: a one-year longitudinal qualitative study.
      • Eilertsen G.
      • Kirkevold M.
      • Bjørk I.T.
      Recovering from a stroke: a longitudinal, qualitative study of older Norwegian women.
      are currently rare
      • Levack W.M.
      • Kayes N.M.
      • Fadyl J.K.
      Experience of recovery and outcome following traumatic brain injury: a metasynthesis of qualitative research.
      but can be illuminatory.
      As the sheer breadth and depth of quantitative data increases by orders of magnitude in the future, and as researchers begin to have tools that enable them to collect genuinely qualitative data on scale, we may even find our methods start to meet in the middle. Well before that, it is worth considering that it may be time to leave such methodology turf wars behind. Increasing numbers of researchers are already reaping the benefits of mixed-methods research that combines quantitative and qualitative approaches.
      • Kersten P.
      • Ellis-Hill C.
      • McPherson K.M.
      • Harrington R.
      Beyond the RCT - understanding the relationship between interventions, individuals and outcome - the example of neurological rehabilitation.
      We need the landscape paintings that qualitative research provides, which uniquely convey a rich depth of the rehabilitation experience not otherwise accessible to us. We also need the analysis equivalent of both photographic techniques and the precise accuracy of topographic maps: all three provide unique value through unique perspectives on the same rehabilitation landscape.

       Computer-aided analysis

      It is not unreasonable to expect that our analyses of rehabilitation data will be supported in the future by tools that take far more initiative. Instead of software packages that blindly run analyses we select on the tabulated data we provide, we will have tools that actively seek out clarification regarding the nature of our data, use that context to understand the data in ways we have not, and present to us views and analyses of our datasets that we have not anticipated. Sufficiently large datasets, collected automatically from clinical contexts, may be made possible by automated mechanisms for recording, coding, and filtering fine-grained data about rehabilitation processes. Tools that then connect this data to descriptions we provide of the constructs and real-world relationships would enable us to draw together a breadth of data that we might struggle to comprehend with our current approaches. It may be that apparently disparate parts of an interdisciplinary rehabilitation process interact in ways we have not understood. For example, we may discover that one therapeutic intervention provides a necessary, but previously unrecognized, foundation for the success of another intervention that we had believed to be unrelated, explaining some past differences in outcome between apparently similar people receiving rehabilitation. Most rehabilitation researchers and practitioners would not be able to build such tools themselves, but (aspirational as they might seem) we can call for the technology and software research and development necessary to deliver this desired future to us. Alongside this, we can take concrete steps to pursue the consensus on data format and metadata standards and begin to routinely submit datasets to appropriately accessible online archives; therefore, we have laid the foundation for such analyses when the tools become available.

       Secondary analysis

      Given past data scarcity, as researchers we have generally treated our data as proprietary. With a few notable exceptions (eg, model systems in the United States

      Model Systems Knowledge Translation Center. Available at: www.msktc.org. Accessed December 13, 2013.

      The University of Alabama at Birmingham. NSCISC National Spinal Cord Injury Statistical Center. Available at: www.nscisc.uab.edu. Accessed December 13, 2013.

      National Data and Statistical Center. Traumatic Brain Injury Model Systems. Available at: www.tbindsc.org. Accessed December 13, 2013.

      National Data and Statistical Center. For the Burn Model Systems. Available at: burndata.washington.edu. Accessed December 13, 2013.

      ), as research teams we have complied with ethical requirements to enable reanalysis of our data but have set fairly high barriers to other researchers accessing our datasets. Ethical and practical constraints have contributed to this, including concerns about participant privacy and the simple infeasibility of widely sharing datasets in a pre-Internet era.
      Given the new requirements of U.S. federal funding agencies, rehabilitation researchers are going to be actively grappling with how to resolve these issues in the imminent future. Initially, this will simply be required to meet funding agency policy. It will certainly allow other researchers and the public to more easily confirm published research findings. However, such archiving of data online could also lead to novel research and dissemination opportunities. Routine online data archiving may lead to innovative publication formats. Rather than presenting a static analysis of a dataset, electronic versions of publications could be presented as dynamic views on published datasets. The reader could be presented with the authors' analysis and interpretation of the data, but when provided with appropriate tools, they could be free to drill down into the detail underlying analyses in the text, tables, and figures. Access could likewise be provided to the full datasets underlying qualitative analyses. Tools to verify, modify, and rerun analyses could enable readers to examine questions of interest that were not covered in the original articles. Provision could be made for such secondary analyses to be accompanied with explanatory discussion and submitted for peer review and subsequent publication alongside the original article as a citable commentary or extension. Authors may, reasonably, want to have embargo periods where they have the sole right to prepare publications from the datasets they have collected, and there would be cases where some parts of a dataset would need to be withheld to protect participant privacy. In a context where funding is increasingly constrained, thus providing a competitive impetus, it may be a difficult time for researchers to contemplate even partially surrendering control over (their?) data. However, academic research appears to be moving in the direction of data openness as the default starting point, and despite competitive pressures this may catalyze innovation and progress.

      Right Questions

      We cannot be naïve. Progress will not necessarily be smooth, and technology and other innovations are not going to rapidly deliver dramatic gains in rehabilitation outcomes. There is the risk we could always be looking for a solution that is just over the horizon, which could be a barrier to implementation of our current knowledge into practice today. Although developing increasingly sophisticated research tools and methodologies has considerable value, our primary focus should be to continue to ask the right questions. Such questions transcend any particular dataset, methodology, or technology of analysis or dissemination. Some of the questions that we must continue to strive to answer are the following: (1) What theories may explain the patterns that we draw out of our data? Similarly, how may our observations guide further development of those theories? (2) How will we ensure that we actually translate our research findings into practice, rather than merely outlining how they might relate to practice? (3) What matters to us, individually, as communities, and as a society? Does our rehabilitation assist the people we are working with to achieve this desired state as much as possible?

      Action Points

      First, open access to full machine-readable datasets is required for all future studies funded by significant U.S. federal government research agencies. Preparing datasets to meet data repository requirements from the outset of a study (eg, following a recognized metadata standard) will greatly simplify compliance.
      Second, researchers should consider voluntary deposit of their data into a relevant repository, even where they are not required to do so by funding agencies. Even a lengthy embargo period (eg, 10 years) would be preferable to data lost to reanalysis.
      Third, datasets should be managed mindful of the possibility that the researchers' preferred publication venue might require datasets to be deposited by the time they are ready to submit for peer review, in the same way that many journals now require clinical trials to be registered prior to data collection. (This is a general observation regarding changes in academic publishing practices. It should not be read as foreshadowing future Archives editorial policy.)

      Acknowledgment

      I thank Kath McPherson, PhD for her feedback on an initial version of this article.

      References

      1. The White House/President Barack Obama. Executive Order – Making open and machine readable the new default for government information. 2013. Available at: http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-. Accessed August 17, 2013.

      2. Burwell SM, VanRoekel S, Park T, Mancini DJ. Open data policy—managing information as an assett. Memorandum for the heads of executive departments and agencies. Washington (DC). 2013. Available at: http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf. Accessed December 13, 2013.

        • Strasser C.
        Closed data… excuses, excuses.
        Data Pub: California Digital Library Conversations about data Web site. 2013; (Available at:) (Accessed December 13, 2013)
        • Eichler H.G.
        • Abadie E.
        • Breckenridge A.
        • Leufkens H.
        • Rasi G.
        Open clinical trial data for all? A view from regulators.
        PLoS Med. 2012; 9: e1001202
        • Organisation for Economic Co-Operation and Development
        OECD principles and guidelines for access to research data from public funding. 2007; (Accessed December 13, 2013)
      3. Databib. Available at: www.databib.org. Accessed December 13, 2013.

        • Adams J.A.
        • Bonk S.C.
        Electronic information technologies and resources: use by university faculty and faculty preferences for related library services.
        College and Research Libraries. 1995; 56: 119-131
        • Tenopir C.
        • King D.W.
        • Edwards S.
        • Wu L.
        Electronic journals and changes in scholarly article seeking and reading patterns.
        New Information Perspectives. 2009; 61: 5-32
        • Chan L.
        • Costa S.
        Participation in the global knowledge commons: challenges and opportunities for research dissemination in developing countries.
        New Library World. 2005; 106: 141-163
        • Bell K.R.
        • Hammond F.
        • Hart T.
        • Bickett A.K.
        • Temkin N.R.
        • Dikmen S.
        Participant recruitment and retention in rehabilitation research.
        Am J Phys Med Rehabil. 2008; 87: 330-338
        • Dobkin B.H.
        • Dorsch A.
        The promise of mHealth: daily activity monitoring and outcome assessments by wearable sensors.
        Neurorehabil Neural Repair. 2011; 25: 788-798
        • Yates A.J.
        Delayed auditory feedback.
        Psychol Bull. 1963; 60: 213-232
        • Lambert M.J.
        Emerging methods for providing clinicians with timely feedback on treatment effectiveness: an introduction.
        J Clin Psychol. 2005; 61: 141-144
        • Tesio L.
        Measuring behaviours and perceptions: Rasch analysis as a tool for rehabilitation research.
        J Rehabil Med. 2003; 35: 105-115
        • Jette A.M.
        • Haley S.M.
        Contemporary measurement techniques for rehabilitation outcomes assessment.
        J Rehabil Med. 2005; 37: 339-345
        • McPherson K.M.
        • Taylor W.J.
        • Leplege A.
        Rehabilitation outcomes: values, methodologies and applications.
        Disabil Rehabil. 2010; 32: 961-964
        • Conneeley A.L.
        Quality of life and traumatic brain injury: a one-year longitudinal qualitative study.
        Br J Occup Ther. 2003; 66: 440-446
        • Eilertsen G.
        • Kirkevold M.
        • Bjørk I.T.
        Recovering from a stroke: a longitudinal, qualitative study of older Norwegian women.
        J Clin Nurs. 2010; 19: 2004-2013
        • Levack W.M.
        • Kayes N.M.
        • Fadyl J.K.
        Experience of recovery and outcome following traumatic brain injury: a metasynthesis of qualitative research.
        Disabil Rehabil. 2010; 32: 986-999
        • Kersten P.
        • Ellis-Hill C.
        • McPherson K.M.
        • Harrington R.
        Beyond the RCT - understanding the relationship between interventions, individuals and outcome - the example of neurological rehabilitation.
        Disabil Rehabil. 2010; 32: 1028-1034
      4. Model Systems Knowledge Translation Center. Available at: www.msktc.org. Accessed December 13, 2013.

      5. The University of Alabama at Birmingham. NSCISC National Spinal Cord Injury Statistical Center. Available at: www.nscisc.uab.edu. Accessed December 13, 2013.

      6. National Data and Statistical Center. Traumatic Brain Injury Model Systems. Available at: www.tbindsc.org. Accessed December 13, 2013.

      7. National Data and Statistical Center. For the Burn Model Systems. Available at: burndata.washington.edu. Accessed December 13, 2013.