Advertisement

Design and Implementation of Clinical Trials in Rehabilitation Research

      Abstract

      Hart T, Bagiella E. Design and implementation of clinical trials in rehabilitation research.
      The growth of evidence-based medicine means that both researchers and clinicians must grasp the complex issues involved in implementing clinical trials, which are especially challenging for the behavioral (experience-based) treatments that predominate in rehabilitation. In this article we discuss selected issues germane to the design, implementation, and analysis of group-level clinical trials in rehabilitation. We review strengths, weaknesses, and best applications of 1-sample, between-subjects, and within-subjects study designs, including newer models such as practical clinical trials and point-of-care trials. We also discuss the selection of appropriate control conditions against which to test rehabilitation treatments, as well as issues related to trial blinding. In a section on treatment definition, we discuss the challenges of specifying the active ingredients in the complex interventions that are widely used in rehabilitation, and present an illustration of 1 approach to defining treatments via the learning mechanisms that underlie them. Issues related to treatment implementation are also discussed, including therapist allocation and training, and assessment of treatment fidelity. Finally we consider 2 statistical topics of particular importance to many rehabilitation trials: the use of multiple or composite outcomes, and factors that must be weighed in estimating sample size for clinical trials.

      Key Words

      List of Abbreviations:

      CONSORT (Consolidated Standards of Reporting Trials), FDR (false discovery rate), PCT (practical clinical trial), RCT (randomized controlled trial), UC (usual care)
      AS EXPECTATIONS GROW for practitioners in rehabilitation to both develop and use evidence-based treatment methods, rehabilitation professionals must grasp the complicated issues involved in implementing clinical trials including selecting a design; defining, standardizing, and ensuring the faithful administration of treatments; and testing the effects of treatment using appropriate statistical methods. Researchers must understand the methods needed for proper implementation of treatment studies, and clinicians must comprehend the issues involved so that they may interpret and evaluate the evidence for implementation in practice. In this article we first review basic experimental designs and associated control conditions, with a focus on their application to rehabilitation research. We then turn to a discussion of several issues germane to research on experience-based (behavioral) treatments: treatment definition and treatment fidelity. While passive treatments such as medication and surgery have a definite place in rehabilitation, our main focus is on the distinct challenges posed by testing the complex behavioral treatments that predominate in rehabilitation. This discussion is followed by a brief section on multicenter trials and efficiency. Finally, we discuss the use of multiple/composite outcomes in rehabilitation trials, and issues surrounding sample size estimation.
      As no one article can address all of the methodologic issues in clinical trials, this article is necessarily selective. In addition to the focus on behavioral research noted above, we are concerned here with experimental treatment trials at the group level. The reader is referred to other works for discussion of single-subject designs and their variants,
      • Tate R.L.
      • McDonald S.
      • Perdices M.
      • Togher L.
      • Schultz R.
      • Savage S.
      Rating the methodological quality of single-subject designs and n-of-1 trials: introducing the Singe-Case Experimental Design (SCED) Scale.
      • Backman C.L.
      • Harris S.R.
      • Chisholm J.A.
      • Monette A.D.
      Single-subject research in rehabilitation: a review of studies using AB, withdrawal, multiple baseline, and alternating treatments designs.
      observational treatment research,
      • Horn S.D.
      • Gassaway J.
      Practice based evidence: incorporating clinical heterogeneity and patient-reported outcomes for comparative effectiveness research.
      and quasi-experimental designs.
      • Shadish W.R.
      • Cook T.D.
      • Campbell D.T.
      Experimental and quasi-experimental designs for generalized causal inference.

      Experimental Designs for Rehabilitation Trials

      Clinical trials seek to prove the efficacy of treatments in the clinical laboratory, and/or to show their effectiveness in clinical practice. Treatments are compared with one another, or to control conditions, in between-subjects designs (each subject receives a different condition) or within-subjects designs (each subject receives all conditions). Both classic and more recent texts summarize these designs and their strengths and limitations, primarily with regard to trade-offs between internal and external validity.
      • Shadish W.R.
      • Cook T.D.
      • Campbell D.T.
      Experimental and quasi-experimental designs for generalized causal inference.
      • Campbell D.T.
      • Stanley J.C.
      Experimental and quasi-experimental designs for research.
      • Portney L.G.
      • Watkins M.P.
      Foundations of clinical research: applications to practice.
      Some of the designs that apply to rehabilitation research, including those using 1 treatment condition, multiple conditions compared between subjects, and multiple conditions compared within subjects, are summarized in table 1.
      Table 1Strengths, Limitations, and Applications of Experimental Designs
      DesignStrengthsLimitationsApplications
      1-Sample Designs
      Pre-post: 1 sample receives treatment, with testing before and after
      • Simple
      • Quick and inexpensive
      Weak internal validity: threatened by maturation, expectancy, etc.
      • Early protocol development
      • Proof of concept
      • Safety/feasibility assessment
      • Estimation of direction and magnitude of effects
      • Best for chronic (stable) samples
      Historical control: sample receiving current treatment is compared with a prior sample who did notFaster/less expensive/smaller sample than concurrent control

      Facilitates recruitment
      • New and old samples may not be comparable
      • Old sample may not be well characterized
      • Internal validity threatened by historical shifts affecting cohorts
      • Concurrent control is infeasible for ethical or administrative reasons
      • Best when high-quality data on historic controls are available
      Futility study: treatment effects in 1 sample are compared with preset criteriaEfficient way to screen worth of newer treatments
      • Internal validity not protected; may lead to false positives
      • Efficacy not tested
      Early stage of treatment development
      Between-Subjects Designs
      RCT, including variants, eg, dose control, additive, and dismantling designs: comparable participants are randomly allocated to 2 or more conditions
      • Criterion standard for rigorous test of internal validity especially where efficacy is the primary question
      • Multiarm designs can gain efficiency and economy by sharing controls
      • Weak external validity if sample highly selected
      • Costly and difficult to manage
      • Difficulty constructing appropriate control groups for complex treatments
      • Some control conditions may deter participants
      • Can answer questions about mechanisms of treatment effects
      • Testing of efficacy of treatments and/or their components compared with:
      • • no treatment or sham treatment (if no standard of care)
      • • UC or established treatment, if it exists
      • • different doses or variations in treatments known to be efficacious
      • Best for focused/circumscribed treatments and for those with preliminary evidence of efficacy
      Cluster randomized design: unit of randomization is group or facility rather than individualHomogeneity of treatment delivery within clusters

      Minimal cross-contamination of different treatments
      May be inefficient, especially if treatment response varies among clusters

      May be subject to recruitment bias if recruiter is aware of cluster allocation to treatment
      Treatment naturally delivered in groups or across a facility, program, or unit
      PCT: treatments compared under real-world conditions
      • High external validity
      • Can examine broad range of outcomes
      • Internal validity may be low due to sample variability, lack of rigor, uncontrolled attrition
      • Resource-intensive due to large sample, long follow-up
      • Treatment effectiveness and policy questions (risks, costs, benefits)
      • Comparisons of proven treatments to one another
      Within-Subjects Designs
      AB design: participants exposed to 2 conditions in uniform sequence
      • Simple and low-cost
      • Can provide high power
      Internal validity threatened by time-related factors (may be offset by using different lengths of A phase, randomly assigned)Treatments that are expected to lead to permanent change
      Cross-over design: participants exposed to 2 conditions, randomized to a sequence (AB, BA)High power/high efficiencyInternal validity threatened by carry-over effectsTreatments that are expected to be at least partly reversible
      A main consideration in selecting a design, or evaluating the appropriateness of a design in published research, is the match to the phase of development of the treatment in question.
      • Whyte J.
      • Gordon W.
      • Gonzalez Rothi L.J.
      A phased developmental approach to rehabilitation research: the science of knowledge building.
      • Whyte J.
      • Barrett A.
      Advancing the evidence base of rehabilitation treatments: a developmental approach.
      • Gonzalez-Rothi L.J.
      Cognitive rehabilitation: the role of theoretical rationales and respect for the maturational process needed for our evidence.
      In early stages of development, a 1-sample design such as a pre- and posttest study may be warranted for proof-of-concept, to establish feasibility, and to ensure that effects of a novel treatment are in the desired direction.
      • Hart T.
      • Vaccaro M.
      • Hays C.
      • Maiuro R.
      Anger self-management training for people with traumatic brain injury: a preliminary investigation.
      Another type of early-stage design is afforded by a futility study,
      • Levin B.
      The utility of futility.
      which is designed to determine efficiently whether a new intervention has potential to be carried to further research. Futility designs may involve more than 1 treatment, but the simplest type is a 1-sample design that compares outcomes of treated participants with previously known outcomes, such as the proportion of people with a given problem who achieve a favorable outcome without treatment. The research question focuses on lack of efficacy, reversing the logical status of the null and alternative hypotheses as they are typically formulated in studies seeking to show treatment efficacy. The risk of type I errors (falsely rejecting a promising treatment) is minimized, and type II errors (accepting an ineffective treatment for more rigorous study) are considered to be of less concern. Futility designs have been used little in rehabilitation research, but may provide an efficient way of selecting treatments for studies with more rigorous (and expensive) designs. For example, Palesch et al
      • Palesch Y.Y.
      • Tilley B.C.
      • Sackett D.L.
      • Johnston K.C.
      • Woolson R.
      Applying a phase II futility study design to therapeutic stroke trials.
      used data from completed phase III studies of ischemic stroke treatments to simulate the results that would have been obtained had the trials been designed as futility studies. They found that trials of all 3 treatments that were found to be ineffective would have ended sooner using a futility design.
      Another 1-sample design compares outcomes of a current treatment sample with those of a past (hence historical) reference group who have been treated differently, or not at all. This design may be used in situations where a concurrent control group is thought to be unfeasible; for example, Malec et al
      • Malec J.
      • Buffington A.
      • Moessner A.
      • Degiorgio L.
      A medical/vocational case coordination system for persons with brain injury: an evaluation of employment outcomes.
      • Malec J.
      • High W.
      • Sander A.
      • Struchen M.
      • Hart K.
      Vocational rehabilitation.
      used historical return to work rates after traumatic brain injury and benchmark rates from treatment outcomes reported in previous studies to argue that an early intervention program designed to prevent vocational failure was more efficacious than no treatment. While these designs may provide useful information, they are subject to serious threats to internal validity due to the likelihood that the current and historical participants, and the experiences to which they are exposed, differ in important ways that are not adequately measured. These may include changes in diagnostic criteria due to improved technology, changes in funding which alter the available pool of participants, or changes in facility management that result in treatment delivery that is more or less effective, or more or less comprehensive, compared with the current scenario.
      In rehabilitation research generally, the chief threats to internal validity include maturation effects, which are changes that occur over time simultaneous with a treatment, for example, spontaneous recovery or healing, and testing effects, including practice effects in which performance on a test improves due to repeated testing.
      • Park N.W.
      • Ingles J.L.
      Effectiveness of attention rehabilitation after an acquired brain injury: a meta-analysis.
      These threats are best handled by using a control or comparison group that is of comparable acuity and receives the identical testing schedule. The researcher must consider carefully whether the treated group would be expected to change without treatment; unless one is attempting to change the rate of recovery, it is often best to seek participants with stable levels of the target problem. Instrumentation effects should be considered in any protracted treatment trial, because they include the subtle effects that may accrue when trial staff becomes more expert in evaluating the outcomes under study, or when staff turnover leads to variability in such expertise. Even more importantly for an intervention study, internal validity may also be threatened by changes in expertise among treatment staff over the duration of the trial. These effects may be minimized by criterion-based staff training and uniform supervision. Regression to the mean is a notoriously troublesome problem for trials in which participants are selected for extreme scores on some measures of a particular problem. On average the subsequent scores will be less extreme, which can mimic or exaggerate treatment effects. Like many other threats to internal validity, this can be handled by using a between-subjects design with an appropriate control group (see table 1).
      The randomized controlled trial (RCT) is widely considered the criterion standard for establishing treatment efficacy.
      • Friedman L.M.
      • Furberg C.D.
      • DeMets D.L.
      Fundamentals of clinical trials.
      The most important characteristic of RCT studies is that all eligible study participants have the same chance of receiving all treatments, which reduces biases related to selection factors such as disease severity or history of previous treatment. As the sample size increases, randomization tends to produce groups that are comparable on both measured and unknown factors that may affect prognosis or treatment response. However, pretreatment group equivalence must still be confirmed prior to analyzing the results.
      As randomization is the cornerstone of the RCT and responsible for its advantages, a brief discussion of the schemes by which it is effected is in order. There are 2 main types of randomization, fixed and adaptive. With fixed randomization a list of allocation to treatment groups is created at the beginning of the study, and the probability of being assigned to each group remains the same throughout. With adaptive randomization, a fixed list is created ahead of time but an algorithm is also put in place to change the allocation probabilities as the study progresses, to favor the more beneficial treatment or to balance the experimental groups. Another common randomization technique is blocked randomization. With this technique, treatment allocation blocks are created in such a way that the same number of study participants is assigned to each treatment within a block. For example, in a randomization list for a study with 2 treatments and blocks of 4, every 4 patients will include 2 assigned to treatment A and 2 to treatment B. To avoid guessing on the part of study personnel, larger blocks (6–8) are generally used; sometimes, different block sizes are combined in the allocation list in random order.
      Randomization may also be stratified by important covariates (eg, severity of disability or clinical site) to ensure that an equal number of patients is assigned to each of the treatment arms in all strata. A different randomization list is then created for each stratum. Stratified randomization is especially useful when some of the strata contain a small number of patients. In this case, simple randomization may lead to imbalances in treatment group allocation, increasing the chances that treatment effects will be confounded with baseline factors that affect outcomes. In the absence of strata, post hoc analysis may help to remove the influence of confounding factors, but this is adequate only if such factors are known and measured well.
      The rigor that accompanies RCTs also brings disadvantages, including complexity and expense. Trials tightly focused on efficacy tend to be weak in external validity: the extent to which the active ingredients of the treatment remain active under the diverse conditions encountered in clinical reality. Unfortunately, current trends emphasize evidence from RCTs to the exclusion of other study designs (eg, in systematic reviews), and investigators in rehabilitation research feel increasing pressure to force their studies into an RCT framework in order to obtain funding. While the strengths of the RCT are undeniable, these trends run the risk of rejecting other designs that could provide evidence bearing on different phases of research,
      • Whyte J.
      • Gordon W.
      • Gonzalez Rothi L.J.
      A phased developmental approach to rehabilitation research: the science of knowledge building.
      • Whyte J.
      • Barrett A.
      Advancing the evidence base of rehabilitation treatments: a developmental approach.
      or that are more advantageous for testing clinical effectiveness.
      A cluster randomized design, also called a group randomized design, is a special case of an RCT in which the unit of randomization is not the individual patient, but a group of patients who all receive the same treatment. This literally may be an intervention administered in a group, or the cluster may refer to an entire unit or facility that has been randomized to a treatment arm. This design is appropriate for treatments that must be administered to groups, and treatments that could not realistically be coadministered in the same facility or by the same team. For example, Merom et al
      • Merom D.
      • Phongsavan P.
      • Wagner R.
      • et al.
      Promoting walking as an adjunct intervention to group cognitive behavioural therapy for anxiety disorders–a pilot group randomized trial.
      randomized 11 anxiety treatment groups to standard cognitive behavior intervention plus training to promote home-based brisk walking, or standard treatment plus education about healthy eating. (While both conditions were associated with clinical improvement, as expected, the additional exercise intervention led to higher levels of improvement for certain types of patients.) It should be noted that cluster randomized trials may be prone to biases that do not affect well-implemented RCTs that randomize at the level of individual patients. In a cluster randomized trial, the allocation of the patient is often known ahead of time, because all patients in a group or facility are randomized to the same condition. This advance knowledge may sway recruiters' decisions about which patients to invite into the trial.
      • Brierley G.
      • Brabyn S.
      • Torgerson D.
      • Watson J.
      Bias in recruitment to cluster randomized trials: a review of recent publications.
      Several designs are available that strike a balance between the rigor of an RCT and the need for demonstrating real-world effectiveness. In practical clinical trials (PCTs), the hypothesis and study design are developed specifically to answer the questions faced by decision makers, such as questions about the relative risks, costs, and benefits in practice of various treatments.
      • Tunis S.R.
      • Stryer D.B.
      • Clancy C.M.
      Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy.
      PCTs compare clinically relevant alternative interventions that may be widespread in practice, using more diverse samples than an RCT and more distal measures of outcome (eg, satisfaction with life). PCTs may also be less focused on treatment adherence than an RCT, in consideration of the variability on this factor found in the real world of clinical care. For example, Cicerone et al
      • Cicerone K.D.
      • Mott T.
      • Azulay J.
      • et al.
      A randomized controlled trial of holistic neuropsychologic rehabilitation after traumatic brain injury.
      compared the effects on community integration of comprehensive, holistic rehabilitation to standard multidisciplinary day treatment in chronic traumatic brain injury. Despite considerable variability in the sample, which was drawn from multiple community sources, modest superiority was shown for the holistic intervention, which included interventions targeting self-regulation of emotion and cognition in addition to more standard day-treatment goals. The psychiatric rehabilitation literature is replete with examples of PCTs that, taken together, provide convincing evidence of the superiority of supported employment over a wide variety of traditional employment readiness services for people with serious mental illness.
      • Bond G.R.
      Supported employment: evidence for an evidence-based practice.
      Although PCTs are less tightly controlled than RCTs, they can be conducted with a high degree of rigor while also providing very strong external validity.
      A novel research design that seeks to integrate research with clinical realities is called the point-of-care clinical trial. In this design, a comparison of 2 reasonable approaches to a target problem is embedded in a clinical setting that can readily deliver either approach. Providers refer patients to be approached for consent, after which they are randomized to 1 or another approach; outcomes are tracked using routine clinical measures. The clinical system is expected to adopt the approach, if any, which over time proves to be superior. Fiore et al
      • Fiore L.D.
      • Brophy M.
      • Ferguson R.E.
      • et al.
      A point-of-care clinical trial comparing insulin administered using a sliding scale versus a weight-based regimen.
      outline the best applications of this design, which include trials using objective outcomes (because there is no attempt to mask patients or treaters to group assignment) and those in which the treatments may be delivered as part of standard care. In these situations, point-of-care trials may prove to be more efficient than traditional RCTs because the treatments and outcome measurements are part of routine care and the translation of research findings into clinical care are built in.
      As shown in table 1, within-subjects designs, in which each participant is exposed sequentially to every condition, can be very powerful and efficient because each participant serves as his or her own control. In particular, cross-over designs in which study participants are randomized to different sequences of treatment phase (eg, AB, BA) can be very efficient if properly conducted. The effects of time (eg, maturation or testing) as well as treatment sequence must always be considered as potential confounds in these trials. When the experimenter can rule out carry-over effects (influence of first-phase treatments on second-phase outcomes), the analysis may proceed by pooling data collected during the same treatment conditions and comparing them using a test for paired samples. A good example of a rehabilitation trial using a crossover design is provided by the study testing the NeuroPage, an electronic reminding device for people with acquired brain injury.
      • Wilson B.A.
      • Emslie H.C.
      • Quirk K.
      • Evans J.J.
      Reducing everyday memory and planning problems by means of a paging system: a randomised control crossover study.
      Participants were randomized to a phase in which they used the pager to accomplish intended tasks, followed by a phase in which task accomplishment was measured without the pager, or to the reverse order of phases. Carry-over effects were present for some participants who received the pager before the no-pager condition, but the overall comparison of conditions remained statistically significant and convincing. In fact, memory for intended tasks was so superior with versus without the device that it was accepted for funding by the National Health Service in the United Kingdom. Notably, this study also resembled a PCT in the deliberate inclusion of very diverse participants, including some who had been treatment failures in clinical settings.
      It is important to note that this discussion of trial designs is not exhaustive. Many variations are possible and designs may also be combined, as resources permit, to take advantage of their differing strengths. For example, an RCT may be followed by a cross-over phase so that every participant has the chance to receive every treatment. This can be especially helpful to make the trial more palatable if it contains a control condition that consists of no treatment or weak treatment condition (see the section Control and Comparison Conditions for Rehabilitation Trials below). As another example, factorial designs provide a cost-effective way of conducting 2 or more trials, each with its own primary outcome, in 1 group of participants. In a full factorial design, patients are randomized to arms that represent all possible combinations of 2 or more experimental treatments of interest. This design affords information on the average effects of all conditions, and allows inferences about conditions interacting with one another as well as in additive combinations. When efficiency is the primary reason for conducting a factorial study, it is assumed that the various treatments do not interact. This assumption may be tenuous for rehabilitation trials, inasmuch as multiple treatments might be selected for their effects on the same outcomes. Factorial designs are generally more useful for efficiency in prevention trials, in which the outcomes being studied are different for the different treatments.

      Control and Comparison Conditions for Rehabilitation Trials

      In certain within-subjects trials and in most RCTs, investigators use a concurrent control condition against which to evaluate the experimental treatment. The complex issues surrounding the selection or design of control conditions appropriate to behaviorally based treatments have been discussed in articles dedicated to the subject
      • Hart T.
      • Fann J.
      • Novack T.
      The dilemma of the control condition in experience-based cognitive and behavioral treatment research.
      • Mohr D.C.
      • Spring B.
      • Freedland K.E.
      • et al.
      The selection and design of control conditions for randomized controlled trials of psychological interventions.
      • Barkauskas V.H.
      • Lusk S.L.
      • Eakin B.L.
      Selecting control interventions for clinical outcome studies.
      • Whitehead W.E.
      Control groups appropriate for behavioral interventions.
      • Saks E.
      • Jeste D.V.
      • Granholm E.
      • Palmer B.W.
      • Schneiderman L.
      Ethical issues in psychosocial interventions research involving controls.
      and will be discussed only briefly here. Selecting a control constitutes a vexing problem for many rehabilitation trials because of the near impossibility of creating a true placebo, as is readily found in medication trials. A placebo by definition is outwardly identical to the experimental (active) substance, which allows for double-blinding, yet it is inert (contains no active ingredients), and thereby does neither good nor harm. A moment's reflection will confirm that these conditions are impossible to satisfy for the majority of rehabilitation trials. Rehabilitation generally consists of volitional learning-based experiences, for example, training and practice episodes or the induction of new knowledge, rather than passive manipulations or substances ingested into the body. With few exceptions, neither participants nor treaters may be blinded to the contents of those experiences. It is even more difficult to conceive of experience-based treatments that are truly inert, cause no harm, and deliver none of the active ingredients of an experimental treatment, while controlling for the many confounding variables that threaten internal validity. Paterson and Dieppe
      • Paterson C.
      • Dieppe P.
      Characteristic and incidental (placebo) effects in complex interventions such as acupuncture.
      describe the difficulty well: in order to create a placebo, one must divide a treatment into the effects that are specific or integral to that treatment, and effects that are incidental. But effects that are incidental to 1 treatment may be integral to another, depending on the theory underlying the treatment. When this important principle is forgotten, a common mistake is to assume that all effects of interpersonal interaction (eg, therapist warmth, attention, ability to engage the patient) are incidental effects that must be controlled in placebo fashion, despite the impossibility of doing so.
      It should be noted that, despite the concerns about double-blinding as regards the participant and treater, the person who assesses the outcomes of a trial can and should be masked to the participants' group assignment. This type of blinding is not difficult to attain in rehabilitation trials, using basic precautions to keep evaluators and treaters from revealing information to one another. Specific instructions should be given to participants to prevent them from revealing treatment allocation information to evaluators. Outcome assessments that depend on observer ratings may be amenable to additional blinding procedures, such as audio- or videotaping the outcome sessions and keeping the raters blind to whether the sessions were conducted before or after treatment. It is critically important to maintain outcome assessment blinding in trials that do not use objective measures, which includes the majority of rehabilitation trials. For example, Wood et al
      • Wood L.
      • Egger M.
      • Gluud L.L.
      • et al.
      Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study.
      showed that incomplete blinding was much more strongly associated with biases in favor of exaggerated treatment effects in health care studies that used patient-reported outcomes or other subjective measures compared with those using completely objective outcomes, such as mortality or blood levels of particular substances.
      An increasingly common control condition used in rehabilitation trials is the deferred treatment or waitlist control. Theoretically, this should control for the effects of patients' and treaters' expectancy that the patient will improve. However, research has shown that patients in waitlist groups may do worse than those who neither receive nor anticipate treatment.
      • Mohr D.C.
      • Spring B.
      • Freedland K.E.
      • et al.
      The selection and design of control conditions for randomized controlled trials of psychological interventions.
      Another option is to create a sham treatment (also referred to as attention control) that superficially resembles the condition thought to be active and includes potentially beneficial ingredients such as therapist interest and social contact. For example, in a trial comparing the effects of different doses of mental rehearsal on motor function in stroke, Page et al
      • Page S.J.
      • Dunning K.
      • Hermann V.
      • Leonard A.
      • Levine P.
      Longer versus shorter mental practice sessions for affected upper extremity movement after stroke: a randomized controlled trial.
      created a sham treatment that involved many components of the active treatment (listening to tapes, thinking about stroke and its effects) without providing rehearsal of movement. For both ethical and practical reasons, sham treatments are best used for brief interventions and those whose mechanisms of action are not thought to be highly dependent on the therapist-patient relationship.
      A final comparison condition of particular relevance to rehabilitation trials is comparison of an experimental treatment to usual care (UC) (or treatment-as-usual). The main limitations here are that (1) there may not exist any UC for the target problem, and (2) UC may be too variable to characterize adequately in a study. When clinical variation exists, experimenters may create a standardized version of typical care for the problem in an approach that has been termed devised UC.
      • Barkauskas V.H.
      • Lusk S.L.
      • Eakin B.L.
      Selecting control interventions for clinical outcome studies.
      This helps to reduce noise in the control condition but raises the question as to whether the comparison sheds any light on the comparative efficacy of real treatments. In yet another type of UC comparison model, an experimental treatment is added to UC for some participants, but not others, in order to determine if the new treatment has any value over standard of care. Again, this model is most appropriate and meaningful for problems for which there is a well-defined standard of care.
      Regardless of the control condition selected, the treatment actually received by all participants should be carefully measured during the trial. Participants assigned to receive less or delayed treatment may seek alternative care outside the trial, and may even receive the experimental treatment elsewhere. Another important consideration is that all treatment arms should be presented as equally credible, from the recruitment and consent documents to the therapist training and supervision process, in an effort to offset unequal participant or treater enthusiasm for a particular condition. It has been shown that simply knowing whether one has been assigned to an experimental or a control treatment alters self-reported outcomes.
      • Whitehead W.E.
      Control groups appropriate for behavioral interventions.

      Fregni F, Imamura M, Chien HF, et al. Challenges and recommendations for placebo controls in randomized trials in physical and rehabilitation medicine: a report of the international placebo symposium working group. Am J Phys Med Rehabil;89:160-72.

      Therapists may also introduce biases if they perceive that 1 treatment arm is less well developed, standardized, or checked than another. In an ideal case, neither therapist nor participant would know which of the treatments was experimental and which was the control.
      • Mohr D.C.
      • Spring B.
      • Freedland K.E.
      • et al.
      The selection and design of control conditions for randomized controlled trials of psychological interventions.
      In any case, both participants and treaters should be asked to rate the credibility of the arm they experienced and the degree to which they felt the assigned treatment was relevant to the patient's concerns.
      • Whitehead W.E.
      Control groups appropriate for behavioral interventions.

      Treatment Definition

      Treatment definition is a crucial first step in developing a rehabilitation trial. Defining a treatment means describing its known or hypothesized active ingredients and the specific ways that these ingredients are conveyed from treater to patient.
      • Hart T.
      Treatment definition in complex rehabilitation interventions.
      This is challenging because many treatments used in rehabilitation are multifaceted, intending to have an impact on multiple targets or goals. Even interventions with circumscribed targets often have the added complexity of requiring active engagement and effort on the part of the patient, in contrast to passive treatments such as medication or surgery. The majority of rehabilitation treatments also depend in part on the interpersonal relationship between patient and therapist. To add yet another layer of complexity, the collective behavior of the rehabilitation team affects outcomes in ways that are just beginning to be understood.
      • Strasser D.C.
      • Falconer J.A.
      • Herrin J.S.
      • Bowen S.E.
      • Stevens A.B.
      • Uomoto J.
      Team functioning and patient outcomes in stroke rehabilitation.
      • Strasser D.C.
      • Falconer J.A.
      • Stevens A.B.
      • et al.
      Team training and stroke rehabilitation outcomes: a cluster randomized trial.
      Rehabilitation lacks both a common language by which to express these complex ingredients, and a unifying theory to help identify which are most important.
      • Whyte J.
      A grand unified theory of rehabilitation (we wish!) The 57th John Stanley Coulter Memorial Lecture.
      Moreover, rehabilitation has generally espoused its individualized treatments, in which the goals of each patient are paramount.
      These complexities make it difficult for rehabilitation researchers to specify active ingredients in the treatment to be tested, which would ideally come about by linking the treatment to 1 or more theories explaining the expected change. A narrowly focused treatment may be defined according to hypotheses about mechanisms of action known to individual disciplines, for example, theories of motor control that guide physical therapy. More complex or interdisciplinary experience-based treatments may need to borrow from or combine theories that cut across disciplines to account for changes in behavior, knowledge or skill via learning mechanisms,
      • Hart T.
      • Powell J.M.
      Principles of learning in TBI rehabilitation.
      or changes in motivation and effort.
      • Hart T.
      • Evans J.
      Self-regulation and goal theories in brain injury rehabilitation.
      However, this is a challenging task due both to the multiplicity of theories relevant to change in behavior, knowledge, attitudes, and habits, and to the potential difficulty in applying these theories to varied patient populations, some of which have primary deficits in the functions that underlie learning.
      Table 2 illustrates 1 approach to specifying active ingredients in rehabilitation treatments according to selected constructs in learning theory. To show the variety of concepts relevant to learning, table 2 displays 2 types of learning (skill-based and knowledge-based) that require volition and active engagement on the part of the participant, and another 2 (habituation and classical conditioning) that do not. For each of these, examples of rehabilitation treatments are presented whose effects could be explained at least in part by specific learning mechanisms. Finally, therapeutic operations are specified that would be expected to enact the theoretical mechanism in clinical practice. Note that rather than specifying all of the details of treatment, theoretical mechanisms of action specify the most important factors to consider in treatment design. For example, for treatments based on habituation, a passive or automatic form of learning that is crucial to the ability to filter environmental information, 2 important factors to specify would be stimulus duration and interstimulus interval. In contrast, interventions based on skill training and practice comprise a very broad class of volitional treatments encompassing a nearly infinite variety of intervention targets (gait, cooking, mood management, vocational performance, etc). The important treatment characteristics for this group are more diverse and include such factors as instructional/motivational set, cueing and coaching parameters, feedback and reinforcement, and methods of gradually increasing task demands. While theories of human performance tell us that all of these factors are potentially important in skill learning, we will also need subtheories to help us narrow the list depending on the type of performance (skill) being trained, and the characteristics of the learner. Yet as table 2 illustrates, even the use of a theoretical framework as broad as the one represented by skill training helps to delimit the factors that need to be considered in treatment design, and the behavioral operations that will put these active ingredients into practice.
      Table 2Examples of Using Learning Theory Constructs to Specify Rehabilitation Treatments
      Type of LearningDefinitionRehabilitation Treatment ExampleTherapeutic Operations Specified by Theory
      Passive (nonvolitional) learning
       HabituationRepeated exposure to a neutral stimulus attenuates an automatic responseReduce startle response or distraction caused by a particular stimulus applied during treatment
      • Modify interstimulus interval (short intervals facilitate habituation)
      • Modify stimulus duration (longer durations promote habituation)
       Classical conditioningLinking a new (conditioned) stimulus to an automatic (unconditioned) stimulus changes the conditions under which a response is elicitedReduce responses reflecting fear or aversion to treatment settings/team members
      • Modify the number of stimuli conditioned to the automatic response (eg, confine unavoidably painful or unpleasant treatments to a single treater, using an area reserved for that treatment)
      • Avoid unnecessary procedures that provoke anxiety responses, eg, quizzing a patient with impaired memory
      Active (volitional) learning
       Skill-based learningTraining and experience (practice) leads to learned capacities to carry out activities at a predetermined level of proficiency, often with the minimum outlay of time, energy, or bothImprove performance on any activity learned via how-to guidance and practice, eg, use of assistive devices; routines for activities of daily living; internal and external mnemonicsDepending on task and patient characteristics, select techniques to enhance skill learning, for example:
      • instructional techniques
      • motivation/engagement methods
      • error handling/feedback methods
      • methods of progressing challenge level of task
      • variety and types of settings/contexts used in practice
      • emphasis on explicit vs implicit learning
      • schedules of practice
       Knowledge-based learningAcquisition of new, verbalizable information through semantic memory system and/or representation of experiencesEnhance patient's knowledge and/or modify attitudes about any aspect of function or disability, either by providing information or facilitating self-discovery of informationDepending on content and patient characteristics, select techniques to maximize knowledge acquisition, for example:
      • structuring and chunking material
      • repetition, rephrasing, and rehearsal
      • facilitating connection between new and existing knowledge schemata
      • assigning or facilitating behavioral experiments to generate self-knowledge

      Treatment Implementation

      In intervention trials, defining a treatment according to hypothesized active ingredients is the first step of a process that, ideally, results in 1 or more manuals consisting of therapist instructions and/or patient materials for the experimental and/or control interventions. It is quite challenging to determine the optimal level of detail for a treatment manual, and not surprisingly, the balance of specificity versus flexibility depends on the type of treatment, the problem that is treated, the time frame of the treatment, and the population under study.
      • Hart T.
      Treatment definition in complex rehabilitation interventions.
      In creating a manual, the investigator must translate the hypothesized active ingredients of the treatment into behavioral operations: things that the therapist should do (and not do) to deliver the ingredients, and, sometimes, things that the patient should do to indicate that they have been received. This level of prescription is important in a complex intervention, because while the knowledge, attitudes, and philosophy of the treater may be important, these factors do not in themselves specify the treatment in sufficient detail for replication.
      Standardizing a treatment for a clinical trial also necessitates key decisions about who will deliver the intervention, how they will be allocated to treatment conditions, and how they will be trained and supervised. The first decision may entail specifying the appropriate background and experience level of study therapists. On the face of it, it would seem ideal to employ treaters with experience in the treatment model(s) under study. In psychotherapy research, however, there is some evidence that beyond a requisite level of competence, too much experience in a given modality may undermine the implementation of manualized treatment. This is presumably because seasoned therapists trust their own judgments over those dictated by the manual.
      • Henry W.P.
      • Strupp H.H.
      • Butler S.F.
      • Schacht T.E.
      • Binder J.L.
      Effects of training in time-limited dynamic psychotherapy: changes in therapist behavior.
      Aside from the specific techniques used in the manual, it is helpful to ensure that therapists have basic experience with the target population of the trial, especially if key characteristics of the population might necessitate adjustments to the protocol (eg, cognitive or linguistic impairments).
      The allocation of treaters to conditions raises some thorny problems. In an RCT, for example, one could start with a pool of available therapists and randomly assign each one to a treatment condition. However, this runs the risk of unequal therapist commitment to the treatments, if some therapists get assigned to conditions that they do not fully endorse.
      • Schnurr P.P.
      The rocks and hard places in psychotherapy outcome research.
      An alternative is to allow therapists to select the conditions that they prefer. This is feasible for trials comparing 2 active but philosophically different treatment models, but not for trials with an obvious sham control condition that no one would select. Another alternative is to have all therapists administer both (or all) treatment conditions to different patients. The obvious drawback here is the likelihood of cross-contamination between treatment models; the more explicitly scripted are the manuals guiding each arm of the trial, the more feasible is this approach. Its main advantage is that it controls well for effects associated with different skill levels or personal therapist qualities, which remain confounded with treatment conditions in the other approaches to allocation.
      Regardless of how treaters are allocated, careful attention must be paid to training them to follow the manualized protocol(s), which can help to equate the skills of therapists with more or less prior experience in the treatment model.
      • Vakoch D.A.
      • Strupp H.H.
      The evolution of psychotherapy training: reflections on manual-based learning and future alternatives.
      In psychotherapy research, variations in training have accounted for a surprising amount of variance in patient outcomes, with more intensive training generally leading to better results.
      • Miller S.J.
      • Binder J.L.
      The effects of manual-based training on treatment fidelity and outcome: a review of the literature on adult individual psychotherapy.
      It is advisable to develop a criterion for knowledge and skills, especially those involving the delivery of hypothesized active ingredients that each therapist must pass before his or her participation in the trial.
      Most researchers involved in clinical trials are familiar with the Consolidated Standards of Reporting Trials (CONSORT) statement published in 1996 and revised in 2001.
      • Altman D.G.
      • Schulz K.F.
      • Moher D.
      • et al.
      The revised CONSORT statement for reporting randomized trials: explanation and elaboration.
      More recently, the CONSORT statement has been extended to randomized trials that involve nonpharmacologic interventions, including rehabilitation treatments.
      • Boutron I.
      • Moher D.
      • Altman D.G.
      • Schulz K.F.
      • Ravaud P.
      CONSORT Group
      Methods and processes of the CONSORT Group: example of an extension for trials assessing nonpharmacologic treatments.
      • Boutron I.
      • Moher D.
      • Altman D.G.
      • Schulz K.F.
      • Ravaud P.
      CONSORT Group
      Extending the CONSORT statement to randomized trials of nonpharmacologic treatment: explanation and elaboration.
      This version acknowledges the special problems in masking, standardizing treatments, and training/allocating treaters that affect many rehabilitation trials, and also includes reporting on treatment adherence or fidelity methods (discussed in the Treatment Fidelity section below). There are further extensions of the CONSORT to pragmatic trials
      • Zwarenstein M.
      • Treweek S.
      • Gagnier J.J.
      • et al.
      Improving the reporting of pragmatic trials: an extension of the CONSORT statement.
      and cluster randomized trials.
      • Campbell M.K.
      • Elbourne D.R.
      • Altman D.G.
      CONSORT group
      CONSORT statement: extension to cluster randomised trials.
      These documents are valuable resources for any investigator planning or reporting a randomized trial in a rehabilitation context.

      Treatment Fidelity

      Specifying therapy at the level of active ingredients and associated behaviors is, ideally, a seamless precursor to the assessment of treatment fidelity. Fidelity may be defined as the extent to which the core components of treatment have been delivered as intended.
      • Bellg A.J.
      • Borrelli B.
      • Resnick B.
      • et al.
      Enhancing treatment fidelity in health behavior change studies: best practices and recommendations from the NIH Behavior Change Consortium.
      Fidelity assessment may be done by personnel external to the study who listen to or watch taped versions of treatment sessions and complete checklists documenting that prescribed behaviors occurred and proscribed behaviors did not. This assessment is most valuable when used as part of a continuous feedback loop during the course of a treatment study, to ensure therapist skill and avoid drift from the protocol. Note that the treatment manual, training materials, and fidelity assessments may be built around the same core of active ingredients that are hypothesized to account for observed treatment effects. Not surprisingly, fidelity assessment is most challenging when the treatments are complex or hard to differentiate from one another.
      • Gearing R.E.
      • El-Bassel N.
      • Ghesquiere A.
      • Baldwin S.
      • Gillies J.
      • Ngeow E.
      Major ingredients of fidelity: a review and scientific guide to improving quality of intervention research implementation.
      Fidelity is often considered as a concept related mainly to the behavior of treatment delivery personnel. However, Lichstein et al
      • Lichstein K.L.
      • Riedel B.W.
      • Grieve R.
      Fair test of clinical trials: a treatment implementation model.
      suggested a broader definition that includes the concepts of treatment receipt and treatment enactment. Treatment receipt refers to the extent to which the patient understands the strategies or techniques taught, and demonstrates the capacity to use them. For this purpose, one could administer pre- and posttreatment tests of knowledge related to treatment.
      • Gearing R.E.
      • El-Bassel N.
      • Ghesquiere A.
      • Baldwin S.
      • Gillies J.
      • Ngeow E.
      Major ingredients of fidelity: a review and scientific guide to improving quality of intervention research implementation.
      Another approach is to interview participants to determine their understanding of treatment concepts, the extent to which they met their own goals for participating in the trial, and whether they adhered to the learned or recommended practices.
      • Spillane V.
      • Byrne M.C.
      • Byrne M.
      • Leathem C.S.
      • O'Malley M.
      • Cupples M.E.
      Monitoring treatment fidelity in a randomized controlled trial of a complex intervention.
      Therapist notes on what seemed to hit home during the trial and what factors impeded full use of treatment could also contribute to this assessment.
      • Hawe P.
      • Shiell A.
      • Riley T.
      • Gold L.
      Methods for exploring implementation variation and local context within a cluster randomised community intervention trial.
      Treatment enactment, which has to do with whether the participant actually uses the learned strategies in day-to-day life, is more challenging to measure but could be ascertained using self-report and proxy report instruments given at some point after the trial.
      • Bellg A.J.
      • Borrelli B.
      • Resnick B.
      • et al.
      Enhancing treatment fidelity in health behavior change studies: best practices and recommendations from the NIH Behavior Change Consortium.
      Behavior logs and homework assignments completed outside of treatment sessions may also contribute data on treatment receipt and enactment.
      • Gearing R.E.
      • El-Bassel N.
      • Ghesquiere A.
      • Baldwin S.
      • Gillies J.
      • Ngeow E.
      Major ingredients of fidelity: a review and scientific guide to improving quality of intervention research implementation.

      Multicenter Trials

      When different centers collaborate to study the effects of an intervention, they accrue more participants in a shorter span of time, and thus are able to accommodate longer follow-up intervals within a funding period. Multicenter studies are especially appropriate when a condition is relatively uncommon, the effect size of the treatment may be small (necessitating a large sample), or a longer follow-up is required to obtain meaningful results.
      • Friedman L.M.
      • Furberg C.D.
      • DeMets D.L.
      Fundamentals of clinical trials.
      More reliable conclusions can be reached at a faster rate, and findings are also more generalizable due to diversity in the sample. Moreover, multicenter trials tend to be scientifically superior as they usually benefit from collaboration of different investigators interested in the same problems. Multicenter trials can also reduce competition for participants and overlap of similar (smaller) studies among clinical sites.
      These advantages come at a cost. Multicenter trials are much more complex than single center trials and require a larger organization and additional personnel. Beyond the logistic and administrative complications, considerable resources must be allocated for the extra effort involved in standardizing screening, recruitment, treatment methods, assessment, and quality control procedures across participating sites. All of the issues outlined in the previous sections—manualization, therapist training, and fidelity assessment—require added layers of oversight and attendant costs. Lack of homogeneity on any of these dimensions may lead to a center-by-treatment interaction effect, the major threat to the validity of multicenter trials. This occurs when the effect of the treatment varies from center to center; in extreme cases there may be a reversal of the direction of effect in some centers. This may constitute a fatal flaw in the interpretation of treatment effects separate from center effects. Preventing it justifies additional funds as well as a higher standard of design, conduct, and organization, and personnel with specialized expertise.

      Efficiency in Clinical Trials

      It sometimes happens that investigators and institutions engage in multiple concurrent or overlapping single-center or multicenter trials, often involving the same patient population, with similar interventions, outcomes, and assessment times. Having several different trials at the same institution has some clear advantages, besides the obvious benefit of the multiple sources of funding. From a logistic and organizational point of view, running several concurrent trials is efficient inasmuch because the studies could share the same personnel and infrastructure. Cost can be minimized through overlapping training and materials, and all trials can benefit from a team of investigators and coordinators who are knowledgeable about each trial. Also, from a study participant's point of view, the availability of different trials offers more opportunities to choose the best setting, treatment, and follow-up schedule.
      The main disadvantage of having multiple concurrent trials in a given institution is the competition that arises among them, especially when the study populations overlap and the pool of available subjects is small. Many trials do not allow study participants to enroll concurrently in other trials because coenrollment in 2 treatments may contaminate 1 or both outcomes. When coenrollment is allowed, another potential disadvantage is that subjects enrolled in multiple trials may be more likely to drop out because the time commitment or the assessment schedule becomes too burdensome. As with multicenter trials, the solution here is careful planning, good communication, and the inclusion of staff who have a big picture of the research landscape at an institution to help avoid problems.

      Statistical Issues for Rehabilitation Clinical Trials

      It is beyond the scope of this article to discuss all statistical considerations for clinical trials in rehabilitation, but 2 that frequently arise are evaluation of multiple outcomes and determination of sample size.

       Multiple or Composite Outcomes

      Choosing the correct primary outcome is a crucial step in the design of a clinical trial. When the effect of a new intervention is tested using an inappropriate outcome measure, for example, one that is not sufficiently sensitive to treatment effects, the chance of rejecting a possibly effective intervention (type II error) increases. Thus, from a methodologic point of view, the choice of the wrong outcome is equivalent to the assumption of a too-large effect size or achieving a too-small sample size. In all cases, the study lacks sufficient information to reject the null hypothesis of no treatment effect, which may lead to a potentially effective intervention being abandoned.
      Mechanistic or proof-of-concept trials in the early phases of research may help to identify the outcomes that are more likely to respond to the treatment. Often, however, rehabilitation treatments legitimately have multiple outcomes, for example, outcomes at the level of both function and activity/participation and possibly quality of life, which are all important to capture and test in a large-scale study of the intervention. From a methodologic and a statistical point of view, using several outcomes in a treatment trial inevitably raises the issue of multiple comparisons. While it would be appealing to determine the effect of the treatment on each outcome measure using its own statistical test, this would require controlling the experiment-wise error rate to avoid an increase of false positive findings. Current procedures used to maintain the experiment-wise error rate at the nominal level (typically .05) may become excessively conservative and inefficient in the context of multiple outcome endpoints.
      It is important to recognize that procedures such as the Bonferroni or the Holm correction are based on the assumption that the multiple tests are independent. This is often not the case with multiple outcome measures, which may be at least moderately correlated and expected to move in the same direction with treatment. In this situation, as the number of tests increases, the correction becomes overly conservative and it becomes unlikely that the null hypothesis is rejected for any of the measures, with a potential type II error forthcoming. This can occur even in a situation in which all measures show a moderate effect of an intervention but none show a very strong effect. A more efficient approach is the use of a global test statistic for the simultaneous test of all null hypotheses regarding the study outcomes.
      • Pocock S.
      • Geller N.
      • Tsiatis A.
      The analysis of multiple endpoints in clinical trials.
      • Bagiella E.
      • Novack T.A.
      • Ansel B.
      • et al.
      Measuring outcome in traumatic brain injury treatment trials: recommendations from the traumatic brain injury clinical trials network.
      The approach of combining several outcomes in a global test is not new and has been proposed as a more powerful and efficient alternative to the usual multiple comparison correction or a multivariate approach such as the Hotelling T2 test. This is especially true when the outcome measures are correlated and are expected to behave qualitatively similarly. A close-test procedure
      • Lemacher W.
      • Wasserman G.
      • Reitmer P.
      Procedures for two sample comparisons with multiple endpoints controlling for the experimentwise error rate.
      is recommended to determine which outcomes are relevant to treatment efficacy, on rejection of the global null hypothesis. This procedure controls the experiment-wise type I error (α) in a strong sense. That is, the probability that 1 or more of the single hypotheses are rejected, given that they are in fact true, is smaller than alpha regardless of which hypotheses are true. The procedure is based on a stepwise analysis in which, after rejection of the null global hypothesis, all possible subsets of hypothesis can be tested in a hierarchical fashion.
      Another approach to the testing of multiple outcomes is to control for the false discovery rate (FDR),
      • Benjamini Y.
      • Hochberg Y.
      Controlling the false discovery rate: a practical and powerful approach to multiple testing.
      which is the proportion of findings expected to be significant by chance. The test procedure based on the FDR is less conservative than the ones based on the control of experiment-wise error, and although it increases the probability of a type I error it also lowers the rate of type II errors when a very large number of outcomes is tested. Instead of a P value, the FDR procedure is based on a q value, which expresses the minimum FDR for which an individual test can be regarded as significant.
      • Benjamini Y.
      • Hochberg Y.
      Controlling the false discovery rate: a practical and powerful approach to multiple testing.
      The FDR is best applied to studies where the number of outcomes is extremely large, for example, neuroimaging studies in which thousands of voxels are examined simultaneously. In such cases it is too conservative to control for experiment-wise error using conventional means and the FDR is more appropriate.

       Sample Size Issues

      Sample size calculation is an essential part of clinical trial planning. Trials should have adequate power to allow investigators to detect clinically meaningful differences between the experimental and the control intervention or among the alternative treatments studied. Studies with insufficient sample size or power are at higher risk of type II error, that is, the chance that a potentially beneficial intervention is discarded and (possibly) never tried again. A first consideration in sample size calculation is that the quantities that are used in the calculation are often estimates of the true quantities, and therefore the resulting sample size is not precise. Estimates based on small pilot studies may often provide overly optimistic estimates of standard errors and effect sizes because they are typically conducted at single centers with lower variability than would be found in a multicenter trial, with participants more representative of the ultimate target population. As a counter-measure, we should be as conservative as possible in generating realistic expectations of the unknown quantities, while balancing the need to generate a sample size that is feasible.
      The general form of any sample size formula includes 4 elements: the critical value of type I error (α), the critical value for type II error (β), the variability of the outcome (σ2), and the expected effect size (Δ):
      n=zαzβσ2Δ


      Specific formulae depend on the nature of the outcome measure (ie, continuous, categorical, time to event). The above formula is presented for didactic purposes; for meeting the needs of a specific trial, see exhaustive references such as Friedman et al.
      • Friedman L.M.
      • Furberg C.D.
      • DeMets D.L.
      Fundamentals of clinical trials.
      Components of the formula must be balanced in careful planning. Ideally, we would like to minimize both type I and type II errors in selecting a sample size. The larger the critical values of each type of error on a normal distribution (z values in the formula above), the smaller the probability of error. It is therefore clear that if we aim at minimizing these 2 errors, a larger sample size is needed. In particular, because the type II error is the complement to the study power, maximizing the power of the study also would require a larger sample size. The variability of the outcome is directly related to the sample size, such that larger variability will call for a larger sample size. The trial variability can be controlled by design, using either a matched or stratified design or by implementing strict inclusion and exclusion criteria. Finally, the expected effect size, that is, the difference between the new and the standard treatment, is inversely related to the sample size. The smaller effect size we aim to detect, the larger the sample required to detect it. In determining the effect size, however, much emphasis should be given to ensuring that the expected effects of the trial will be clinically significant.
      Besides the standard elements used in the sample size formulas, sample size considerations should also include the pool of potentially eligible subjects, the study inclusion and exclusion criteria, and a realistic estimation of how many patients will consent to participate and how many will complete the study.

      Conclusions

      As the expectation for evidence-based treatment in rehabilitation grows, so does the need for appropriate clinical trial design and methodology. The choice of design for a treatment study depends on the nature of the target outcomes, the phase of intervention development addressed in the trial, and the resources available. Investigators should be familiar with the different study designs, and the advantages and disadvantages of each, so as to select the design that best suits their needs. Understanding important methodologic issues, such as sample size calculation, defining and standardizing treatments, the proper choice or design of control condition, and the analysis of multiple outcome measures, is also essential in the design phase of a clinical trial. Clinicians also need enough of a working knowledge of these concerns to be able to evaluate the quality of intervention studies and the strength of the evidence at hand.
      A properly planned clinical trial represents the culmination of many decisions and is the first step toward the successful evaluation of efficacy or effectiveness of a new or refined treatment, 1 of many such steps necessary to advance the science and practice of rehabilitation.

      Acknowledgment

      We thank Megan Bartlett, MA, for assistance with manuscript preparation and literature retrieval.

      References

        • Tate R.L.
        • McDonald S.
        • Perdices M.
        • Togher L.
        • Schultz R.
        • Savage S.
        Rating the methodological quality of single-subject designs and n-of-1 trials: introducing the Singe-Case Experimental Design (SCED) Scale.
        Neuropsychol Rehabil. 2008; 18: 385-401
        • Backman C.L.
        • Harris S.R.
        • Chisholm J.A.
        • Monette A.D.
        Single-subject research in rehabilitation: a review of studies using AB, withdrawal, multiple baseline, and alternating treatments designs.
        Arch Phys Med Rehabil. 1997; 78: 1145-1153
        • Horn S.D.
        • Gassaway J.
        Practice based evidence: incorporating clinical heterogeneity and patient-reported outcomes for comparative effectiveness research.
        Med Care. 2010; 48: S17-S22
        • Shadish W.R.
        • Cook T.D.
        • Campbell D.T.
        Experimental and quasi-experimental designs for generalized causal inference.
        Houghton Mifflin, Boston2002
        • Campbell D.T.
        • Stanley J.C.
        Experimental and quasi-experimental designs for research.
        Rand McNally, Chicago1963
        • Portney L.G.
        • Watkins M.P.
        Foundations of clinical research: applications to practice.
        3rd ed. Pearson Education, Upper Saddle River2009
        • Whyte J.
        • Gordon W.
        • Gonzalez Rothi L.J.
        A phased developmental approach to rehabilitation research: the science of knowledge building.
        Arch Phys Med Rehabil. 2009; 90: S3-S10
        • Whyte J.
        • Barrett A.
        Advancing the evidence base of rehabilitation treatments: a developmental approach.
        Arch Phys Med Rehabil. 2012; 93: S101-S110
        • Gonzalez-Rothi L.J.
        Cognitive rehabilitation: the role of theoretical rationales and respect for the maturational process needed for our evidence.
        J Head Trauma Rehabil. 2006; 21: 194-197
        • Hart T.
        • Vaccaro M.
        • Hays C.
        • Maiuro R.
        Anger self-management training for people with traumatic brain injury: a preliminary investigation.
        J Head Trauma Rehabil. 2011 Mar 14; ([Epub ahead of print])
        • Levin B.
        The utility of futility.
        Stroke. 2005; 36: 2331-2332
        • Palesch Y.Y.
        • Tilley B.C.
        • Sackett D.L.
        • Johnston K.C.
        • Woolson R.
        Applying a phase II futility study design to therapeutic stroke trials.
        Stroke. 2005; 36: 2410-2414
        • Malec J.
        • Buffington A.
        • Moessner A.
        • Degiorgio L.
        A medical/vocational case coordination system for persons with brain injury: an evaluation of employment outcomes.
        Arch Phys Med Rehabil. 2000; 81: 1007-1015
        • Malec J.
        • High W.
        • Sander A.
        • Struchen M.
        • Hart K.
        Vocational rehabilitation.
        in: High W.M. Sander A.M. Struchen M.A. Hart K.A. Rehabilitation for traumatic brain injury. Oxford Univ Pr, New York2005: 176-201
        • Park N.W.
        • Ingles J.L.
        Effectiveness of attention rehabilitation after an acquired brain injury: a meta-analysis.
        Neuropsychology. 2001; 15: 199-210
        • Friedman L.M.
        • Furberg C.D.
        • DeMets D.L.
        Fundamentals of clinical trials.
        Springer, New York2010
        • Merom D.
        • Phongsavan P.
        • Wagner R.
        • et al.
        Promoting walking as an adjunct intervention to group cognitive behavioural therapy for anxiety disorders–a pilot group randomized trial.
        J Anxiety Disord. 2008; 22: 959-968
        • Brierley G.
        • Brabyn S.
        • Torgerson D.
        • Watson J.
        Bias in recruitment to cluster randomized trials: a review of recent publications.
        J Eval Clin Pract. 2011 Jun 20; ([Epub ahead of print])
        • Tunis S.R.
        • Stryer D.B.
        • Clancy C.M.
        Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy.
        JAMA. 2003; 290: 1624-1632
        • Cicerone K.D.
        • Mott T.
        • Azulay J.
        • et al.
        A randomized controlled trial of holistic neuropsychologic rehabilitation after traumatic brain injury.
        Arch Phys Med Rehabil. 2008; 89: 2239-2249
        • Bond G.R.
        Supported employment: evidence for an evidence-based practice.
        Psychiatr Rehabil J. 2004; 27: 345-359
        • Fiore L.D.
        • Brophy M.
        • Ferguson R.E.
        • et al.
        A point-of-care clinical trial comparing insulin administered using a sliding scale versus a weight-based regimen.
        Clin Trials. 2011; 8: 183-195
        • Wilson B.A.
        • Emslie H.C.
        • Quirk K.
        • Evans J.J.
        Reducing everyday memory and planning problems by means of a paging system: a randomised control crossover study.
        J Neurol Neurosurg Psychiatry. 2001; 70: 477-482
        • Hart T.
        • Fann J.
        • Novack T.
        The dilemma of the control condition in experience-based cognitive and behavioral treatment research.
        Neuropsychol Rehabil. 2008; 18: 1-21
        • Mohr D.C.
        • Spring B.
        • Freedland K.E.
        • et al.
        The selection and design of control conditions for randomized controlled trials of psychological interventions.
        Psychother Psychosom. 2009; 78: 275-284
        • Barkauskas V.H.
        • Lusk S.L.
        • Eakin B.L.
        Selecting control interventions for clinical outcome studies.
        West J Nurs Res. 2005; 27: 346-363
        • Whitehead W.E.
        Control groups appropriate for behavioral interventions.
        Gastroenterology. 2004; 126: S159-S163
        • Saks E.
        • Jeste D.V.
        • Granholm E.
        • Palmer B.W.
        • Schneiderman L.
        Ethical issues in psychosocial interventions research involving controls.
        Ethics Behav. 2002; 12: 87-101
        • Paterson C.
        • Dieppe P.
        Characteristic and incidental (placebo) effects in complex interventions such as acupuncture.
        Br Med J. 2009; 330: 1202-1205
        • Wood L.
        • Egger M.
        • Gluud L.L.
        • et al.
        Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study.
        BMJ. 2008; 336: 601-605
        • Page S.J.
        • Dunning K.
        • Hermann V.
        • Leonard A.
        • Levine P.
        Longer versus shorter mental practice sessions for affected upper extremity movement after stroke: a randomized controlled trial.
        Clin Rehabil. 2011; 25: 627-637
      1. Fregni F, Imamura M, Chien HF, et al. Challenges and recommendations for placebo controls in randomized trials in physical and rehabilitation medicine: a report of the international placebo symposium working group. Am J Phys Med Rehabil;89:160-72.

        • Hart T.
        Treatment definition in complex rehabilitation interventions.
        Neuropsychol Rehabil. 2009; 19: 824-840
        • Strasser D.C.
        • Falconer J.A.
        • Herrin J.S.
        • Bowen S.E.
        • Stevens A.B.
        • Uomoto J.
        Team functioning and patient outcomes in stroke rehabilitation.
        Arch Phys Med Rehabil. 2005; 86: 403-409
        • Strasser D.C.
        • Falconer J.A.
        • Stevens A.B.
        • et al.
        Team training and stroke rehabilitation outcomes: a cluster randomized trial.
        Arch Phys Med Rehabil. 2008; 89: 10-15
        • Whyte J.
        A grand unified theory of rehabilitation (we wish!).
        Arch Phys Med Rehabil. 2008; 89: 203-209
        • Hart T.
        • Powell J.M.
        Principles of learning in TBI rehabilitation.
        J Head Trauma Rehabil. 2011; 26: 179-181
        • Hart T.
        • Evans J.
        Self-regulation and goal theories in brain injury rehabilitation.
        J Head Trauma Rehabil. 2006; 21: 142-155
        • Henry W.P.
        • Strupp H.H.
        • Butler S.F.
        • Schacht T.E.
        • Binder J.L.
        Effects of training in time-limited dynamic psychotherapy: changes in therapist behavior.
        J Consult Clin Psychol. 1993; 61: 434-440
        • Schnurr P.P.
        The rocks and hard places in psychotherapy outcome research.
        J Trauma Stress. 2007; 20: 779-792
        • Vakoch D.A.
        • Strupp H.H.
        The evolution of psychotherapy training: reflections on manual-based learning and future alternatives.
        J Clin Psychol. 2000; 56: 309-318
        • Miller S.J.
        • Binder J.L.
        The effects of manual-based training on treatment fidelity and outcome: a review of the literature on adult individual psychotherapy.
        Psychotherapy. 2002; 39: 184-198
        • Altman D.G.
        • Schulz K.F.
        • Moher D.
        • et al.
        The revised CONSORT statement for reporting randomized trials: explanation and elaboration.
        Ann Intern Med. 2001; 134: 663-694
        • Boutron I.
        • Moher D.
        • Altman D.G.
        • Schulz K.F.
        • Ravaud P.
        • CONSORT Group
        Methods and processes of the CONSORT Group: example of an extension for trials assessing nonpharmacologic treatments.
        Ann Intern Med. 2008; 148: W60-W66
        • Boutron I.
        • Moher D.
        • Altman D.G.
        • Schulz K.F.
        • Ravaud P.
        • CONSORT Group
        Extending the CONSORT statement to randomized trials of nonpharmacologic treatment: explanation and elaboration.
        Ann Intern Med. 2008; 148: 295-309
        • Zwarenstein M.
        • Treweek S.
        • Gagnier J.J.
        • et al.
        Improving the reporting of pragmatic trials: an extension of the CONSORT statement.
        Br Med J. 2008; 337: a2390
        • Campbell M.K.
        • Elbourne D.R.
        • Altman D.G.
        • CONSORT group
        CONSORT statement: extension to cluster randomised trials.
        Br Med J. 2004; 328: 702-708
        • Bellg A.J.
        • Borrelli B.
        • Resnick B.
        • et al.
        Enhancing treatment fidelity in health behavior change studies: best practices and recommendations from the NIH Behavior Change Consortium.
        Health Psychol. 2004; 23: 443-451
        • Gearing R.E.
        • El-Bassel N.
        • Ghesquiere A.
        • Baldwin S.
        • Gillies J.
        • Ngeow E.
        Major ingredients of fidelity: a review and scientific guide to improving quality of intervention research implementation.
        Clin Psychol Rev. 2011; 31: 79-88
        • Lichstein K.L.
        • Riedel B.W.
        • Grieve R.
        Fair test of clinical trials: a treatment implementation model.
        Advances in Behavior Research and Therapy. 1994; 16: 1-29
        • Spillane V.
        • Byrne M.C.
        • Byrne M.
        • Leathem C.S.
        • O'Malley M.
        • Cupples M.E.
        Monitoring treatment fidelity in a randomized controlled trial of a complex intervention.
        J Adv Nurs. 2007; 60: 343-352
        • Hawe P.
        • Shiell A.
        • Riley T.
        • Gold L.
        Methods for exploring implementation variation and local context within a cluster randomised community intervention trial.
        J Epidemiol Community Health. 2004; 58: 788-793
        • Pocock S.
        • Geller N.
        • Tsiatis A.
        The analysis of multiple endpoints in clinical trials.
        Biometrics. 1987; 43: 487-498
        • Bagiella E.
        • Novack T.A.
        • Ansel B.
        • et al.
        Measuring outcome in traumatic brain injury treatment trials: recommendations from the traumatic brain injury clinical trials network.
        J Head Trauma Rehabil. 2010; 25: 375-382
        • Lemacher W.
        • Wasserman G.
        • Reitmer P.
        Procedures for two sample comparisons with multiple endpoints controlling for the experimentwise error rate.
        Biometrics. 1991; 47: 511-521
        • Benjamini Y.
        • Hochberg Y.
        Controlling the false discovery rate: a practical and powerful approach to multiple testing.
        J R Stat Soc Series B Stat Methodol. 1995; 57: 289-300