Volume 90, Issue 9 , Pages 1478-1488, September 2009
Physical and Cognitive Functioning After 3 Years Can Be Predicted Using Information From the Diagnostic Process in Recently Diagnosed Multiple Sclerosis
Article Outline
Abstract
de Groot V, Beckerman H, Uitdehaag BM, Hintzen RQ, Minneboo A, Heymans MW, Lankhorst GJ, Polman CH, Bouter LM, on behalf of the Functional Prognostication and Disability (FuPro) Study Group. Physical and cognitive functioning after 3 years can be predicted using information from the diagnostic process in recently diagnosed multiple sclerosis.
Objective
To predict functioning after 3 years in patients with recently diagnosed multiple sclerosis (MS).
Design
Inception cohort with 3 years of follow-up. At baseline, predictors were obtained from medical history taking, neurologic examination, and magnetic resonance imaging (MRI).
Setting
Neurology outpatient clinic.
Participants
Patients with MS (N=156); 146 with complete follow-up.
Interventions
Not applicable.
Main Outcome Measures
Inability to walk at least 500m, impaired dexterity, cognitive impairments, incontinence, inability to drive a car or use public transportation, social dysfunction, and reliance on a disability pension.
Results
Clinical prediction rules were constructed for the models that were well calibrated (sufficient agreement between predicted and observed outcomes, based on visual inspection of calibration curves) and that showed sufficient discrimination (area under the receiver operation characteristic curve >.70) after internal bootstrap validation. The models for the inability to walk at least 500m, impaired dexterity, and cognitive impairments were well calibrated. Discrimination was sufficient for all 7 models, except the one predicting social dysfunction (.67). The inability to walk at least 500m was predicted by the perceived ability to walk, impairment of the cerebellar tract, and the number of MRI lesions in the spinal cord. Impaired dexterity was predicted by the perceived ability to use the hands, impairments of the pyramidal, cerebellar, and sensory tracts, and the T2-weighted infratentorial lesion load. Cognitive impairment was predicted by age, gender, the perceived ability to concentrate, and the T2-weighted supratentorial lesion load.
Conclusions
Inability to walk at least 500m, impaired dexterity, and cognitive impairments can be predicted with predictors that are derived from medical history taking, neurologic examination, and MRI shortly after a definite diagnosis of MS has been made.
Key Words: Cohort studies, Disability evaluation, Multiple sclerosis, Prognosis, Rehabilitation
List of Abbreviations: AUC, area under the receiver operation characteristic curve, EDSS, Expanded Disability Status Scale, MRI, magnetic resonance imaging, MS, multiple sclerosis
MULTIPLE SCLEROSIS is characterized by variable neurologic symptomatology that differs not only between patients but also within patients over time. This variability makes predicting the clinical course of the disease difficult, posing a significant challenge for physicians treating patients with MS and causing patients to feel uncertain about their future. This uncertainty negatively influences their quality of life.1, 2 Well-validated prognostic models can aid physicians in making decisions about certain (preventive) treatments for patients with MS or can improve the information given to these patients about their future prognosis.
Thus far, the prediction models published in the literature on MS have had a strong focus on the strength and the relevance of the predictors themselves,3, 4, 5, 6, 7, 8, 9, 10 hoping that this would provide clues to a better understanding of the etiology or the course of the disease. Research that aims to investigate the strength of the relationship of a determinant with a particular outcome should focus on one determinant and correct for confounding variables in order to assess the real relationship between this determinant and the outcome. Reviews of the studies that have investigated determinants of the course of MS have shown that a progressive onset, being older at the time of diagnosis, an interval of less than 1 year between relapses, and impairments of pyramidal or cerebellar tracts are associated with a progressive disease course, whereas an exacerbation as a first sign of MS, a high recovery rate after the first exacerbation, and afferent or monoregional symptoms are associated with a more favorable disease course.3, 4, 5, 6, 7, 8, 9, 10 This research has provided very useful information on the strength of the determinants themselves and has improved our understanding of the disease, but whether these determinants can be used to improve prognostication in individual patients has not been investigated.
In contrast with the literature on cardiac disorders,11 intensive care units,12 traumatic brain injury,13 and Guillain-Barré syndrome,14 the literature on MS has not yet assessed the usefulness of the complete prognostic models to predict future events accurately. The construction of a complete prognostic model differs fundamentally from research that investigates the strength of a determinant.15 All phases of the development of a prognostic model are directed towards obtaining a model that maintains its prognostic ability in different clinical samples of patients. This means that determinants that are easily obtainable in clinical practice are preferred above highly specialized measurements that are not routinely collected, and that predictors that are already known from the medical literature and (expert) clinicians will be used for the model construction. Furthermore, during the construction of the regression models, less emphasis is placed on the significance level of the determinants, which often means that a (very) liberal P value is used. With this strategy, the risk of overfitting the regression models is minimized, and the chance of obtaining an externally valid model is increased. Finally, in the presentation of the results, the accuracy of the predictions of the whole model is emphasized. For MS, one prognostic study16 to assess the risk of reaching secondary progression has been published, but this study used a different approach, namely a Bayesian analysis, to assess the risk. In a large sample, the risk of several determinants, which were selected on the basis of a previous study, was calculated. The specificity of the model was very good, while the sensitivity was poor.
With respect to future outcomes, most studies have focused on neurologic and locomotor function, using the score of the EDSS as the outcome and the neurologic deficits or MRI parameters as candidate predictors. However, other areas of functioning are relevant for patients, such as wheelchair dependence, impaired dexterity, cognitive impairments, incontinence, inability to use a car or public transportation, social dysfunction, and reliance on a disability pension. Studying these outcomes also means that the predictors should not be limited to neurologic or MRI parameters, but that psychosocial predictors should also be assessed.
The aim of our study was to construct and assess the usefulness of prediction models to predict functioning in the areas of mobility, dexterity, cognition, voiding, transportation, social activities, and work.
Methods
Patients and Design
All consecutive, potentially eligible patients visiting the participating outpatient clinics of 5 neurology departments were invited to participate. A cohort of 156 patients, aged 16 to 55 years, with recently (<6mo previously) diagnosed MS was recruited from 1998 to 2000 and prospectively monitored for 3 years. Diagnosis was determined according to the Poser criteria for definite MS.17 Treatments were not standardized. Patients with other neurologic disorders, systemic diseases, or malignant neoplastic diseases were excluded. This study was performed as part of a longitudinal study collecting extensive data on many potentially relevant predictors and outcomes at baseline and at 6 months, 1, 2, and 3 years later.18, 19 For the present analyses we used the baseline information for the predictors, and the 3-year data for the relevant outcomes. The patients were visited at home to minimize dropout, and 4 well-trained raters were responsible for the scoring. The ethics committee of the VU University Medical Center approved this study.
Construction of Prediction Models
As has been outlined in the introduction, the construction of a prediction model requires a specific methodological approach.15 The prediction models were constructed with the intention to use them in clinical practice. Therefore, we involved representatives of potential users of these models in the construction phase. Before actual data analysis, the aims of our study were discussed during 2 informal semistructured workshops with neurologists and researchers specializing in MS, and with rehabilitation physicians and physical and occupational therapists. In these workshops, we discussed which outcomes would be relevant to predict, and which candidate predictors should be investigated to predict these outcomes.
Outcomes
Inability to walk at least 500m was defined as an EDSS score of 4 or higher.20 Impaired dexterity was defined as an abnormal score (mean – 1.96 SD, healthy Dutch reference population) for the 9-Hole Peg Test.21 Cognitive impairments were defined as a score of mean – SD for 1 or more subtests of a cognitive screening test that was specifically developed for MS, which includes the subscales Consistent Long Term Retrieval and Long Term Storage of the Selective Reminding Test measuring verbal learning and memory, the 10/36 Spatial Recall Test measuring visuospatial learning and delayed recall, the Symbol Digit Modalities Test measuring sustained attention and concentration, the Paced Auditory Serial Addition Test measuring sustained attention and information processing speed, and the Word List Generation measuring verbal fluency.22, 23, 24 Incontinence was defined as a score of 5 or lower for the continence item of the FIM.25 Inability to drive a car or use public transportation was defined as needing help or being unable on the ability to travel item of the Rehabilitation Activities Profile.26 Social dysfunction was defined as an abnormal score (mean – 1.96 SD, healthy Dutch reference population) for 1 or more of the 3 social subscales (role physical, role emotional, social functioning) of the Medical Outcomes Study 36-Item Short-Form Health Survey.27 The patients were asked in a direct question about complete or partial reliance on a disability pension.
Candidate Predictors
Participants in the workshops were encouraged to name predictors that are relatively easy to obtain in clinical practice. First, the most relevant predictors for which information could be gathered during medical history taking were identified. Next, the most relevant predictors for which a physical examination is required were identified, and finally, the most relevant predictors obtained through complex diagnostic tests were identified. Using the information obtained from the discussions and from the literature, as described in the introduction, we selected candidate predictors from the baseline data of the extensive data set.19 Table 1 shows the selected outcomes and the predictors that were used to construct the models. Data on the selected outcomes obtained at baseline were not used as predictors. For the predictors that are based on medical history taking we used items of the Disability and Impact Profile.28, 29 This written questionnaire contains patient-rated numerical rating scales, which range from 0 to 10, on 40 different abilities. Each ability is assessed with 2 questions: (1) a question to assess the perceived disability for that item, and (2) a question to assess the extent to which the perceived disability poses a problem for the patient. We used the first question of the abilities that we were interested in. For the predictors that are based on physical examination, we used the EDSS Functional Systems scores.20 MRI was used to obtain the predictor variables T2-weighted (supra- and infratentorial) lesion loads in cm3, and the number of lesions in the spinal cord.30, 31 In total, 4 MRI predictor variables were used: T2 supratentorial, T2 infratentorial, T2 total, and spinal cord.
Table 1. Candidate Predictors Measured at Baseline for Each Outcome of Interest
| Predictor per Outcome of Interest | Range | Description |
|---|---|---|
| Inability to walk at least 500m | ||
| 0–10 | Not at all—very well | |
| 0–10 | Very easily—not at all | |
| 0–6 | No signs—quadriplegia | |
| 0–5 | No signs—severe ataxia | |
| n | No. of lesions counted | |
| Impaired dexterity | ||
| 0–10 | Not at all—very well | |
| 0–6 | No signs—sensation lost below head | |
| 0–6 | No signs—quadriplegia | |
| 0–5 | No signs—severe ataxia | |
| cm3 | ||
| Cognitive impairments | ||
| y | ||
| 0–1 | Woman—man | |
| 0–10 | Bad—good | |
| 0–10 | Not at all—very well | |
| cm3 | ||
| Incontinence | ||
| 0–10 | Not at all—easily | |
| 0–6 | No signs—quadriplegia | |
| n | No. of lesions counted | |
| Inability to use a car or public transportation | ||
| 0–10 | Bad—good | |
| 0–10 | Not at all—very well | |
| 0–6 | No signs—quadriplegia | |
| 0–5 | No signs—severe ataxia | |
| cm3 | ||
| n | No. of lesions counted | |
| Social dysfunction | ||
| 0–10 | Bad—excellent | |
| 0–10 | Gloomy—happy | |
| 0–10 | Very easily—not at all | |
| 0–6 | No signs—quadriplegia | |
| 0–5 | No signs—severe ataxia | |
| cm3 | ||
| Reliance on a disability pension | ||
| 0–10 | Gloomy—happy | |
| 0–10 | Bad—good | |
| 0–10 | Not at all—very well | |
| 0–10 | Very easily—not at all | |
| 0–6 | No signs—quadriplegia | |
| 0–5 | No signs—severe ataxia | |
| cm3 |
⁎Item of the Disability and Impact Profile. |
†Item of the Functional Systems of the EDSS. |
‡Values derived from MRI of the brain and spinal cord. |
Analysis
Only patients with complete outcome data at 3 years were analyzed. To improve data quality and reduce the risk of bias, missing data on predictors were imputed twice32, 33 by using the data augmentation procedure in NORM software,34 yielding 2 imputed data sets. Descriptive statistics were used to describe the study population. For each outcome the number and percentage of patients with an unfavorable outcome were calculated.
Because predictive modeling in small data sets is susceptible to bias, we made use of the approach described by Steyerberg et al,15, 32, 33 which we described in the introduction. We used a limited set of candidate predictors that were selected on the basis of information from the literature and on clinical grounds. Subsequently, logistic regression models were constructed in each imputed data set, using a backwards stepwise selection procedure with a liberal P value of 0.5. When predictors in these models showed a counterintuitive relationship with the outcome, which means that the sign of the regression coefficient is opposite to what we expected, this predictor was deleted from the model, and the backwards selection procedure was repeated. Because the selected predictors were the same in both imputed data sets, internal validation was performed on one of the sets.
Bootstrapping techniques were used to study the internal validity of the final models (ie, to adjust the estimated regression coefficients for overfitting and the model performance for overoptimism).33, 35 Random bootstrap samples were drawn with replacement (250 replications) from the full data set. The shrinkage factor, a result of the bootstrap analyses, is a measure of overfitting. Regression coefficients can be corrected for overfitting by multiplying them by this shrinkage factor. Bootstrapping was performed in S-plus 6.1.a
Model Performance
The model performance, expressed as calibration and discrimination, after bootstrapping can be considered as the performance that can be expected from similar future patients. Calibration refers to whether the predicted outcomes agree with the observed outcomes. A frequently occurring problem with prediction models is that the predictions for new patients are too extreme (too high for high-risk patients and too low for low-risk patients). Well-calibrated models have a slope of 1, while models providing predictions that are too extreme have a slope of less than 1.
The discriminative ability of the model (ie, how accurately can high-risk patients be distinguished from low-risk patients) was assessed using the AUC (95% confidence interval). An AUC of 0.5 indicates no discrimination above chance, whereas an AUC of 1.0 indicates perfect discrimination. A rough guide for classifying the discriminative ability of a diagnostic test is the traditional academic points system: excellent (>.90), good (>.80), fair (>.70), poor (>.60), or fail (>.50).36
Clinical Prediction Rules
To facilitate the calculation of an individual patient's risk, we developed score charts for the prediction models that were internally valid. We divided the regression coefficients of the multivariate models by the lowest regression coefficient and rounded them to the nearest integer to form scores for the predictors. The sum of the scores corresponds to the risk of a poor outcome. We created 3 risk categories: high (probability of adverse outcome >75%), moderate (probability of adverse outcome 25%–75%), and low (probability of adverse outcome <25%).
Results
Patients
Data on the outcomes at the 3-year follow-up were missing for 10 of the 156 patients. These 10 patients did not differ significantly from the rest of the cohort with regard to gender, age, T2-weighted lesion load at baseline, or number of lesions in the spinal cord at baseline. However, they had a trend towards higher baseline EDSS scores and, in contrast to the results for the EDSS, fewer lesions on the baseline MRI. For 13 of the 146 patients with a complete follow-up, baseline MRI data on the brain and spinal cord were missing. MRI data on the spinal cord were also missing for 2 patients. These data were imputed. Data on all other candidate predictors were complete. Table 2 shows the baseline characteristics of the patients, most of which are consistent with the expected pattern: more women than men, and approximately 80% with a relapse onset.
Table 2. Baseline Characteristics (n=146)
| Patient characteristics | |
| 93 | |
| 37.4±9.7 | |
| Disease characteristics | |
| 82% | |
| 2.5 | |
| Candidate predictors | |
| 9 | |
| 9 | |
| 9 | |
| 10 | |
| 8 | |
| 8 | |
| 8 | |
| 7 | |
| 1 | |
| 1 | |
| 1 | |
| 3.4 | |
| 0.2 | |
| 3.6 | |
| 2 |
Table 3 shows the number of patients with an unfavorable outcome at baseline and at the 3-year follow-up. For most patients, functioning does not change over the 3-year period. Most changes are in the direction of unfavorable outcomes. Exceptions are the outcomes of cognitive impairment (29 patients showed remarkable improvement) and social functioning (important changes in both directions).
Table 3. Frequencies of Unfavorable Outcomes at Baseline and After 3 Years (n=146)
| Baseline | Changes | 3y | ||
|---|---|---|---|---|
| Improved | Deteriorated | |||
| Inability to walk at least 500m | 16 | 5 | 26 | 37 |
| Impaired dexterity | 36 | 4 | 14 | 46 |
| Cognitive impairments | 60 | 29 | 13 | 44 |
| Incontinence | 9 | 6 | 21 | 24 |
| Inability to use a car or public transportation | 9 | 6 | 11 | 14 |
| Social dysfunction | 58 | 20 | 22 | 60 |
| Reliance on a disability pension | 26 | 3 | 54 | 77 |
The final regression models, obtained after a backwards stepwise procedure with a liberal P value of 0.5 and after elimination of predictors with a counterintuitive relationship with the outcome, are shown in table 4. The presented models are corrected for overoptimism by bootstrapping. Figure 1 shows the discrimination and calibration curves. The outcomes for inability to walk at least 500m, impaired dexterity, and cognitive impairments show good calibration (calibration curves follow approximately the 45° diagonal, and the shrinkage factors [slope] approach 1). The calibration curves for the other outcomes show important miscalibration. Discriminative ability is good for the models predicting inability to walk at least 500m (AUC=.89 [.83–.95]) and incontinence (AUC=.80 [.71–.90]); fair for the models predicting impaired dexterity (AUC=.77 [.69–.86]), cognitive impairments (AUC=.74 [.65–.83]), inability to use a car or public transportation (AUC=.76 [.65–.87]), and reliance on a disability pension (AUC=.72 [.64–.80]); and poor for the model predicting social dysfunction (AUC=.67 [.58–.76]).
Table 4. Final Regression Models and Their Predictive Ability
| Models and Predictors (Score Range) | Predictive Value | Model Performance | |||
|---|---|---|---|---|---|
| βshrunk | Factor | P | Slope | AUC (95% CI) | |
| Inability to walk at least 500m | |||||
| –.57 | 3 | .00 | .93 | .89 | |
| .77 | –5 | .00 | |||
| .16 | –1 | .05 | |||
| Impaired dexterity | |||||
| –.16 | 1 | .16 | .85 | .77 | |
| .25 | –2 | .31 | |||
| .46 | –3 | .03 | |||
| .27 | –2 | .17 | |||
| .97 | –6 | .00 | |||
| Cognitive impairments | |||||
| .03 | 1 | .12 | .88 | .74 | |
| .88 | 29 | .02 | |||
| –.17 | –5 | .07 | |||
| .06 | 2 | .00 | |||
| Incontinence | |||||
| –.44 | .00 | .97 | .80 | ||
| .10 | .25 | ||||
| Inability to use a car or public transportation | |||||
| –.19 | .06 | .71 | .76 | ||
| .38 | .20 | ||||
| .29 | .25 | ||||
| .12 | .09 | ||||
| Social dysfunction | |||||
| –.20 | .06 | .87 | .67 | ||
| .16 | .01 | ||||
| .01 | .27 | ||||
| Reliance on a disability pension | |||||
| –.23 | .01 | .84 | .72 | ||
| .17 | .44 | ||||
| .19 | .29 | ||||
| .03 | .08 | ||||


Fig 1.
Discrimination (left column) and calibration (right column) curves for all outcomes. The ideal line represents perfect calibration, the apparent line represents our original data and the bias-corrected line represents the bootstrap corrected calibration of the model. (A) Inability to walk at least 500 meters, (B) impaired dexterity, (C) cognitive impairments, (D) incontinence. (E) inability to use a car or public transportation, (F) social dysfunction, (G) reliance on a disability pension.
Table 4 also shows that information obtained from medical history taking and MRI is included in every regression model, and that information obtained from the physical examination is not included in the models that predict incontinence and social dysfunction.
Twelve of the 37 potential predictors did not predict the outcome they were supposed to predict. Seven omitted predictors were from the category medical history taking (“Are you easily tired?” [2x], “How good is your memory?” [2x], “How well can you concentrate?”, “How do you feel?” [2x]); 4 from the category physical examination (impairment pyramidal [3x] and cerebellar tracts); and 1 was an MRI parameter (T2-weighted lesion load). However, of these, only “How do you feel?” did not predict any outcome it was supposed to predict.
Clinical Prediction Rules
Clinical prediction rules were constructed for the models predicting inability to walk at least 500m, impaired dexterity, and cognitive impairments (appendix 1). They are fully based on the results of the final regression models. The “factors” from table 4 are used in the calculations of the clinical prediction rules.
Discussion
We have shown that it is feasible to make internally valid predictions for patients with recently diagnosed MS with regard to outcomes on physical and cognitive functioning. The inability to walk at least 500m was predicted by the perceived ability to walk, impairment of the cerebellar tract, and the number of MRI lesions in the spinal cord. Impaired dexterity was predicted by the perceived ability to use the hands, impairments of the pyramidal, cerebellar, and sensory tracts, and the T2-weighted infratentorial lesion load. Cognitive impairments were predicted by age, gender, the perceived ability to concentrate, and the T2-weighted supratentorial lesion load.
In general, our results show that it makes sense to select potential predictors by following the diagnostic process of the physician (ie, first, medical history taking; then physical examination; finally MRI), because all prediction models contain information from medical history taking and MRI, and only 2 of the 7 prediction models did unexpectedly not contain predictors from the physical examination. Similarly, Bergamaschi et al16 suggested incorporating additional clinical information, such as information on fatigue, cognitive impairments, and neuroradiologic information, into their prediction model in order to improve the sensitivity. In addition, they also suggested incorporating genetic, neuroimmunologic, and neurophysiologic information. Although impairments of the pyramidal tract are frequently accompanied by bladder problems, apparently they do not contribute to the prediction of incontinence. Also, we wrongly expected impairments of the pyramidal and cerebellar tracts to predict social functioning. Nevertheless, we think that our results show that useful prognostic information can be obtained from the standard routine of information gathering in clinical practice.
It is very tempting to (causally) interpret the strength of the associations between the predictors in the final models and the predicted outcomes. However, as outlined in the introduction, we have used a specific method to construct the regression models. The aim of this method is to predict future events as accurately as possible and not to assess the strength of an association. Most importantly, this method does not investigate confounding, which means that an assessment of the unconfounded association is not possible, and thus interpreting results in this way should not be done. In contrast to the method that we describe in this article, we have published an article19 in which we used a completely different method of analyzing our longitudinal data with the intention to identify the most powerful determinants of social functioning. It is also very tempting to add other clinical, or new, potentially stronger determinants to these models. An example may be brain atrophy measurements in the model for cognitive functioning. Although brain atrophy has been suggested to be causally related to cognitive functioning, adding this information to a prediction model does not necessarily mean that predictions improve. In prediction modeling, the added value of a determinant should be investigated by assessing the change in discriminative ability (AUC) and model fit, and not by looking at the strength of the association.
An important strength of our study is that the analysis was designed to optimize the internal validity.32, 33 Several attempts were made to minimize bias. First, missing baseline data were imputed to optimize the quality of the data. Second, we used a limited set of clinically relevant candidate predictors that were only excluded when the P value was greater than .50, or when the sign of the coefficient was opposite to what we expected. Finally, bootstrapping was used to correct for overoptimism of the regression coefficients and the model parameters (calibration: shrinkage factor, and discrimination: AUC).
Study Limitations
A possible weakness of the study was the assessment of cognitive dysfunction. Twenty-nine patients showed cognitive improvements in the first 3 years, substantially more than the number of patients who improved on the other outcomes. In accordance with the design of our study,18 cognitive data were collected annually, but it is possible that an interval of 1 year is not sufficiently long to rule out a practice effect. Another explanation might be that the definition of cognitive impairment that we applied does not correctly diagnose cognitive impairment in patients. The cognitive screening test is based on 5 cognitive tests that each assess a different aspect of cognitive functioning, but in the literature there is no consensus on which cutoff point to use.23, 24, 37, 38, 39 We used a sensitive cutoff point that classified patients as cognitively impaired if 1 or more of their test scores were lower than the mean – SD, compared with a Dutch reference population. Our strategy might therefore lead to a greater number of patients classified as cognitively impaired, whereas they actually perform within the norm (ie, patients are classified as false-positives). Therefore, the observed improvements in cognitive functioning might just be changes that occur within normal ranges. Alternative cutoff points, such as 2 or more test scores lower than the mean – SD, or 1 or more test scores lower than the mean – 2SD, have also been applied in the literature. However, applying these criteria to our data still showed cognitive improvements for a substantial number of patients (data not shown). Therefore, the observed improvements in cognitive functioning are either caused by a practice effect or they are real improvements.
At baseline (ie, a maximum of 6 months after a definite diagnosis of MS was made) 9 (6%) patients were receiving disease-modifying treatment. At the 3-year follow-up, this rose to 44 (30%) with a mean treatment duration of 25 months. We did not include disease-modifying treatment at baseline in our models because we assumed that confounding by indication could influence our findings. Patients with a more severe disease course are more likely to receive this treatment. The omission of disease-modifying treatment in the prediction models means that our models can be used independent of disease-modifying treatment. With regard to external validity, this means that our results can be generalized to populations in which approximately the same percentage of patients are receiving disease-modifying treatment.
Although our results look promising, application in clinical practice is not justified until they have been validated externally.40, 41, 42, 43 The analyses that we have presented should be repeated in a new cohort, which should be recruited in a different geographic area, at a different point in time, or, as is current in MS, assessed with different diagnostic criteria.44 The regression coefficients and model parameters in these cohorts should be used to assess the applicability of these models in clinical practice. When external validation has shown that the models perform well, and when the clinical usefulness of the clinical prediction rules has been established, they can be used with confidence in clinical practice to aid clinicians in making a prognosis. However, because the application of research findings in clinical practice is not self-evident, the clinical prediction rules should be actively implemented.45, 46
Our results indicate that predictions of the outcomes that are based on performance measures (ie, measures that require patients to actually perform a physical or cognitive test) are better than the predictions of outcomes based on self-reported health status. This implies that the more objective outcomes can be correctly predicted, but that self-reported outcomes are more difficult to predict. The reason for this might be that personal or social factors, which are not easy to measure as predictors, also have an effect on self-reported outcomes. In clinical practice, the clinical prediction rules could be used not only to improve treatment decisions regarding the initiation of disease-modifying treatment, but also to improve the timing of the (components of) rehabilitation treatment. Of equal importance is the possibility to improve the counseling of a patient. In conversations with the patient, the physician should become familiar with the patient's personal and social situation. When this information is combined with the information obtained from the clinical prediction rules, a patient-specific prognosis can be formulated, which the physician can then discuss with the patient. The results of this discussion can be used to adjust the counseling of the patient, or can lead to the initiation of preventive measures or (rehabilitation) treatment.
Conclusions
In conclusion, during the first 3 years of MS, it is possible to predict accurately inability to walk at least 500m, impaired dexterity, and cognitive impairments based on predictors that are derived from medical history taking, physical examination, and MRI shortly after a definite diagnosis of MS has been made. The ability to predict physical and cognitive functioning might facilitate the counseling of patients and the planning of (rehabilitation) treatment. But first, adequate performance of the models in a new cohort must be validated externally.
Supplier
Acknowledgments
We thank the neurologists in the participating hospitals (VU University Medical Center, Academic Medical Center Amsterdam, Sint Lucas Andreas Hospital Amsterdam, OLVG Hospital Amsterdam, Erasmus Medical Center Rotterdam) for recruiting the patients, and M. Jacobs-Van der Bruggen, PT, M. Schothorst, PT, and T. Wedding, PT, for performing the measurements.
Appendix 1
CLINICAL PREDICTION RULES
References
- Expectations of wheelchair-dependency in recently diagnosed patients with multiple sclerosis and their partners. Eur J Neurol. 2003;10:287–293
- . Perception of prognostic risk in patients with multiple sclerosis: the relationship with anxiety, depression, and disease-related distress. J Clin Epidemiol. 2004;57:180–186
- . Prediction of outcome in multiple sclerosis based on multivariate models. J Neurol. 1994;241:597–604
- . Prognostic factors in a multiple sclerosis incidence cohort with twenty-five years of follow-up. Brain. 1993;116:117–134
- The natural history of multiple sclerosis: a geographically based study (2. Predictive value of the early clinical course). Brain. 1989;112:1419–1428
- The natural history of multiple sclerosis: a geographically based study (I. Clinical course and disability). Brain. 1989;112:133–146
- . The natural history of multiple sclerosis: a geographically based study (3. Multivariate analysis of predictive factors and models of outcome). Brain. 1991;114:1045–1056
- . A prospective study on the prognosis of multiple sclerosis. Neurol Sci. 2000;21:S831–S838
- . Early clinical predictors and progression of irreversible disability in multiple sclerosis: an amnesic process. Brain. 2003;126:770–782
- Change in MS-related disability in a population-based cohort: a 10-year follow-up study. Neurology. 2004;62:51–59
- . Discrimination and calibration of mortality risk prediction models in interventional cardiology. J Biomed Inform. 2005;38:367–375
- . Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Kingdom. Crit Care Med. 2006;34:1378–1388
- Predicting outcome after traumatic brain injury: practical prognostic models based on large cohort of international patients. BMJ. 2008;336:425–429
- . A clinical prognostic scoring system for Guillain-Barre syndrome. Lancet Neurol. 2007;6:589–594
- . Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19:1059–1079
- Early prediction of the long term evolution of multiple sclerosis: the Bayesian Risk Estimate for Multiple Sclerosis (BREMS) score. J Neurol Neurosurg Psychiatry. 2007;78:757–759
- New diagnostic criteria for multiple sclerosis: guidelines for research protocols. Ann Neurol. 1983;13:227–231
- . The initial course of daily functioning in multiple sclerosis: a three-year follow-up study. Mult Scler. 2005;11:713–718
- Vitality, perceived social support and disease activity determine the performance of social roles in recently diagnosed multiple sclerosis: a longitudinal analysis. J Rehabil Med. 2008;40:151–157
- . Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology. 1983;33:1444–1452
- . Multiple sclerosis functional composite: impact of reference population and interpretation of changes. Mult Scler. 2002;8:366–371
- . Cognitive dysfunction in multiple sclerosis (I. Frequency, patterns, and prediction). Neurology. 1991;41:685–691
- The brief repeatable battery of neuropsychological tests: normative values allow application in multiple sclerosis clinical practice. Mult Scler. 2001;7:263–267
- . Cognitive impairment in probable multiple sclerosis. J Neurol Neurosurg Psychiatry. 2003;74:443–446
- . Functional assessment scales: a study of persons with multiple sclerosis. Arch Phys Med Rehabil. 1990;71:870–875
- . The rehabilitation activities profile: a validation study of its use as a disability index with stroke patients. Arch Phys Med Rehabil. 1995;76:501–507
- Translation, validation, and norming of the Dutch language version of the SF-36 Health Survey in community and chronic disease populations. J Clin Epidemiol. 1998;51:1055–1068
- . Subjective weighting of disability: an approach to quality of life assessment in rehabilitation. Disabil Rehabil. 1994;16:198–204
- Quality of life in multiple sclerosis: the Disability and Impact Profile (DIP). J Neurol. 1996;243:469–474
- Concurrent validity of the MS Functional Composite using MRI as a biological disease marker. Neurology. 2001;56:215–219
- Spinal cord abnormalities in recently diagnosed MS patients: added value of spinal MRI examination. Neurology. 2004;62:226–233
- . Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. J Clin Epidemiol. 1999;52:935–942
- . Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54:774–781
- NORM: Multiple imputation of incomplete multivariate data under a normal model, version 2, 1999. Software for Windows 95/98/NT. Available at: http://www.stat.psu.edu/∼jls/misoftwa.html. Accessed August 16, 2009.
- . Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–387
- . Measuring the accuracy of diagnostic systems. Science. 1988;240:1285–1293
- . Comparison of two brief neuropsychological batteries in people with multiple sclerosis. Mult Scler. 2002;8:169–176
- . Three screening batteries to detect cognitive impairment in multiple sclerosis. Mult Scler. 2002;8:382–389
- . Screening for memory problems in multiple sclerosis. Br J Clin Psychol. 2000;39:311–315
- Some prognostic models for traumatic brain injury were not valid. J Clin Epidemiol. 2006;59:132–143
- External validation is necessary in prediction research: a clinical example. J Clin Epidemiol. 2003;56:826–832
- . What do we mean by validating a prognostic model?. Stat Med. 2000;19:453–473
- . External validation and comparison of recently described prediction rules for suspected pulmonary embolism. Curr Opin Pulm Med. 2004;10:345–349
- . Assessing the generalizability of prognostic information. Ann Intern Med. 1999;130:515–524
- . From best evidence to best practice: effective implementation of change in patients' care. Lancet. 2003;362:1225–1230
- . Closing the gap between research and practice: an overview of systematic reviews of interventions to promote the implementation of research findings (The Cochrane Effective Practice and Organization of Care Review Group). BMJ. 1998;317:465–468
- a Insightful Corp, 1700 Westlake Ave N, Ste 500, Seattle, WA 98109-3044.
Supported by The Netherlands Organization for Scientific Research (grant no. NWO 940-33-009).
No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the authors or upon any organization with which the authors are associated.
The Functional Prognostication and Disability (FuPro) Study Group includes the following investigators: G.J. Lankhorst, J. Dekker, A.J. Dallmeijer, M.J. IJzerman, H. Beckerman, V. de Groot: VU University Medical Center Amsterdam (project coordination); A.J.H. Prevo, E. Lindeman, V.P.M. Schepers: University Medical Center, Utrecht; H.J. Stam, E. Odding, B. van Baalen: Erasmus Medical Center, Rotterdam; A. Beelen, I.J.M. de Groot: Academic Medical Center, Amsterdam.
PII: S0003-9993(09)00397-9
doi:10.1016/j.apmr.2009.03.018
© 2009 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Volume 90, Issue 9 , Pages 1478-1488, September 2009


