Skip Navigation


Brain Advance Access originally published online on September 6, 2006
Brain 2006 129(10):2648-2659; doi:10.1093/brain/awl223
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
129/10/2648    most recent
awl223v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by de Groot, V.
Right arrow Articles by Bouter, L. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by de Groot, V.
Right arrow Articles by Bouter, L. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

The usefulness of evaluative outcome measures in patients with multiple sclerosis

V. de Groot1,4, H. Beckerman1,4, B. M. J. Uitdehaag2,3, H. C. W. de Vet4, G. J. Lankhorst1,4, C. H. Polman2 and L. M. Bouter4

1 Departments of Rehabilitation Medicine, VU University Medical Center Amsterdam, The Netherlands 2 Departments of Neurology, VU University Medical Center Amsterdam, The Netherlands 3 Departments of Clinical Epidemiology and Biostatistics, VU University Medical Center Amsterdam, The Netherlands 4 EMGO Institute, VU University Medical Center Amsterdam, The Netherlands

Correspondence to: Vincent de Groot, Department of Rehabilitation Medicine, VU University Medical Center, P.O. Box 7057, 1007 MB Amsterdam, The Netherlands E-mail: v.degroot{at}vumc.nl


    Summary
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
To select the most useful evaluative outcome measures for early multiple sclerosis, we included 156 recently diagnosed patients in a 3-year follow-up study, and assessed them on 23 outcome measures in the domains of disease-specific outcomes, physical functioning, mental health, social functioning and general health. A global rating scale (GRS) and the Expanded Disability Status Scale (EDSS) were used as external criteria to determine the minimally important change (MIC) for each outcome measure. Subsequently, we determined whether the outcome measures could detect their MIC reliably. From these, per domain the outcome measure that was found to be most sensitive to changes (responsive) was identified. At group level, 11 outcomes of the domains of physical functioning, mental health, social functioning and general health could reliably detect the MIC. Of these 11, the most responsive measures per domain were the Medical Outcome Study 36 Short Form sub-scale physical functioning (SF36pf), the Disability and Impact Profile (DIP) sub-scale psychological, the Rehabilitation Activities Profile sub-scale occupation (RAPocc) and the SF36 sub-scale health, respectively. Overall, the most responsive measures were the SF36pf and the RAPocc. In individual patients, none of the measures could reliably detect the MIC. In sum, in the early stages of multiple sclerosis the most useful evaluative outcome measures for research are the SF36pf (physical functioning) and the RAPocc (social functioning).

Key Words: multiple sclerosis; evaluative outcome measures; responsiveness; minimally important change; smallest real change

Abbreviations: DIP, Disability and Impact Profile; EDSS, Expanded Disability Status Scale; GAS, Graphic Assessment Scale; MIC, minimally important change; MSFC, Multiple Sclerosis Functional Composite Measure; NHPT, nine-hole peg test; RAPocc, Rehabilitation Activities Profile sub-scale occupation; SaGAS, Short and Graphic Assessment Scale; SF36pf, Medical Outcome Study 36 Short Form sub-scale physical functioning; TWT, timed-walk test

Received February 10, 2006. Revised July 21, 2006. Accepted July 25, 2006.


    Introduction
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
The Expanded Disability Status Scale (EDSS) is a frequently used and well-known outcome measure for multiple sclerosis. However, it is criticized because it has unsatisfactory validity, and its reliability is poor (Noseworthy, 1994Go; Sharrack and Hughes, 1999Go; Hobart et al., 2000Go). In response to this situation, the National Multiple Sclerosis Society Clinical Outcomes Assessment Task Force reviewed a large number of data sets to determine which outcome measures would adequately reflect the consequences of the disease and are capable of reliably assessing these consequences. (Cutter et al., 1999Go; Fischer et al., 1999Go). This led to the development of the Multiple Sclerosis Functional Composite Measure (MSFC), which consists of the 25-foot timed-walk test (TWT), the nine-hole peg test (NHPT) and the paced auditory serial addition test (PASAT). Originally, the Task Force intended to include a measure of visual acuity, but no reliable measure could be found. The MSFC is intended to replace the EDSS as outcome measure in current and future trials (Cutter et al., 1999Go; Miller et al., 2000Go; Cohen et al., 2001Go). The interpretation of the scores of the individual components of the MSFC is straightforward. However, the total score, which results from a relatively complex formula to combine the component scores, is more difficult to interpret. An adaptation of the MSFC, the short and graphic assessment scale (SaGAS), (Vaney et al., 2004Go) uses only the TWT and the NHPT. Through specific transformation, a score is obtained that should be easier to interpret. Other newly developed disease-specific outcomes are the multiple sclerosis impact scale (Hobart et al., 2001aGo) and the Guy's Neurological Disability Scale (Sharrack and Hughes, 1999Go). In addition to these new, disease-specific, measures, several other disability and quality of life measures have been used in research into this illness (Granger et al., 1990Go; Kidd et al., 1995Go; Jonsson et al., 1996Go; Lankhorst et al., 1996Go; Ottenbacher et al., 1996Go; Cohen et al., 1999Go; Pfennings et al., 1999a; Van der Putten et al., 1999Go; Freeman et al., 2000Go; Hobart et al., 2001bGo).

Responsiveness is an important clinimetric property. It represents the ability to measure change, and is particularly relevant when outcome measures are to be used in longitudinal studies, such as clinical trials (De Vet et al., 2001Go; Terwee et al., 2003Go). In connection with multiple sclerosis, however, it has been studied much less extensively than validity and reliability (Koziol et al., 1999Go; Sharrack and Hughes, 1999Go; Schwid et al., 2000Go; Hoogervorst et al., 2001aGo; Patzold et al., 2002Go; Uitdehaag et al., 2002Go; Riazi et al., 2003Go; Hobart et al., 2004Go; McGuigan and Hutchinson, 2004Go). Moreover, in the literature there is no consensus about the exact definition of responsiveness (Terwee et al., 2003Go). Consequently, there are many currently available methods that have been developed to assess responsiveness (Terwee et al., 2003Go; Crosby et al., 2003Go; Husted et al., 2000Go). It has been shown that applying different methods leads to different conclusions about the absolute responsiveness of an outcome measure (Terwee et al., 2003Go). However, conclusions about the relative responsiveness, i.e. how do different measures perform in relation to each other, are less dependent on the method used (Terwee et al., 2003Go). To assess the relative responsiveness, several outcome measures of interest should be included, and parallel assessments should be made at the same points in time.

The methods that can be used to assess whether scores have changed can be sub-divided into distribution-based and anchor-based methods (Lydick and Epstein, 1993Go; Cella et al., 2002aGo, bGo; Schmitt and Di Fabio, 2004Go) Distribution-based methods, using standardized metrics, focus on the ability of an outcome measure to reliably determine change, and aim to quantify the noise, i.e. the variability of the score changes in the absence of a relevant change. Anchor-based methods focus on the correspondence of the change on the outcome measure of interest with the change on an external criterion (Cella et al., 2002aGo; Schunemann et al., 2003Go) and aim to quantify the signal, i.e. the size of the score change when there is a relevant change. The results of anchor-based methods depend on the external criterion and the cut-off point chosen (Cella et al., 2002aGo). The usefulness of an evaluative outcome measure depends on whether score changes associated with a relevant change can reliably be distinguished from the variability of score changes in absence of a relevant change (Guyatt et al., 1987Go).

In this study, 23 (sub-scales of) outcome measures were compared. The aim was to select the most useful evaluative outcome measures for the early stages of multiple sclerosis.


    Material and methods
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
Patients
All consecutive potentially eligible patients visiting the participating neurology outpatient clinics were invited to participate. A cohort of 156 recently (<6 months previously) diagnosed patients, aged 16–55 years, was recruited and followed prospectively for 3 years. Diagnosis was based on the Poser criteria for definite multiple sclerosis (Poser et al., 1983Go) Patients with other neurological disorders, or systemic or malignant neoplastic diseases, were excluded. The measurements took place at baseline, and 6 months, and after 1, 2 and 3 years. In the case of a relapse, the measurements were postponed for a few weeks until the relapse had subsided. The patients were visited at home in order to minimize drop-out. Four well-trained raters were responsible for the scoring.

Outcome measures
We studied the (sub-)scales of the EDSS (Kurtzke, 1983Go; Whitaker et al., 1995Go; Rudick et al., 1996Go), the MSFC (Cutter et al., 1999Go; Fischer et al., 1999Go; Cohen et al., 2000Go; Kalkers et al., 2000Go, 2001Go; Miller et al., 2000Go; Hoogervorst et al., 2001bGo), the SaGAS (Vaney et al., 2004Go), the Action Research Arm Test (ARAT) (Lyle, 1981Go; Van der Lee et al., 2001Go), the Disability and Impact Profile (DIP) (Laman and Lankhorst, 1994Go; Jonsson et al., 1996Go; Lankhorst et al., 1996Go; Cohen et al., 1999Go; Pfennings et al., 1999aGo), the Functional Independence Measure (FIM) (Granger et al., 1990Go; Kidd et al., 1995Go; Marolf et al., 1996Go), the Rehabilitation Activities Profile (RAP) (Van Bennekom et al., 1995Go, 1996Go), the Rivermead Mobility Index (RMI) (Collen et al., 1991Go; Forlander and Bohannon, 1999Go; Hsieh et al., 2000Go; Antonucci et al., 2002Go) and the Medical Outcome Study Short Form 36 (SF36). (Vickrey et al., 1995Go; Brunet et al., 1996Go; Freeman et al., 2000Go; Hobart et al., 2001bGo). The 23 (sub-)scales covered 5 domains: 3 disease-specific measures, 10 physical functioning measures (5 mobility measures, 3 self-care measures and 2 upper limb function measures), 4 mental health measures (2 cognitive function measures and 2 emotional well-being measures), 5 social functioning measures and 1 general health measure. Of these, 11 outcome measures were questionnaires, 7 were (parts of) measures that required physical examination or testing procedures and 5 outcome measures were based on semi-structured interviews. When possible, outcome measures were transformed into a scale ranging from 100 (best) to 0 (worst). Scores on the NHPT, the 10-m TWT, the MSFC, and the SaGAS could not be transformed in this way, because these continuous scales do not have defined end-points for best or worst scores. Table 1 presents an overview of the outcome measures and the baseline scores (standard deviation).


View this table:
[in this window]
[in a new window]

 
Table 1 Outcome measures studied and baseline scores of 156 multiple sclerosis patients

 
Analysis of responsiveness
To determine whether a patient's score had changed, we applied two external criteria: (i) a 7-point Likert-type patient rated global rating scale (GRS) of change, using the situation at diagnosis as reference point, (Jaeschke et al., 1989Go; Juniper et al., 1994Go; Liang, 1995Go; Stucki et al., 1995Go; Bessette et al., 1998Go; Cella et al., 2002bGo; Guyatt et al., 2002Go) emphasizing the perspective of the patient, and (ii) a change on the EDSS, representing the perspective of the clinician. The GRS question asked was: ‘How would you rate your current health when compared with your health at the time of diagnosis?’ The answering categories were: very much improved, much improved, slightly improved, stable, slightly deteriorated, much deteriorated, and very much deteriorated. The EDSS is a single-scale measure that ranges from 0 = a normal neurological examination, to 10 = death due to multiple sclerosis.

To assess the relative responsiveness, that is relatively independent of the method used to assess the responsiveness, (Terwee et al., 2003Go) we calculated the area under the receiver operating characteristic (ROC) curve with its 95% confidence interval (AUC, 95% CI) for every outcome measure, using score changes since baseline at 3 years (Beurskens et al., 1996Go; Van der Windt et al., 1998Go; De Vet et al., 2001Go; Mancuso and Peterson, 2004Go). We used a non-parametric method which does not make any assumptions about the distributions to compute the AUC. Figure 1 shows an example of two ROC curves. The relative responsiveness was assessed separately for deterioration and improvement. For both external criteria the scores were dichotomized, using the category stable (no change) as reference category.


Figure 1
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 ROC curves. In a ROC curve the sensitivity is plotted against 1–specificity. The AUC is a measure of the responsiveness of the outcome measure. An AUC ≤0.5 (diagonal line) indicates that the outcome measure is not responsive. The more the ROC curve approaches the upper left corner the more responsive the outcome measure is.

 
The minimally important change score of an outcome measure (MIC) is calculated as the mean change score in patients who showed a minimally important change according to an external criterion (Wyrwich et al., 1999Go). For the GRS of the patient's perspective we used the categories of slightly improved or slightly deteriorated to identify the patients who showed a minimally important change. Figure 2 illustrates graphically were the MIC is located on the spectrum of change-scores. The next possible categories, namely much improved or much deteriorated, were not used, because they indicate substantial improvement or deterioration. For EDSS of the clinician's perspective we used an improvement or deterioration of one point since baseline, because a change of one EDSS point is frequently used in trials and is the lowest EDSS change that can reliably be detected in the lower EDSS ranges (Noseworthy et al., 1990Go; Goodkin et al., 1992Go). The MIC was calculated from the patient's perspective (MIC-Pimprovement and MIC-Pdeterioration), and the clinician's perspective (MIC-Cimprovement and MIC-Cdeterioration). Because the longitudinal study design had five repeated measurements, we used generalized estimating equations (GEE) to estimate the MIC. This regression analysis technique for longitudinal data makes optimal use of the available data and reduces the standard error of the estimates, while at the same time correcting for the dependence between subsequent measurements (Zeger and Liang, 1986Go) The correlation structure was chosen on the basis of the correlation matrix of the outcome measures, and set at exchangeable (i.e. correlation coefficients between the first and successive measurements are approximately equal) for all outcomes except the cognitive sub-scale of the FIM that was set at 4-dependence (i.e. correlation coefficients between the first and successive measurements are progressively smaller). Scores on the outcome measures were used as dependent variable [Y(t)], and time (t, in years) and four dummy variables based on the external criteria (deteriorated, slightly deteriorated, slightly improved, improved) were used as independent variables. The stable group was used as reference. Because the GRS used the time of diagnosis as reference point, we used an autoregression formula that also includes the score for the outcome measure at baseline [Y(t0)] as independent variable. In the formula:

Formula
ß4 is interpreted as the mean score change on the outcome measure for patients who were slightly deteriorated, and provides an estimate for the MICdeterioration. ß5 is interpreted as the mean score change on the outcome measure for patients who were slightly improved, and provides an estimate for the MICimprovement.


Figure 2
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Relationship between SRC and MIC. (A) Shows the distribution of change scores for the categories (stable, slightly deteriorated and deteriorated) of the external criterion. There is minimal overlap between scores and the MIC is much larger than the SRC. This outcome measure is useful. (B) Shows again the distribution of change-scores for each category of the external criterion. There is much overlap between the scores and the MIC is smaller than the SRC. This outcome measure is not useful.

 
To assess the reliability of two scores on each outcome measure, we used the smallest real change (SRC) (Pfennings et al., 1999bGo; Beckerman et al., 2001Go; De Vet et al., 2001Go). The SRC is more often referred to as the smallest real difference, but since our main focus is on intra-individual changes, we prefer to use the term smallest real change. For each external criterion the SRC was calculated in the sub-group of patients who did not change, according to the external criterion during the first 6 months after inclusion. The SRC takes two sources of variability into account: (i) the reliability of the outcome measure, and (ii) the naturally occurring variability in stable patients. The SRC offers the opportunity to calculate a measure for comparisons at group level (SRCgroup) and at individual level (SRCindividual) (Pfennings et al., 1999bGo). The SRCindividual was calculated as 1.96 x SD of the score changes in stable patients. Figure 2 shows graphically where the SRC is located on the spectrum of change-scores. The SRCgroup was calculated as SRCindividual /{surd}n.

The selection of the most useful evaluative outcome measure was based on the relative responsiveness (highest AUC), whether the MIC > SRCindividual or SRCgroup, (see Fig. 2) and whether the results were comparable for both external criteria. For each outcome measure we calculated the sample sizes (patients per group) needed to show differences between independent samples in future studies. We used the formula 2 x {[(Z{alpha} + Zß) x (SRCgroup/1.96)]/MIC}2 (Guyatt et al., 1987Go), where {alpha} is set at 0.05 (Z{alpha} = 1.96) and ß is set at 0.20 (Zß = 0.84), in order to achieve a power of 0.80.

The statistical analyses were performed with SPSS version 11.5 for Windows. GEE analyses were performed with the Statistical Package for Interactive Data Analysis (SPIDA) version 6.05 from the Statistical Computing Laboratory.


    Results
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
A total of 156 patients were included in the cohort between January 1998 and January 2001. Table 2 shows the baseline characteristics of these patients. Most characteristics comply with the expected pattern: more females than males in the relapsing–remitting group, more males than females in the primary progressive group, and more severe neurological deficits in the primary progressive group. Seven patients were lost to follow-up (three after 1 year, one after 2 years and three after 3 years), and 15 measurements were missing. The baseline scores on the outcome measure are presented in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 2 Baseline characteristics of patients with multiple sclerosis

 
Table 3 shows the distribution of GRS and EDSS scores for each measurement. The distributions are remarkably different. The GRS scores are more equally spread across the categories, and according to the GRS fewer patients were stable, and more patients had improved. Over time there is a tendency for both external criteria to change towards deterioration. The percentage of patients that deteriorated (taking categories deteriorated and slightly deteriorated together) according to the patient's and clinician's perspective, respectively, is 36 and 22% at 6 months, 46 and 33% at 1, 50 and 46% at 2, and 60 and 44% at 3 years. The agreement between the patient's and clinician's perspective to classify patients as deteriorated, stable or improved is 35% ({kappa} = 0.10) at 6 months, 42% ({kappa} = 0.14) at 1, 40% ({kappa} = 0.07) at 2, and 45% ({kappa} = 0.13) at 3 years.


View this table:
[in this window]
[in a new window]

 
Table 3 Distribution (n, %) of the GRS (patient's perspective) and EDSS (clinician's perspective) based external criteria for each measurement

 
Tables 4 and 5 show that the AUCs range from 0.50 to 0.75 and have wide CIs. For five (patient's perspective) and seven (clinician's perspective) outcome measures the AUC does not significantly differ from 0.50. For a substantial number of outcome measures the MIC does not significantly differ from zero, which means that the MIC cannot be detected beyond chance for these outcome measures in this population. It also means that these outcome measures are not suitable to evaluate change in this population. Furthermore, none of the outcome measures has an MIC > SRCindividual, which makes the outcome measures unsuitable to detect an minimally important change in an individual patient. However, several measures have an MIC > SRCgroup, which makes them suitable for research purposes. The final columns in the tables show a large variation in required sample sizes. The unrealistically high estimates of the sample sizes are caused by large estimates of the SRCindividual relative to the estimate of the MIC.


View this table:
[in this window]
[in a new window]

 
Table 4 AUC, MIC-P and SRC for deterioration using the patient's perspective as external criterion

 


View this table:
[in this window]
[in a new window]

 
Table 5 AUC, MIC-C, and SRC for deterioration using the clinician's perspective as external criterion

 
The results for deterioration from the patient's perspective can be found in Table 4. Of the disease-specific outcome measures, the EDSS has the highest AUC [0.70 (95% CI 0.62–0.79)]. For all three disease-specific outcome measures the MIC-Pdeterioration is small, and does not significantly differ from zero. Of the outcome measures related to physical functioning, the SF36pf has the highest AUC [0.75 (95% CI 0.67–0.84)] and an MIC-Pdeterioration (–8.58) that exceeds the SRCgroup (–4.38). Of the outcome measures related to mental health, the FIM sub-scale cognitive function (FIMcf) and the DIP sub-scale psychological (DIPpsy) have approximately the same AUCs [0.65 (95% CI 0.55–0.74) and 0.64 (95% CI 0.55–0.73), respectively]. For the DIPpsy the MIC-Pdeterioration (–2.88) exceeds the SRCgroup (–2.80), but for the FIMcf the MIC-Pdeterioration (–1.47) is smaller than the SRCgroup (–1.66). Of the outcome measures related to social functioning, the RAP sub-scale occupation (RAPocc) has the highest AUC [0.73 (95% CI 0.64–0.81)] and an MIC-Pdeterioration (–7.74) exceeding the SRCgroup (–4.24).

Table 5 shows the results for deterioration from the clinician's perspective. Because information from the EDSS is used to obtain the external criterion, results for the EDSS cannot be calculated. The two disease-specific outcome measures have a very similar AUC [0.72 (95% CI 0.63–0.81) for the SaGAS and 0.71 (95% CI 0.62–0.80) for the MSFC], and for both the MIC-Cdeterioration was small and did not significantly differ from zero. Of the outcome measures related to physical functioning, SF36pf has the highest AUC [0.72 (95% CI 0.63–0.80)] and an MIC-Cdeterioration (–8.52) that amply exceeds the SRCgroup (–2.81). Of the outcome measures related to mental health, the DIPpsy and the PASAT3 (test 3-second version) have an AUC of 0.60 (95% CI = 0.50–0.70 and 0.50–0.69, respectively). For both outcome measures the MIC-Cdeterioration is small and does not significantly differ from zero. Of the outcome measures related to social functioning, the RAPocc has the highest AUC [0.69 (95% CI 0.61–0.78)] and an MIC-Cdeterioration (–8.40) that amply exceeds the SRCgroup (–2.69).

Regardless of the domain of the outcome measures, the five most responsive (AUC) outcome measures to detect deterioration from the patient's perspective are the SF36pf [0.75 (0.67–0.84)], the DIP sub-scale mobility [DIPmob; 0.73 (0.65–0.82)], the RAPocc [0.73 (0.64–0.81)], the DIP sub-scale self-care [DIPself; 0.70 (0.62–0.79)] and the EDSS [0.70 (0.62–0.79)]. Of these, only the EDSS does not fulfil the criterion MIC-Pdeterioration > SRCgroup. The five most responsive outcome measures to detect deterioration (AUC) from the clinician's perspective are the SaGAS [0.72 (0.63–0.81)], the SF36pf [0.72 (0.63–0.80)], the MSFC [0.71 (0.62–0.80)], the RAPocc [0.69 (0.61–0.78)] and the TWT [0.69 (0.59–0.78)]. Of these, only the SF36pf and the RAPocc have an MIC-Cdeterioration > SRCgroup.

The results for improvement are less clear, because of the small percentage of patients in the slightly improved groups (data not shown). The MIC was either very small or did not significantly differ from zero. Therefore, it was not possible to compare the results with the SRC. Consequently, we can only look at the relative responsiveness by comparing the AUCs. From the patient's perspective, the highest AUCs were found for the EDSS [0.78 (95% CI 0.70–0.87)], the DIPmob [0.73 (95% CI 0.64–0.85)], the FIM sub-scale motor function [FIMmf; 0.71 (0.63–0.80)], the SF36pf [0.71 (95% CI 0.62–0.80)] and the RAPocc [0.71 (95% CI 0.62–0.82)]. From the clinician's perspective, the highest AUCs were found for the RAPocc [0.79 (95% CI 0.63–0.95)], the SF36pf [0.77 (95% CI 0.64–0.90)], the FIMmf [0.74 (95% CI 0.62–0.86)], the FIMcf [0.74 (95% CI 0.59–0.90)] and the RAPmob [0.72 (95% CI 0.58–0.87)]. Irrespective of the external criterion that is applied, the most responsive outcome measures to detect improvement are the FIMmf, the SF36pf, the RAPocc and the EDSS. However, the criterion MIC > SRC could not be evaluated for any of the measures.


    Discussion
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
In the early stages of multiple sclerosis, the two most useful evaluative outcome measures to detect deterioration, and that perform well irrespective of the external criterion that is applied, are the SF36pf for the physical functioning domain (mobility), and the RAPocc for the social functioning domain. Both measures have an MIC > SRCgroup, which makes them suitable for application in clinical research. However, none of the outcome measures that we studied had an MIC > SRCindividual, which means that the reliability demands that warrant application at individual patient level are not met.

The selection of an outcome measure is not only guided by its responsiveness. It is also important to select an outcome measure that really measures the phenomena of interest. Therefore, we categorized the outcome measures that we have studied into five domains and five sub-domains, which should guide their selection. Before the final selection of an outcome measure, one should study the content of an outcome measure to make sure it measures the variable one is interested in. The measures that perform best in the other domains are the DIPpsy (mental health domain, emotional well-being) and the SF36gh (general health domain), but none of the disease-specific outcome measures fulfilled our selection criteria.

We were looking for an outcome measure with a performance that did not depend on the required perspective. Finding such an outcome measure would increase our confidence in this measure, because it would imply that the results obtained with this measure have the same meaning for both the clinician and the patient. However, it might be very legitimate to emphasize one or both perspectives depending on the research aim. For more basic research purposes reliance on examiner-driven outcomes might be fully acceptable. But for more clinically oriented research questions, i.e. studies that are interested in the effects on patients, such as clinical trials, reliance on examiner-driven assessments only is not sufficient. In these studies one should also include patient-driven outcome measures, because that is the only way to show benefit for patients. For the evaluation of this kind of clinically oriented research it would be very valuable to have a (primary) outcome measure available which evaluative ability is independent of the chosen perspective (patient versus examiner), because only then the MIC is the same for the patient and the examiner, which facilitates the interpretation of this research.

An important strength of this study is the simultaneous evaluation of several outcome measures that are frequently used in multiple sclerosis research. Scores were collected for 23 (sub-scales of) outcome measures in the same patients and in the same way. This enables a direct comparison of the outcome measures, and facilitates interpretation of the results. Information about the responsiveness of outcome measures is often derived from several studies with different designs, different populations, different anchors, and different outcome measures. This hampers the selection of the most responsive outcome measure, because no direct comparison can be made.

The relative responsiveness is quite independent of the particular approach to the evaluation of responsiveness (Terwee et al., 2003Go). We chose the approach presented in this article for two reasons. First of all, we aimed to identify the most responsive outcome measures by comparing the outcome measures on the basis of the AUC (relative responsiveness). Second, we tried to obtain data that would facilitate the interpretation of score changes in future studies. The interpretation depends on two aspects of the score change: (i) what is a minimally important change, and (ii) is the instrument capable of measuring this change? We have used the MIC as a measure of minimally important change, and the SRC to estimate the ability of a measure to detect this change. From our results we conclude that our strategy worked well for the analysis of changes in the direction of deterioration, because we were able to clearly show the relative responsiveness, and provide clear data that facilitate the interpretation of score changes. However, the results with regard to changes in the direction of improvement are inconclusive, due to the small number of patients in this category.

Another aspect of this study that deserves some attention is the analysis of repeated measures. We made optimal use of the longitudinal data by applying longitudinal data-analysis techniques, which reduces the standard error of our estimates. Moreover, we constructed a regression model that enabled us to estimate the MIC for deterioration and improvement in one model. The possibility of this study to show improvement is limited by its design, because recruiting recently diagnosed patients, who are only mildly disabled, implies a limitation in the possibility to improve. Therefore, our results for improvement are not as clear as those for deterioration. However, despite this limitation, the study does provide some preliminary evidence that the MICdeterioration and the MICimprovement are not necessarily equal (Cella et al., 2002bGo).

A well-known problem in studies of anchor-based responsiveness is the choice of the external criterion to define change (Cella et al., 2002aGo). Norman et al. (1997)Go compared two methods to assess responsiveness with each other: (i) an effective therapy as construct for change, and (ii) a retrospective method to assess change using a GRS. In this direct comparison the GRS performs worse than the effective therapy as external criterion. The problem with the generalization of these results is that there is often not an effective therapy available. Particularly in longitudinal cohort studies, such as ours, we cannot rely on an effective therapy. There are ways to use effective therapy as construct for change in multiple sclerosis by applying outcome measures in patients that were treated for a relapse with corticosteroids. A major problem in these studies is that one is looking at improvements. It is absolutely not certain that these results can subsequently be used in studies that look at deterioration.

Because a gold standard for change is lacking, we had to rely on other methods to define change. We decided not to rely on one method, because the chosen method to define change influences the results of the analyses. Furthermore, we carefully sought for sensible external criteria. Roughly speaking, there are three constructs for the evaluation of change in multiple sclerosis: data obtained from repeated MRI studies, the EDSS as the most frequently used clinical outcome measure, and a GRS which emphasizes the perspective of the patient. Our main focus in this study was on disability and quality of life. Therefore, using MRI data as a construct for change is not appealing, since it only offers information at the level of pathological changes, which are only remotely related to disability and even less related to quality of life. The EDSS has limitations with regard to its validity and reliability, which might make it relatively unsuitable as an external criterion for change. However, despite this criticism, it is a scale that is very well known among clinicians. It is, in fact, so well-known that a description of a study population is not complete without EDSS data. Therefore, we used the EDSS to determine important change from a clinician's point of view. Because the first question of a clinician during a visit often is a global rating: ‘How are you doing since the last visit’, and because a stronger external criterion is lacking, we used a GRS to emphasize the perspective of the patient. Because all outcomes were compared with these two sensible external criteria, we made insightful what the effect of the external criteria is.

A global rating requires that patients are able to mentally subtract a previous situation from the present situation (Liang, 1995Go; Stratford et al., 1996Go). Criticism about the use of a GRS concerns the fact that this rating has often been found to show stronger associations with the present situation than with the previous situation (Guyatt et al., 2002Go). In an attempt to overcome this problem, we coupled the previous situation to an important life-event for the patient. In this way, we tried to facilitate the mental subtraction, and hoped for more equal associations of the GRS with the previous and the present situation. We considered the time of diagnosis as an important life-event. Because in our study patients were not diagnosed until some time after their exacerbation and because the mean time between diagnosis and first measurement is relatively short (3.5 months), we decided that it was valid to use it as reference point. Our strategy was partly successful. The mean correlation coefficient between the GRS at 3 years and the outcome measures at baseline was 0.26 (range 0.15–0.43), at 6 months it was 0.30 (range 0.14–0.44), at 1 year it was 0.33 (range 0.14–0.49), at 2 years it was 0.37 (range 0.09–0.56), and at 3 years it was 0.40 (range 0.14–0.59).

Another point of discussion about the use of the GRS as external criterion is the choice of the cut-off point used for the calculation of the MIC. We decided to use the category ‘slightly deteriorated’ or ‘slightly improved’ as indicator of minimally important change. In our opinion, the next category (‘much deteriorated’ or ‘much improved’) is, at least semantically, not equivalent to minimally important change. Others have argued that using ‘much deteriorated’ or ‘much improved’ is more appropriate than ‘slightly deteriorated’ or ‘slightly improved’, because the latter two categories are often used by patients who are reluctant to classify themselves as stable, while their situation would justify this classification (Ostelo and De Vet, 2005Go). We performed a sensitivity analysis (data not shown), with the category ‘much deteriorated’ as cut-off, and compared the MIC-P and the MIC-P estimates obtained in this sensitivity analysis (MIC-Psens) with the MIC-C. For 17 outcome measures the MIC-P was closer to the MIC-C than the MIC-Psens, indicating that there is a greater correspondence between the MIC-P and the MIC-C than between the MIC-Psens and the MIC-C, which supports the use of the category ‘slightly deteriorated’ as cut-off in this sample. In future studies it might be useful to add extra categories to the GRS between ‘slightly’ and ‘much’, for example by using ‘deteriorated’ and ‘improved’ on their own, and to use these categories to determine the MIC. This might lessen the (semantic) gap between ‘slightly’ and ‘much’, and might aid patients who are reluctant to use the category ‘stable’, without influencing the estimation of the MIC.

Recently, Solari et al. (2005)Go studied the practice effects of the MSFC and suggested that, to improve efficiency, one prebaseline administration of TWT, three of PASAT and four of NHPT are needed. Their study consisted of repeated administrations of the tests in 1 day. What their results mean for repeated MSFC measurements with intervals of 6 months or longer, such as our study, is not immediately clear. Will you never lose your ability to perform the PASAT or NHPT once you have mastered it, or do you again need some prebaseline administrations after you have not been performing the PASAT or NHPT for some time? For the components of the MSFC and the SaGAS we used the same test protocol at each measurement. The NHPT and the TWT were conducted twice. For the TWT this is sufficient, for the NHPT two additional administrations would have been better. The PASAT was always administered once, but in any case after at least one practice trial, as described in the MSFC manual. Although the interval between subsequent measurements was at least 6 months, we cannot rule out a practice effect. Ignoring a possibly present practice effect will lead to inflated measures of responsiveness in the direction of deterioration for the NHPT and PASAT, because the measured change in cognitive or upper limb function is smaller than the real change. The opposite would occur for the measures of responsiveness in the direction of improvement, because the measured improvement in cognitive function is larger than the real improvement.

Although we were able to identify the most responsive outcome measures and to show, for several of these outcome measures, that the signal (MIC) exceeds the noise (SRCgroup), it should be noted that our results are not automatically applicable to all patients with multiple sclerosis. In general, our population was only mildly disabled, had a disease duration of just over 3 years at the end of the study, and was treated with disease modifying treatment if indicated (44 patients were on disease modifying treatment at the end of the study). Because this treatment will influence the outcomes and the external criteria in the same direction, it will probably not significantly alter our results. The results of this study can therefore be used in early intervention studies. With the positive effects of disease modifying treatments, patients will be mildly disabled for a longer period. Future trials will have to compare newly developed treatments with the current disease modifying treatments. Showing differences in effectiveness in these studies will increasingly suffer from power problems. In comparative studies an outcome measure should be able to show differences between longitudinal changes of two (or more) groups (arms of a trial), which is probably more difficult than showing changes within one group only. In our opinion this is a requirement that can only be fulfilled when an outcome measure is already capable of detecting longitudinal changes. Our results clearly show that some of the outcome measures that we have studied, and that are not regularly used in trials, are more suitable to evaluate changes than others. In the early stages of multiple sclerosis a reduction of the walking distance is more often a problem than a reduction in walking speed. The SF36pf probably performs well because it also contains items about walking distance, whereas the regularly used TWT only measures walking speed. The RAPocc and, to a lesser extent, the DIPsoc, probably perform well because they measure social functioning. Although social functioning is seriously affected in the early stage of multiple sclerosis, it is not part of the measures that are regularly used in trials. Future responsiveness studies should focus on more severely disabled populations and populations with a longer duration of the disease.

None of the outcome measures used in this study could detect important change in individual patients. Outcome measures that might be useful should have a relatively low SRCindividual. This point has already been acknowledged in relation to the MSFC. Several authors have stated that a change of 20% for the components of the MSFC is required to exceed measurement error (Kaufman et al., 2000Go; Schwid et al., 2002Go) and that changes for the MSFC and SaGAS should be >0.5 (Hoogervorst et al., 2004Go; Vaney et al., 2004Go). Depending on the external criterion used, we found that in our sample a change of 2.6–3.0 s (40% of baseline) for the TWT and 2.8–5.3 s (13% of baseline) for the NHPT is required to exceed measurement error. In our sample, changes in MSFC and SaGAS should exceed 0.54–0.72 and 0.25–0.44, respectively, in order to indicate significant change. However, MSFC scores should be interpreted with caution, because it is not evident from the total score which component contributes most to the total score. The differences between results reported in the literature (Kaufman et al., 2000Go; Schwid et al., 2002Go; Hoogervorst et al., 2004Go; Vaney et al., 2004Go) and our results might be explained by our study design. We recruited recently diagnosed patients, whereas in the other studies the patients had the disease for various lengths of time. Furthermore, we used a fixed interval of 6 months between visits to identify the stable patients, whereas the other studies used a 5-day or a variable interval. The design of the present study matches usual patient care, which increases the validity of our results, but, unfortunately, leads to the conclusion that the outcome measures in this study are not suitable for detecting change within a few years in individual, recently diagnosed, patients.


    Contribution of authors
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
Concept and design: V.deG., H.B., B.M.J.U., H.C.W.deV., G.J.L., C.H.P., L.M.B.

Acquisition of data: V.deG., H.B., B.M.J.U., C.H.P.

Analysis and interpretation of the data: V.deG., H.B., B.M.J.U., H.C.W.deV., G.J.L., C.H.P., L.M.B.

Drafting of the manuscript: V.deG., H.B.

Critical revision of the manuscript for important intellectual content: V.deG., H.B., B.M.J.U., H.C.W.deV., G.J.L., C.H.P., L.M.B.


    Conflict of interest
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
There are no conflicts of interest. The corresponding author (V.deG.) had full access to all the data used in the study, and had the final responsibility for the decision to submit the manuscript for publication.


    Acknowledgements
 
The Netherlands Organization for Scientific Research (NWO 940-33-009) supported this study. It has been performed on behalf of the Functional Prognostication and Disability (FuPro) Study Group: G.J.L., J. Dekker, A. J. Dallmeijer, M. J. IJzerman, H.B., V.d.G.: VU University Medical Center Amsterdam (project co-ordination); A. J. H. Prevo, E. Lindeman, V. P. M. Schepers: University Medical Center, Utrecht; H. J. Stam, E. Odding, B. van Baalen: Erasmus Medical Center, Rotterdam; A. Beelen, I. J. M. de Groot: Academic Medical Center, Amsterdam. We would like to thank the neurologists in the participating hospitals (VU University Medical Center, Academic Medical Center Amsterdam, Sint Lucas Andreas Hospital Amsterdam, OLVG Hospital Amsterdam, Erasmus Medical Center Rotterdam) for recruiting the patients, and M. van der Bruggen, M. Schothorst, and T. Wedding for performing the measurements.


    References
 Top
 Summary
 Introduction
 Material and methods
 Results
 Discussion
 Contribution of authors
 Conflict of interest
 References
 
Antonucci G, Aprile T, Paolucci S. (2002) Rasch analysis of the Rivermead mobility index: a study using mobility measures of first-stroke inpatients. Arch Phys Med Rehabil 83:1442–9.[CrossRef][Web of Science][Medline]

Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek ALM. (2001) Smallest real difference, a link between reproducibility and responsiveness. Qual Life Res 10:571–8.[CrossRef][Web of Science][Medline]

Bessette L, Sangha O, Kuntz KM, Keller RB, Lew RA, Fossel AH, et al. (1998) Comparative responsiveness of generic versus disease-specific and weighted versus unweighted health status measures in carpal tunnel syndrome. Med Care 36:491–502.[CrossRef][Web of Science][Medline]

Beurskens AJHM, De Vet HCW, Köke AJA. (1996) Responsiveness of functional status in low back pain: a comparison of different instruments. Pain 65:71–6.[CrossRef][Web of Science][Medline]

Brunet DG, Hopman WM, Singer MA, Edgar CM, MacKenzie TA. (1996) Measurement of health-related quality of life in multiple sclerosis patients. Can J Neurol Sci 23:99–103.[Web of Science][Medline]

Cella D, Eton DT, Lai JS, Peterman AH, Merkel DE. (2002a) Combining anchor and distribution-based methods to derive minimal clinically important differences on the functional assessment of cancer therapy (FACT) anemia and fatigue scales. J Pain Symptom Manage 24:547–61.[CrossRef][Web of Science][Medline]

Cella D, Hahn EA, Dineen K. (2002b) Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening. Qual Life Res 11:207–21.[CrossRef][Web of Science][Medline]

Cohen JA, Cutter GR, Fischer JS, Goodman AD, Fedor RH, Jak AJ, et al. (2001) Use of the multiple sclerosis functional composite as an outcome measure in a phase 3 clinical trial. Arch Neurol 58:961–7.[Abstract/Free Full Text]

Cohen JA, Fischer JS, Bolibrush DM, Jak AJ, Kniker JE, Mertz LA, et al. (2000) Intrarater and interrater reliability of the multiple sclerosis functional composite outcome measure. Neurology 54:802–6.[Abstract/Free Full Text]

Cohen L, Pouwer F, Pfennings LE, Lankhorst GJ, Van der Ploeg HM, Polman CH, et al. (1999) Factor structure of the disability and impact profile in patients with multiple sclerosis. Qual Life Res 8:141–50.[CrossRef][Web of Science][Medline]

Collen FM, Wade DT, Robb GF, Bradshaw CM. (1991) The Rivermead Mobility Index: a further development of the Rivermead motor assessment. Int Disabil Stud 13:50–4.[Medline]

Crosby RD, Kolotkin RL, Williams GR. (2003) Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 56:395–407.[CrossRef][Web of Science][Medline]

Cutter GR, Baier ML, Rudick RA, Cookfair DL, Fischer JS, Petkau J, et al. (1999) Development of a multiple sclerosis functional composite as a clinical trial outcome measure. Brain 122:871–82.[Abstract/Free Full Text]

De Vet HCW, Bouter LM, Bezemer PD, Beurskens AJ. (2001) Reproducibility and responsiveness of evaluative outcome measures. Theoretical considerations illustrated by an empirical example. Int J Technol Assess Health Care 17:479–87.[Web of Science][Medline]

Fischer JS, Rudick RA, Cutter GR, Reingold SC. (1999) The multiple sclerosis functional composite measure (MSFC): an integrated approach to multiple sclerosis clinical outcome assessment. National Multiple Sclerosis Society Clinical Outcomes Assessment Task Force. Mult Scler 5:244–50.[Abstract/Free Full Text]

Forlander DA and Bohannon RW. (1999) Rivermead mobility index: a brief review of research to date. Clin Rehabil 13:97–100.[Abstract/Free Full Text]

Freeman JA, Hobart JC, Langdon DW, Thompson AJ. (2000) Clinical appropriateness: a key factor in outcome measure selection: the 36 item short form health survey in multiple sclerosis. J Neurol Neurosurg Psychiatry 68:150–6.[Abstract/Free Full Text]

Goodkin DE, Cookfair D, Wende K, Bourdette D, Pullicino P, Scherokman B, et al. (1992) Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke expanded disability status scale (EDSS). Multiple Sclerosis Collaborative Research Group. Neurology 42:859–63.[Abstract/Free Full Text]

Granger CV, Cotter AC, Hamilton BB, Fiedler RC, Hens MM. (1990) Functional assessment scales: a study of persons with multiple sclerosis. Arch Phys Med Rehabil 71:870–5.[Web of Science][Medline]

Guyatt G, Walter S, Norman G. (1987) Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis 40:171–8.[CrossRef][Web of Science][Medline]

Guyatt GH, Norman GR, Juniper EF, Griffith LE. (2002) A critical look at transition ratings. J Clin Epidemiol 55:900–8.[CrossRef][Web of Science][Medline]

Hobart J, Freeman J, Thompson A. (2000) Kurtzke scales revisited: the application of psychometric methods to clinical intuition. Brain 123:1027–40.[Abstract/Free Full Text]

Hobart JC, Lamping DL, Fitzpatrick R, Riazi A, Thompson A. (2001a) The Multiple sclerosis impact scale (MSIS-29): a new patient-based outcome measure. Brain 124:962–73.[Abstract/Free Full Text]

Hobart JC, Lamping DL, Freeman JA, Langdon DW, McLellan DL, Greenwood RJ, et al. (2001b) Evidence-based measurement: which disability scale for neurologic rehabilitation? Neurology 57:639–44.[Abstract/Free Full Text]

Hobart JC, Riazi A, Lamping DL, Fitzpatrick R, Thompson AJ. (2004) Improving the evaluation of therapeutic interventions in multiple sclerosis: development of a patient-based measure of outcome. Health Technol Assess 8:1–60.[Medline]

Hoogervorst EL, Kalkers NF, Van Winsen LML, Uitdehaag BMJ, Polman CH. (2001a) Differential treatment effect on measures of neurologic exam, functional impairment and patient self-report in multiple sclerosis. Mult Scler 7:335–9.[Abstract/Free Full Text]

Hoogervorst EL, Van Winsen LM, Eikelenboom MJ, Kalkers NF, Uitdehaag BM, Polman CH. (2001b) Comparisons of patient self-report, neurologic examination, and functional impairment in multiple sclerosis. Neurology 56:934–7.[Abstract/Free Full Text]

Hoogervorst EL, Zwemmer JN, Jelles B, Polman CH, Uitdehaag BMJ. (2004) Multiple sclerosis impact scale (MSIS-29): relation to established measures of impairment and disability. Mult Scler 10:569–74.[Abstract/Free Full Text]

Hsieh CL, Hsueh IP, Mao HF. (2000) Validity and responsiveness of the rivermead mobility index in stroke patients. Scand J Rehabil Med 32:140–2.[CrossRef][Web of Science][Medline]

Husted JA, Cook RJ, Farewell VT, Gladman DD. (2000) Methods for assessing responsiveness: a critical review and recommendations. J Clin Epidemiol 53:459–68.[CrossRef][Web of Science][Medline]

Jaeschke R, Singer J, Guyatt GH. (1989) Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials 10:407–15.[CrossRef][Web of Science][Medline]

Jonsson A, Dock J, Ravnborg MH. (1996) Quality of life as a measure of rehabilitation outcome in patients with multiple sclerosis. Acta Neurologica Scandinavica 93:229–35.[Web of Science][Medline]

Juniper EF, Guyatt GH, Willan A, Griffith LE. (1994) Determining a minimal important change in a disease-specific quality of life questionnaire. J Clin Epidemiol 47:81–7.[CrossRef][Web of Science][Medline]

Kalkers NF, De Groot V, Lazeron RH, Killestein J, Ader HJ, Barkhof F, et al. (2000) multiple sclerosis functional composite: relation to disease phenotype and disability strata. Neurology 54:1233–9.[Abstract/Free Full Text]

Kalkers NF, Bergers E, Castelijns JA, Van Walderveen MA, Bot JC, Ader HJ, et al. (2001) Optimizing the association between disability and biological markers in multiple sclerosis. Neurology 57:1253–8.[Abstract/Free Full Text]

Kaufman M, Moyer D, Norton J. (2000) The significant change for the timed 25-foot walk in the multiple sclerosis functional composite. Mult Scler 6:286–90.[Abstract/Free Full Text]

Kidd D, Howard RS, Losseff NA, Thompson AJ. (1995) The benefit of inpatient neurorehabilitation in multiple sclerosis. Clin Rehabil 9:198–203.[Abstract/Free Full Text]

Koziol JA, Lucero A, Sipe JC, Romine JS, Beutler E. (1999) Responsiveness of the Scripps neurologic rating scale during a multiple sclerosis clinical trial. Can J Neurol Sci 26:283–9.[Web of Science][Medline]

Kurtzke JF. (1983) Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology 33:1444–52.[Abstract/Free Full Text]

Laman H and Lankhorst GJ. (1994) Subjective weighting of disability: an approach to quality of life assessment in rehabilitation. Disabil Rehabil 16:198–204.[Web of Science][Medline]

Lankhorst GJ, Jelles F, Smits RCF, Polman CH, Kuik DJ, Pfennings LE, et al. (1996) Quality of life in multiple sclerosis: the disability and impact profile (DIP). J Neurol 243:469–74.[CrossRef][Web of Science][Medline]

Liang MH. (1995) Evaluating measurement responsiveness. J Rheumatol 22:1191–2.[Web of Science][Medline]

Lydick E and Epstein RS. (1993) Interpretation of quality of life changes. Qual Life Res 2:221–6.[CrossRef][Web of Science][Medline]

Lyle RC. (1981) A performance test for assessment of upper limb function in physical rehabilitation treatment and research. Int J Rehabil Res 4:483–92.[Web of Science][Medline]

Mancuso CA and Peterson MG. (2004) Different methods to assess quality of life from multiple follow-ups in a longitudinal asthma study. J Clin Epidemiol 57:45–54.[CrossRef][Web of Science][Medline]

Marolf MV, Vaney C, Konig N, Schenk T, Prosiegel M. (1996) Evaluation of disability in multiple sclerosis patients: a comparative study of the functional independence measure, the extended Barthel index and the expanded disability status scale. Clin Rehabil 10:309–13.[Abstract/Free Full Text]

McGuigan C and Hutchinson M. (2004) The multiple sclerosis impact scale (MSIS-29) is a reliable and sensitive measure. J Neurol Neurosurg Psychiatry 75:266–9.[Abstract/Free Full Text]

Miller DM, Rudick RA, Cutter G, Baier M, Fischer JS. (2000) Clinical significance of the multiple sclerosis functional composite: relationship to patient-reported quality of life. Arch Neurol 57:1319–24.[Abstract/Free Full Text]

Norman GR, Stratford P, Regehr G. (1997) Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol 50:869–79.[CrossRef][Web of Science][Medline]

Noseworthy JH. (1994) Clinical scoring methods for multiple sclerosis. Ann Neurol 36:S80–5.[CrossRef][Web of Science][Medline]

Noseworthy JH, Vandervoort MK, Wong CJ, Ebers GC. (1990) Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation Multiple Sclerosis Study Group. Neurology 40:971–5.[Abstract/Free Full Text]

Ostelo RW and De Vet HC. (2005) Clinically important outcomes in low back pain. Best Pract Res Clin Rheumatol 19:593–607.[CrossRef][Medline]

Ottenbacher KJ, Hsu Y, Granger CV, Fiedler RC. (1996) The reliability of the functional independence measure: a quantitative review. Arch Phys Med Rehabil 77:1226–32.[CrossRef][Web of Science][Medline]

Patzold T, Schwengelbeck M, Ossege LM, Malin JP, Sindern E. (2002) Changes of the multiple sclerosis functional composite and EDSS during and after treatment of relapses with methylprednisolone in patients with multiple sclerosis. Acta Neurol Scand 105:164–8.[CrossRef][Web of Science][Medline]

Pfennings LE, Van der Ploeg HM, Cohen L, et al. (1999a) A health-related quality of life questionnaire for multiple sclerosis patients. Acta Neurol Scand 100:148–55.[Web of Science][Medline]

Pfennings LE, Van der Ploeg HM, Cohen L, Polman CH. (1999b) A comparison of responsiveness indices in multiple sclerosis patients. Qual Life Res 8:481–9.[CrossRef][Web of Science][Medline]

Poser CM, Paty DW, Scheinberg L, et al. (1983) New diagnostic criteria for multiple sclerosis: guidelines for research protocols. Ann Neurol 13:227–31.[CrossRef][Web of Science][Medline]

Riazi A, Hobart JC, Lamping DL, Fitzpatrick R, Thompson AJ. (2003) Evidence-based measurement in multiple sclerosis: the psychometric properties of the physical and psychological dimensions of three quality of life rating scales. Mult Scler 9:411–9.[Abstract/Free Full Text]

Rudick R, Antel J, Confavreux C, Cutter G, Ellison G, Fischer J, et al. (1996) Clinical outcomes assessment in multiple sclerosis. Ann Neurol 40:469–79.[CrossRef][Web of Science][Medline]

Schmitt JS and Di Fabio RP. (2004) Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria. J Clin Epidemiol 57:1008–18.[CrossRef][Web of Science][Medline]

Schunemann HJ, Griffith L, Jaeschke R, Goldstein R, Stubbing D, Guyatt GH. (2003) Evaluation of the minimal important difference for the feeling thermometer and the St. George's respiratory questionnaire in patients with chronic airflow obstruction. J Clin Epidemiol 56:1170–6.[CrossRef][Web of Science][Medline]

Schwid SR, Goodman AD, Apatoff BR, Coyle PK, Jacobs LD, Krupp LB, et al. (2000) Are quantitative functional measures more sensitive to worsening multiple sclerosis than traditional measures? Neurology 55:1901–3.[Abstract/Free Full Text]

Schwid SR, Goodman AD, McDermott MP, Bever CF, Cook SD. (2002) Quantitative functional measures in multiple sclerosis: what is a reliable change? Neurology 58:1294–6.[Abstract/Free Full Text]

Sharrack B and Hughes RA. (1999) The Guy's neurological disability scale (GNDS): a new disability measure for multiple sclerosis. Mult Scler 5:223–33.[Abstract/Free Full Text]

Solari A, Radice D, Manneschi L, Motti L, Montanari E. (2005) The multiple sclerosis functional composite: different practice effects in the three test components. J Neurol Sci 228:71–4.[CrossRef][Web of Science][Medline]

Stratford PW, Binkley FM, Riddle DL. (1996) Health status measures: strategies and analytic methods for assessing change scores. Phys Ther 76:1109–23.[Abstract/Free Full Text]

Stucki G, Liang MH, Fossel AH, Katz JN. (1995) Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol 48:1369–78.[CrossRef][Web of Science][Medline]

Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. (2003) On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res 12:349–62.[CrossRef][Web of Science][Medline]

Uitdehaag BMJ, Ader HJ, Roosma TJ, De Groot V, Kalkers NF, Polman CH. (2002) Multiple sclerosis functional composite: impact of reference population and interpretation of changes. Mult Scler 8:366–71.[Abstract/Free Full Text]

Van Bennekom CAM, Jelles F, Lankhorst GJ, Bouter LM. (1995) The rehabilitation activities profile: a validation study of its use as a disability index with stroke patients. Arch Phys Med Rehabil 76:501–7.[CrossRef][Web of Science][Medline]

Van Bennekom CAM, Jelles F, Lankhorst GJ, Bouter LM. (1996) Responsiveness of the rehabilitation activities profile and the Barthel Index. J Clin Epidemiol 49:39–44.[CrossRef][Web of Science][Medline]

Van der Lee JH, Beckerman H, Lankhorst GJ, Bouter LM. (2001) The responsiveness of the action research arm test and the Fugl-Meyer assessment scale in chronic stroke patients. J Rehabil Med 33:110–3.[CrossRef][Web of Science][Medline]

Van der Putten JJ, Hobart JC, Freeman JA, Thompson AJ. (1999) Measuring change in disability after inpatient rehabilitation: comparison of the responsiveness of the Barthel index and the functional independence measure. J Neurol Neurosurg Psychiatry 66:480–4.[Abstract/Free Full Text]

Van der Windt DA, Van der Heijden GJ, De Winter AF, Koes BW, Devillé W, Bouter LM. (1998) The responsiveness of the shoulder disability questionnaire. Ann Rheum Dis 57:82–7.[Abstract/Free Full Text]

Vaney C, Vaney S, Wade DT. (2004) SaGAS, the short and graphic ability score: an alternative scoring method for the motor components of the multiple sclerosis functional composite. Mult Scler 10:231–42.[Abstract/Free Full Text]

Vickrey BG, Hays RD, Harooni R, Myers LW, Ellison GW. (1995) A health-related quality of life measure for multiple sclerosis. Qual Life Res 4:187–206.[CrossRef][Web of Science][Medline]

Whitaker JN, Mcfarland HF, Rudge P, Reingold SC. (1995) Outcomes assessment in multiple sclerosis clinical trials: a critical analysis. Mult Scler 1:37–47.[Medline]

Wyrwich KW, Tierney WM, Wolinsky FD. (1999) Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol 52:861–73.[CrossRef][Web of Science][Medline]

Zeger SL and Liang KY. (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42:121–30.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
NeurologyHome page
J. F. Foley and D. W. Brandes
Redefining functionality and treatment efficacy in multiple sclerosis
Neurology, June 9, 2009; 72(23_Supplement_5): S1 - S11.
[Abstract] [Full Text] [PDF]


Home page
Mult SclerHome page
C Heesen, J Bohm, C Reich, J Kasper, M Goebel, and S. Gold
Patient perception of bodily functions in multiple sclerosis: gait and visual function are the most valuable
Multiple Sclerosis, August 1, 2008; 14(7): 988 - 991.
[Abstract] [PDF]


Home page
ptjournalHome page
J. Paltamaa, T. Sarasoja, E. Leskinen, J. Wikstrom, and E. Malkia
Measuring Deterioration in International Classification of Functioning Domains of People With Multiple Sclerosis Who Are Ambulatory
Physical Therapy, February 1, 2008; 88(2): 176 - 190.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
129/10/2648    most recent
awl223v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by de Groot, V.
Right arrow Articles by Bouter, L. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by de Groot, V.
Right arrow Articles by Bouter, L. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?