Brain, Vol. 122, No. 1, 141-159,
January 1999
© 1999 Oxford University Press
The psychometric properties of clinical rating scales used in multiple sclerosis
1 Department of Neurology, UMDS, Guy's Hospital, London and 2 Department of Biomedical Statistics, School of Epidemiology and Health Sciences, University of Manchester, Manchester, UK
Correspondence to:
Professor Richard A. C. Hughes, Department of Neurology, Medical School Building, Guy's Hospital, London Bridge, London SE1 9RT, UK
E-mail: r.hughes{at}umds.ac.uk
| Abstract |
|---|
|
|
|---|
Many clinical rating scales have been proposed to assess the impact of multiple sclerosis on patients, but only few have been evaluated formally for reliability, validity and responsiveness. We assessed the psychometric properties of five commonly used scales in multiple sclerosis, the Expanded Disability Status Scale (EDSS), the Scripps Neurological Rating Scale (SNRS), the Functional Independence Measure (FIM), the Ambulation Index (AI) and the Cambridge Multiple Sclerosis Basic Score (CAMBS). The score frequency distributions of all five scales were either bimodal (EDSS and AI) or severely skewed (SNRS, FIM and CAMBS). The reliability of each scale depended on the definition of `agreement'. Inter- and intra-rater reliabilities were high when `agreement' was considered to exist despite a difference of up to 1.0 EDSS point (two 0.5 steps), 13 SNRS points, 9 FIM points, 1 AI point and 1 point on the various CAMBS domains. The FIM, AI, and the relapse and progression domains of the CAMBS were sensitive to clinical change, but the EDSS and the SNRS were unresponsive. The validity of these scales as impairment (SNRS and EDSS) and disability (EDSS, FIM, AI and the disability domain of the CAMBS) measures was established. All scales correlated closely with other measures of handicap and quality of life. None of these scales satisfied the psychometric requirements of outcome measures completely, but each had some desirable properties. The SNRS and the EDSS were reliable and valid measures of impairment and disability, but they were unresponsive. The FIM was a reliable, valid and responsive measure of disability, but it is cumbersome to administer and has a limited content validity. The AI was a reliable and valid ambulation-related disability scale, but it was weakly responsive. The CAMBS was a reliable (all four domains) and responsive (relapse and progression domains) outcome measure, but had a limited validity (handicap domain). These psychometric properties should be considered when designing further clinical trials in multiple sclerosis.
multiple sclerosis; clinical rating scales; reliability; validity; responsiveness
AI = Ambulation Index; CAMBS = Cambridge Multiple Sclerosis Basic Score; EDSS = Expanded Disability Status Scale; FIM = Functional Independence Measure; FS = Functional Systems; SNRS = Scripps Neurological Rating Scale
| Introduction |
|---|
|
|
|---|
Clinical rating scales allow the physician to classify patients according to their degree of impairment, disability, handicap or quality of life, assist in predicting the course of the illness, and provide tools to monitor the response to experimental treatments. The value of any scale depends on its clinical usefulness and scientific integrity. A clinically useful scale should be acceptable to patients and health care professionals, practical to administer and cost efficient. Such an instrument should also be scientifically sound in terms of three basic psychometric properties: reliability, validity and responsiveness (Streiner and Norman, 1995
Over the last 40 years, >15 different clinical rating scales have been devised and used in multiple sclerosis research. The scales used most commonly are Kurtzke's Expanded Disability Status Scale (EDSS) and its related Functional Systems (FS) (Kurtzke, 1983
), the Scripps Neurological Rating Scale (SNRS) (Sipe et al., 1984
) and the Ambulation Index (AI) (Hauser et al., 1983
). The generic Functional Independence Measure (FIM) (Hamilton et al., 1987
; Keith et al., 1987
) and the Cambridge Multiple Sclerosis Basic Score (CAMBS) (Mumford and Compston, 1993
) have also been proposed as potentially useful clinical scales (Noseworthy, 1994
).
The EDSS is the best known and the most widely used scale. It combines impairment and disability in a 20-step ordinal scale, which ranges between 0 (normal status) and 10 (death due to multiple sclerosis), by incorporating the scores of eight FS scales with the patient's ability to ambulate, use their upper limbs, communicate and swallow. The EDSS has a bimodal distribution in cross-sectional studies of large series of patients (Willoughby and Paty, 1988
; Goodkin et al., 1989
; Rodriguez et al., 1994
), a fair to substantial inter-rater reliability (kappa coefficient 0.320.76) (Amato et al., 1988
; Noseworthy et al., 1990
; Francis et al., 1991
) and moderate to almost perfect intra-rater reliability of its lower (1.03.5) grades (frequency of perfect agreement 5060%, intra-class correlation coefficient 0.880.96) (Goodkin et al., 1992
). The SNRS is a 22-item ordinal impairment scale which converts the neurological examination into a numerical score using a three-level scoring system, with scores ranging between 100 (normal status) and 0 (worst possible score). This scale is reported to have a near normal distribution, high inter- and intra-rater reliabilities (weighted kappa coefficients of 0.83 and 0.98, respectively) and a modest degree of sensitivity to clinical change (Koziol et al., 1996
). The AI is a semi-quantitative scale which converts ambulation-related disability into an ordinal scale. It specifies 10 grades ranging between 0 (normal status) and 9 (wheelchair-bound and unable to transfer independently), has a bimodal distribution (Swingler and Compston, 1992
) and a moderate to substantial inter-rater reliability (kappa coefficient 0.50.7) (Francis et al., 1991
). The FIM is an 18-item ordinal scale which rates the level of assistance required to perform various activities of daily living using a seven-level scoring system, with scores ranging between 124 (normal status) and 18 (totally dependent). It has a high internal consistency (Cronbach's alpha 0.940.95), high sum score inter-rater reliability (intra-class correlation coefficient of 0.830.96), high motor and cognitive domain intra-rater reliability (intra-class correlation coefficient of 0.950.84), variable item score inter- and intra-rater reliabilities (kappa coefficient of 0.140.98), weak sensitivity to clinical change and high correlation with the EDSS and the burden of care (Granger et al., 1990
; Hall et al., 1993
; Brosseau and Wolfson, 1994
; Hobart et al., 1996a
, c
). The CAMBS is an ordinal scale which rates the individual contributions of disability, relapse, disease progression and handicap by using four separate sub-scales with a five-level scoring system. This scale is reported to have a near normal distribution (disability and handicap domains), moderate inter-rater reliability (combined score: kappa coefficient <0.66) and high correlation between its disability domain and the EDSS, and between its handicap domain and the Nottingham Health Profile (Hunt et al., 1981
; Mumford and Compston, 1993
).
It is surprising that despite the wide use of these scales in clinical research, data related to their psychometric properties remain incomplete. Such data are of paramount importance for assessing the results of previous clinical trials and for the design of future trials. This study was designed to assess the reliability, construct validity and responsiveness of the five commonly used scales in multiple sclerosis research. Face and content validity of these scales have already been reviewed in a previous publication (Sharrack and Hughes, 1996
).
| Patients and methods |
|---|
|
|
|---|
This study was performed in the multiple sclerosis research clinic at Guy's Hospital, London, and was approved by the ethics committee of the local health authority. All subjects consented to take part in the study, and were only recruited if they had clinically or laboratory definite relapsingremitting or secondary progressive multiple sclerosis.
Inter-rater reliability study
Sixty-four adult patients were recruited for this study. These patients consisted of a cohort of 50 patients attending a multiple sclerosis research clinic and 14 patients in long-term residential care. Patients were assessed by three raters, two neurologists and a neurology research nurse, who were familiar with the clinical scoring scales used in multiple sclerosis from experience in previous clinical trials and teaching workshops. To standardize the methods of applying the various scales by the three raters, training sessions were conducted prior to the beginning of this study, during which 10 subjects were examined and scored jointly. Each patient in this study was assessed in the same session independently by the three raters. All patients were allocated scores on the EDSS, FS, SNRS and the AI by the two assessing neurologists, and scores on the FIM and the CAMBS (50 patients only) by one neurologist and the neurology research nurse.
Intra-rater reliability and responsiveness study
The same three raters followed a cohort of 50 multiple sclerosis patients attending the Guy's Hospital multiple sclerosis research clinic for 9 months, with assessments every 3 months. During each visit, patients were asked to compare their clinical condition with how they felt on the previous occasion, and indicate whether their condition had since worsened, remained stable or improved. At the same time, one of the neurologists, who had assessed the patients on the previous occasions, subjectively designated their clinical status as worse, stable or better. Patients' overall status were later classified as stable, improved or worsened, if both the patients' and the neurologist's assessments were identical, indicating no change, improvement or worsening, respectively. Patients' overall status were otherwise designated as `uncertain', and all related data were not included in the final analysis. Patients were also asked to complete the EuroQol health-related quality of life questionnaire (EuroQol Group, 1990), and were assigned scores on the EDSS, SNRS and AI by one neurologist and scores on the FIM, CAMBS and the Barthel Index (Mahoney and Barthel, 1965
) by the other neurologist. In the absence of a gold standard for assessing clinical `stability' and `change', and in accordance with the methodology of previous studies, intra-rater reliability was tested on the pairs of assessments between which patient's overall status were judged to have remained stable, whereas responsiveness was tested on the pairs of assessment between which they had changed (improved or worsened) (Deyo et al., 1991
; Ellison et al., 1993
).
Validity study
The validity of the five scales was assessed in the same cohort of 50 patients who took part in the intra-rater and responsiveness study described above. During their third visit, all patients were asked to complete the London Handicap Scale (Harwood et al., 1994
) and the Short Form 36 health survey questionnaire (SF-36) (Garratt et al., 1993
), and were ranked by one neurologist according to their ability to work, do their housework and look after themselves. They were also ranked independently by two raters (a neurologist and a research nurse) according to their subjectively perceived degree of disability. Convergent and discriminant construct validity were tested by assessing the degree to which each scale in this study correlated with the other four scales, and with other measures of disability (Barthel Index), handicap (the London Handicap Scale) and health-related quality of life (SF-36). Group differences construct validity was assessed by testing the extent to which the scores of these scales correlated with the severity of disability as judged by the two raters. Hypothesis testing construct validity was assessed by testing the hypothesis that scores on any impairment, disability or handicap scale should be more abnormal in patients who were unable to work or do their housework because of multiple sclerosis, and in patients who were dependent on others for some or all of their activities of daily living.
Blinding
The majority (78%) of the patients in the inter-rater reliability study and all the patients in the intra-rater reliability, responsiveness and validity studies were taking part in a double-blind therapeutic trial in which the two neurologists were the `examining' and the `treating' physicians, and the research nurse was the `trial coordinator'. In this trial, the `examining' neurologist was responsible for assessing the patients' relapse status and assigning scores on the various clinical scales, the `treating' neurologist was responsible for the overall medical management of the patients, and the research nurse was responsible for the administrative aspects of the study. To comply with the required blinding for both the ongoing therapeutic trial and the current study, the raters refrained from discussing the patients' clinical conditions amongst themselves, and none of them had access to their own or the other raters' previous scores, which were kept separate from the patients' clinical records. To reduce the effect of patients' bias on the inter-rater reliability which may result from practice effect or fatigue, no fixed order for the examination of the patients by each rater was observed. Data for the intra-rater reliability study were collected at 3 monthly intervals to reduce assessors' and patients' bias, which may result from recall of previous assessments.
Statistical analysis
Data were tabulated and analysed using SPSS 7.5 for Windows. Two-tailed tests were used for all statistical analyses. The EDSS, SNRS, FIM, AI, CAMBS, Barthel Index and the disability ranks were treated as ordinal scales, whereas the London Handicap Scale and the SF-36 were treated as interval scales. Reciprocal, logarithmic and square root transformations of the ordinal and skewed data were performed and found to be unhelpful. Descriptive statistics were used to describe the study population in terms of demography, disease characteristics and score distributions. Inter- and intra-rater reliabilities were assessed with kappa coefficient (chance-corrected perfect agreement reliability) (Cohen, 1960
) and intra-class correlation coefficient (partial agreement corrected reliability) (Shrout and Fleiss, 1979
). The scores of these two coefficients were interpreted conventionally as: <0 poor agreement, 00.20 slight agreement, 0.210.40 fair agreement, 0.410.60 moderate agreement, 0.610.80 substantial agreement and 0.811 almost perfect agreement (Landis and Koch, 1977
). As reliability estimates are population dependent, 95% confidence intervals were constructed for both kappa [1.96 times the standard error of kappa (Norman and Streiner, 1993
)] and intra-class correlation coefficients [using the method developed by Fleiss and Schrout (1978)]. The reliability was also expressed, following the work of Bland and Altman (1986), as the mean and 95% confidence intervals of inter- and intra-rater score differences to estimate rater bias and the repeatability coefficient (1.96 times the standard deviation of the score difference). The latter coefficient, which has been used by the British Standard Institute (1979) as a measure of the reliability of scientific measurements, is an indication of the maximum score difference required to achieve 95% rater agreement. The use of this method is appropriate for ordinal data since score differences are likely to have a normal distribution (Bland and Altman, 1986
). Internal consistency of the two multidimensional scales (SNRS and FIM) was assessed using Cronbach's coefficient alpha (Cronbach, 1951
). Scale items were considered to be `homogeneous' if Cronbach's alpha was above 0.70 but not higher that 0.90 (Streiner and Norman, 1995
). The individual contribution of the various scale items to the sum score of the two multidimensional scales was assessed using factor analysis (Norman and Streiner, 1993
). Responsiveness was assessed using Wilcoxon Signed Ranks test and effect size (Kazis et al., 1989
) calculated by dividing the difference between the scores of the first and the second assessments by the standard deviation of the first assessment scores. Effect size values of 0.20.49 were defined conventionally as small, those of 0.50.79 as moderate and those of 0.8 and greater as large (Cohen, 1977
). Construct validity was assessed using Pearson's and Spearman rank correlation coefficients for interval and ordinal scales, respectively. Correlation coefficients of 0.350.49 were interpreted empirically as weak, those of 0.50.79 as moderate and those of 0.8 or greater as strong.
| Results |
|---|
|
|
|---|
Reliability
Inter-rater reliability
Sixty-four patients with a wide spectrum of disabilities, ranging from being asymptomatic to being bedridden and completely dependent, were recruited for this study. The group consisted of 42 women and 22 men with a median age 40 years (range 2274 years) and median disease duration of 13 years (range 235 years). Inter-rater reliability of the CAMBS was assessed on a sub-group of 50 patients (31 women and 19 men) with a median age of 36 years (range 2451), median EDSS score of 4.5 (range 07.5) and a median disease duration of 12 years (range 217).
EDSS.
The median (range) scores of the two raters were identical at 5.5 (09.5). The frequency distribution of the two score sets was bimodal, with fewer patients scoring at EDSS 4.0 and 7.0 (Fig. 1A
).
|
Inter-rater agreement on the different FS scores was variable, with kappa coefficients ranging between 0.41 and 0.67 (moderate to substantial), intra-class correlation coefficients ranging between 0.81 and 0.95 (almost perfect), and repeatability coefficients ranging between 1.2 and 1.6 points (Table 1
0.5 point (one 0.5 EDSS step),
1.0 point (two 0.5 EDSS steps) and
1.5 points (three 0.5 EDSS steps), respectively (Fig. 2A
|
|
SNRS.
The median (range) scores of the two raters were similar at 69.5 (0100) and 67 (0100). The frequency distribution of the two score sets was positively skewed to the `normal' end of the scale, with a smaller cluster at the `severely impaired' end of the scale (Fig. 1B
5 points,
10 points,
15 points and
19 points, respectively (Fig. 2B
|
FIM.
The median (range) scores of the two raters were almost identical at 119 (18126) and 119 (27126). The frequency distribution of the two score sets was positively skewed to the `normal' end of the scale, with a smaller cluster at the `severely disabled' end of the scale (Fig. 1C
5 points,
9 points and
13 points, respectively (Fig. 2C
|
AI.
The median (range) scores of the two raters were similar at 2 (09) and 3 (09). The frequency distribution of the two score sets was bimodal, with more patients scoring at the `normal' end of the scale and fewer patients scoring between 7 and 8 (Fig. 1D
1 point (Fig. 2D
|
CAMBS.
The median (range) scores of the two assess-ments for the scale's four domains were similar at: disability 2 (14) and 3 (14), relapse 1 (13) and 1 (14), progression 1 (13) and handicap 2 (14). The frequency distribution of the two relapse and progression domain score sets was negatively skewed to the `normal' end of the scales (Fig. 1F and G
Raters' bias.
With the exception of the cranial nerves item, bladder, bowel, and sexual item, and some of the motor and the cerebellar items of the SNRS, the mean score differences between the two raters were generally small, with narrow 95% confidence intervals which included the `0' value, indicating the absence of raters' bias.
Intra-rater reliability study
Thirty-five patients had remained stable between two visits on at least one occasion during the 9 months follow-up period. To avoid introducing any statistical bias, only one pair of assessments (the first) per patient was included in the final analysis. This cohort consisted of 20 women and 15 men with a median age of 38 years (range 2451 years) and median disease duration of 11 years (217 years). To compensate for the design of the relapse and the progression domains of the CAMBS (which have been devised to assess disease stability over a time longer than 3 months), intra-rater reliability of these two domains was assessed in a sub-group of 23 patients after excluding all patients who had any relapses during the 9 months before the first assessment (nine patients) or between the first and the second assessments (three patients; all had mild relapses from which they recovered completely).
EDSS.
The median (range) scores of the two assessments were identical at 4.5 (07.5). Intra-rater agreement on the different FS scores was variable, with kappa coefficients ranging between 0.42 and 0.66 (fair to substantial), intra-class correlation coefficients ranging between 0.67 and 0.92 (substantial to almost perfect) and repeatability coefficients ranging between 1.3 and 1.8 points (Table 1
). The largest score differences between the two assessments were 2 points for the pyramidal, sensory, bladder and bowel, and mental FS, and 3 points for the cerebellar, brainstem and visual FS. Intra-rater agreement on the EDSS scores was 63, 89 and 100% when agreement was defined as no difference, a difference of
0.5 point (one EDSS step) and
1.0 point (two 0.5 EDSS steps), respectively (Fig. 3A
),
with a repeatability coefficient of 0.8 point, a kappa coefficient of 0.7 (substantial) and an intra-class correlation coefficient of 0.99 (almost perfect) (Table 1
).
|
SNRS.
The median (range) scores for the two assessments were very similar at 73 (3398) and 71 (3498). Intra-rater agreement on the different scale items was variable, with kappa coefficients ranging between 0.33 and 0.75 (fair to substantial), intra-class correlation coefficients ranging between 0.52 and 0.92 (moderate to almost perfect) and repeatability coefficients ranging between 0.8 and 4.7 points (Table 2
5 points,
10 points and
14 points, respectively (Fig. 3B
FIM.
The median (range) scores of the two assessments were identical at 123 (90126). Intra-rater agreement on the different scale items was variable, with kappa coefficients ranging between 0.55 and 1 (moderate to perfect), intra-class correlation coefficients ranging between 0.60 and 1 (substantial to perfect) and repeatability coefficients ranging between 0 and 2.2 points (Table 3
). Intra-rater agreement on the sum scores was 37, 92 and 100% when agreement was defined as no difference, a difference of
5 points and
9 points, respectively (Fig. 3C
), with a repeatability coefficient of 6.1 points and an intra-class correlation coefficient of 0.94 (almost perfect) (Table 3
).
AI.
The median (range) scores of the two assessments were identical at 2 (08). Intra-rater agreement was 66, 94, 97 and 100% when agreement was defined as no difference, a difference of
1 points,
2 points and
3 points, respectively (Fig. 3D
), with a repeatability coefficient of 1.5 points, a kappa coefficient of 0.59 (moderate) and an intra-class correlation coefficient of 0.93 (almost perfect) (Table 4
).
CAMBS.
The median (range) scores of the two assessments for the scale's four domains were identical at: disability 2 (14), relapse 1 (13), progression 1 (13) and handicap 2 (14). Intra-rater agreement on the different domains was very high, with kappa coefficients ranging between 0.58 and 0.80 (moderate to substantial), intra-class correlation coefficients ranging between 0.71 and 0.85 (substantial to almost perfect) and repeatability coefficients ranging between 0.8 and 1.4 points (Table 4
). The largest score differences between the two assessments were 1 point for the disability domain, 2 points for the relapse domain, 1 point for the progression domain and 2 points for the handicap domain (Fig. 3E
H).
Raters' bias.
With the exception of the lower cranial nerves item of the SNRS, the mean score differences between the two raters were generally small, with narrow 95% confidence intervals which included the `0' value, indicating the absence of raters' bias.
Internal consistency and factor analysis
Internal consistency and factor analysis of the two multidimensional scales assessed in this study (SNRS and FIM) were evaluated using the inter-rater reliability study data. Internal consistency was very high, with Cronbach's alpha of 0.92 for the SNRS and 0.98 for the FIM. Factor analysis of SNRS suggested a five factor solution which accounted for 79.3% of the total variance (cumulative percentage of 52.8, 63.0, 69.7, 74.8 and 79.3%; eigenvalues of 11.6, 2.2, 1.5, 1.1, 1, respectively). The first factor of the rotated matrix (cerebellar factor) correlated with the `upper and lower limb cerebellar' (the latter also correlated with the fourth factor), `eye movements', `lower cranial nerves' and `nystagmus' items. The second factor (cerebral/visual/upper limb motor factor) correlated with the `mentation and mood', `visual acuity', `fields/discs/pupils', `upper limb motor' and `reflexes' items. The third factor (sensory factor) correlated with the `upper and lower limb sensory' (the latter also correlated with the fourth factor) items. The fourth factor (lower limb/spinal factor) correlated with the `lower limb motor', `lower limb cerebellar', `gait' and `bladder, bowel, and sexual function' items. The fifth factor (Babinski factor) correlated with the `Babinski reflex' item only. Factor analysis of the FIM suggested a two factor solution which accounted for 89.4% of the total variance (cumulative percentage of 83 and 89.4%; eigenvalues of 14.9 and 1.2, respectively). The first factor of the rotated matrix (motor factor) correlated with the `motor' items of the scale (items AM), and the second factor (cognitive factor) correlated with the `communication' and `social cognition' items (items NR).
Responsiveness
Of the 50 patients assessed, 25 were found to have changed on at least one occasion during the 9 month follow-up period. This group consisted of 20 women and five men with a median age of 36 years (range 2451 years), median EDSS of 5.5 (range 07.5) and a median disease duration of 10 years (range 222 years). To avoid introducing any statistical bias, only one pair of assessments (the first) per patient was included in the final analysis. The order of assessment in each pair (15 patients worsened, and 10 improved) was later re-arranged so as to make all the changes of one direction (stable or improved to worsened). Patients' subjective assessments of their own health status using the EuroQol visual analogue scale were moderately sensitive to clinical change (effect size 0.55, P < 0.001). Similar subjective assessment by the assessing neurologist using the EuroQol visual analogue scale was weakly sensitive to clinical change (effect size 0.36, P < 0.001).
EDSS.
The EDSS was not sensitive to clinical change (effect size 0.11, P = 0.051). Most of the FS were also unresponsive except for the mental FS which was weakly responsive (effect size 0.38, P = 0.012) mainly on account of mood changes (Table 5
).
|
SNRS.
The SNRS sum score was unresponsive (effect size 0.17, P = 0.253). The individual scale items were also unresponsive except for the mentation and mood item which was weakly sensitive to clinical change (effect size 0.36, P = 0.043) (Table 6
|
FIM.
The FIM sum score was weakly sensitive to clinical change (effect size 0.46, P < 0.001). Many `motor' items (eating, grooming, sphincter control, bed and toilet transfer, and locomotion) were also weakly to moderately responsive (effect size 0.250.67, P = 0.0440.039), but none of the cognitive items were responsive (Table 7
|
AI.
This scale was weakly sensitive to clinical change (effect size 0.20, P = 0.039) (Table 8
|
CAMBS.
The relapse and progression domains were moderately sensitive to clinical change (effect size 0.67, P = 0.001 and 0.78, P < 0.001, respectively), whereas the disability domain was only weakly responsive (effect size 0.39, P = 0.008) and the handicap domain was unresponsive (effect size 0.14, P = 0.206) (Table 8
To assess the responsiveness of the five scales at different levels of disease severity, the patients were categorized into one of three levels of disease severity according to their baseline EDSS scores: mild (EDSS 0.04.5), moderate (EDSS 5.06.0) and severe (EDSS 6.57.5). Sub-group analysis showed the responsiveness of these scales within each band of disease severity to be similar to the results of the whole group.
Validity
Convergent validity was assessed on the same cohort of 50 patients who took part in the intra-rater reliability and responsiveness study. The group consisted of 31 women and 19 men, with a median age of 36 years (range 2451 years), median EDSS score of 4.5 (range 07.5) and a median disease duration of 12 years (range 217 years).
Convergent and discriminant validity
The Barthel Index correlated highly with the FIM (r = 0.88), and moderately with the EDSS (r = 0.74), SNRS (r = 0.69), AI (r = 0.72), and the disability and the handicap domains of the CAMBS (r = 0.69 and 0.61, respectively) (Table 9
).
In comparison, the London Handicap Scale correlated moderately with the EDSS (r = -0.69), SNRS (r = 0.71), AI (r = 0.72), and the disability and the handicap domains of the CAMBS (r = 0.59 and 0.65, respectively), and weakly with the FIM (r = 0.43) (Table 9
). The physical functioning item of the SF-36 correlated highly with the EDSS (r = 0.82), SNRS (r = 0.82), FIM (r = 0.88) and AI (r = 0.87), and moderately with the disability and the handicap domains of the CAMBS (r = 0.71 and 0.65, respectively) (Table 9
). Slightly weaker, but statistically significant, correlations were also found between the SF-36 physical role limitation item and the EDSS (r = 0.50), SNRS (r= 0.46), FIM (r = 0.36), AI (r = 0.52) and the handicap domain of the CAMBS (r = 0.54); the SF-36 general health perception item and the EDSS (r = 0.47), SNRS (r = 0.44), FIM (r = 0.41), AI (r = 0.38) and the handicap domain of the CAMBS (r = 0.39); the SF-36 social functioning item and the EDSS (r = 0.47), SNRS (r = 0.37), FIM (r = 0.43), AI (r = 0.42), and the disability and the handicap domains of the CAMBS (r = 0.33 and 0.53, respectively); the SF-36 vitality item and the EDSS (r = 0.41), SNRS (r = 0.36), FIM (r = 0.38), AI (r = 0.39), and the disability and the handicap domains of the CAMBS (r = 0.45 and 0.48, respectively); and the SF-36 bodily pain item and the FIM (r = 0.34) (Table 9
).
|
The correlation between the five scales assessed in the study is reported in Table 10
|
Group differences and hypothesis testing
The two disability rank lists, which were compiled by one of the neurologists and the research nurse, were almost identical (r = 0.99). All five scales, particularly the SNRS, correlated highly with the mean ranks of disability (Table 11
|
| Discussion |
|---|
|
|
|---|
Multiple sclerosis is a multifaceted disease characterized by a wide variability of clinical manifestations and natural history. Clinical rating scales used in this illness require relevant scale items, need to be able to embrace the whole range of affected domains and should have high levels of reliability, validity and responsiveness. Face and content validity of the currently existing scales have already been addressed by many researchers and reviewed by us (Sharrack and Hughes, 1996
EDSS
As reported by other researchers (Willoughby and Paty, 1988
; Goodkin et al., 1989
; Koziol et al., 1996
), we found the frequency distribution of the EDSS scores to be bimodal, with relative paucity of the middle scores. This bimodality is unlikely to have been artefactual, despite the relatively small number of patients in this study, given the concordance between our findings and those obtained in cross-sectional studies of large population-based incident cohorts (Rodriguez et al., 1994
; Midgard et al., 1996
). Inter- and intra-rater reliabilities of the FS scores were comparably high. Similarly to the previously reported studies, a difference of 2 points on the various FS scales achieved 97100% rater agreement (Amato et al., 1988
; Noseworthy et al., 1990
; Goodkin et al., 1992
). Inter- and intra-rater reliabilities of the EDSS scores were equally high. Complete intra-rater and 96% inter-rater agreements were obtained by allowing a difference of 1.0 point (two 0.5 EDSS steps). These inter-rater reliability results are generally in accordance with the previously reported studies (Amato et al., 1988
; Noseworthy et al., 1990
; Goodkin et al., 1992
), although Francis et al. (1991) reported the score difference between the two raters to vary between 2.0 and 4.0 points (four to eight 0.5 EDSS steps) in 10% of cases. Compared with our results, Goodkin et al. (1992) reported high intra-rater reliability in a group of 10 patients with EDSS scores of 13.5, with complete intra-rater agreement obtained by allowing a difference of 0.5 points (one EDSS step). The discrepancy between these results and ours is likely to be due to the differences in the level of disease severity between the two cohorts and to the time between the first and the second assessments. Assessments in the Goodkin study were done on the same day, and a practice effect cannot, therefore, be totally excluded, whereas our assessments were separated by 3 months. Although part of the difference between the first and the second assessments' scores in our study might have been due to a real but unreported change in the patients' clinical status, patients' variability between the two assessments was greatly minimized by including only those patients in whom clinical `stability' was reported by both the patient and the rater.
The EDSS and its associated FS, with the exception of the mental FS, were insensitive to clinical change. The responsiveness of the EDSS has not been assessed previously in a manner exactly comparable with that used in our study. However, Ellison et al. (1993) found the Disability Status Scale (the previous version of the EDSS) to be insensitive to worsening of patient's clinical status as judged by the treating neurologist, and Hobart et al. (1996b) found the EDSS to be unresponsive in a group of 64 patients with moderate to severe disability (EDSS 5.09.0). The face validity of the EDSS as a measure of combined impairment and disability was confirmed by its high correlation with the SNRS (particularly at the lower EDSS grades), the FIM, patients' disability ranks, patients' self-assessment of disability using the physical functioning domain of the SF-36, and its moderate correlation with the Barthel Index. Similar high correlation with the physical functioning domain of the SF-36 has been reported recently by Rothwell et al. (1997), but sub-group analysis of the patients with EDSS scores of 5.07.5 in our cohort failed to replicate the high correlation between the EDSS and the Barthel Index reported by Hobart et al. (1996b) in 66 patients with moderate to severe disability (EDSS 5.09.0). As expected for any impairment/disability scale, the EDSS correlated moderately with measures of handicap and quality of life, and with patients' ability to work and do their housework.
SNRS
Contrary to the report of Koziol et al. (1996) that the SNRS has a near normal frequency distribution, we found the SNRS scores to be skewed to the `normal' end of the scale, with an additional cluster at the `severely impaired' end of the scale, suggesting both `floor' and `ceiling' effects. This discrepancy might, at least partially, be due to the differences in the range of disease severity, as assessed by the EDSS, between the two studies (2.08.0 in the Koziol study and 09.5 in our study). The internal consistency of the scale items was found to be surprisingly very high given the multidimensional nature of the scale, suggesting a degree of item redundancy (Streiner and Norman, 1995
). Factor analysis showed the scale items, with the exception of the lower limb sensory and cerebellar items, to have segregated into five relatively `meaningful' factors which explained most of the variance. Inter- and intra-rater reliabilities of the different SNRS items were variable, ranging between poor and substantial, depending on the definition of agreement. Although complete inter- and intra-rater agreements were only obtained by allowing a difference of 19 and 14 points, respectively, partial agreement corrected reliability of the sum scores was high. A difference of 10 points only achieved 76% intra-rater and 85% inter-rater agreement, but a difference of 13 points achieved >95% inter- and intra-rater agreement. These reliability results are similar to those published from the Scripps clinic (Koziol et al., 1996
).
Despite this high reliability, neither the SNRS sum score nor any of its items were sensitive to clinical change. The only exception was the mentation and mood item, which was responsive mainly on account of mood changes. The responsiveness of the SNRS was assessed in one previous study (Koziol et al., 1996
), in which score changes on this scale were found to be more gradual in comparison with those on the EDSS. Although direct comparison between this study and ours is not possible because of the methodological differences, we found both these scales to be unresponsive. The face validity of the SNRS as an impairment measure was supported by its high correlation with the EDSS. Surprisingly, the SNRS correlated highly with patients' disability ranks, the FIM, the disability domain of the CAMBS and the physical functioning domain of the SF-36, but, as expected for an impairment scale, only moderately with the Barthel Index, patients' ability to work and do their housework, and other measures of handicap and quality of life.
FIM
The FIM scores were severely skewed to the `normal' end of the scale, with a smaller cluster at the `severely disabled' end of the scale, suggesting both `ceiling' and `floor' effects. To our knowledge, the frequency distribution of the FIM scores has never been assessed comprehensively in a multiple sclerosis population before. As reported by Brosseau and Wolfson (1994), we found the internal consistency of the scale items to be surprisingly very high given the multidimensional nature of this scale, suggesting a degree of item redundancy (Streiner and Norman, 1995
). Factor analysis suggested two factors which segregated the `mobility' and the `cognitive' items of the scale and accounted for most of the variance (although the latter accounted for only 6.4% of the total variance). Inter- and intra-rater reliabilities of the mobility items were generally comparably high. In comparison, the cognitive items were more reliable when applied by the same rater, reflecting their ambiguity and the lack of precision in differentiating between their different grades. Although complete inter- and intra-rater agreements were only obtained by allowing a difference of 13 and 9 points, respectively, partial agreement corrected reliability of the FIM sum scores was high, and a difference of 9 points achieved >95% inter-rater agreement. These findings are consistent with other published reliability results in which the inter-rater reliability for the sum scores was found to be comparably high, and the inter-rater reliability of the mobility items to be higher than that of the cognitive items (Hall et al., 1993
; Brosseau and Wolfson, 1994
). There are no published data addressing the intra-rater reliability of this scale. As reported by others (Hall et al., 1993
), we found the responsiveness of the FIM sum score and many of its mobility items to be high, thereby supporting the usefulness of this scale in clinical trials of multiple sclerosis. The face validity of the FIM as a disability scale was supported by the high correlation between this scale and other disability scales, particularly the EDSS, the Barthel Index (which is not surprising given the generic similarities between the two scales), the disability domain of the CAMBS, patients' self-assessment of disability using the physical functioning domain of the SF-36 and patients' disability ranks, and its moderate correlation with the AI. As expected for any disability scale, the FIM correlated moderately with other handicap and quality of life scales, but surprisingly highly with the SNRS.
The usefulness of the FIM in clinical trials of multiple sclerosis should be considered in the light of its limited content validity (Sharrack and Hughes, 1996
). This scale is not comprehensive for the potential disabilities which could occur in this illness, as it does not rate visual, speech, swallowing, affective or sexual disabilities. It is our impression that the cognitive items of the FIM are the least useful part of this scale in the context of multiple sclerosis as they have a poor inter-rater reliability, are unresponsive and explain only 6.4% of the total variance (factor analysis: mental factor). The FIM is also a somewhat cumbersome scale which requires reference to a 48-page instruction book and training for its application.
AI
As reported by other workers (Goodkin et al., 1989
; Swingler and Compston, 1992
), the frequency distribution of the AI scores was found to be bimodal, with relative paucity of scores 7 and 8. Rater reliability of this scale was very high. Complete inter-rater and 94.3% intra-rater agreement was obtained by allowing a difference of a single point. These results are similar to our previously reported inter-rater reliability on a group of 20 patients which suggested that 95% of the raters would score within 1 point of the `correct' score (Francis et al., 1991
). No other rater reliability studies on this scale have been reported in the literature. Despite its high reliability, the AI was found to be weakly sensitive to clinical change. This is not surprising since this scale addresses only one dimension of the potential disabilities which could occur in this illness. The face validity of the AI as a disability scale was supported by its high correlation with patients' disability ranks, and patients' self-assessment of disability using the physical functioning domain of the SF-36. The AI correlated moderately with the EDSS, the FIM and the Barthel Index, reflecting its mono-dimensional nature in relation to the other disability scales. As expected for a disability scale, the AI correlated moderately with impairment, handicap and quality of life scales.
CAMBS
As reported in its original publication (Mumford and Compston, 1993
), the frequency distribution of the relapse and progression domains of this scale was negatively skewed to the `normal' end of the scales, suggesting a `floor' effect, which reflected the natural history of the illness and the patient population used in the study. In comparison, the frequency distribution of the disability domain was positively skewed to the `severely disabled' end of the scale, suggesting a `ceiling' effect, whereas the handicap domain was evenly distributed. Rater reliability of this scale's four domains was reasonably high. Complete rater agreement was obtained by allowing a difference of 12 points on the various domains. With the exception of the handicap domain, a difference of 1 point achieved >95% rater agreement. The only published reliability data on this scale are those of Mumford and Compston (1993) who suggested a moderate reliability of the combined scores (calculated as kappa coefficient <0.66) which is somewhat higher than we found (kappa coefficient of 0.41). Nevertheless, these figures are of doubtful significance since a sum score is not used in the scale. The scale's relapse and progression domains were moderately sensitive to clinical change, reflecting their simple definition and the design of our responsiveness study. The disability domain was weakly sensitive to clinical change, whereas the handicap domain was unresponsive.
The face validity of the disability domain of the CAMBS was supported by its high correlation with patients' disability ranks, the FIM, the EDSS and the patients' self-assessment of disability using the physical functioning domain of the SF-36, and its moderate correlation with the Barthel Index. This domain also correlated moderately with other impairment, handicap and quality of life scales. Surprisingly, the correlation between the handicap domain of the CAMBS and the London Handicap Scale, patients' independence and patients' ability to work and do their housework was only moderate or weak, whereas its correlation with the EuroQol VAS was high, thereby throwing doubt on the validity of this domain as a handicap scale. The previously reported high correlation between this domain and the Nottingham Health Profile was based on a study of only 10 patients (Mumford and Compston, 1993
).
Raters' and patients' bias
This study was designed to minimize the effect of raters' and patients' bias on the assessment of reliability and responsiveness. All the raters were blinded to their own and other raters' previous scores, and open discussions about patients' clinical conditions were avoided amongst themselves. In the inter-rater reliability study, patients were assessed independently by the three raters, and no fixed order for the examination was observed. The latter was designed to reduce the effect of patients' bias due to fatigue or recall of answers to specific questions required to obtain some of the clinical scores. The effect of this potential source of bias is unlikely to have been significant since the inter-rater reliability figures were not consistently higher than the intra-rater reliability figures, which were based on assessments separated by 3 month periods. For the same reason, it is also unlikely that the familiarity of the patients to the assessors has increased the intra-rater reliability on account of raters' recall of their previous scores, since such figures were often lower than the inter-rater reliability figures obtained by comparing the scores of two different raters.
Although inter-rater reliability was assessed on scores obtained by either two neurologists (EDSS, SNRS and AI) or a neurologist and a research nurse (FIM and CAMBS), it is unlikely that this has affected the validity of cross-scale reliability comparison, since the application of the FIM and the CAMBS was based on patients interview rather than neurological examination, and since previous studies have found these two scales to be equally reliable when applied by a neurologist and a nurse (Mumford and Compston, 1993
), two therapists (Brosseau and Wolfson, 1994
) or a neurologist and a multidisciplinary team comprising a doctor, an occupational therapist, a physiotherapist, a speech therapist and a nurse (Kidd et al., 1995
). Furthermore, the reliability of these scales was comparable when they were applied by the same neurologist twice (in the intra-rater reliability study), or by a neurologist and a research nurse (in the inter-rater reliability study).
A degree of rater's bias was observed in the application of the SNRS, reflecting the lack of clear guidelines for assessing the severity of impairment in this scale (Sharrack and Hughes, 1996
). The reliability confidence intervals, particularly intra-rater, were relatively wide, reflecting the small number of patients recruited for this study, as standard errors of measurements used to construct the 95% confidence intervals, are inversely related to the sample size (Norman and Streiner, 1993
).
Conclusions
This study has assessed comprehensively the psychometric properties of five commonly used clinical scales in multiple sclerosis research. None of these scales completely satisfied the requirements of an ideal outcome measure, although many were found to have some desirable properties. The EDSS was reliable within 1.0 point (two 0.5 steps), valid as an impairment and disability scale, but not responsive. The SNRS was internally consistent, reliable within 13 points, valid as an impairment scale, but not responsive. The FIM was internally consistent, reliable within 9 points, valid as a disability scale, sensitive to clinical change, but had a limited content validity. The AI was reliable within 1 point, valid as an ambulation-related disability scale, but weakly sensitive to clinical change. The CAMBS was generally reliable within 1 point in each of its domains, and had valid disability and responsive relapse and progression domains. These results should inform the choice of outcome measures in future multiple sclerosis treatment trials.
| Acknowledgments |
|---|
We wish to thank all the patients who participated in this study for their kind co-operation.
| References |
|---|
|
|
|---|
Amato MP, Fratiglioni L, Groppi C, Siracusa G, Amaducci L. Interrater reliability in assessing functional systems and disability on the Kurtzke scale in multiple sclerosis. Arch Neurol 1988; 45: 7468.
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1: 30710.[Web of Science][Medline]
British Standards Institution. Precision of test methods I: guide for the determination and reproducibility of standard test methods. BS 5497 Part I. London: British Standards Institution; 1979.
Brosseau L, Wolfson C. The inter-rater reliability and construct validity of the Functional Independence Measure for multiple sclerosis subjects. Clin Rehabil 1994; 8: 10715.
Cohen J. A coefficient of agreement for nominal scales. Educational and psychological measurement 1960; 20: 3746.[Web of Science]
Cohen, J. Statistical power analysis for the behavioral sciences. New York: Academic Press; 1977.
Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951; 16: 297334.[Web of Science]
Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation. Control Clin Trials 1991; 12 (4 Suppl): 142S58S.
Ellison GW, Myers LW, Leake BD. Responsiveness of the Disability Status Scale (DSS) [abstract]. Neurology 1993; 43 (2 Suppl 1): A204.
EuroQol Group. EuroQol: a new facility for measurement of health related quality of life. Health Policy 1990; 16: 199208.[Web of Science][Medline]
Fleiss JL, Schrout PE. Approximate interval estimation for a certain intraclass correlation coefficient. Psychometrika 1978; 43: 25962.[Web of Science]
Francis DA, Bain P, Swan AV, Hughes RA. An assessment of disability rating scales used in multiple sclerosis. Arch Neurol 1991; 48: 299301.
Garratt AM, Ruta DA, Abdalla MI, Buckingham JK, Russell IT. The SF36 health survey questionnaire: an outcome measure suitable for routine use within the NHS? Br Med J 1993; 306: 14404.
Goodkin DE, Hertsgaard D, Rudick RA. Exacerbation rates and adherence to disease type in a prospectively followed-up population with multiple sclerosis: implications for clinical trials. Arch Neurol 1989; 46: 110712.
Goodkin DE, Cookfair D, Wende K, Bourdette D, Pullicino P, Scherokman B, et al. Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke Expanded Disability Status Scale (EDSS). Neurology 1992; 42: 85963.
Granger CV, Cotter AC, Hamilton BB, Fiedler RC, Hens MM. Functional assessment scales: a study of persons with multiple sclerosis. Arch Phys Med Rehabil 1990; 71: 8705.[Web of Science][Medline]
Hall KM, Hamilton BB, Gordon WA, Zasler ND. Characteristics and comparisons of functional assessment indices: Disability Rating Scale, Functional Independence Measure, and Functional Assessment Measure. J Head Trauma Rehabil 1993; 8: 6074.
Hamilton BB, Granger CV, Sherwin FS, Zielezny M, Teshman JS. A uniform national data system for medical rehabilitation. In: Fuhrer MJ, editor. Rehabilitation outcomes: analysis and measurements. Baltimore: Brookes; 1987. p. 13747.
Harwood RH, Gompertz P, Ebrahim S. Handicap one year after a stroke: validity of a new scale. J Neurol Neurosurg Psychiatry 1994; 57: 8259.
Hauser SL, Dawson DM, Lehrich JR, Beal MF, Kevy SV, Propper RD, et al. Intensive immunosuppression in progressive multiple sclerosis: a randomized, three-arm study of high-dose intravenous cyclophosphamide, plasma exchange, and ACTH. N Engl J Med 1983; 308: 17380.[Abstract]
Hobart J, Lamping D, Freeman J, Thompson A. Measuring disability in multiple sclerosis: reliability of the functional independence measure [abstract]. Eur J Neurol 1996a; 3 Suppl 2: 123.
Hobart JC, Lamping DL, Freeman JA, Thompson AJ. Reliability, validity and responsiveness of the Kurtzke expanded disability status scale (EDSS) in multiple sclerosis (MS) patients [abstract]. Eur J Neurol 1996b; 3 Suppl 4: 13.
Hobart JC, Lamping DL, Freeman JA, Thompson A.J. The responsiveness of disability measures in multiple sclerosis (MS) [abstract]. Eur J Neurol 1996c; 3 Suppl 4: 13.
Hobart JC, Lamping DL, Thompson AJ. Evaluating neurological outcome measures: the bare essentials. J Neurol Neurosurg Psychiatry 1996d; 60: 12730.
Hunt SM, McKenna SP, McEwen J, Williams J, Papp E. The Nottingham Health Profile: subjective health status and medical consultations. Soc Sci Med [A] 1981; 15: 2219.
Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989; 27 (3 Suppl): S17889.
Keith RA, Granger CV, Hamilton BB, Sherwin FS. The functional independence measure: a new tool for rehabilitation. In: Eisenberg MG, Grzesiak RC, editors. Advances in clinical rehabilitation, Vol. 1. New York: Springer-Verlag; 1987. p. 618.
Kidd D, Stewart G, Baldry J, Johnson J, Rossiter D, Petruckevitch A, et al. The Functional Independence Measure: a comparative validity and reliability study [see comments]. Disabil Rehabil 1995; 17: 104. Comment in: Disabil Rehabil 1995; 17: 456.[Web of Science][Medline]
Koziol JA, Frutos A, Sipe JC, Romine JS, Beutler E. A comparison of two neurologic scoring instruments for multiple sclerosis. J Neurol 1996; 243: 20913.[Web of Science][Medline]
Kurtzke JF. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology 1983; 33: 144452.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 15974.[Web of Science][Medline]
Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Md State Med J 1965; 14: 615.[Medline]
Midgard R, Riise T, Nyland H. Impairment, disability, and handicap in multiple sclerosis: a cross-sectional study in an incident cohort in More and Romsdal County, Norway. J Neurol 1996; 243: 33744.[Web of Science][Medline]
Mumford CJ, Compston A. Problems with rating scales for multiple sclerosis: a novel approachthe CAMBS score. J Neurol 1993; 240: 20915.[Web of Science][Medline]
Norman GR, Streiner DL. Principal components and factor analysis. In: Norman GR, Streiner DL. Biostatistics. St. Louis (MO): Mosby; 1993. p. 12942.
Noseworthy JH. Clinical scoring methods for multiple sclerosis. Ann Neurol 1994; 36 Suppl: S805.
Noseworthy JH, Vandervoort MK, Wong CJ, Ebers GC. Interrater variability with the Expanded Disability Status Scale (EDSS) and Functional Systems (FS) in a multiple sclerosis clinical trial. Neurology 1990; 40: 9715.
Rodriguez M, Siva A, Ward J, Stolp-Smith K, O'Brien P, Kurland L. Impairment, disability, and handicap in multiple sclerosis: a population-based study in Olmsted County, Minnesota. Neurology 1994; 44: 2833.
Rothwell PM, McDowell Z, Wong CK, Dorman PJ. Doctors and patients don't agree: cross sectional study of patients' and doctors' perceptions and assessments of disability in multiple sclerosis. Br Med J 1997; 314: 15803.
Sharrack B, Hughes RA. Clinical scales for multiple sclerosis. J Neurol Sci 1996; 135: 19.[Web of Science][Medline]
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979; 86: 4208.[Web of Science][Medline]
Sipe JC, Knobler RL, Braheny SL, Rice GP, Panitch HS, Oldstone MB. A neurologic rating scale (NRS) for use in multiple sclerosis. Neurology 1984; 34: 136872.
Streiner DL, Norman GR. Health measurement scales. 2nd edn. Oxford: Oxford University Press; 1995.
Swingler RJ, Compston DAS. The morbidity of multiple sclerosis. Q J Med 1992; 83: 32537.
Willoughby EW, Paty DW. Scales for rating impairment in multiple sclerosis: a critique. Neurology 1988; 38: 17938.
Received August 19, 1998. Accepted August 25, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Tyson and L. Connell The psychometric properties and clinical utility of measures of walking and mobility in neurological conditions: a systematic review Clinical Rehabilitation, November 1, 2009; 23(11): 1018 - 1033. [Abstract] [PDF] |
||||
![]() |
J. F. Foley and D. W. Brandes Redefining functionality and treatment efficacy in multiple sclerosis Neurology, June 9, 2009; 72(23_Supplement_5): S1 - S11. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gray, G. McDonnell, and S. Hawkins Tried and tested: the psychometric properties of the multiple sclerosis impact scale (MSIS-29) in a population-based study Multiple Sclerosis, January 1, 2009; 15(1): 75 - 80. [Abstract] [PDF] |
||||
![]() |
C Heesen, J Bohm, C Reich, J Kasper, M Goebel, and S. Gold Patient perception of bodily functions in multiple sclerosis: gait and visual function are the most valuable Multiple Sclerosis, August 1, 2008; 14(7): 988 - 991. [Abstract] [PDF] |
||||
![]() |
R. L Kane, C. T Bever, M. Ehrmantraut, A. Forte, W. J Culpepper, and M. T Wallin Teleneurology in patients with multiple sclerosis: EDSS ratings derived remotely and from hands-on examination J Telemed Telecare, June 1, 2008; 14(4): 190 - 194. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Leone, S. Bonissoni, L. Collimedaglia, F. Tesser, S. Calzoni, A. Stecco, P. Naldi, and F. Monaco Factors predicting incomplete recovery from relapses in multiple sclerosis: a prospective study Multiple Sclerosis, May 1, 2008; 14(4): 485 - 493. [Abstract] [PDF] |
||||
![]() |
J. Paltamaa, T. Sarasoja, E. Leskinen, J. Wikstrom, and E. Malkia Measuring Deterioration in International Classification of Functioning Domains of People With Multiple Sclerosis Who Are Ambulatory Physical Therapy, February 1, 2008; 88(2): 176 - 190. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Rampello and A. Chetta Author Response Physical Therapy, May 1, 2007; 87(5): 558 - 559. [Full Text] [PDF] |
||||
![]() |
A. Rampello, M. Franceschini, M. Piepoli, R. Antenucci, G. Lenti, D. Olivieri, and A. Chetta Effect of Aerobic Training on Walking Capacity and Maximal Exercise Tolerance in Patients With Multiple Sclerosis: A Randomized Crossover Controlled Study Physical Therapy, May 1, 2007; 87(5): 545 - 555. [Abstract] [Full Text] [PDF] |
||||
![]() |
A Creange, I Serre, M Levasseur, D Audry, A Nineb, D Boerio, T Moreau, P Maison, and Reseau SINDEFI-SEP Walking capacities in multiple sclerosis measured by global positioning system odometer Multiple Sclerosis, March 1, 2007; 13(2): 220 - 223. [Abstract] [PDF] |
||||
![]() |
L Julian, N M Merluzzi, and D C Mohr The relationship among depression, subjective cognitive impairment, and neuropsychological performance in multiple sclerosis Multiple Sclerosis, January 1, 2007; 13(1): 81 - 86. [Abstract] [PDF] |
||||
![]() |
L.-J. Liaw, C.-L. Hsieh, S.-K. Lo, S. Lee, M.-H. Huang, and J.-H. Lin Psychometric properties of the modified Emory Functional Ambulation Profile in stroke patients Clinical Rehabilitation, May 1, 2006; 20(5): 429 - 437. [Abstract] [PDF] |
||||
![]() |
S L Minden, D Frankel, L Hadden, J Perloff, K P Srinath, and D C Hoaglin The Sonya Slifka Longitudinal Multiple Sclerosis Study: methods and sample characteristics Multiple Sclerosis, February 1, 2006; 12(1): 24 - 38. [Abstract] [PDF] |
||||
![]() |
G. Verheyden, G. Nuyens, A. Nieuwboer, P. Van Asch, P. Ketelaer, and W. De Weerdt Reliability and Validity of Trunk Assessment for People With Multiple Sclerosis Physical Therapy, January 1, 2006; 86(1): 66 - 76. [Abstract] [Full Text] [PDF] |
||||
![]() |
V de Groot, H Beckerman, G J Lankhorst, C H Polman, and L M Bouter The initial course of daily functioning in multiple sclerosis: a three-year follow-up study Multiple Sclerosis, December 1, 2005; 11(6): 713 - 718. [Abstract] [PDF] |
||||
![]() |
A. Miller and S. Dishon Health-related quality of life in multiple sclerosis: psychometric analysis of inventories Multiple Sclerosis, August 1, 2005; 11(4): 450 - 458. [Abstract] [PDF] |
||||
![]() |
O.R. Pearson, M.E. Busse, R.W.M. van Deursen, and C.M. Wiles Quantification of walking mobility in neurological disorders QJM, August 1, 2004; 97(8): 463 - 475. [Full Text] [PDF] |
||||
![]() |
J.-H. Lin, I-P. Hsueh, C.-F. Sheu, and C.-L. Hsieh Psychometric properties of the sensory scale of the Fugl-Meyer Assessment in stroke patients Clinical Rehabilitation, April 1, 2004; 18(4): 391 - 397. [Abstract] [PDF] |
||||
![]() |
C McGuigan and M Hutchinson The multiple sclerosis impact scale (MSIS-29) is a reliable and sensitive measure J. Neurol. Neurosurg. Psychiatry, February 1, 2004; 75(2): 266 - 269. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M Gold, H. Schulz, A. Monch, K.-H. Schulz, and C. Heesen Cognitive impairment in multiple sclerosis does not affect reliability and validity of self-report health measures Multiple Sclerosis, August 1, 2003; 9(4): 404 - 410. [Abstract] [PDF] |
||||
![]() |
E L J Hoogervorst, M J Eikelenboom, B M J Uitdehaag, and C H Polman One year changes in disability in multiple sclerosis: neurological examination compared with patient self report J. Neurol. Neurosurg. Psychiatry, April 1, 2003; 74(4): 439 - 442. [Full Text] [PDF] |
||||
![]() |
J Lechner-Scott, L Kappos, M Hofman, C H Polman, H Ronner, X Montalban, M Tintore, M Frontoni, C Buttinelli, M P Amato, et al. Can the Expanded Disability Status Scale be assessed by telephone? Multiple Sclerosis, April 1, 2003; 9(2): 154 - 159. [Abstract] [PDF] |
||||
![]() |
M W Nortvedt and T Riise The use of quality of life measures in multiple sclerosis research Multiple Sclerosis, February 1, 2003; 9(1): 63 - 72. [Abstract] [PDF] |
||||
![]() |
C R Nicholl, N B Lincoln, and E D Playford The reliability and validity of the Nottingham Extended Activities of Daily Living Scale in patients with multiple sclerosis Multiple Sclerosis, October 1, 2002; 8(5): 372 - 376. [Abstract] [PDF] |
||||
![]() |
I-P Hsueh, J-H Lin, J-S Jeng, and C-L Hsieh Comparison of the psychometric characteristics of the functional independence measure, 5 item Barthel index, and 10 item Barthel index in patients with stroke J. Neurol. Neurosurg. Psychiatry, August 1, 2002; 73(2): 188 - 190. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Lowe, M. D. Phillips, J. T. Lurito, D. Mattson, M. Dzemidzic, and V. P. Mathews Multiple Sclerosis: Low-Frequency Temporal Blood Oxygen Level-Dependent Fluctuations Indicate Reduced Functional Connectivity—Initial Results Radiology, July 1, 2002; 224(1): 184 - 192. [Abstract] [Full Text] |
||||
![]() |
E L. Hoogervorst, N F Kalkers, L M. van Winsen, B M. Uitdehaag, and C H Polman Differential treatment effect on measures of neurologic exam, functional impairment and patient self-report in multiple sclerosis Multiple Sclerosis, October 1, 2001; 7(5): 335 - 339. [Abstract] [PDF] |
||||
![]() |
J. Hobart, D. Lamping, R. Fitzpatrick, A. Riazi, and A. Thompson The Multiple Sclerosis Impact Scale (MSIS-29): A new patient-based outcome measure Brain, May 1, 2001; 124(5): 962 - 973. [Abstract] [Full Text] [PDF] |
||||
![]() |
E.L.J. Hoogervorst, L.M.L. van Winsen, M.J. Eikelenboom, N.F. Kalkers, B.M.J. Uitdehaag, and C.H. Polman Comparisons of patient self-report, neurologic examination, and functional impairment in MS Neurology, April 10, 2001; 56(7): 934 - 937. [Abstract] [Full Text] [PDF] |
||||
![]() |
S M Gold, C Heesen, H Schulz, U Guder, A Monch, J Gbadamosi, C Buhmann, and K H Schulz Disease specific quality of life instruments in multiple sclerosis: Validation of the Hamburg Quality of Life Questionnaire in Multiple Sclerosis (HAQUAMS) Multiple Sclerosis, April 1, 2001; 7(2): 119 - 130. [Abstract] [PDF] |
||||
![]() |
P. H Wiesel, C. Norton, A. J Roy, J. B Storrie, J. Bowers, and M. A Kamm Gut focused behavioural treatment (biofeedback) for constipation and faecal incontinence in multiple sclerosis J. Neurol. Neurosurg. Psychiatry, August 1, 2000; 69(2): 240 - 243. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hobart, J. Freeman, and A. Thompson Kurtzke scales revisited: the application of psychometric methods to clinical intuition Brain, May 1, 2000; 123(5): 1027 - 1040. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Liu and L. D Blumhardt Disability outcome measures in therapeutic trials of relapsing-remitting multiple sclerosis: effects of heterogeneity of disease course in placebo cohorts J. Neurol. Neurosurg. Psychiatry, April 1, 2000; 68(4): 450 - 457. [Abstract] [Full Text] [PDF] |
||||
![]() |
J A Freeman, J C Hobart, D W Langdon, and A J Thompson Clinical appropriateness: a key factor in outcome measure selection: the 36 item short form health survey in multiple sclerosis J. Neurol. Neurosurg. Psychiatry, February 1, 2000; 68(2): 150 - 156. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Sharrack and R. A. Hughes The Guy's Neurological Disability Scale (GNDS): a new disability measure for multiple sclerosis Multiple Sclerosis, August 1, 1999; 5(4): 223 - 233. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











