Skip Navigation


Brain Advance Access originally published online on November 9, 2005
Brain 2006 129(1):224-234; doi:10.1093/brain/awh675
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
129/1/224    most recent
awh675v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Hobart, J. C.
Right arrow Articles by Zajicek, J. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hobart, J. C.
Right arrow Articles by Zajicek, J. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Getting the measure of spasticity in multiple sclerosis: the Multiple Sclerosis Spasticity Scale (MSSS-88)

J. C. Hobart1,2,3, A. Riazi2, A. J. Thompson2, I. M. Styles3, W. Ingram1, P. J. Vickery1, M. Warner1, P. J. Fox1 and J. P. Zajicek1

1 Peninsula Medical School, Plymouth, 2 Neurological Outcome Measures Unit, Institute of Neurology, London, UK and 3 School of Education, Murdoch University, Perth, Western Australia

Correspondence to: Dr Jeremy Hobart, Senior Lecturer and Honorary Consultant Neurologist, Department of Clinical Neuroscience, Peninsula Medical School, Room N16 ITTC Building, Tamar Science Park, Davy Road, Plymouth, Devon PL6 8BX, UK E-mail: Jeremy.Hobart{at}pms.ac.uk


    Summary
 Top
 Summary
 Introduction
 Methods
 Results
 Discussion
 Supplementary material
 References
 
Spasticity is most commonly defined as an inappropriate, velocity dependent, increase in muscle tonic stretch reflexes, due to the amplified reactivity of motor segments to sensory input. It forms one component of the upper motor neuron syndrome and often leads to muscle stiffness and disability. Spasticity can, therefore, be measured through electrophysiological, biomechanical and clinical evaluation, the last most commonly using the Ashworth scale. None of these techniques incorporate the patient experience of spasticity, nor how it affects people's daily lives. Consequently, we set out to construct a rating scale to quantify the perspectives of the impact of spasticity on people with multiple sclerosis. Qualitative methods (in-depth patient interviews and focus groups, expert opinion and literature review) were used to develop a conceptual framework of spasticity impact, and to generate a pool of items with the potential to convert this framework into a rating scale with multiple dimensions. This item pool was administered, in the form of a questionnaire, to a sample of people with multiple sclerosis and spasticity. Guided by Rasch analysis, we constructed and validated a rating scale for each component of the conceptual framework. Decisions regarding item selection were based on the integration and assimilation of seven specific analyses including clinical meaning, ordering of thresholds, fit statistics and differential item functioning. The qualitative phase (17 patient interviews, 3 focus groups) generated 144 potential scale items and a conceptual model with eight components addressing symptoms (muscle stiffness, pain and discomfort and muscle spasms,), physical impact (activities of daily living, walking and body movements) and psychosocial impact (emotional health, social functioning). The first postal survey was sent to 272 people with multiple sclerosis and had a response rate of 88%. Findings supported the development of scales for each component but demonstrated that five item response options were too many. The 144-item questionnaire, reformatted with four-item response options, was administered with four validating instruments to an independent sample of 259 people with multiple sclerosis (response rate 78%). From the responses, an 88-item instrument with eight subscales was developed that satisfied criteria for reliable and valid measurement. Correlations with other measures were consistent with predictions. The 88-item Multiple Sclerosis Spasticity Scale (MSSS-88) is a reliable and valid, patient-based, interval-level measure of the impact of spasticity in multiple sclerosis. It has the potential to advance outcomes measurement in clinical trials and clinical practice, and provides a new perspective in the clinical evaluation of spasticity.

Key Words: spasticity measurement; multiple sclerosis; Multiple Sclerosis Spasticity Scale (MSSS-88); quality of life measurement; Rasch analysis

Abbreviations: ADL = activities of daily living; FAMS = Functional Assessment in Multiple Sclerosis; MSSS-88 = Multiple Sclerosis Spasticity Scale; MSIS-29 = Multiple Sclerosis Impact Scale

Received March 30, 2005. Revised September 30, 2005. Accepted October 4, 2005.


    Introduction
 Top
 Summary
 Introduction
 Methods
 Results
 Discussion
 Supplementary material
 References
 
Spasticity is common, clinically and pathophysiologically complex, and disabling. It affects at least 35% of people post-stroke (Watkins et al., 2002Go) and up to 90% of people with multiple sclerosis at some point (Paty and Ebers, 1998Go). A range of treatments is available including spasticity reduction strategies, specialist rehabilitation therapy, oral medications, intramuscular and intrathecal injections, intrathecal infusions and surgery. Problematic spasticity typically requires a combination of treatments (Crayton et al., 2004Go), and should involve a patient-focused, co-ordinated, multidisciplinary team approach (Thompson et al., 2005Go). These facts emphasize that scientifically sound and clinically meaningful spasticity measurement is indispensable to clinical practice and research in this area (Voerman et al., 2005Go).

Spasticity measurement, like spasticity management, is complicated. In broad terms, measurement instruments can be categorized into neurophysiological methods (Voerman et al., 2005Go), biomechanical techniques (Wood et al., 2005Go) and clinical scales (Platz et al., 2005Go). The clinical meaningfulness of neurophysiological and biomechanical approaches has been questioned, as they focus on highly specific examinations (e.g. H-reflex or single joint analysis), correlate poorly with clinical indicators of spasticity and have problems with reliability and sensitivity (Voerman et al., 2005Go; Wood et al., 2005Go). Clinical scales used in the measurement of spasticity have also been found wanting (Platz et al., 2005Go). Of the 24 scales recently reviewed, (Platz et al., 2005Go) 18 were single item measures and, as a consequence, have poor reliability (McHorney et al., 1992Go), validity (Manning et al., 1982Go; Hobart, 2003Go) and responsiveness (Sloan et al., 2002Go). Only three scales had more than three items: two of these assessed resistance to passive movement, the third measured the extensor toe sign. No scale had been developed to address the broader consequences of spasticity for the patient.

If spasticity management is to be patient-focused, clinical trials and clinical practice need rigorous measurement methods that capture patients' experiences and perceptions of spasticity, and complement the existing range of measures. That challenge, which has not been met by existing scales (Platz et al., 2005Go), was the aim of this study.


    Methods
 Top
 Summary
 Introduction
 Methods
 Results
 Discussion
 Supplementary material
 References
 
Overview
There were three stages. First, we used a range of qualitative studies to develop a conceptual framework of spasticity impact, and a pool of potential items hypothesized to convert this framework into a scale. Second, we administered the items, as a questionnaire, to a sample of people with multiple sclerosis and spasticity and, using Rasch analysis, undertook the preliminary steps of constructing a subscale for each component of the conceptual framework. Third, we undertook a second survey to finalize and validate the instrument. The research ethics committees of Derriford Hospital and the National Hospital for Neurology and Neurosurgery (NHNN) approved the study.

Stage 1: conceptual model formation and item generation
Four pieces of qualitative work were undertaken to develop a conceptual framework of spasticity impact and to generate a pool of items with the potential to convert (operationalize) this framework into a scale with multiple subscales. First, in-depth, semi-structured interviews were conducted with individual multiple sclerosis patients from NHNN, until no new themes emerged. Second, three in-depth semi-structured focus groups were conducted with multiple sclerosis patients from Derriford Hospital. Patients were chosen to ensure a wide variance of spasticity severity, age, sex, disease duration and disease type. Interviews and focus groups were tape-recorded, transcribed and content analysed (WINMAX; Kuckartz, 1996Go). Third, a comprehensive literature review was undertaken to identify relevant health areas and potential items. Lastly, expert opinion on the impact of spasticity was sought from neurologists, spasticity nurses, multiple sclerosis nurses and rehabilitation staff.

A preliminary questionnaire was formatted and pre-tested in a small group of patients with multiple sclerosis and variable degrees of spasticity.

Stage 2: first postal survey
The questionnaire was posted to a random half-sample of the 544 patients from the Cannabinoids in Multiple Sclerosis study (CAMS; Zajicek et al., 2003Go) who had commenced trial medication and were still under follow-up. To encourage high response rates we used personalized letters, standardized instructions and reminders for non-responders at 3 and 5 weeks.

Analysis plans
Scale development was guided by Rasch measurement principles (Rasch, 1960) and analyses (Andrich et al., 1997–2004Go). The key principle is that the mathematical (Rasch) model articulates a set of requirements that must be met for rating scale data to generate internally valid, equal-interval measurements that are stable (invariant) across items and people. In contrast, scales whose development is guided by traditional psychometric methods generate ordinal scores whose invariance is unknown (Wright and Linacre, 1989Go).

We constructed a scale for each area defined as important to patients by the qualitative studies. The aim was that each scale consisted of a set of clinically meaningful items that satisfied requirements for measurement. This goal was achieved by choosing a set of items hypothesized to constitute a scale for each area, analysing the observed data against measurement criteria and making decisions on item selection and deletion. Appraisals according to these criteria were not conducted singularly and sequentially, but simultaneously and interactively within the specific context of the item set being examined. The seven measurement criteria were:

Clinical meaning. We examined all items in each set to judge the extent to which they were clinically cohesive. Items deemed non-specific were considered for deletion.

Thresholds for item response options. For each item, the use of response categories scored with successive integer scores (1 = not at all to 5 = extremely) implies a continuum of increasing impact, from less (not at all) to more (extremely). This assumption was tested by examining the ordering of thresholds (or points of crossover between two adjacent response categories) ascertained by the Rasch analysis (Andrich, 1978Go). A threshold is the point on the measurement continuum defined by a scale (e.g. degree of muscle stiffness) at which the probability of responding to adjacent categories (e.g. ‘not at all’ and ‘a little’) is equal. Disordered thresholds imply scoring functions that are not working as intended (Andrich, 1978Go). Such items were considered for deletion.

Item fit statistics. Rasch analysis tests the extent to which the observed data (patients' responses to items) accord with (fit) the responses expected by a mathematical (Rasch) model. Misfit implies an item is not working as intended in a scale, and may be regarded as not measuring the construct under consideration. There are many methods of examining the fit of data to the model, no method alone is sufficient to make a judgement about fit. We examined three indicators. First, log residuals that summarize the difference between observed and expected responses to an item across all people (item–person interaction). Second, chi square values that summarize the difference between observed and expected responses to an item for class intervals of people who have relatively similar levels of disability (item–trait interaction). Third, item characteristic curves (ICC) that display graphically the expected responses across the continuum of person scores and the observed values for each class interval of person scores. There are no absolute criteria for interpreting fit statistics. It is more meaningful to interpret them together, and in light of the clinical usefulness of an item set.

Item locations. The items of a scale define the continuum on which people are measured. Rasch analysis locates items and people on this continuum. Ideally, and logically, items should be evenly spread over a reasonable range and targeted to the people they are measuring. Items with similar locations on the continuum indicate that one of them might be redundant.

Differential item functioning (DIF). Stable measurement rulers are required for people to be measured precisely and validly (Linacre et al., 1994Go). That is, the items of a scale are required to perform similarly across different groups of people. More specifically, for any given level of disability the expected value of an item is required to be the same irrespective of which group a person belongs to. We examined all items for the extent to which their functioning was differentially affected by gender, age, mobility level (unaided, with aid and wheelchair user) and degree of spasticity (self-reported as minimal, mild, moderate or severe). Items demonstrating DIF, determined by statistical (ANOVA) and visual (ICC) tests, were considered for deletion (Hagquist and Andrich, 2004Go).

Correlations between standardized residuals. A residual is the difference between the observed and expected response for a person to an item. A standardized residual is computed by squaring and summing all residuals for an item and dividing this value by its standard deviation. Correlations between residuals assess the extent to which the response to one item is biased by the response to another. The significance of these values depends on sample size and item number. Values of >0.30 imply dependency among items and were used to identify items for evaluation (Andrich, 1988Go).

Person separation index (PSI). This reliability statistic, analogous to Cronbach's alpha (Andrich, 1982Go), quantifies the error associated with the measurements of people in this sample. Higher values indicate greater reliability. When items were deleted the impact on reliability was determined.

Stage 3: second postal survey
Sample
The remaining random half sample from the CAMS study cohort was surveyed, excluding 13 local people participating in another study. This sample was divided into random half samples that received booklets containing the new spasticity scale, a self-report spasticity grading (0 = minimal; 1 = mild; 2 = moderate; 3 = severe), demographic questions, but different validating scales. Booklet 1 contained the Multiple Sclerosis Impact Scale (MSIS-29; Hobart et al., 2001bGo) and the physical functioning (SF36PF) and mental health (SF36MH) subscales from the Short Form Health Survey (SF-36; Ware et al., 1993Go). Booklet 2 contained the mobility (FAMSmob) and emotional well-being (FAMSewb) subscales of the Functional Assessment of multiple sclerosis (FAMS; Cella et al., 1996Go), postal Barthel Index (BI; Gompertz et al., 1994Go) and General Health Questionnaire (GHQ-12; Goldberg and Hillier, 1979Go). Standard survey methods were used.

Analysis plans
All analyses described above were repeated. In addition, internal construct validity was examined by computing intercorrelations among subscales of the new spasticity instrument and by determining the ability of the subscales to detect differences between groups defined by their self-report spasticity grading. Convergent and discriminant construct validity was examined by determining the extent to which correlations between the new spasticity instrument and validating variables were consistent with expectation. These methods are described elsewhere (Hobart et al., 1996Go; Hobart et al., 2001aGo; Scientific Advisory Committee of the Medical Outcomes Trust, 2002Go).


    Results
 Top
 Summary
 Introduction
 Methods
 Results
 Discussion
 Supplementary material
 References
 
Stage 1: conceptual model formation and item generation
Seventeen interviews (75% female; mean age 47 years) were conducted until no new information was extracted. There were three focus groups (71% female, mean age 54 years) that included a total of 14 people. Expert opinion was canvassed from neurologists, multiple sclerosis nurses, spasticity nurses and rehabilitation therapists. Content analysis of the interview and focus group transcripts generated ~2000 statements concerning the impact of spasticity. These were extracted, grouped into main themes and examined for redundancy.

This qualitative work generated a preliminary conceptual model of spasticity impact and, on the basis of that model, a preliminary questionnaire with 144 items was developed. Three main domains (symptoms, physical functioning and psychosocial functioning) were identified, with a total of 8 subscales: muscle stiffness (19 items); pain and discomfort (10 items); muscle spasms (23 items); activities of daily living (ADL) (14 items); body movements (21 items); walking (15 items); emotional health (26 items) and social functioning (16 items). All items were given the same five-point response options: 1 = not at all bothered; 2 = a little bothered; 3 = moderately bothered; 4 = quite a bit bothered and 5 = extremely bothered.

Items were pre-tested in an independent sample of 17 out-patients and in-patients (NHNN) with varying levels of spasticity. Appropriate modifications were made and demographic questions were included in the booklet. At this early stage, all 144 items were retained and put into the most clinically appropriate grouping even though a number of items were considered non-specific indicators of that construct. For example, we put the item ‘bothered by heaviness anywhere in your limbs’ in the subscale concerning muscle stiffness, although we were unsure that it would be part of the final operationalization of that construct.

Stage 2: first postal survey
Sample
Questionnaire booklets were sent to 272 people, and 240 were returned completed (conservative response rate 88%). Table 1 shows the respondents' characteristics.


View this table:
[in this window]
[in a new window]
 
Table 1 Characteristics of survey samples

 
Rasch analysis
The main finding was that empirical analysis using the Rasch measurement model did not support the five-point item response option. Most items (132 of 144) had disordered thresholds implying the scoring function was not working as anticipated. The category probability curves (CPC), which plot subscale scores on the x-axis against the probability of endorsing each item response category on the y-axis, suggested the main reason for disordering was that patients could not discriminate reliably between the five response options. In particular, people appeared unable to reliably distinguish ‘a little’ from ‘moderately’, and ‘moderately’ from ‘quite a bit’.

Given this finding we undertook a post hoc analysis. First, we examined the effect of reducing the response options from five to four by combining ‘moderately’ with either ‘a little’ or ‘quite a bit’, as suggested by each item's CPC. This left seven items with disordered thresholds. With items re-scored in this manner, preliminary Rasch analyses of the hypothesized item groups were performed and supported the feasibility of constructing valid subscales for the eight components. However, post hoc analyses make assumptions about how people would have responded if a category had not been available. Therefore, in stage 3, we repeated the 144-item survey in an independent sample with a four-point item response option (1 = not at all; 2 = a little; 3 = moderately and 4 = extremely).

Stage 3: second postal survey
Sample characteristics
Questionnaire booklets were sent to 259 people. Random half samples received booklets 1 and 2. A total of 202 people returned completed questionnaire booklets (78% response rate). Table 1 shows their characteristics. In essence, this was an older sample of people with multiple sclerosis with moderate-long disease duration, half were wheelchair users, and most reported their spasticity to be moderate (32%) or severe (44%).

Scale development
The final decision as to which items should remain in each subscale was determined by assimilating the information from all seven criteria defined in the methods. A total of 56 items were deleted (mean per scale = 7; range 1–13). For example, 23 items were considered for inclusion in the muscle spasms subscale. Four of these items were eliminated because they were considered to be non-specific indicators of a continuum, from less to more, of the degree of muscle spasms. These items were: ‘juddering/jolting related to spasms’; ‘feet or legs bouncing up and down’; ‘spasms leading to difficulties greeting people with a handshake’ and ‘spasms leading to difficulty giving hugs’.

The remaining 19 items were entered into a Rasch analysis. Four of these items had reversed thresholds (‘spasms waking up your partner’; ‘spasms that are difficult to stop’; ‘spasms provoked by temperature change’ and ‘feeling that your knees are stuck together’) indicating that the four-point response option was not working as intended for these items. Although these items appear clinically important, they were removed because of the reversed thresholds and because other items in the set had similar locations and, therefore, they could be regarded as redundant in measurement terms. Another item, ‘spasms when transferring’, demonstrated DIF in different mobility groups. That is, this item had a different meaning for people with different levels of mobility (even though they had the same total score on the subscale) and was therefore unstable in measurement terms. This item was removed because of this problem, and also because it had a location similar to other items in the set, and could be regarded as redundant. The remaining 14 items appeared to constitute a clinically meaningful set, relating to muscle spasms, and satisfied the pre-determined criteria as a measurement instrument. Details of the complete instrument development process are available from the authors.

Scale validation
Tables 2GoGo5 show for all subscales the item locations, standard errors and fit statistics (fit residuals and chi square values), and the subscale reliabilities. For each subscale, the item locations spread across a reasonable range of their continua, the standard errors were small, almost all log residuals lay within the recommended range of –2.5 to +2.5 and chi squared statistics were small. All person separation indices were high (≥0.92). These findings support the reliability and validity of each MSSS-88 subscale. Table 8 shows the distributions of person measurements for each subscale. Scores spanned the full subscale range and floor and ceiling effects were less than the recommended maximum of 20% (McHorney and Tarlov, 1995Go). However, the three physical functioning subscales had larger floor effects than the other subscales (range 11–19.8%).


View this table:
[in this window]
[in a new window]
 
Table 2 MSSS-88: muscle stiffness and pain and discomfort scales

 

View this table:
[in this window]
[in a new window]
 
Table 3 MSSS-88: muscle spasms and ADL scales

 

View this table:
[in this window]
[in a new window]
 
Table 4 MSSS-88: walking and body movements scales

 

View this table:
[in this window]
[in a new window]
 
Table 5 MSSS-88: emotional health and social functioning scales

 
Correlations among subscales, and with other measures and variables. Tables 6 and 7 show correlations among MSSS-88 subscales, and between MSSS-88 subscales, validating instruments and descriptive variables. The magnitude and pattern of these correlations was generally consistent with expectations based on the constructs perceived to be measured by the instruments. This provides further evidence, albeit circumstantial, for the validity of the MSSS-88. For example, correlations among MSSS-88 subscales range from 0.35 to 0.83 (12–69% shared variation), implying the eight subscales measured related but discrete constructs. Correlations between MSSS-88 subscales and the seven validating instruments collected at the same point in time were broadly consistent with expectation. For example, the MSSS-88 emotional health and social functioning subscales correlated most highly with the MSIS-29 psychological impact subscale, SF-36 MH subscale and GHQ-12.


View this table:
[in this window]
[in a new window]
 
Table 6 Correlations among MSSS-88 scales, and between MSSS-88 scales and other variables

 

View this table:
[in this window]
[in a new window]
 
Table 7 MSSS-88: correlations with validating scales

 
Table 6 also shows the correlations between MSSS-88 subscales, self-report degree of spasticity, indoor mobility level, duration of multiple sclerosis, age and gender. The majority of correlations were consistent with expectation. For example, correlations with age and gender were low (range –0.15 to +0.09), whereas those with a four-point patient-reported grading of spasticity severity (minimal, mild, moderate and severe) were moderate (0.35–0.51).

Group differences validity. Table 8 reports the mean MSSS-88 locations for people who graded their spasticity as minimal/mild, moderate or severe. All subscales demonstrated a stepwise and statistically significant increase in mean value associated with increasing self-reported spasticity severity.


View this table:
[in this window]
[in a new window]
 
Table 8 MSSS-88: Group difference and relative validity

 

    Discussion
 Top
 Summary
 Introduction
 Methods
 Results
 Discussion
 Supplementary material
 References
 
The aim of this study was to develop a scale for measuring patients' perceptions of the impact of spasticity in multiple sclerosis. The resulting instrument, the 88-item Multiple Sclerosis Spasticity Scale (MSSS-88; Appendix see supplementary material), attempts to quantify the impact of spasticity in eight clinically relevant areas: three spasticity specific symptoms (muscle stiffness, pain and discomfort and muscle spasms), three areas of physical functioning (ADL, walking, body movements), emotional health and social functioning. This patient-derived model of the impact of spasticity highlights the complexity of an apparently unidimensional clinical concept. It can also be viewed as a framework for evidence-based management in the development of integrated care pathways, guidelines for care (Multiple Sclerosis Council for Clinical Practice Guidelines, 2003Go) and service development.

Does the MSSS-88 have too many items to be clinically useful and are the specialized skills required for Rasch analysis justified? Three questions underpin these concerns. First, why are there so many items? Second, what evidence is available and what mechanisms are in place to ensure clinical usability? Third, do the clinical advantages of Rasch analysis outweigh the necessity for specialized knowledge and software?

First, the reason there are so many items is the need for breadth, range and precision of measurement. The qualitative phase of the study identified that the clinically appropriate breadth of measurement was eight subscales. For each of these eight subscales, adequate measurement range requires the two most extreme scale items to be well separated. Measurement precision is determined by the number of units into which the range is divided and is defined mainly by the number of items of the scale. In addition, the number of items in a scale determines the confidence intervals around an individual patient's estimate (Wright and Masters, 1982Go), and a reasonable number of items are required to ‘anchor’ the construct measured by any scale. As clinicians and clinical trials require scales that give precise estimates of people's locations on the continuum that are also able to detect clinically significant change, scales need to have a reasonable number of items located at regular intervals across a substantial range. Thus, at this stage of scale development, we were reluctant to reduce the number of items further.

Second, there is evidence from our work, and that of others, that 88 items is not too many for clinical usefulness. The two postal surveys in this study, our surveys in non-trial samples of multiple sclerosis and other neurological disorder, and the work of others in health measurement have consistently demonstrated high response rates and data completeness despite large numbers of items. Nevertheless, we are aware that the MSSS-88 may be used as one of many outcomes and we have, therefore, constructed it to be flexible and adaptable to meet different measurement needs. For example, a clinical trial may be interested primarily in the impact of a treatment on spasticity symptoms. In contrast, a clinical service may be more interested in functional outcomes. Here, we recommend the use of the most appropriate MSSS-88 subscale/s to address the measurement question. This is possible because each subscale is a stand-alone measurement instrument.

In other situations, such as the evaluation of a busy multidisciplinary spasticity service, clinicians or researchers may wish to measure all eight areas but feel that 88 items is too many. Here, we recommend the use of a short-form version of the MSSS-88. That is, a selection of the most clinically appropriate items from each subscale. For example, users could choose items 1, 3, 5, 8, 10, 13 and 14 from the Muscle Spasms subscale to give a seven-item short form whose item locations are evenly spread across the continuum. This approach is possible because the item locations of each subscale are calibrated with respect to each other. Consequently, investigators can use any subset of items from any subscale and generate results that are referable to the long form version of that scale (Choppin, 1968Go, Wright, 1977Go). It is, however, important to be aware that scales with few items have limited precision (unless their range in very restricted) and are less able to detect small but clinically meaningful change.

Third, Rasch analysis offers four clinically meaningful scientific advantages that we believe far outweigh concerns about the necessity for specialized knowledge and software: (i) it offers clinicians the ability to construct interval-level measurements from ordinal-level rating scale data, thereby addressing a major concern of using rating scales as outcome measures (Whitaker et al., 1995Go; Platz et al., 2005Go); (ii) it enables clinicians to obtain estimates suitable for individual person analyses rather than only for group comparison studies; (iii) it enables clinicians to use subsets of items from each subscale rather than all items from the scale (detailed above), yet still be able to compare scores using different sets of items; (iv) missing item data can be handled scientifically, rather than on the basis of assumption, because Rasch analysis computes an estimate from the available data rather than requiring missing data to be imputed.

Nevertheless, Rasch analysis appears complicated, is not widely used, and there are few clinicians and researchers trained in its use and interpretation. For this reason we offer three methods for computing MSSS-88 subscale scores. In the first method, item scores can be summed, without weighting or standardization, to generate ordinal-level total scores just as any other Likert-type scale. Missing responses to items can be replaced with the mean score of the items completed (person-specific item mean score) provided that 50% or more of the items in a scale have been completed (Ware et al., 1993Go). In the second method of computing MSSS-88 subscale scores, the ordinal summed scores generated above can be transformed into interval-level measurements using conversion tables that can be made available with the scales. In the third method of computing MSSS-88 scores, investigators can Rasch analyse their own data. Furthermore, if these analyses use (anchor) the item and threshold locations from our dataset, available on request, people in the new sample will be measured on an identical interval-level metric to the one we have constructed.

Is the use of ordinal summed MSSS-88 scores justified and an advance over another ordinal scale? It is justified because the very use of integer and total scores depends on the data conforming to the Rasch model (Andrich, 1978Go). Whether or not they do, is an empirical question that is answered by a Rasch analysis. In many situations scores on items are simply summed, but it is usually done by assumption without checking thoroughly that it is a valid procedure and whether it is justified to use integer scores for the successive item response categories. Such thorough checks cannot be achieved using traditional psychometric methods (Massoff, 2002). This is one advance over other ordinal scales, another advance is the linearization of scores that follows directly from the model.

The evidence that Rasch analysis transforms ordinal summed scores into interval-level measurements lies in the properties of the mathematical model (Rasch, 1961Go; Wright and Stone, 1979Go; Andrich 1988Go). Effectively, the difference between (comparison of) any two people, any two items, or any one person and one item, is defined by the logarithm of the relative probabilities. In essence, albeit an oversimplification, the observed scores in the data matrix are replaced by the expected probabilities of occurrence, and relative differences computed as ratios of the relative probabilities (as these are consistent indicators of relative differences). This ratio of the relative probabilities is then expressed on a linear scale in an additive form by taking logarithms. In addition, it can be proven mathematically that the summed score is the sufficient statistic for estimating the item and person locations, and the estimation of these locations is independent of each other. As such, the Rasch model is able to transform summed scores into linear measures of persons and items that are on the same scale with a common unit and freed up from the distributional properties of each other. Thus the Rasch model realizes, mathematically, the requirements for scientific measurement (Rasch, 1961Go; Massof, 2002Go; Andrich, 2004Go): invariant comparisons of people, and items, on the same linear scale.

Unfortunately, Rasch analysis is applicable only to multiple item scales, not single item scales such as the Expanded Disability Status Scale (EDSS), Rankin scale, Hauser Ambulation Index, EDMUS and Ashworth scale. It may be possible to construct interval measurements from the Kurtzke Functional Systems if data satisfy the requirements of the Rasch model, and it is considered clinically meaningful to combine scores across the eight systems. Of note, Kurtzke did not think this was clinically appropriate (Kurtzke, 1961Go).

It is important to clarify what is implied by scores and score differences. In estimating the linear person measurements, it is not implied that, say, pain at one part of the continuum is twice that at another part of the continuum. That would require a natural zero point. Second, it is entirely possible that at different parts of the same continuum, there may be some qualitatively different responses or reactions. By analogy, in heating water more and more quantitatively, there is a qualitative reaction as it starts to boil. The investigation of possible qualitative differences on the continuum is for further clinical study having constructed a quantitative scale.

In this study we took an approach to scale development that is recommended strongly (Andrich, 2002aGo, bGo) but differs somewhat from the approach adopted by others. Specifically, we first developed and used a conceptual model to define the areas for subscale development and then used an explicit mathematical model to guide subscale development in each of these areas. It is more typical for developers of health rating scales to use statistical techniques, such as factor analysis of an item pool, to define the areas for subscale development, and then traditional psychometric methods of testing reliability and validity to refine those subscales. Using factor analysis to group items can be misleading as it partitions items according to their intercorrelations and, thus, makes the assumption that correlations between items indicate the extent to which they measure the same thing. This is an oversimplification (Duncan, 1985Go). Furthermore, factor analysis is strongly influenced by sample sizes (Nunnally and Bernstein, 1994Go), the number and type of items analysed, and the targeting of items and persons. The advantage of using an explicit mathematical model to guide scale construction is that it enables sophisticated checks on the internal validity and consistency of the scores, as well as the construction of stable linear measurement systems (Wright, 1977Go; Linacre et al., 1994Go; Massof, 2002Go; Andrich, 2004Go).

The three physical subscales of the MSSS-88 have larger floor effects than the other subscales, implying that it may be beneficial to extend their range of measurement in the future. This can be achieved without affecting the subscales as they stand, because the item locations are calibrated relative to each other. Furthermore, future scale developments can be empirically driven. The distribution of the relative item locations (Tables 2GoGo5) shows the ‘gaps’ in each subscale (notable distances between item locations), which developers may wish to fill, and the distribution of person measurements shows when it may be valuable to extend the continuum in either direction. Tables 2GoGo5 indicate the nature of items either side of the gaps, and those that define upper and lower limits of measurement. Consequently, potential items that are appropriate to those locations can be generated and examined in small samples to get reasonably accurate preliminary calibrations before needing to undertake more definitive surveys.

This study has limitations. The use of a controlled clinical study population, perhaps more motivated than the general multiple sclerosis population, might give a false impression of the utility of an 88-item scale in everyday practice. We discussed earlier circumstantial evidence that this may not be the case. Other limitations are that this work contributes little to our understanding of the relationship between self-assessment and objective clinical findings in spasticity, and the potential implications of study medications on the findings. The CAMS study finished 12–18 months before MSSS-88 measurements were collected, so concurrent Ashworth measurements were not available and all patients were off study medication. Nevertheless, we predict low correlations between these two instruments (e.g. <0.50) because they quantify very different clinical manifestations of spasticity and capture different people's perspectives. The Ashworth scale is a clinician-based evaluation of a clinical sign (muscle tone) at rest, whereas the MSSS-88 is a subjective assessment of day-to-day spasticity symptoms and functional impact in eight clinically separate areas. Consequently, the two instruments should be viewed as complementary, not competing, outcome measures for clinical trials.

Measurement of different manifestations also underpins the finding of low correlations between neurophysiological, biomechanical and clinical indicators of spasticity. Some interpret these findings with surprise or disappointment (Wood et al., 2005Go), or question the validity of one or other or both measurement methods. However, low correlations between different indicators of spasticity are predictable and appropriate, and have a number of important implications. They highlight that the selection of outcome measures underpins the meaningful interpretation of studies, that a range of carefully selected and complementary outcomes may need to be measured, that measures of one entity are unlikely to be adequate surrogate markers of another (e.g. MRI and disability) and that relying on correlations to validate scales can be limited.

The MSSS-88 represents an attempt to conceptualize and measure the impact of a complex neurological problem from the patients' perspective. Qualitative research, clinical experience and sophisticated measurement methods have been integrated to develop a scale that complements existing methods of evaluating spasticity in multiple sclerosis. It has great potential to advance the measurement and, thereby, management of this disabling clinical problem. Further examination and responsiveness testing are now required to understand the clinical meaning of MSSS-88 scores and score changes, and evaluations of the instrument in people with non-multiple sclerosis spasticity will determine its applicability as a generic instrument.


    Supplementary material
 Top
 Summary
 Introduction
 Methods
 Results
 Discussion
 Supplementary material
 References
 
For a copy of the MSSS-88 scale, see Supplementary material at Brain online.


    Acknowledgements
 
The authors wish to thank the patients who participated and Professor David Andrich for his input. Dr Hobart's recent sabbatical in Australia was supported by the Royal Society of Medicine (Ellison-Cliffe Travelling Fellowship), the Peninsula Medical School, the Multiple Sclerosis Society of Great Britain and Northern Ireland, and the NHS Health Technology Assessment Programme. However, the views and opinions expressed are not necessarily those of the NHS Executive. The Neurological Outcome Measures Unit is supported by the De Lazlo Foundation.


    References
 Top
 Summary
 Introduction
 Methods
 Results
 Discussion
 Supplementary material
 References
 
Andrich D. A rating formulation for ordered response categories. Psychometrika 1978; 43: 561–73.[CrossRef][ISI]

Andrich D. An index of person separation in latent trait theory, the traditional KR20 index, and the Guttman scale response pattern. Educ Psychol Res 1982; 9: 95–104.

Andrich D. Rasch models for measurement. Beverley Hills, CA: Sage Publications; 1988.

Andrich D. A framework relating outcomes based education and the taxonomy of educational objectives. Stud Educ Eval 2002a; 28: 35–59.[CrossRef]

Andrich D. Implication and applications of modern test theory in the context of outcomes based research. Stud Educ Eval 2002b; 28: 103–21.

Andrich D. Controversy and the Rasch model: a characteristic of incompatible paradigms? Med Care 2004; 42: I7–I16.[Medline]

Andrich D, Sheridan B, Luo G. RUMM 2020. Perth, WA: RUMM Laboratory Pty Ltd; 1997–2004.

Cella DF, Dineen K, Arnason B, Reder A, Webster KA, Karabatsos G, et al. Validation of the functional assessment of multiple sclerosis quality of life instrument. Neurology 1996; 47: 129–39.[Abstract/Free Full Text]

Choppin B. An item bank using sample free calibration. Nature 1968; 219: 870–2.[CrossRef][Medline]

Crayton H, Heyman R, Rossman H. A multimodal approach to managing the symptoms of multiple sclerosis. Neurology 2004; 63: S12–S18.[Abstract/Free Full Text]

Duncan OD. Probability, disposition and the inconsistency of attitudes and behaviours. Synthese 1985; 42: 21–34.

Goldberg DP, Hillier VF. A scaled version of the General Health Questionnaire. Psychol Med 1979; 9: 139–45.[ISI][Medline]

Gompertz P, Pound P, Ebrahim S. A postal version of the Barthel Index. Clin Rehabil 1994; 8: 233–9.[Abstract/Free Full Text]

Hagquist C, Andrich D. Is the sense of coherence instrument applicable on adolescents a latent trait analysis using Rasch modelling. Pers Individ Dif 2004; 36: 955–68.[CrossRef]

Hobart JC. Rating scales for neurologists. J Neurol Neurosurg Psychiatr 2003; 74: iv22–6.[Free Full Text]

Hobart JC, Lamping DL, Thompson AJ. Evaluating neurological outcome measures: the bare essentials. J Neurol Neurosurg Psychiatr 1996; 60: 127–30.[ISI][Medline]

Hobart JC, Freeman JA, Lamping DL, Fitzpatrick R, Thompson AJ. The SF-36 in multiple sclerosis: why basic assumptions must be tested. J Neurol Neurosurg Psychiatr 2001a; 71: 363–70.[Abstract/Free Full Text]

Hobart JC, Lamping DL, Fitzpatrick R, Riazi A, Thompson AJ. The multiple sclerosis impact scale (MSIS-29): a new patient-based outcome measure. Brain 2001b; 124: 962–73.[Abstract/Free Full Text]

Kurtzke JF. On the evaluation of disability in multiple sclerosis. Neurology 1961; 11: 686–94.[Free Full Text]

Kuckartz U. WINMAX pro '96 - scientific text analysis. Berlin: Scolari Sage Publications; 1996.

Linacre JM, Heinemann AW, Wright BD, Granger CV, Hamilton BB. The structure and stability of the functional independence measure. Arch Phys Med Rehabil 1994; 75: 127–32.[ISI][Medline]

Manning W, Newhouse J, Ware JE Jr. The status of health in demand estimation: or, beyond excellent, good, fair, and poor. In: Fuchs V, editor. Economic aspects of health. Chicago: The University of Chicago Press; 1982. p. 143–84.

Massof R. The measurement of vision disability. Optom Vis Sci 2002; 79: 516–52.[ISI][Medline]

McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res 1995; 4: 293–307.[CrossRef][ISI][Medline]

McHorney CA, Ware JE Jr, Rogers W, Raczek AE, Lu JFR. The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts. Med Care 1992; 30: MS253–MS265.[CrossRef][ISI][Medline]

Multiple Sclerosis Council for Clinical Practice Guidelines. Spasticity management in multiple sclerosis. Consortium of multiple sclerosis Centres; 2003.

Nunnally JC, Bernstein IH. Psychometric theory. New York: McGraw-Hill; 1994.

Paty DW, Ebers GC. Clinical features. In: Paty DW, Ebers GC, editors. Multiple sclerosis. Philadelphia: F.A. Davis company; 1998.

Platz T, Eickhof C, Nuyens G, Vuadens P. Clinical scales for the assessment of spasticity, associated phenomena, and function: a systematic review of the literature. Disabil Rehabil 2005; 27: 7–18.[CrossRef][ISI][Medline]

Rasch G. On general laws and the meaning of measurement in psychology. In: Neyman J, editor. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability IV. Berkeley CA: University of California Press; 1961. p. 321–34.

Rasch G. Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Education Research; 1960. Reprinted Chicago: University of Chicago Press; 1980.

Scientific Advisory Committee of the Medical Outcomes Trust. Assessing health status and quality of life instruments: attributes and review criteria. Qual Life Res 2002; 11: 193–205.[CrossRef][ISI][Medline]

Sloan JA, Aaronson N, Cappelleri JC, Fairclough DL, Varricchio C, The Clinical Significance Consensus Meeting Group. Assessing the clinical significance of single items relative to summated scores. Mayo Clin Proc 2002; 77: 479–87.[ISI][Medline]

Thompson AJ, Jarrett L, Lockley L, Marsden J, Stevenson V. Clinical management of spasticity. J Neurol Neurosurg Psychiatr 2005; 76: 459–63.[Free Full Text]

Voerman GE, Gregoric M, Hermens HJ. Neurophysiological methods for the assessment of spasticity: the Hoffmann reflex, the tendon reflex, and the stretch reflex. Disabil Rehabil 2005; 27: 33–68.[CrossRef][ISI][Medline]

Ware JE Jr, Snow KK, Kosinski M, Gandek B. SF-36 Health Survey manual and interpretation guide. Boston, MA: Nimrod Press; 1993.

Watkins C, Leathley M, Gregson JM, Moore AP, Smith TL, Sharma A. Prevalence of spasticity post stroke. Clin Rehabil 2002; 16: 515–22.[Abstract/Free Full Text]

Whitaker JN, McFarland HF, Rudge P, Reingold SC. Outcomes assessment in multiple sclerosis trials: a critical analysis. Mult Scler 1995; 1: 37–47.[Medline]

Wood D, Burridge J, van Wijck F, McFadden C, Hitchcock RA, Pandyan AD, et al. Biomechanical approaches applied to the lower and upper limb for the measurement of spasticity: a systematic review of the literature. Disabil Rehabil 2005; 27: 19–32.[CrossRef][ISI][Medline]

Wright BD. Solving measurement problems with the Rasch model. J Educ Meas 1977; 14: 97–116.[CrossRef]

Wright BD, Linacre JM. Observations are always ordinal: measurements, however must be interval. Arch Phys Med Rehabil 1989; 70: 857–60.[ISI][Medline]

Wright BD, Masters GN. Rating scale analysis. Chicago: MESA Press; 1982.

Wright BD, Stone MH. Best test design. Chicago: MESA Press; 1979.

Zajicek J, Fox P, Sanders H, Wright D, Vickery J, Nunn A, et al. Cannabinoids for treatment of spasticity and other symptoms related to multiple sclerosis (CAMS study): multi-centre randomised placebo-controlled trial. Lancet 2003; 362: 1517–26.[CrossRef][ISI][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Neurol. Neurosurg. PsychiatryHome page
P. Hagell and C. Nygren
The 39 item Parkinson's disease questionnaire (PDQ-39) revisited: implications for evidence based medicine
J. Neurol. Neurosurg. Psychiatry, November 1, 2007; 78(11): 1191 - 1198.
[Abstract] [Full Text] [PDF]


Home page
Mult SclerHome page
L. Miller, P. Mattison, L. Paul, and L. Wood
The effects of transcutaneous electrical nerve stimulation (TENS) on spasticity in multiple sclerosis
Multiple Sclerosis, May 1, 2007; 13(4): 527 - 533.
[Abstract] [PDF]


Home page
Clin RehabilHome page
M. Giovannelli, G. Borriello, P. Castri, L. Prosperini, and C. Pozzilli
Early physiotherapy after injection of botulinum toxin increases the beneficial effects on spasticity in patients with multiple sclerosis
Clinical Rehabilitation, April 1, 2007; 21(4): 331 - 337.
[Abstract] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
129/1/224    most recent
awh675v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Hobart, J. C.
Right arrow Articles by Zajicek, J. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hobart, J. C.
Right arrow Articles by Zajicek, J. P.
Social Bookmarking