OUP user menu

Modifying the Medical Research Council grading system through Rasch analyses

(CC)
Els Karla Vanhoutte , Catharina Gerritdina Faber , Sonja Ingrid van Nes , Bart Casper Jacobs , Pieter Antoon van Doorn , Rinske van Koningsveld , David Reid Cornblath , Anneke Jelly van der Kooi , Elisabeth Aviva Cats , Leonard Hendrik van den Berg , Nicolette Claudia Notermans , Willem Lodewijk van der Pol , Mieke Catharina Elisabeth Hermans , Nadine Anna Maria Elisabeth van der Beek , Kenneth Craig Gorson , Marijke Eurelings , Jeroen Engelsman , Hendrik Boot , Ronaldus Jacobus Meijer , Giuseppe Lauria , Alan Tennant , Ingemar Sergio José Merkies
DOI: http://dx.doi.org/10.1093/brain/awr318 1639-1649 First published online: 21 December 2011

Summary

The Medical Research Council grading system has served through decades for the evaluation of muscle strength and has been recognized as a cardinal feature of daily neurological, rehabilitation and general medicine examination of patients, despite being respectfully criticized due to the unequal width of its response options. No study has systematically examined, through modern psychometric approach, whether physicians are able to properly use the Medical Research Council grades. The objectives of this study were: (i) to investigate physicians’ ability to discriminate among the Medical Research Council categories in patients with different neuromuscular disorders and with various degrees of weakness through thresholds examination using Rasch analysis as a modern psychometric method; (ii) to examine possible factors influencing physicians’ ability to apply the Medical Research Council categories through differential item function analyses; and (iii) to examine whether the widely used Medical Research Council 12 muscles sum score in patients with Guillain–Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy would meet Rasch model's expectations. A total of 1065 patients were included from nine cohorts with the following diseases: Guillain–Barré syndrome (n = 480); myotonic dystrophy type-1 (n = 169); chronic inflammatory demyelinating polyradiculoneuropathy (n = 139); limb-girdle muscular dystrophy (n = 105); multifocal motor neuropathy (n = 102); Pompe's disease (n = 62) and monoclonal gammopathy of undetermined related polyneuropathy (n = 8). Medical Research Council data of 72 muscles were collected. Rasch analyses were performed on Medical Research Council data for each cohort separately and after pooling data at the muscle level to increase category frequencies, and on the Medical Research Council sum score in patients with Guillain–Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy. Disordered thresholds were demonstrated in 74–79% of the muscles examined, indicating physicians’ inability to discriminate between most Medical Research Council categories. Factors such as physicians’ experience or illness type did not influence these findings. Thresholds were restored after rescoring the Medical Research Council grades from six to four options (0, paralysis; 1, severe weakness; 2, slight weakness; 3, normal strength). The Medical Research Council sum score acceptably fulfilled Rasch model expectations after rescoring the response options and creating subsets to resolve local dependency and item bias on diagnosis. In conclusion, a modified, Rasch-built four response category Medical Research Council grading system is proposed, resolving clinicians’ inability to differentiate among its original response categories and improving clinical applicability. A modified Medical Research Council sum score at the interval level is presented and is recommended for future studies in Guillain–Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy.

  • MRC
  • manual muscle testing
  • Rasch
  • neuromuscular disorders

Introduction

In 2005, a historical essay tracing the history of scoring and summation of neuromuscular weakness as part of daily neurological practice was published by Dyck et al. (2005). Mitchell and Lewis (1886) initiated the practice of alphanumerical scoring of neurological signs in the 19th century. However, it was Lovett, an orthopaedic surgeon, who introduced an ordinal scoring of muscle weakness that formed the basis for the Mayo Clinics and Medical Research Council (MRC) manual muscle testing grading systems, of which the MRC system is most widely used (Medical Research Council, 1943; Dyck et al., 2005). Its worldwide recognition is most probably due to its simplicity, and drawings illustrating how limb muscles should be tested. Through the decades, various versions have been published that aimed to improve the methods for muscle examination. The 2010 edition of Aids to the Investigation of Peripheral Nerve Injuries. Medical Research Council: Nerve Injuries Research Committee was recently presented on behalf of the guarantors of Brain, embracing a historical review and appreciation for its nurtures through the decades (Compston, 2010). Despite being the most cardinal feature of daily neurological practice, the MRC scale has been respectfully criticized due to the unequal width of its categories, with Grades 1, 2 and 3 being too narrow, and 4 being too broad, often leading to attempts to modify the scale (Brandsma et al., 1995; Dyck et al., 2005; Cuthbert and Goodheart, 2007; MacAvoy and Green, 2007; Merlini, 2010).

One of the most common sources of improper use of any outcome measure concerns the inconsistent use of the response options that correspond to the scales’ items (Tennant and Conaghan, 2007). This results in what is known as ‘reversed or disordered thresholds’. The term threshold refers to the point between two adjacent response categories where either response is equally probable. In the case of the MRC scale, a threshold would be the point between two adjacent categories, such as between MRC Grades 2 and 3. Disordered thresholds occur when physicians have difficulty consistently discriminating between the MRC grades in patients with various degrees of muscle weakness. Surprisingly, no study has systematically examined the appropriateness of the MRC scale using modern psychometric techniques.

The objectives of this study were: (i) to examine the applicability and discriminative capacity of physicians using the MRC grades in patients with various neuromuscular illnesses with different degrees of muscle weakness. We questioned whether physicians could demonstrate a fairly uniform MRC grades’ ordered thresholds pattern along the Rasch scale continuum, since previous reports suggested human's inability to differentiate between more than four response options (Andrich, 1996; Penta et al., 2001); (ii) to investigate the influence of factors possibly affecting the proper use of the MRC grades in clinical practice (such as physician's clinical experience). For these two objectives, the Rasch method as a modern psychometric vehicle was used, solely focusing on threshold and item bias examinations (Rasch, 1960; Tennant and Conaghan, 2007); and (iii) since Guillain–Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy (CIDP) are potentially treatable illnesses and the MRC sum score has been often used as an outcome measure to determine efficacy in these illnesses, we have chosen to examine whether this multi-item scale would fulfil all Rasch model expectations in patients with Guillain–Barré syndrome and CIDP and if not, to propose changes to improve its use (Kleyweg et al., 1991; van der Meche and Schmitz, 1992; Merkies, 2001; van Koningsveld et al., 2004; Hughes et al., 2008)

Patients and methods

Patients

The MRC grades of various muscles were collected from different neuromuscular seminal studies published in the last two decades. Most of these studies have guided the worldwide neurological community in understanding the clinical and therapeutic pattern of these illnesses. A total of 1065 patients (Guillain–Barré syndrome: n = 480; myotonic dystrophy type 1: n = 169; CIDP: n = 139; limb-girdle muscular dystrophy: n = 105; multifocal motor neuropathy: n = 102; Pompe's disease: n = 62; and monoclonal gammopathy related polyneuropathy of undetermined significance: n = 8) were included (Table 1 and Supplementary Table 1) (van der Meché and Schmitz, 1992; van der Kooi et al., 1996; de Die-Smulders et al., 1998; Van den Berg-Vos et al., 2002; van Koningsveld et al., 2004; Hagemans et al., 2005; Van Asseldonk et al., 2005; Hughes et al., 2008; Hermans et al., 2010). The initial MRC data of all patients from the above-mentioned cohorts were selected for the purposes of the current study. All patients met their international criteria for their illness (Asbury and Cornblath, 1990; AANA, 1991; Bushby and Beckmann, 1995; Hirschhorn and Reuser, 2001; EFNS/PNS, 2006; Prior, 2009). The diagnosis ‘monoclonal gammopathy related polyneuropathy of undetermined significance’ was established after excluding all possible causes for the gammopathy and polyneuropathy (Miescher and Steck, 1996). For all studies, consent was obtained according to the Declaration of Helsinki and approval was obtained by the Ethical Committee of the institution in which the original study was performed.

View this table:
Table 1

Basic characteristics of patients with neuromuscular disorders

Study/disorderPatients examined (n)Age, mean years (SD), rangeGenderSymptoms duration, mean (SD), range (years)
Female (%)Male (%)
INCAT study11354.3 (15.1), 14–8454 (47.8)59 (52.2)6.9 (3.1), 0.5–28
Dutch Guillain–Barré syndrome trial, 199214747.5 (19.2), 5–8171 (48.3)76 (51.7)
Dutch Guillain–Barré syndrome trial, 2004 +  Guillain–Barré syndrome pilot study, 199425050.5 (20.1), 7–89109 (43.6)141 (56.4)
Myotonic dystrophy type-116943.5 (11.5), 18–6983 (49.1)86 (50.9)5.3 (6.9), 0–34
Multifocal motor neuropathy10254.3 (12.1), 26–7976 (74.5)26 (25.5)11.8 (8.2), 0.2–43
Pompe's disease6248.1 (11.9), 25.1–71.729 (46.8)33 (53.2)7.9 (9.3), 0–30.5
Limb-girdle muscular dystrophy10537.8 (15.6), 3–7064 (61.0)41 (39.0)21.0 (14.5), 0–58
ICE CIDP11751.6 (16.5), 18–8340 (34.2)77 (65.8)5.3 (6.2), 0.2–34.3
  • In the INCAT studies, a total of 113 patients were examined (Guillain–Barré syndrome, n = 83; CIDP, n = 22 and monoclonal gammopathy-related polyneuropathy of undetermined significance, n = 8).

  • ICE CIDP = immune globulin intravenous for CIDP; INCAT = inflammatory neuropathy cause and treatment.

Assessment scale

The MRC grading system provides the following grades: 0, paralysis; 1, only a trace or flicker of muscle contraction is seen or felt; 2, muscle movement is possible with gravity eliminated; 3, muscle movement is possible against gravity; 4, muscle strength is reduced, but movement against resistance is possible and 5, normal strength.

The MRC grades of the following six muscle pairs comprise the MRC sum score for Guillain–Barré syndrome and CIDP: upper arm abductors, elbow flexors, wrist extensors, hip flexors, knee extensors and foot dorsal flexors (Kleyweg et al., 1991). In the remaining cohorts (monoclonal gammopathy of undetermined significance related polyneuropathy, multifocal motor neuropathy, mytonic dystrophy type-1, Pompe's disease and limb-girdle muscular dystrophy), the muscles groups evaluated represented the clinical picture of each illness (see Supplementary Table 1 for available muscles per cohort).

Rasch analysis

Rationale for using Rasch method

In health care, outcome measures often consist of ordinal multi-item questionnaires, based on the classical test theory (DeVellis, 2006). Concerns have been raised about inappropriate analysis of the generated summed scores that are erroneously assumed to be at the interval level (Wright, 1999; Svensson, 2001; DeVellis, 2006). The ability of a scale to provide fundamental measurements should be established before the more commonly reported psychometric attributes such as being simple, valid, reliable and responsive (Tennant et al., 2004a; Tennant and Conaghan, 2007). Modern scientific methods have been adopted to overcome the shortcomings of traditional measurements. One of the most widely used modern techniques is the Rasch method that transforms ordinal obtained scores into interval-level variables, and whose fit of data satisfies numerous checkpoints as part of model expectations (Rasch, 1960; Tennant et al., 2004a; Tennant and Conaghan, 2007).

In the current study setting, the Rasch model assumes that a patient with less weakness (thus more strength) will have a greater chance of receiving a higher MRC grade by the examining physician. A comprehensive description of the Rasch analysis specifically for neurologists is provided elsewhere (Pallant and Tennant, 2007; Tennant and Conaghan, 2007; van Nes et al., 2011). Briefly, the Rasch model shows what should be expected in response to ordinal items if interval scaling is to be achieved. For this, the following criteria should be fulfilled.

  1. Thresholds examination: when using items with more than two response categories, as for the MRC grades, proper ordering of the response options should be verified using category probability curves for each muscle group examined, since this will reflect the ability of physicians to use the MRC in a correct way (Shaw et al., 1992). Ordered thresholds are where the transition (threshold) between categories map on to the underlying construct in the expected manner. Thus the transition between categories (e.g. 1–2 and 2–3) reflects increasing levels of muscle strength (Fig. 1, top). Disordered thresholds can occur when physicians use the response options inconsistently, and this inconsistency can be a source of misfit to model expectations. The difficulty discriminating between response options may be a result of too many options, or where the labelling of the options is confusing, both of which may lead to misinterpretation.

  2. Fit statistics: fit statistics give an indication of how well the items fit the expected ordering required by the model. This ordering is a probabilistic version of Guttman Scaling (Guttman, 1950). There are two general categories for detecting misfit: overall (summary) misfit, using the entire response matrix, and the individual fit (examining all items and all persons individually). At the summary level the overall mean residual values for both persons and items can be calculated. These values are expressed as a z-score with a mean of 0 and a SD of 1, values of which indicate perfect fit to model expectations (Tennant et al., 2004b; Pallant and Tennant, 2007). The summary item–trait interaction statistic reflects the fit of the observed data to the model's expectations and is represented by the chi-square. This statistic gives an indication of the invariance of the ordering of items across patients with different levels of muscle strength. A significant chi-square indicates a failure to retain this ordering. Besides the overall fit residuals, individual item–chi square and item and person residuals can be calculated (Tennant et al., 2004b; Pallant and Tennant, 2007; Vandervelde et al., 2007).

  3. Item bias: response to an item should not vary between groups (e.g. males versus females), given the same level of the underlying trait (e.g. muscle strength). We assessed item bias (differential item functioning) on the MRC data for various available person factors. A panel (I.S.J.M. and C.G.F.) have studied the range of the factors age, disease duration and physician's experience in the available cohorts. Subsequently, these factors were categorized into subgroups for item bias analyses, aiming for an equivalent distribution of participants among the subgroups (25–33% per subgroup).

  4. Local dependency: local dependency arises when items are linked such that the response on one item is dependent upon the response to another. Item sets with correlations >0.3 are considered a source of misfit to the model (Tennant and Conaghan, 2007).

  5. Unidimensionality: the Rasch model assumes unidimensionality and consequently post hoc tests are included in the analysis to ensure that this assumption holds. These tests involve a comparison of person estimates (of muscle strength) based upon two sets of items identified from the first principal component analysis of the residuals. The estimates for every individual are compared by a t-test, and where <5% of these comparisons are significantly different, this is taken to support the assumption of unidimensionality (Smith, 2002).

Figure 1

MRC response categories related thresholds explained and coded as ‘normal’ (green) or ‘abnormal’ (red)’. The first row shows the ideal graph representation for proper thresholds for the MRC grades. The first threshold at the intersection between MRC response options 0 and 1 corresponds to a 50% chance of choosing between these two adjacent categories. The thresholds should be ordered to obtain an ideal graph: Threshold 1 < Threshold 2 < Threshold 3 < Threshold 4 < Threshold 5. The second and third row give graphical examples of proper threshold ordering (coded as a green box) and disordered threshold (coded as a red box), respectively. T1–T5 = Thresholds 1–5, respectively.

Test procedure

Figure 2 presents a systematic ordering of the analyses performed in the current study. In Analyses 1 and 2 (MRC Rasch analyses for each cohort separately and MRC Rasch analyses after pooling data) the following were examined:

  • Step 1: the presence of ordered thresholds, thus determining whether the MRC grades for each muscle were ordered reflecting physicians’ ability to use these grades properly;

  • Step 2: in case of disordered thresholds: to seek for the most optimal modified MRC rescored categories that could serve as a unified tool in manual muscle scoring for all muscle groups. In order to rescore the MRC categories, the frequency distribution among the categories and the category probability curves were taken into account;

  • Step 3: the presence of possible item bias was examined to determine whether factors such as physician's experience in the neuromuscular field (i.e. would a more experienced physician apply the MRC grades more appropriately than a less experienced physician?) or possible differences between community and university based neurologists might influence the applicability of the MRC grades.

Figure 2

Study algorithm showing a systematic ordering of the analyses performed in the current study. First analyses (Analysis 1): initial MRC Rasch analysis for each individual cohort separately (thus performing a total of eight individual model analyses). Second analyses (Analysis 2): MRC Rasch analyses after pooling data at the muscle level from available cohorts. Third analyses (Analysis 3): MRC sum score Rasch analysis in patients with Guillain–Barré syndrome and CIDP. DM1 = myotonic dystrophy type-1; ICE = immune globulin intravenous for CISP; INCAT = inflammatory neuropathy cause and treatment; LGMD = limb-girdle muscular dystrophy; MMN = multifocal motor neuropathy.

Therefore, in Analyses 1 and 2, the Rasch method was applied only to examine the ability of physicians to use the MRC grading system in a proper way and to determine whether there were factors influencing its use. These analyses were not intended to create a formal Rasch-built MRC sum score for each cohort individually, since some of the cohort samples were relatively small, hence not fulfilling the basic requirements for proper Rasch modelling (Linacre, 1994).

For Analysis 2, MRC data were pooled at the muscle level from the various available cohorts and resubjected to Rasch analysis, thereby controlling for diagnosis as a possible confounder and strengthening the category frequencies for the various muscles (Linacre, 2002).

In Analysis 3 (MRC sum score Rasch analysis in Guillain–Barré syndrome/CIDP), the MRC 12 muscles sum score was analysed to determine whether Rasch model expectations would be met. The first two steps for Analyses 1 and 2 (see above) were also performed here. Subsequently, since there is no consensus regarding a fixed sequence of steps that must be followed when doing Rasch analyses, our rationale for the following steps were constantly driven by the biggest abnormality seen when studying all subjected data to Rasch, thereby focusing on all aspects that did not meet model expectations (misfit statistics, fit residuals disturbances, under-/overfitting, local dependency >0.3, and item bias). All steps needed were taken to create a unidimensional scale at the interval level.

Rasch general aspects, person factors and statistics

The MRC data of each muscle group were treated as if it was an ‘item’ that needed to be completed by the patients with response options from 0 to 5 (in the current study setting: a physician completed the ‘item’) using the Rasch Unidimensional Measurement Model 2020 software (Andrich et al., 2003).

In Analysis 1 (MRC Rasch in each cohort separately), the following person factors were taken into account (Supplementary Table 2):

  1. Ages: 1, <40 years; 2, 40–59 years and 3, ≥60 years;

  2. gender: 0, female; 1, male;

  3. type of disease: (a) inflammatory neuropathy-cause-and-treatment cohort: 1, Guillain–Barré syndrome; 2, CIDP; 3, gammopathy related polyneuropathy; (b) myotonic dystrophy cohort: 1, mild; 2, adult; 3, child/congenital type; and (c) limb-girdle dystrophy cohort: 1, sarcoglycanopathy; 2, calpainopathy; 3, limb-girdle type 1B, 2B and 2I; 4, unclassified;

  4. duration of disease: (a) for all cohorts except limb-girdle patients: 1, <5 years; 2, 5–9 years; 3, 10–19 years; 4, ≥20 years; and (b) for limb-girdle cohort: 1, <10 years; 2, 10–19 years; 3, 20–29 years; 4, ≥30 years;

  5. physician's experience in the neuromuscular field: for the inflammatory–neuropathy cause and treatment studies: 1, <3 years experience; 2, 3–5 years experience; 3, ≥6 years experience; the latter group constituting senior neuromuscular experts;

  6. institution; for the Guillain–Barré syndrome trials: 0, community based; 1, university based hospital; and

  7. country; for the Guillain–Barré syndrome cohort 2004: 1, The Netherlands; 2, Germany; 3, Belgium.

For Analyses 2 and 3 (MRC Rasch after pooling data and MRC sum score in Guillain–Barré syndrome/CIDP), the factors studied included (i) age category: 1, <40 years; 2, 40–59 years; 3, ≥60 years; (ii) gender: 0, female; 1, male; and (iii) type of disease: depending on the amount of illnesses being pooled together, each illness received a separate code.

For the MRC sum score analysis, the person separation index was also determined, which should be ≥0.7 for proper group comparison, and a minimum of 0.9 for clinical use (Bland and Altman, 1997). The unrestricted partial credit Rasch model was used. Further analyses were undertaken using Stata 11.0 statistical software for Windows XP.

Results

General aspects

A total of 1065 patients with various neuromuscular disorders were included from nine studies. Table 1 presents the patients’ characteristics. MRC data on 72 muscle groups were available (Supplementary Table 1, muscle groups assessed per cohort).

Analysis 1: initial MRC Rasch analyses for each cohort separately

Step 1: thresholds examination

The obtained data (ordered thresholds coded ‘green’; disordered coded ‘red’; see Fig. 1 explaining these codes) for each muscle group in each cohort were summed, thereby creating a total of 210 muscle groups examined. A total of 165 (78.6%) muscle groups had disordered thresholds versus 45 (21.4%) with ordered thresholds. The disordered thresholds were particularly seen in the mid-response MRC category area (options 2 to 4).

Step 2: rescoring MRC categories

A panel of neuromuscular and Rasch researchers studied the category probability curves and category frequencies of the MRC data for each muscle group. Subsequently, all muscle groups were systematically rescored in order to obtain the maximum uniform amount of response options, which turned out to be four categories (instead of six). Of the 210 muscle groups rescored, 182 (86.7%) had ordered thresholds and 28 (13.3%) were still disordered. Sixteen of these disordered muscle groups were distally located (finger spreaders, flexors and extensors, grip strength, wrist extensors and flexors, foot dorsal and plantar flexors). All disordered muscle groups except two were found in the two cohorts with the lowest number of patient's records (multifocal motor neuropathy, n = 102 and Pompe's disease, n = 62).

View this table:
Table 2

Results before and after rescoring the response options from six to four categories with corresponding threshold locations

Graphic
Graphic
  • A normal threshold ordering of the MRC grades is coded as ‘normal’; abnormal threshold is ‘abnormal’. See Fig. 1, for examples, explaining these codes. Threshold location = location of the thresholds of adjacent MRC response options located on the created ruler (and expressed in logits).

Step 3: item bias examination

Eight selected person factors were used to examine possible item bias on the available muscle groups (see Supplementary Table 2 for available factors per cohort). Before rescoring, a total of 806 muscle groups (96.9%) were free of item bias, thus not being influenced by person factors like physicians’ experience. Item bias was only found in 26 muscles (3.1%; on person factor gender: 11 muscle groups had uniform differential item functioning, on disease type: eight had uniform, on disease duration: two uniform and one non-uniform, on physician's experience: two uniform, on country: one uniform, and on age: one muscle group had uniform differential item functioning). Differential item functioning findings did not change after rescoring at the individual cohort level.

Analysis 2: MRC Rasch analyses after pooling data

Step 1: thresholds examination

Similar findings were seen in the pooled data analyses. Of the 72 muscles examined, a total of 53 muscle groups (73.6%) had disordered threshold, particularly in the mid-categories (Table 2, ‘before rescoring’).

Step 2: rescoring MRC categories

Equivalent to the findings of Analysis 1 and based on the location seen of the disordered thresholds (mid-categories 2–4), all muscle groups were systematically rescored to a modified MRC with four categories. Table 2 provides the data for the rescored MRC categories (see last four columns). Ordered thresholds were restored for all muscles except the masseter muscle. A modified version of the MRC grading system was created for clinical use with the following grades: 0, paralysis; 1, severe weakness (defined as >50% loss of strength); 2, slight weakness (<50% loss of strength); and 3, normal strength. A 50% cut-off was based on the following: having four modified response options as having three thresholds (three theoretical intersections between adjacent response options: Thresholds 1, 2 and 3); half of the distance between Threshold 3 (representing the intersection between modified MRC Grades 2 and 3; location 4.3 logits) and Threshold 1 (intersection between modified grades 0 and 1; location −2.98) for all 72 muscle groups is located at 0.66 logits [−2.98 (location Threshold 1) + 0.5 × 7.28 (0.5 × distances between Threshold 3 and Threshold 1)], which is close to the mean for Threshold 2 (intersection between the modified Grades 1 and 2): 0.46.

Step 3: item bias examination

Differential item functioning was also performed on person factors age, gender and diagnosis (Supplementary Table 3). Item bias was hardly seen on age and gender. On diagnosis, 33 muscle groups (45.8%) demonstrated differential item functioning (Supplementary Table 3).

Analysis 3: MRC sum score Rasch analysis in patients with Guillain–Barré syndrome and chronic inflammatory demyelinating polyradiculoneuropathy

Step 0: general description of patients examined and initial findings

A total of 619 patients from several cohorts [Guillain–Barré syndrome, n = 480; CIDP, n = 139; n = 272 females (43.9%) and n = 347 males (56.1%)] were available for these analyses (van der Meche and Schmitz, 1992; The Dutch Guillain–Barré syndrome study group, 1994; Merkies, 2001; van Koningsveld et al., 2004; Hughes et al., 2008). The original MRC summed score failed to meet the model expectations. Misfit statistical findings for all three statistical parameters were initially seen (Table 3, ‘initial’ analysis).

View this table:
Table 3

Summary Rasch analyses statistics for the modification of MRC sum score in patients with Guillain−Barré syndrome and CIDP

AnalysisItem fit residualsPerson fit residualsItem-trait chi-square interactionPSIUnidimensionality independent t-test (95%CI)
Mean (SD)Mean (SD)DFP-value
Initial0.147 (4.626)−0.562 (1.749)108<0.000010.940.20 (0.183–0.218)
Final0.341 (1.100)−0.316 (1.094)550.08910.91NA
  • In the final analysis, item and person fit residuals are acceptable, whereas chi-square is non-significant, indicating invariance across the trait. A person separation index of 0.91 indicates a reliable internal consistency. NA = not available; after performing split analyses, Rasch Unidimensional Measurement Model does not provide the opportunity to perform unidimensionality testing.

  • DF = degrees of freedom; PSI = person separation index.

Steps 1 and 2: thresholds examination and rescoring

Similar findings were seen here as the above-mentioned analyses. Eight muscle groups had disordered threshold. For uniformity, all 12 muscle groups were rescored to four response options, thereby restoring threshold ordering.

Step 3: local dependency and creating subsets

The following steps were driven by the strongest misfit seen to the Rasch model, which was found to be the strong local dependency findings of equivalent (right and left) muscle pairs (e.g. shoulder abductors right and left side; Spearman's correlations: ρ = 0.676–0.831). Therefore, six subsets of items were created, by combining the corresponding muscle pairs (left and right) with each other, improving the statistical parameters and resolving local dependency.

Step 4: unidimensionality examination

Based on the first principal components analysis, two comparison groups of subsets were formed with three positively loaded (arm muscle subsets) versus three negatively loaded (leg muscle subsets). The independent t-tests between these two groups suggested acceptable unidimensionality [t-test (95% confidence interval): 0.065 (0.047–0.082)].

Step 5: item bias examination

Uniform differential item functioning was demonstrated on person factor ‘disease type’ for all created muscle subsets, except for the elbow flexors subset. Therefore, each subset of muscle pairs was split in order to obtain specific subsets for the patients with Guillain–Barré syndrome and CIDP, separately. After this, the model was free of any item bias and local dependency. All subsets of items, except the ‘foot dorsal flexors for patients with Guillain–Barré syndrome’, demonstrated fit statistics within required limits. The foot dorsal flexors in Guillain–Barré syndrome had a fit residual of +5.845 (P = 0.000021), which disturbed Rasch model fitting (Table 3, final analysis for complete model fit after removing this item). However, for practical reasons the structure of the MRC sum score (composed by 12 muscles) was maintained, despite having skewed foot dorsal flexors in the Guillain–Barré syndrome subset of item. A high person separation index (0.91) was obtained for the final modified MRC sum score model.

Discussion

Manual muscle testing has been used for more than seven decades for monitoring disease progression and response to therapy in various neuromuscular disorders (van der Meche and Schmitz, 1992; van der Kooi et al., 1996; de Die-Smulders et al., 1998; Merkies, 2001; Van den Berg-Vos et al., 2002; van Koningsveld et al., 2004; Hagemans et al., 2005; Van Asseldonk et al., 2005; Hughes et al., 2008; Hermans et al., 2010) and the MRC grading system has been widely used for this purpose (Dyck et al., 2005; Compston, 2010). This study systematically examined the discriminatory capacity of the MRC grading system in a broad mixture of patients with neuromuscular illnesses, assessing a large number of muscles using the Rasch method. The original six response categories of the MRC grading system failed to differentiate among patients with various degrees of muscle weakness. Three-quarters of all muscles examined demonstrated disordered thresholds, especially in the mid-response categories (options 2–4). The inability of physicians to apply the apparently intuitive and easily applicable MRC grades in a proper way is consistent with reports criticizing the MRC system (Dyck et al., 2005; Schreuders et al., 2006; MacAvoy and Green, 2007; Merlini, 2010). The current paper also shows that the observed disordered thresholds were generally independent of factors such as physicians’ experience, duration of illness or type of practice (university- versus community-based). The original MRC grading system inconsistencies were also ‘cross-validated’ throughout the neuromuscular cohorts, as the findings between the individual disease cohorts were equivalent.

After systematically rescoring all MRC grades to a modified four category response option, the accuracy of the MRC grading system increased by fulfilling ordered thresholds requirements. While this change from six to four response options might intuitively lower the ability to capture functional changes in a patient, from the current evidence, however, keeping the six responses will give a false sense of precision and potentially increase the error in assessment, which may lead to a false sense of clinically meaningful improvement when it may not exist.

The current paper shows the difficulties with the use of summed scores derived from various muscles tested in patients with Guillain–Barré syndrome and CIDP. However, after Rasch modelling, we were able to present a transformed modified MRC 12 muscle groups summed score for use in future clinical studies in these disorders (Kleyweg et al., 1991). The analyses revealed severe misfit of the foot dorsal flexors. However, since Guillain–Barré syndrome and CIDP are length-dependent neuropathies, we decided to keep this muscle group in the final model. The presented Rasch-built modified interval MRC sum score is considered a substantial improvement compared to the evaluation of muscle strength using ordinal based scores, which in essence are not suitable for performing adequate statistics. The modified interval MRC sum score for patients with CIDP should, however, be applied with some caution, because only 139 patients were assessed, which is lower than the proposed sample size requirements for a stable model (Linacre, 1994). Also, the responsiveness of the Rasch-built modified interval MRC summed score for patients with Guillain–Barré syndrome and CIDP needs to be demonstrated in longitudinal studies, which is currently being investigated (Liang, 1995). However, its personal separation index was high, indicating good ability of the modified scale to differentiate between groups of patients with various degrees for muscle weakness. Finally, since the differential item functioning findings on diagnosis (Supplementary Table 3) demonstrate that neuromuscular illnesses may behave differently, it is conceivable that Rasch-built MRC sum scores are needed for specific illnesses such as multifocal motor neuropathy and other neuromuscular diseases. These efforts should be the focus of future studies.

In conclusion, the original MRC manual muscle testing grading system failed to meet the Rasch model expectations in various neuromuscular disorders, despite being the standard metric in neurology worldwide. Modification of this grading system to four response categories (0, paralysis; 1, severe weakness; 2, slight weakness; and 3, normal strength) may significantly enhance the ability of clinicians to differentiate degrees of weakness with greater precision and accuracy. Based on this, we have developed a Rasch-built interval MRC summed score for use in future clinical studies evaluating patients with Guillain–Barré syndrome and CIDP. Future studies are warranted to improve the solidness of our neurological assessments.

Supplementary Material

Supplementary material is available at Brain online.

Acknowledgements

We thank Professor S. Waxman from the Yale University, USA who helped us to increase the transparency and reading of the manuscript. The members of PeriNomS Study Group are as follows: A.A. Barreira, Brazil; D. Bennett, UK; P.Y.K. van den Bergh, Belgium; V. Bril, Canada; G. Devigili, Italy; R.D. Hadden, UK; A.F. Hahn, Canada; H.-P. Hartung, Germany; R.A.C. Hughes, UK; I. Illa, Spain; H. Katzberg, Canada; A.J. van der Kooi, The Netherlands; J.-M. Léger, France; R.A. Lewis, USA; M.P.T. Lunn, UK; O.J.M. Nascimento, Brazil; E. Nobile-Orazio, Italy; L. Padua, Italy; J. Pouget, France; M.M. Reilly, UK, I. van Schaik, The Netherlands; B. Smith, USA; M. de Visser, The Netherlands; D. Walk, USA

Footnotes

  • *The members of PeriNomS Study Group are provided in the Acknowledgements section.

Abbreviations
CIDP
chronic inflammatory demyelinating polyradiculoneuropathy
MRC
Medical Research Council

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

References

View Abstract