OUP user menu

Revisiting the dissociation between singing and speaking in expressive aphasia

Sylvie Hébert, Amélie Racette, Lise Gagnon, Isabelle Peretz
DOI: http://dx.doi.org/10.1093/brain/awg186 1838-1850 First published online: 23 June 2003


We investigated the production of sung and spoken utterances in a non‐fluent patient, C.C., who had a severe expressive aphasia following a right‐hemisphere stroke, but whose language comprehension and memory were relatively preserved. In experiment 1, C.C. repeated familiar song excerpts under four different conditions: spoken lyrics, sung lyrics on original melody, lyrics sung on new but familiar melody and melody sung to a neutral syllable ‘la’. In experiment 2, C.C. repeated novel song excerpts under three different conditions: spoken lyrics, sung lyrics and sung‐to‐la melody. The mean number of words produced under the spoken and sung conditions did not differ significantly in either experiment. The mean number of notes produced was not different either in the sung‐to‐la and sung conditions, but was higher than the words produced, hence showing a dissociation between C.C.’s musical and verbal productions. Therefore, our findings do not support the claim that singing helps word production in non‐fluent aphasic patients. Rather, they are consistent with the idea that verbal production, be it sung or spoken, result from the operation of same mechanisms.

  • Keywords: aphasia; melody intonation therapy; singing; songs; speaking
  • Abbreviation: MIT = melody intonation therapy

Received September 16, 2002. Revised February 13, 2003. Second revision March 26, 2003 Accepted March 28, 2003


Expressive language deficits occurring after brain damage, encompassed under the general heading of aphasia, have long been reported (e.g. Broca, 1861). One striking report from clinical settings concerns severely aphasic patients who, having recovered none or few of their speech abilities, are still able to sing previously learned songs with well‐articulated and linguistically intelligible words. Such patients, who became aphasic after the removal of the whole left hemisphere (e.g. Smith, 1966) or after stroke (Assal et al., 1977; Yamadori et al., 1977; Jacome, 1984), have a very restricted output with respect to spontaneous speech, but seem to be able to recover word articulation with the support of melody. All reports but one (Assal et al., 1977) concern patients with no particular musical training. Thus, this ability to sing with words seems to reflect a general trait of cerebral organization.

The classical interpretation of this long‐standing observation is that singing familiar songs would depend on right hemisphere functions, whereas propositional (generative) speech would depend on left hemisphere functions. Damage to the left hemisphere would therefore leave intact the patients’ ability to sing previously learned songs, whereas damage to the right hemisphere would impair ‘automatic’ speech and familiar song singing. Some case reports fit with this interpretation (e.g. Speedie et al., 1993), but some others do not (e.g. case 2 in Assal et al., 1977).

These reports remain descriptive in that they are not substantiated by quantitative behavioural data of patients’ production. The only study that is does not support the idea that music facilitates word production: Cohen and Ford (1995) examined the production of 12 patients who became aphasic after a unilateral left hemisphere vascular accident. Patients chose three songs from a list of eight songs they had sung in therapy over the previous 3 months. Patients had to produce the words of these familiar songs under three experimental conditions: spoken naturally (without any support), spoken with a steady drumbeat accompaniment and sung accompanied with the melody played on a keyboard. Word intelligibility (i.e. the average number of intelligible words divided by the average duration of each condition) was greater when utterances were spoken, compared with when they were sung or spoken accompanied by a drumbeat. There are, however, a number of shortcomings in the Cohen and Ford (1995) study that prevent firm conclusions from being drawn. For example, the type and severity of aphasia of the patients were not specified. More importantly, only group data are reported, which may not be representative of how each patient performed in the different conditions. The averaging of performance may create effects that do not reflect any of the individual performance patterns, or may cancel out effects that would have been significant at the individual level (see Caramazza and McCloskey, 1988). Also, as was suggested by the authors themselves, the word intelligibility index could have been compromised in the rhythm and melody conditions because judges had to listen to recorded speech with musical instruments in the background. Masking effects could thus contribute to the lower intelligibility found in these conditions. Another factor is that the word intelligibility index was only an approximation of the patients’ production: only a random sample taken from each patient in each condition was examined, which rendered the productions not necessarily comparable from one condition to another, or from one patient to another. Furthermore, patients’ performance on the melodic dimension of the songs was not assessed. However, depending on its complexity, music may perturb rather than facilitate word production. Music may increase word recall provided it is simple and repetitive (Rubin, 1977; Serafine et al., 1984; Wallace, 1994), but word recall is higher for spoken than for sung words when music is more difficult to learn (Wallace, 1994; Racette and Peretz, 2001). Finally, given that amusia (i.e. a deficit in musical abilities occurring after brain damage) is more often associated with aphasia than not (see Marin and Perry, 1999), the presence of amusia, in some or all of the patients, cannot be ruled out.

The weakness of the empirical support, or the lack thereof, demands a closer assessment of the claim that music helps word articulation. In particular, the comparison between sung and spoken productions needs more methodological rigor. The question is of theoretical and clinical importance. First, music and language are generally viewed as activities relying on largely separate neural and cognitive processes. Indeed, even though deficits in language and music functions may co‐occur after brain damage, it is likely that the association between amusia and aphasia is attributable to the proximity between brain regions responsible for those functions. A number of carefully detailed perceptual studies carried out with brain‐damaged patients have amply documented functional dissociations between language and music, i.e. musical abilities may be spared while language functions are impaired, and vice versa, even in songs. Specifically, amusic patients with no aphasia have been described who are still able to recognize and judge words of songs as familiar despite an inability to recognize the corresponding musical song part (Peretz et al., 1994, 1997; Peretz, 1996; Griffiths et al., 1997; Hébert and Peretz, 2001). From this perspective, a sparing of the ability to sing words while being unable to speak the very same words would be challenging to the current view that lyrics and melodies are separable entities, even in songs.

Secondly, the very observation of patients being able to sing despite not being able to speak is at the origin of the melodic intonation therapy (MIT), a technique that has been considered as the most promising avenue for aphasia rehabilitation by the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology (1994). MIT does not use the singing of familiar songs per se as a form of therapy; rather, it uses intonation patterns that exaggerate the normal melodic content of phrases that gradually vary in complexity as the patient makes progress, with the underlying idea being that musical intonation ability, a form of singing, is a right hemisphere function. Interpretation of successful recovery from aphasia with the MIT technique was that it facilitated the use of language areas of the right hemisphere, after damage to the language areas in the left hemisphere (Albert et al., 1973), or that it increased the role of the right hemisphere in inter‐hemispheric control of language (Sparks et al., 1974). Recent evidence, however, does not support either one of these interpretations (Belin et al., 1996). Rather, it suggests that right hemisphere activation would sign the persistence of aphasia rather than its recovery, and that the latter is associated with a reactivation of language‐related structures in the left hemisphere.

Theoretical accounts other than those involving hemispheric specialization should thus be considered. In particular, there are alternative explanations regarding why singing would have the potential to facilitate word production. One possible explanation is that sung words are articulated at a slower rate in singing than in speaking. This speed reduction would enable word pronunciation that would otherwise be too rapid. Slowed speech is characteristic of non‐fluent aphasias (Geschwind, 1971). Singing generally enhances fluency and word intelligibility of patients with motor speech disorders, such as dysarthria or stuttering, presumably by speed reduction (Healey et al., 1976; Colcord and Adams, 1979; Cohen, 1988; Pilon et al., 1998). Furthermore, it has been shown that syllable lengthening, which is an acoustic correlate of speed reduction in singing, helps non‐fluent aphasic patients when they use MIT: the longer the syllables are, the more phrases are produced by patients (Laughlin et al., 1979). It is probable that syllable chunking and rhythmic anticipation also participate in this advantage of singing over speaking, although their contribution have never been formally assessed.

Another potential contributing factor to the facilitating effect of singing over speaking is that production of familiar songs imposes a reduced demand for language formulation. Familiar songs are overlearned and use non‐propositional language. They are encoded as ‘word strings’ that are recalled verbatim (Wallace, 1994; Peretz et al., 1995). Moreover, the tight bonding between words and music in songs makes them difficult to separate in memory in normal listeners (e.g. for familiar songs see Hébert and Peretz, 2001; for novel songs see Serafine et al., 1984). The musical part of familiar songs can help to provide access to verbal knowledge when direct access to lexicon is compromised by amnesia (Baur et al., 2000). Conversely, access to song representation in memory can be achieved through access to the speech lexicon when access to music is compromised by amusia (Steinke et al., 2001). Because they are strongly connected in memory, producing familiar melodies could help to retrieve the production of words associated with these melodies. Moreover, as familiar songs have been heard and produced repetitively, the mental representation of songs is not only tied to their content (words and music), but also to their motor program pattern. This could explain why automatized formulations such as familiar song words, prayers, and other similar materials such as months of the year and days of the week are less vulnerable than unfamiliar material to brain damage.

In summary, there is little empirical and theoretical ground to support the claim that singing words can be spared despite severely impaired speech abilities. In the present study, we carry out the first systematic evaluation of word production in a severely non‐fluent aphasic patient by comparing his sung and spoken production of the same utterances. Two experiments examined familiar and novel materials, respectively. Productions were analysed in terms of both words and notes produced, in order to establish whether or not music imposes a load on memory and production, as it can occur in text recall.

Case description

Neurological history

The patient, C.C., is a right‐handed retired policeman (with 12 years of school education) who was 60 years old when he suffered unilateral cerebral damage caused by a right sylvian thrombosis. On the morning of May 23, 1997, his wife found him lying on the bed, with left superior hemiplegia and aphasia. On admission, the neurological examination further revealed a left facial paresis as well as paresis of the left arm and leg and left hemianopsia. Head CT scan and more recent MRI images (Fig. 1) revealed a right temporo‐fronto‐parietal hypodensity involving cortical and subcortical regions extending to the internal capsule, destruction of the temporal pole, atrophy in the region of the sylvian fissure, and ventricular enlargement. There was no sign of intracranial blood or hypertension. Given the atypical occurrence of aphasia following right hemisphere damage in a right‐handed man, a control CT scan and an MRI examination were carried out and confirmed that there was no evidence of cerebral damage other than the one initially found on the right side.

Fig. 1 (A and B) MRI scan of C.C., taken 66 months post‐stroke, showing a right temporo‐fronto‐parietal lesion.

C.C. scored 100% right‐handed on the Edinburgh Handedness Inventory (Oldfield, 1971), including eye dominance: when he worked as a police officer, he used his right eye to properly align his weapon to the target.

C.C. was admitted to rehabilitation for 12 months, where he underwent physical, occupational and speech therapy. On the day of discharge, he had recovered much of his physical abilities (he was able to walk with a cane), but he was still severely aphasic.

Neuropsychological assessment

A summary of C.C.’s cognitive functioning is presented in Table 1. Some of the tests were administered twice, coinciding approximately (within a 2‐month period) with the times when the experimental testing was carried out. The first testing was carried out about 6 months post‐infarct, and the second about 3 years later. Mini‐Mental State Examination (Folstein et al., 1975), as well as verbal IQ and verbal memory assessment, were not possible due to his severe expressive aphasia. C.C.’ scores on non‐verbal IQ and MQ [Revised Wechsler Adult Intelligence Scale (WAIS‐R) and Revised Wechsler Memory Scale (WMS‐R)] were slightly below average. This was attributable to a deficit in attention and slowness in information processing, and is compatible with the severity of aphasia displayed by C.C. His performance was characterized by an impaired verbal (but not non‐verbal) working memory. Performance on long‐term and semantic memory tests was normal.

View this table:
Table 1

C.C.’s performance on intelligence and memory assessments in session 1 (6 months post‐infarct) and in session 2 (33 months post‐infarct)

Session 1 scoresSession 2 scores
Performance IQ7580
Picture completion*78
Picture arrangement*56
Block design*810
Object assembly*88
Digit symbol*26
Non‐verbal MQ9281
Working memory
Digit span (forward/backward)5/24/2
Visuo‐spatial span55
Word span33
Long term memory
BEM 144 (immediate and differed recognition; Signoret, 1991)24/2424/24
Facial recognition (Warrington, 1984)46/5044/50
Rey figure immediate recall*912
Semantic memory
Pyramids and Palm Trees test (Howard and Patterson, 1992)46/50
BORB (Riddoch and Humphreys, 1993)
Picture–word association36/40
Real–unreal judgements: hard24/32
Real–unreal judgements: easy30/32
Item match30/32
Association match29/30

*Scores scaled according to age.

Language assessment

Spontaneous speech was severely impaired both quantitatively and qualitatively, and characterized by aborted sentences, filling words, neologisms and phonemic paraphasias. A summary of C.C.’s language functioning is given in Table 2. Language assessment was carried out with some of the subtests from a French adaptation of the Boston Diagnostic Aphasia Examination (Mazaux and Orgogozo, 1981). However, most of the language tests were drawn from the MT‐86β Aphasia Battery (Nespoulous et al., 1992), for which normative data in French are available (Béland and Lecours, 1990; Béland et al., 1993). Overall, performance showed a discrepancy between receptive and expressive language abilities. There was some improvement over time on both naming and verbal fluency tests, but C.C.’s performance remained very much below average. Testing was effortful, and error types were consistent with those found in spontaneous speech. Out of a total of 55 errors, most were omissions (34.5%), and the rest were distributed among paraphasias (20%, most often semantic, e.g. Embedded Image Embedded Image becomes Embedded Image ), perseverations (16.4%), neologisms (9.1%, e.g. Embedded Image becomes Embedded Image ), phonemic transformations (10.9%, 3.4% of which became lexical, e.g. Embedded Image becomes Embedded Image , Embedded Image becomes Embedded Image ), and other incomplete responses (9.1%, e.g. Embedded Image becomes lik). In contrast, automatic speech was well preserved except for a few phonemic transformations, with normal performance on most tests involving series.

View this table:
Table 2

C.C.’s performance on linguistic assessments in session 1 (6 months post‐infarct) and session 2 (33 months post‐infarct)

Session 1Session 2
Boston Diagnostic Aphasia Examination
Naming5/60, severity 116/60, severity 2
Automatized speech
Digits 1 to 10nn
Days of the weeknn
Months of the yearnn
Words of the familiar song ‘Au clair de la lune’n–
Melody of the familiar song ‘Au clair de la lune’n
MT‐86β aphasia battery
Verbal fluency29
High and low frequency words and nonwords24/2525/25
Short sentences (4 words)1/11/1
Long sentences (8–10 words)0/20/2
Oral comprehension
Word and sentence picture matching32/47
Body‐part identification under oral instruction6/8
Body‐part identification under written instruction4/8
Object manipulation2/8
Word reading13/30
Token test17.5/36

n = normal performance; n– = production below normal, i.e. with phonemic transformations.

Comprehension was impaired, but less severely than expression, and related in great part to C.C.’s working memory problems. For instance, in the Token test, his performance was 14 out of 15 on the first 15 items (five words or less), and degraded promptly when the instructions were eight words long or more. Repetition involving words and short phrases was normal and dropped when sentences involved eight words or more. The diagnosis was crossed mixed aphasia, with a more severe deficit on the expressive side.

Musical assessment and automatized speech

C.C. was not a formally trained musician, but had been an amateur singer all his life (he still loves to sing), both solo and in choirs. C.C. provided us with tapes containing live performance of his singing before his accident, and therefore we are confident that he had excellent pre‐morbid singing abilities. A preliminary musical assessment on the Montreal Battery of Evaluation of Amusia (MBEA; Liégeois‐Chauvel et al., 1998) indicated that C.C. was within the normal range on all subtests, except in the scale discrimination condition and incidental memory test, where C.C. performed slightly below the controls, and within, or just above, 1 SD from the controls’ mean. The controls were nine elderly subjects with a mean age of 58.2 years (range 55–64) and on average 13.9 years (range 7–20) of education. We assessed C.C.’s memory for highly familiar songs by asking him to make a familiarity judgement for 20 tunes (without lyrics) presented in random order, half being familiar and the other half novel (see Table 3). C.C. could correctly classify 18 out of the 20 tunes.

View this table:
Table 3

C.C.’s performance scores on various musical tests and automatized speech (other than the ones involved in the language assessment)

C.C.Control [mean (range)]
Musical tests
Lexical19/2019.7/20 (18–20)
Scale24/3026.4/30 (25–29)
Contour22/3026.1/30 (22–29)
Interval23/3025/30 (21–28)
Rhythm26/3028.8/30 (22–30)
Meter25/3023.7/30 (21–27)
Incidental memory24/3027.7/30 (26–30)
Familiarity judgement18/20
Continuation of familiar songs:
Automatized speech

C.C.’s ability to retrieve the lyrics of well‐known songs was assessed by examining whether or not he could continue songs when given the first part. He was given the first half‐phrase of song under three conditions: sung on a neutral syllable, with the lyrics spoken, or sung with lyrics and music. He was asked to carry on in the same manner (i.e. to continue either the tune only, the lyrics only or the song), and could sing the tunes on a neutral syllable without any difficulty (20 out of 20), but could not continue the lyrics of any of the songs (0 out of 20). When given the songs, he could accurately sing about half of them with words and music, either perfectly or with some errors (14 out of 20).

C.C.’s automatic speech was preserved for prayers, but not for proverbs, as he could recover the last part of only four out of 18 very popular sayings (yet in seven instances C.C. missed one or two words only).

In summary, despite very impaired spontaneous speech abilities and severe expressive aphasia, this initial assessment established that C.C. was not amusic. Cueing him with songs (words and music) seemed to help him continue the lyrics, but it was not clear whether the music would help him to retrieve the words of songs under more controlled conditions. Since his ability to repeat was relatively preserved, repetition was used in the following experiments.

Patients and methods



The patient, C.C., participated in experiments 1 and 2. The same materials (with some exceptions, as described below) served in two testing sessions at two different time points (session 1 at 6 months post‐infarct; session 2 at 33 months post‐infarct).


Control data for the two experiments were obtained once from four healthy retired policemen with no history of neurological or psychiatric diseases, at the time of session 2 for C.C. Their socio‐economic backgrounds, handedness and age closely matched those of C.C. (mean age 61.8 years, range 59–66). None of them had formal musical training, and they were all singers in a police amateur choir. All subjects (C.C. and controls) gave their informed consent to all tests administered. The study was approved by The Ethics Committee of the Research Center on Aging of The Sherbrooke Geriatric University Institute.

Procedure for C.C.

The excerpts were sung to the patient by the experimenter. The live procedure ensured good contact with C.C. and a dynamic environment, and enabled also the use of visual as well as auditory cues. This situation therefore placed C.C. in the best testing conditions. C.C. was instructed to repeat each excerpt immediately after hearing it. The whole testing session was recorded using a portable digital audio tape (DAT) recorder.

Procedure for controls

All the excerpts from testing session 2 (i.e. excerpts as sung by the experimenter to C.C.) were extracted from the DAT tape, and presented to controls. In other words, controls heard the excerpts as they were actually sung or spoken to C.C. They were placed in slightly more difficult testing conditions, however, since they were not presented with a live performance and hence could not use visual cues. The controls were tested individually, and their own testing session was also recorded.

Data scoring

All the productions were saved in individual computer sound files. Two musically trained judges made independent quantitative and qualitative scorings of both texts and melodies.

For text, the percentage of correctly repeated words was calculated. Percentage of words, rather than syllables, was considered as the dependent variable, since numbers of syllables sometimes differ between sung and spoken renditions. Elisions (equally present in both spoken and sung versions) were considered as part of the word to which they were attached.

Criteria for considering a word as ‘incorrect’ were as follows: any change from the originally presented words (see below the type of change), omissions or inversion of words. A point was withdrawn from the raw score for any addition of words or (unintelligible) word string at the beginning or within the utterance. Errors were classified according to six relevant linguistic categories: phonemic, lexical, omissions, inversions, additions and neologisms/unintelligible.

For melodies, the percentage of correctly repeated notes was calculated. Out‐of‐tune or missing notes were considered as mistakes. One point was withdrawn for each additional note, and for rhythmic mistakes.

Experiment 1: familiar song production

In experiment 1, we investigated C.C.’s performance on familiar songs. In session 1, we were particularly interested in determining whether or not singing the original familiar song (original words with original music‐matched songs) would be better reproduced than singing the familiar words to an equally familiar, but different, melody (mismatched songs). In other words, we were interested in assessing the effect of singing per se in comparison with singing the original songs. If singing per se were a facilitator for word production, either by virtue of speed reduction or some other means, then singing familiar words should yield comparable performance whether words were sung to the original or to a mismatched melody. In session 2, two conditions were added, that is, a spoken condition where C.C. had to say the words of the songs in a natural manner, and a condition where he had to sing the melodies of the songs on a neutral syllable ‘la’. If singing helps to produce words accurately, then the sung versions should yield higher performance than the spoken ones.


Sixteen pairs of highly familiar songs were selected from a repertoire of childhood and traditional songs (Peretz et al., 1995). Excerpts were 9.5 notes on average (range 7–16), and 6.7 words (range 4–11). Two long excerpts were shortened for session 2, reducing the average number of notes to 8.5 notes (range 7–11) and the number of words to 5.9 (range 4–7). Excerpts are presented in Appendix A (available as supplementary material at Brain Online). The song of each given pair was interchangeable in terms of text and melody with another song, thus generating two new, mismatched, songs with every pair of familiar songs (see Fig. 2).

Fig. 2 Example of how two mismatched songs were constructed from two matched songs in experiment 1.

There were two experimental conditions in session 1. In the first condition, the original songs (original text and melody) were sung (mean duration 4.47 s, SD 1.21, range 2.7–7.4 s). In the second condition, the mismatched songs (text and melody interchanged) were sung (mean duration 4.54 s, SD 1.35, range 2.9–7.7 s). There was no significant difference between the durations of these versions [t(14) = –0.22, P = 0.83]. In session 2, two ‘isolated’ conditions were added, that is, the spoken version (mean duration 2.52 s, SD 0.61, range 1.7–3.9 s) and the melody on the neutral syllable ‘la’ (mean duration 4.35 s, SD 1.14 s, range 2.9–6.9 s). As expected, the duration of productions were significantly different [F(3,42) = 14.18, P < 0.001]. The spoken versions were produced at a faster rate than the other versions (P < 0.001), but the latter did not differ from each other (all P values >0.05). Thus, on average, the spoken versions were 1.67 times faster than the sung versions.


In session 1 (C.C. only), trials including the matched and mismatched melodies used with a given set of lyrics were presented in pairs. That is, the same lyrics were presented twice in a row, once with the familiar and once with the mismatched melody, in a counter‐balanced order.

In session 2 (C.C. and controls), the spoken versions were added. Each condition (matched, mismatched and spoken) was split into three blocks of five or six excerpts, and organized in such a way that order of presentation of these three conditions was counter‐balanced across excerpts. A short pause followed every block. The melodies on the syllable ‘la’ were presented in one single block at the end of the testing session.

Results and comments

Inter‐rater agreement

Inter‐rater agreements were calculated separately for language and music for C.C. and his controls (collapsed across sessions). For language (spoken and sung versions), the inter‐rater correlations were: r(76) = 0.98, P < 0.001 for C.C., and 1 for controls. For music (matched and mismatched versions), the inter‐rater correlations were r(77) = 0.98, P < 0.001 for C.C., and r(174) = 0.95, P < 0.001 for controls. Overall, there were very few words and notes for which no consensus could be reached among raters (between 0 and 3.4% of productions), and those were withdrawn from the analyses.

The percentage of correctly repeated words and notes for each excerpt in each condition served as dependent variables. Data are shown in Table 4. Owing to a technical error in session 2, one song was removed from the analyses for that session.

View this table:
Table 4

Results (percentage correct) for familiar songs (experiment 1)

Music (%)Language (%)
Session 1Session 2Session 1Session 2
Isolated89.894.9 (85.3–98.4)72.2100
Matched99.085.696.2 (90.3–98.3)87.286.5100
Mismatched95.097.890.5 (70.6–98.6)78.466.9100

*Ranges shown in parentheses.

C.C.’s performance

An ANOVA (analysis of variance) was conducted with excerpt as the random factor, and session (1 versus 2), condition (music versus language) and version (matched versus mismatched) as within‐item factors. This analysis revealed a significant main effect of condition [F(1,14) = 27.40, P < 0.001], with an overall better performance on music than on language (with 94.1 and 79.2%, respectively). More interestingly, the analysis also yielded a significant interaction between condition and version [F(1,14) = 10.33, P < 0.01]. Performance between matched and mismatched versions did not differ for music (with 91.9 and 96.3% for matched and mismatched, respectively; P > 0.05), but did on language (with 86.4 and 72.0% for matched and mismatched, respectively; P < 0.01). Thus, C.C. produced more words when music and lyrics were set in their original combination.

C.C. versus controls

The next analysis compared C.C.’s performance in session 2 with that of the controls. C.C.’s performance was well within the range of the controls in the music condition, with 91.7% (range for controls 70.6–98.6%), but not in the language condition, where C.C. performed at 75.2% and the controls reached perfect performance in the three versions. As normality of distributions could not be assumed, nonparametric tests were run to examine the performance of C.C. and of controls separately. Friedman’s tests revealed that C.C.’s performance in the three versions did not differ from each other in the music [χ2(2) = 1.62, P = 0.45] or in the language condition [χ2(2) = 2.47, P = 0.29]. This pattern of performance was also found for the controls, but with a trend for matched songs yielding better performance than mismatched songs in the music condition [χ2(2) = 5.087, P = 0.08]. The controls’ performance in the language condition was at ceiling in all three versions.

Error types for C.C. are shown in Table 5. Because of the small number of errors in each category, χ2 could not be computed. However, the error pattern looks similar across the three conditions (spoken, matched and mismatched), and is consistent with the language assessment tests. That is, errors are mostly characterized by omissions, and phonemic (e.g. mulε̃/windmil becomes numε̃) and lexical (e.g. Embedded Image becomes Embedded Image ) errors, with the exception of inversions, which occurred more often in the spoken version than in the sung versions.

View this table:
Table 5

Error types for familiar (experiment 1) and unfamiliar (experiment 2) songs, percentage of errors to C.C. averaged across session

Phonemic (%)Lexical (%)Omissions (%)Additions (%)Inversions (%)Neologisms/unintelligible (%)

Overall, results of this experiment show a dissociation between C.C.’s musical and language abilities, and show that C.C.’s performance remained similar more than 2 years after his brain infarct. More importantly, these results do not support the claim that singing words yields better performance than speaking the same words, even though motor programming of sung versions enjoyed a privileged status in memory over spoken versions. C.C.’s performance in the language condition, however, was overall quite high, in that C.C. could repeat correctly on average between 65 and 86% of the words of the song excerpts, depending on the condition involved. Although this performance on language was still very much below the perfect performance of his controls, this was an outstanding achievement given his very impaired spontaneous speech abilities. This underlines the contrast between generative speech and rote memory: C.C.’s spontaneous speech is very poor, yet for a number of familiar song excerpts he could nevertheless produce 100% of the words accurately. This is a very striking contrast if considered in isolation, and supports the idea that songs are encoded as highly automatized word strings in memory.

What our results demonstrate is that across a pool of highly familiar songs, word production is similar across speaking and singing. Therefore, neither the act of singing or the presence of the musical part of the songs help C.C. to produce words more accurately, although C.C. was generally more fluent with this material than in everyday spontaneous speech. This was also supported by the findings that the three versions, that is, matched, mismatched and spoken versions, yielded comparable performance. Therefore, singing per se, i.e. singing words onto an equally familiar, but not the original, melody, was no better for word production over speaking. It should be noted that producing words and music was not detrimental to C.C.’s production of notes, since singing the tunes on a neutral syllable yielded the same performance, in terms of number of notes, as singing with lyrics. This was also true for the controls.

Experiment 2: unfamiliar song production

In experiment 2, C.C. was presented with novel songs. Thus, the sung versions had no particular advantage over the spoken versions, given that none had ever been previously heard or produced. As in the previous experiment, C.C. had to repeat each novel excerpt under the different experimental conditions (i.e. sung, spoken, sung‐to‐la melody) after having heard it. If singing improves word production, repetition of sung words should be better than spoken words. On the other hand, if sung and spoken production stem from the same processes, performance for sung words should not differ significantly from the one for spoken words.


Sixteen unfamiliar songs were selected from a repertoire of childhood songs (les plus belles chansons, 1995). Excerpts were 9.4 notes long (range 7–14), and 5.6 words (range 4–8) on average. Three excerpts were shortened for the testing session 2, therefore reducing the average number of notes to 8.2 (range 6–13) and the number of words to 4.9 (range 4–7). Excerpts are presented in Appendix A (available as supplementary material at Brain Online). Each excerpt served in three different experimental conditions: In the first condition, the melodic part of the song was sung on the neutral syllable ‘la’ without accompaniment (mean duration 3.62 s, SD 0.86, range 2.55–5.42 s). In the second condition, the text of the songs was spoken in a natural manner (mean duration 2.22 s, SD 0.59, range 1.2–3.72 s). In the third condition, the song (text and melody) was sung without accompaniment (mean duration 3.54 s, SD 0.76, range 2.55–5.42 s). An ANOVA on durations taking sessions (1 versus 2) as the between‐items factor, and versions (spoken, sung, sung‐to‐la) as the within‐items factor yielded no significant effect of session (F < 1), but a significant main effect of version [F(2,60) = 185.29, P < 0.001]. The mean duration for the spoken version was shorter than for the two other versions (P < 0.01), and the latter did not differ from each other (P = 0.30, by post hoc comparisons). Again, on average, the spoken version was produced 1.6 times faster than the other conditions.


In Session 1 (C.C.), conditions 1 and 2 (i.e. Isolated) were split into two blocks of eight excerpts presented in a random order. Half of the spoken excerpts were presented first, followed by half of the melodies sung on the syllable ‘‘la’’, followed by a short pause. The other block contained the remainder of the excerpts. Finally, the songs were presented in one single block.

In session 2 (C.C. and controls), the three conditions were split into three blocks of five or six excerpts. These were presented in a counterbalanced order. A short pause followed every block. The melodies on the syllable ‘la’ were presented in one single block at the end of the testing session.

Results and comments

Inter‐rater agreement

Inter‐rater agreements were calculated separately for language and music for C.C. and controls (collapsed across sessions), and were again very high. For language (spoken and sung versions), the inter‐rater correlations were r(62) = 0.95, P < 0.001 for C.C., and r(126) = 0.99, P < 0.001 for controls. For music (sung‐to‐la and sung versions), the inter‐rater correlations were r(62) = 0.96, P < 0.001 for C.C., and r(126) = 0.95, P < 0.001 for controls. The very few words and notes for which no consensus could be reached among raters (between 0.8 and 8.6% of productions) were withdrawn from the analyses.

The percentage of correctly repeated words and notes for each excerpt in each condition served as dependent variables. Results are shown in Table 6.

View this table:
Table 6

Results (percentage correct) for unfamiliar songs (experiment 2)

Music (%)Language (%)
Session 1Session 2Session 1Session 2
Isolated829089.1 (77.8–96.8)43.468.2100
Sung79.584.292.2 (87.8–96.8)50.951.799.8 (99.1–100)

*Ranges shown in parentheses.

C.C.’s performance

An ANOVA was conducted with excerpt as the random factor, and sessions (1 versus 2), condition (music versus language) and version (isolated—sung‐to‐la or spoken—versus sung) as within‐item factors. This analysis revealed a significant effect of condition [F(1,15) = 22.22, P < 0.001], the music condition yielding much better overall performance than the language condition (with 84 versus 53.9%, respectively). Although there was a trend for overall performance to improve from session 1 to session 2 (64.4 versus 73.4%, respectively), it was not significant [F(1,15) = 3.77, P < 0.08]. There was no other significant or near‐significant main effect or interactions.

C.C. versus controls

A second analysis was run to compare C.C.’s performance in session 2 with those of the controls. Once again, C.C. performed in the range of the controls in the music condition, with 87.2% (range of controls 77.8–96.8%), but not in the language condition, with 60% (range of controls 99.1–100%). Nonparametric tests revealed that C.C.’s performance did not differ among the two versions in the music condition [χ2(1) = 0.00, P < 1.00], or in the language condition [χ2(1) = 1.33, P = 0.25]. The same pattern was found for controls, with χ2(1) = 1.60, P = 0.21 in the mcondition, and χ2(1) = 1.00, P = 0.32 in the language condition. Thus, the performance of both C.C. and the controls did not differ from one version to another (i.e. isolated—spoken or sung‐to‐la—versus sung).

The scores for each error type were collapsed across conditions (spoken versus sung) and sessions (1 versus 2). The error types were similar across the sung and spoken conditions, and did not differ statistically [χ2 = 0.14, not significant] (see Table 5).

The results of this experiment show that C.C. did not perform better when singing than when speaking, despite intact musical abilities. When presented with novel phrases, C.C.’s performance on language was lower than that of the controls, irrespective of the version produced. The reduced speed, regularity or syllable chunking imposed by singing seemed insufficient to produce a better performance in sung versions than in spoken versions. Rather, there is a (non‐significant) trend for the music to impose an additional burden on C.C.’s ability to produce words, as his performance tended to be poorer in the sung condition than when there was no music associated.

Overall, however, C.C. was able to repeat about half of the words of the songs, either sung or spoken. Again, this is a remarkable achievement given his poor ability to generate spontaneous speech. Repetition was a task that assisted our non‐fluent patient in producing words that would have been otherwise difficult or impossible to articulate. Presumably, both the model of articulation provided by the experimenter during testing and the fact that there was no need to generate speech were beneficial to our patient. The fact that repetition would be a better means than simple production (without a model) to assess word production was suggested from the formal language assessment, although these repetition tasks were not sensitive enough to perfectly predict C.C.’s performance in our tasks. In the formal language assessment, C.C. performed quite well on word repetition, but did not do very well with long sentences because of his working memory problems.

The important aspect to bear in mind is that C.C.’s striking ability to produce song excerpts was not suspected from his spontaneous conversation. Thus, our findings do not contradict the clinical observation that patients who cannot sustain a spontaneous conversation can nevertheless sing. What our results demonstrate, however, is that patients who can sing can also articulate words of those songs, if put in the right conditions to do so.

However, having previous knowledge of the songs made it easier for C.C. to repeat the lyrics, whereas it did not change his performance on music. This was verified by an ANOVA that compared C.C.’s performance (in session 2) on familiar and unfamiliar materials, as a function of conditions (music versus language) and versions (isolated—spoken or sung‐to‐la—versus sung). As expected, the interaction between material and condition was significant [F(1,30) = 4.26, P < 0.05]: C.C.’s performance on music did not differ across material, with 87.7 and 87.2% on familiar and unfamiliar materials, respectively (P > 0.05), but differed on language, with 79.3 and 60% for familiar and unfamiliar materials, respectively (P < 0.01). This confirms the fact that C.C. is at ease with music, either familiar or unfamiliar, and that familiar song representations encoded in his long‐term memory helped him to produce the words originally associated with the music.


The main finding from the present study shows that singing does not facilitate word articulation in the case of a non‐fluent aphasic patient. This applies to both pre‐learned and novel songs. Music did not play a facilitating role in word production, by virtue of either mechanical constraints including speed reduction or cognitive load, such as syllable chunking and rhythmic anticipation. Rather, word articulation seems to be governed by mechanisms that are insensitive to the mode of expression, be it sung or spoken.

C.C. represents a classical instance of aphasia without amusia: he performed normally when he had to produce the musical parts of songs, but at a much lower level when he had to repeat the words, either sung or spoken. Such a dissociation between performance on parts of the same stimuli (i.e. songs) consisting of both a verbal and a musical part, is not banal. Indeed, aphasia often occurs jointly with amusia, most likely because a natural lesion is likely to affect cognitive functions such as music and language that depend on systems lying in close proximity in the brain. C.C. is yet another case demonstrating a dissociation between music and language skills (e.g. Peretz et al., 1994, 1997; Hébert and Peretz, 2001; Steinke et al., 2001). The present case study of C.C. serves as the first demonstration that language and music can be dissociable at the level of production. To date, all previous reports have involved perception and memory tasks. Thus, C.C.’s results indicate that different networks subserve music and language, and that even in songs, the musical and the language parts are processed by independent mechanisms.

Another important contribution of this study is from a methodological perspective. We showed that when music and speech are compared under identical testing conditions they maintain their functional autonomy. This involved comparing production of the same utterances in both speech and singing. In most prior studies, spontaneous speech was simply contrasted with singing well‐known songs. From this perspective, C.C. is not unique; he could also reproduce the words of familiar songs with few errors and quite fluently. Thus, C.C.’s results are consistent with the classical claim of non‐fluent aphasic patients still being able to sing. The contrast between generative and rote memory, as exemplified by his spontaneous speech and his song production, respectively, is indeed remarkable. Despite the fact that the mean number of words correctly repeated was not significantly different when singing than when speaking, C.C.’s singing gave the raters a feeling of fluency that was particularly strong. This impression of fluency, presumably produced by legato (i.e. no pauses between words), is not captured in the overall scores presented in this study. Unfortunately, fluency is poorly defined, and its corresponding acoustical cues are not known (Gordon, 1998). Therefore, the impression of fluency in singing certainly contrasts with the limited and jerky spontaneous speech output of non‐fluent aphasic patients. However, a rigorous comparison between sung and spoken productions yields a quite different picture: when a non‐fluent aphasic patient is able to sing a familiar song with words, he is also able to produce the corresponding words in a spoken fashion. The same is true for novel materials for which there were no pre‐existing mental representations.

If singing does not facilitate word articulation, then the MIT should perhaps no longer be considered as key a tool in this endeavour. However, as mentioned previously, there are additional benefits that the MIT may provide. For instance, it has recently been suggested that a treatment emphasizing the rhythmic attributes of target utterances improved repetition to a greater degree than one emphasizing their melodic attributes (Boucher et al., 2001). Similarly, reduction of speech rate, improvement of vocabulary and maintenance of proper breathing may all contribute to the improvement of spontaneous speech. Extra‐linguistic aspects such as maintaining motivation and high spirits in patients after brain damage by feeling competent in singing should also be taken into consideration. There remains a great need for formal assessments of the MIT interventions, along with detailed information about patients to be included.

A further aspect of the study that is worthy of discussion is the fact that C.C. became aphasic as a consequence of a right hemispheric lesion. The question is to what extent a reversed brain organization for language (in a right‐hander) has implications for song performance. At the behavioural level, the type of aphasia displayed by C.C. is classic, in that C.C. displays a pattern of performance that is typical for a non‐fluent aphasic patient. In support of this claim, Coppens and colleagues made a thorough analysis of published crossed aphasia cases, and concluded that the symptomatology of aphasia displayed by these patients (be it categorized as mirror‐image or anomalous aphasia type) does not differ from the one displayed by left hemisphere damaged patients (Coppens et al., 2002). In addition, following the criteria defined by the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology (1994) based on phenomenology rather than hemisphere of lesion, C.C. may have been a choice candidate for the MIT, especially since he could sing without difficulty. C.C.’s typical profile suggests that his performance in our experiments is representative of the performance of non‐fluent aphasic patients in general. Thus it is expected that our findings would be replicated in other patients with similar types of language impairments. At the very least, our study provides a robust way of testing this prediction in other patients.


We wish to thank C.C. for his generous and enthusiastic participation in this study. We also wish to thank Dr Howard Chertkow for scientific advice on anatomical data, Drs Denis Bergeron, Tamàs Fülöp and Julie Brazeau‐Lamontagne for their help in obtaining scans, Joël Macoir for his help in the language assessment, Nancy Blanchette for help in data scoring, and Renée Béland, who provided insightful discussions at several stages of the study. This research was supported by a FRSQ research fellow and grant awarded to S.H., by a grant from the Medical Council Research of Canada to I.P. and a doctoral fellowship from the Medical Council Research of Canada to A.R.


  1. 14.
  2. 42.
View Abstract