OUP user menu

Neural systems for speech and song in autism

(CC)
Grace Lai, Spiro P. Pantazatos, Harry Schneider, Joy Hirsch
DOI: http://dx.doi.org/10.1093/brain/awr335 First published online: 1 February 2012

Summary

Despite language disabilities in autism, music abilities are frequently preserved. Paradoxically, brain regions associated with these functions typically overlap, enabling investigation of neural organization supporting speech and song in autism. Neural systems sensitive to speech and song were compared in low-functioning autistic and age-matched control children using passive auditory stimulation during functional magnetic resonance and diffusion tensor imaging. Activation in left inferior frontal gyrus was reduced in autistic children relative to controls during speech stimulation, but was greater than controls during song stimulation. Functional connectivity for song relative to speech was also increased between left inferior frontal gyrus and superior temporal gyrus in autism, and large-scale connectivity showed increased frontal–posterior connections. Although fractional anisotropy of the left arcuate fasciculus was decreased in autistic children relative to controls, structural terminations of the arcuate fasciculus in inferior frontal gyrus were indistinguishable between autistic and control groups. Fractional anisotropy correlated with activity in left inferior frontal gyrus for both speech and song conditions. Together, these findings indicate that in autism, functional systems that process speech and song were more effectively engaged for song than for speech and projections of structural pathways associated with these functions were not distinguishable from controls.

  • autism
  • functional MRI
  • DTI
  • language
  • music

Introduction

Autism is a complex developmental disorder currently estimated to affect ∼1:100 children (Centers for Disease Control, 2009; Kogan et al., 2009). It is defined by reduced social interaction, impaired communication and restricted interests and behaviour. One prevailing characteristic of autism is that it is a disorder associated with atypical brain connectivity affecting distributed neural systems (Belmonte et al., 2004; Just et al., 2004; Courchesne and Pierce, 2005; Minshew and Williams, 2007). Neuroimaging studies of language in high-functioning autistic subjects have reported decreased activation in Broca's area (left inferior frontal gyrus, inferior frontal gyrus as well as decreased functional and structural connectivity between frontal and posterior language processing regions (Just et al., 2004; Harris et al., 2006; Kana et al., 2006; Sahyoun et al., 2009; Fletcher et al., 2010). However, while these findings support disconnection models for autism that propose under-connectivity between long-range brain regions (Just et al., 2004; Minshew and Williams, 2007), disconnection models do not explain the frequently observed preservation of related functions such as music (Mottron et al., 2000; Heaton et al., 1999, 2007, 2008, 2009; Allen et al., 2009) where, in healthy adults, neural systems engaged during music and language functions tend to be highly coincident (Koelsch et al., 2002; Limb, 2006; Schon et al., 2010; Patel, 2011).

To investigate this paradox between impaired language and preserved music functions in autism, we combined functional MRI, functional connectivity and diffusion tensor imaging (DTI) to evaluate functional and structural systems sensitive to language and music in low-functioning autistic patients and typically developing age-matched controls. Functional MRI was used to identify relative activation of frontoparietal areas engaged during speech and song stimulation and functional connectivity used to determine the co-variation of functional MRI signals between cooperating regions. DTI tractography was used to compare the structural organization and integrity of fibre pathways connecting temporal and frontal auditory language and music areas between autistic and control groups. In particular, a dorsally projecting pathway, known as the arcuate fasciculus, and a ventrally projecting pathway, comprised of temporal–frontal connections via the extreme capsule, including the inferior fronto-occipital and the uncinate fasciculi (Schmahmann and Pandya, 2006; Anwander et al., 2007; Catani and Mesulam, 2008; Frey et al., 2008; Glasser and Rilling, 2008; Saur et al., 2008).

In typically developing subjects, frontal (i.e. inferior frontal gyrus, also known as Broca's area) and temporal–parietal regions (i.e. superior temporal, middle temporal gyrus and angular gyri, parts of which overlap with Wernicke's area) have been shown to be involved in both music and language processing (Koelsch et al., 2002; Brown et al., 2006; Limb, 2006; Schon et al., 2010; Patel, 2011). Disconnection models of autism (Just et al., 2004; Courchesne and Pierce, 2005; Sahyoun et al., 2009) propose that impaired functional and structural connectivity underlie decreased activation in left inferior frontal gyrus during language tasks. Alternatively, reduced activity in left inferior frontal gyrus and under-connectivity observed during language tasks has been proposed to be associated with language specific deficits that could result from upstream impairments in perceptual or linguistic processing of speech stimuli (Muller, 2007; Groen et al., 2008; Overy, 2009). In this study, we investigate whether language-specific deficits could be taken as decreased recruitment for speech relative to music, rather than disconnection.

Materials and methods

Subjects

Thirty-six patients with autism participated in this study, all recruited by physician referral. Twelve of these patients (mean age = 12.40 years, SD = 4.70, range = 7.01–22.47; males = 10; right-handed = 10) were imaged while alert. Images from an additional 27 patients (mean age = 8.62 years, SD = 3.14, range = 5.41–17.93; males = 22; right-handed = 15) who received MRI evaluations (structural, functional and DTI scans) for medical purposes under light propofol sedation were also included in this study, following parental consent. Twenty-one non-autistic controls (mean age = 10.72 years, SD = 4.42, range = 3.57–17.78; males = 14; right-handed = 19) were imaged alert and recruited via flyers distributed within the Columbia University Medical Centre and Columbia University campuses. Due to excessive head movement, several additionally recruited subjects (two control and four autistic subjects) were excluded from the final data set. All parents provided consent for their child to participate, or to include their clinical MRI examinations in this research study as approved by the Columbia University Medical Centre Institutional Review Board. When possible, assent was also obtained from the subjects. A subset of data from these same subjects was used in a previous study investigating the potential application of functional MRI for identification of autism (Lai et al., 2011).

Comparisons between autistic and control groups were based on subsets of age-matched subjects and functional MRI comparisons between autistic and control groups included only images acquired during alert conditions. Patients and controls were not matched on IQ since patients were low functioning. Within-group contrasts for the autism group included sedated and non-sedated subjects. Additionally, DTI comparisons included data from both non-sedated and sedated subjects for the benefit of an increased sample size. DTI images from 5/12 alert patients were excluded due to visible movement. Table 1 provides a summary of demographic information (age, gender and handedness) for subjects included in all DTI and functional MRI comparisons.

View this table:
Table 1

Subject number and demographic information for DTI, functional MRI, functional connectivity and correlation analyses

SubjectsMean age (SD)RangeFemale (n)Left (n)Amb (n)
DTI: autism versus control comparisons (autism non-sedated + sedated)
    Autism (n = 16)11.02 (3.72)5.81–17.93221
    Controls (n = 18)11.17 (4.39)3.57–17.78410
Functional MRI: autism versus control comparisons (all non-sedated)
    Autism (n = 12)12.40 (4.69)7.01–22.47211
    Controls (n = 12)12.06 (4.03)4.19–17.78310
Functional connectivity analyses, no between-group comparisons (autism non-sedated + sedated)
    Autism (n = 39)9.614.045.41–22.47476
    Controls (n = 21)10.724.423.57–17.78720
DTI and functional MRI correlations, no between-group comparisons (autism sedated)
    Autism (n = 27)8.373.055.41–17.93265
    Controls (n = 21)10.724.423.57–17.78720 Amb =

Autistic children were eligible for the study if they met diagnostic criteria for autism on the Diagnostic and Statistical Manual of Mental Disorders-IV and the Autism Diagnostic Interview-Revised. Language impairment was measured using the Language and Communication subscale of the Autism Diagnostic Interview-Revised and clinical observation (Supplementary material). Control subjects were eligible to participate if they did not have a diagnosis of autism, a psychiatric disorder or siblings diagnosed with autism. Levels of normal social and academic functioning for controls were confirmed via scholastic performance at grade level and parent report. Both autistic and control children were without co-morbid neurological or developmental disorders, as determined by clinical evaluation performed by the referring physician for autistic subjects and parent report for control subjects.

Music affinity ratings

Due to the severity of impairment of autistic subjects in this study, a formal assessment of music function was not performed. Rather, parents were asked to rate how receptive their child was to the selected music on a scale from 0 to 10 defined as: 0, not at all, does not orient to music when playing, may as well be random noise; 5, moderately, will listen to and enjoy if playing, but will not request it; and 10, extremely, will request it to be played frequently and listen attentively for long periods of time.

Imaging procedures

Alert autistic and control subjects

We employed a ‘silent video’ technique to help minimize head movement and distractibility in young children. A familiar video was shown (on mute) throughout the scan duration. The silent video was presented via a rear-projection screen or MRI compatible goggles depending on the child's preference. Comparisons between auditory epochs and baseline revealed brain activity related to the auditory stimulus rather than the video that occurred continuously during both stimulus and baseline epochs.

Sedated autistic patients

Patients imaged under conventional clinical conditions were imaged to rule out organic disease while sedated with propofol for neurological assessment in accordance with the medical requisition of the referring physician. Although sedation has been associated with reduced amplitude of the functional MRI signal during auditory stimulation (Heinke et al., 2004; Davis et al., 2007), it is indicated to map language systems in children under clinical conditions (Souweidane et al., 1999). See Supplementary material for description of anaesthesia management. Parents of eligible patients provided permission to include these medical scans.

Functional magnetic resonance imaging stimulation

Each functional MRI acquisition (run) was 2 min 29 s in duration, consisting of a 24-s period of background scanner noise, followed by four 15 s presentations of the auditory stimulus alternating with 15 s when the auditory stimulus was not presented. Two runs for each stimulus type (speech and song) were presented consecutively. The order of presentation was randomized across subjects. Auditory stimuli were pre-recorded by parents and presented passively to subjects via magnetic resonance-safe headphones. Although passive language stimulation primarily engages receptive processes, it is necessary for use with low-functioning children who cannot comply with task instructions during an imaging procedure. Activation in typical language areas has been previously reported during routine clinical assessments for alert (Hirsch et al., 2000) and sedated patients (Souweidane et al., 1999; Gemma et al., 2009) using similar stimulation techniques.

Speech stimuli were recordings of each child's own parents speaking in a natural and conversational manner to their child. All parents were instructed to talk about the same topics (i.e. being in the scanner, recent events, plans after the scan) although the text was not scripted in order to assure familiarity with each parent's conversational style. Song stimuli were selected as each child's favourite or preferred song containing vocals. Since autistic children often have fixed interests and are particularly receptive to familiar stimuli, it was necessary that speech and song stimuli were familiar and preferred for each subject. For the speech recordings, two independent raters judged whether the 15-s clips of voice recordings from autistic and control parents could be distinguished. Both raters judged the child's diagnosis with only 55% accuracy (11/20) with a 43% (9/20) correspondence. Close-to-chance levels of performance indicate that narratives from parents of autistic children did not differ perceptibly from controls. Audio stimuli were power normalized across subjects to ensure similar acoustic properties across subjects.

Magnetic resonance imaging acquisition

Alert autistic and control children were imaged using a research-dedicated 1.5 T GE Twin Speed magnetic resonance scanner located in the Functional MRI Research Centre at Columbia University Medical Centre. Clinical structural and functional images were acquired at the MR Imaging Centre of the Morgan Stanley Children's Hospital of New York-Presbyterian Hospital on a similar 1.5 T GE Twin Speed magnetic resonance scanner using identical sequences.

In both cases, functional MRI images were acquired using an echo planar T2*-weighted gradient echo sequence (echo time = 51 ms, repetition time = 3000 ms, flip angle= 83°). Twenty-seven contiguous axial slices covering the full brain were acquired along the anterior-posterior commissure plane, with a 192 × 192 mm field of view imaged on a 128 × 128 grid yielding an in-plane resolution of 1.56 × 1.56 mm and slice thickness of 4.5 mm. High-resolution structural images were acquired using a 3D spoiled gradient echo sequence (124 slices, 256 × 256, field of view = 220 mm), with a total scan time of 10 min 38 s. DTI images were acquired using a echo-planar sequence (repetition time = 8500 ms, echo time = 81.9, 25 directions, b = 1000 s/mm2). Twenty-seven slices were acquired with a resolution 1.02 × 1.02 mm and slice thickness of 5.00 mm on a 128 × 128 grid. The total scan time for the DTI acquisition was 3 min 58 s. Although the use of 25 diffusion directions constrains the ability to detect crossing fibres for tractography analyses, the shorter run time achieved by using fewer diffusion directions was necessary to minimize image acquisition time for children.

Image analysis

DTI and functional MRI images were processed using FSL 4.1 software (www.fmrib.ox.ac.uk/fsl/) and statistical functions in MATLAB v7.4.

Functional magnetic resonance imaging analyses

Functional image processing was carried out using FEAT (FMRI Expert Analysis Tool) Version 5.91, part of the FMRIB Software Library (Smith et al., 2004). Pre-processing consisted of brain extraction, motion correction, spatial smoothing (Gaussian kernel, full width at half maximum = 5 mm), high-pass filtering (cut-off = 60 s) and pre-whitening. General linear model analyses did not show statistically different patterns of activation with and without motion regressors. However, mean motion displacement (collapsed across runs) was the greatest in non-sedated autistic patients (mean = 1.29 mm), followed by non-sedated controls (mean = 0.17 mm), and the least in sedated patients (mean = 0.09 mm). Pre-processed images were normalized to standard MNI (Montreal Neurological Institute) co-ordinate space and entered into multiple linear regression analyses. We used standard normalization procedures that employ the adult MNI atlas. While there is some suggestion that age-based atlases may provide more detailed normalization (Aljabar et al., 2009), other studies report no age-associated bias (Ghosh et al., 2010). Future validation of these emerging methodologies will benefit following studies.

First-level general linear model analyses for each individual were performed on the speech and song runs separately to identify the main effects of auditory stimulation for each condition. To model areas with increased activity during stimulus relative to baseline periods, one regressor was included in the general linear model where auditory-on periods were assigned a weight of +1 and baseline periods were assigned a weight of 0. Group (controls and autism) effects and between-group comparisons (control > autism and vice versa) were then computed using FLAME 1, a function in the FMRIB Software Library. Comparisons between autistic and control groups included non-sedated patients’ data only. An additional main-effect group analyses, which included a regressor for age, did not yield qualitative differences in regions activated during stimulation under any condition for both autistic and control groups.

Co-variation analyses indicated the extent to which the integrity of specific tracts was correlated with functional MRI activity in left inferior frontal gyrus across subjects. Mean fractional anisotropy values (refer to section on DTI analyses) for left dorsal and ventral tracts were demeaned and included as co-variate regressors in the group general linear model analyses for speech and song conditions. This analysis was restricted to sedated autistic children because of the high quality of DTI images acquired under sedation.

Results were thresholded at Z > 1.6 (P < 0.05) and cluster corrected at a threshold of P < 0.05. Functional MRI results were thresholded at the highest P-values to minimize the probability of false-negative results when demonstrating a lack or reduction of activity.

Functional connectivity

The psychophysiological interaction analysis was performed using established methods (Friston et al., 1997; Gitelman et al., 2003). Inferior frontal gyrus was defined for each subject by multiplying individual uncorrected functional MRI activation maps (thresholded at Z > 1.6) with an anatomical mask of right and left inferior frontal gyrus from the Harvard–Oxford atlas in the FMRIB Software Library package. If no significant clusters of activation were identified within the inferior frontal gyrus, a region of interest defined by the group activation in left inferior frontal gyrus of language related activity from the control group was used for the speech runs and a region of interest of song activity from the autism group was used for the song runs. Group regions of interest for inferior frontal gyrus could not be made for autism and control groups separately because speech activation was not present in left inferior frontal gyrus in the autism group and song activation was more robust in the patient group. These regions of interest were not drawn from comparisons between groups and do not target voxels specifically greater for control versus autistic subjects. All control subjects showed significant activity in left inferior frontal gyrus during speech stimulation. Right hemisphere regions of interest were derived from the conjunction of right inferior frontal gyrus activation during song for both control and autistic groups.

A Z-score weighted time series was extracted from the peak voxel within the inferior frontal gyrus region of interest and deconvolved with the canonical haemodynamic response function. The deconvolved time series was demeaned, multiplied by the stimulus regressor and reconvolved with the haemodynamic response function (Gitelman et al., 2003). This product regressor modelled increases in functional connectivity during stimulus-on epochs relative to rest epochs, which represents the psychophysiological interaction (Friston et al., 1997). The product regressor along with the demeaned inferior frontal gyrus time series and stimulus regressor were entered into the design matrix of a general linear model analysis. The product regressor was orthogonalized with respect to the other two regressors. For the psychophysiological interaction regressor, direct song > speech and speech > song contrasts were performed for autistic and control groups separately to test whether functional connectivity during speech and song stimulation between language/music processing regions differed across conditions. Results were thresholded at Z > 1.6 (P < 0.05) and cluster corrected at P < 0.05 (Worsley, 2001).

For large-scale functional connectivity analyses, images were pre-processed using SPM8 software (Supplementary material) (Wellcome Department of Imaging Neuroscience). Pair-wise functional connectivity (Fisher's R-to-Z transformed Pearson correlations) was computed between 298 total cortical, subcortical and cerebellar regions of interest for each subject and condition (Fig. 3B; see Supplementary material for discussion of region of interest definition). We performed an initial filter step to filter out noise (approximately 38 000 positive and negative correlations generated near zero) and increase the likelihood of only including real functional connections in comparisons between conditions and groups. Connections that were either positive or negative over all subjects and conditions were analysed separately (thresholded using a one-sample t-test, P < 0.001 uncorrected). There were 5879 positive connections and 4959 negative connections. A paired t-test was then applied to identify connections that were greater for song relative to speech (song > speech), and those greater for speech relative to song (speech > song) across a range of P-thresholds (P = 0.05 to P = 0.001, Fig. 3C) in order to ensure that results of subsequent comparisons were not specific to or dependent upon particular P-value thresholds. The mean length (Euclidean distance between the end points) of these identified connections was then compared between song > speech and speech < song using two-sample t-tests. Correlations that survived P < 0.05 uncorrected thresholds (for the paired comparisons) were used to define the connections between regions of interest displayed in Fig. 3D.

Diffusion tensor imaging analyses

All DTI processing used FMRIB Software Library program tools. Pre-processing consisted of correction for eddy current and head motion using affine registration and brain extracted to exclude non-brain tissue. Diffusion tensors were then fitted using ‘dtifit’, which generates fractional anisotropy and mean diffusivity values for each voxel. Fractional anisotropy measures the degree of anisotropy of water diffusion within each voxel, which is higher within directional white matter tracts. Mean diffusivity is an indicator of the amplitude of water diffusion. Both fractional anisotropy and mean diffusivity are commonly used. However, since it has been shown that fractional anisotropy and mean diffusivity are not orthogonal, we also calculated tensor norm as a measure orthogonal to fractional anisotropy (Ennis and Kindlmann, 2006). Higher fractional anisotropy values have been associated with properties of local white matter microstructure such as increased myelination, number of axons and axon diameter. These parameters have also been used to infer the integrity of white matter tracts in the brain (Beaulieu, 2002; Le Bihan, 2003; Hagmann et al., 2006). Estimation of the distribution of diffusion parameters to model crossing fibres was run using ‘bedpostx’ (Behrens et al., 2007) for tractography analyses.

DTI tractography was performed on each subject to isolate language-related white matter projections originating from primary auditory areas using ‘probtractx’. Starting from voxels within a specified seed mask, pathways were tracked using modified Euler streamlining, which draws up to 5000 streamline samples through estimated probability distributions of diffusion direction from each voxel with a minimum fractional anisotropy value of 0.2. The output is an image where each voxel contains a value that represents the number of streamline samples from a seed voxel that passes through that particular voxel. We considered all voxels with a value >0 as part of the specified tract. Since the value for the number of streamlines is not well understood and values ranged widely across both autistic and control subjects, we did not set a specific threshold as an eligibility criterion. No threshold was set in order to include all tracts that satisfy the seed, waypoint and target inclusion criteria rather than to provide statistical comparison. Distance correction was used to account for the fact that connectivity distribution decreases with distance from the seed mask. Refer to Behrens et al. (2007) for detailed methods of tractography algorithms and the Supplementary material and Supplementary Fig. 1 for details regarding the determination of seed and waypoint masks for tractography of dorsal and ventral pathways.

Statistical comparisons of diffusion tensor imaging measures

DTI results were quantified to determine whether tracts projected to the same locations across autistic and control groups and whether the integrity of these tracts differed between groups. To identify the physical terminations of projections of each pathway, we multiplied the connectivity distribution output from the tractography analysis by the termination mask of the inferior frontal gyrus specified for tractography. Inferior frontal gyrus terminations within the inferior frontal gyrus were defined as those voxels that overlapped with the termination mask. To quantify whether the dorsal and ventral tracts terminated in the same regions for both control and autistic groups, we first summed the binarized inferior frontal gyrus termination masks across subjects for each group. Voxels in the summed image have values between 1 and the total number of subjects in each group that had a tract terminating in the same voxel. T-tests of the proportion of subjects with terminations in each voxel within the inferior frontal gyrus were performed to statistically quantify whether or not the spatial overlap differed between ventral and dorsal tracts for controls and autistic subjects. Ventral versus dorsal comparisons were performed for each group separately and control versus autism comparisons were then performed for each tract across groups.

To determine whether the integrity of tracts differed between autistic and control subjects, we compared the mean fractional anisotropy, mean diffusivity and orthogonal tensor norm values across all voxels within the dorsal and ventral tracts. The connectivity distribution output image generated during the tractography analysis was first binarized to form regions of interest of dorsal and ventral tracts. The mean of the fractional anisotropy, mean diffusivity and tensor norm values in each region of interest was computed for each subject and each tract. Control versus autism group comparisons for dorsal and ventral tracts in each hemisphere were performed using two-sample t-tests.

Results

Behavioural results

All autistic subjects scored in the high range of impairment on all three Autism Diagnostic Interview-Revised subsections (Reciprocal Social Interaction: mean = 21.18, SD = 1.66, range = 17–24; Language and Communication: mean = 18.87, SD = 2.62, range = 12–26; Restricted, Repetitive and Stereotyped Behaviour: mean = 6.00, SD = 1.15, range = 4–9). A diagnosis of autism is made when a child scores higher than a specified minimum on all three sections (Social: >10; Language: >8; Repetitive Behaviours: >3). In particular, scores on the language and communication domain for all patients in this study (range = 12–26) were well above the diagnostic minimum (>8) for autism.

Clinical observations of words uttered during a 30 min free-play session ranged from 0 to 250. Mean number of words uttered in response to a physician's prompt was 46.4 (SD = 76.16, median = 14), and the mean number of spontaneously produced words was 16.29 (SD = 42.70, median = 4). Breakdown of the percentage of children with zero to >50 words (Fig. 1A) document the limited verbal output in a majority of our patients (>50% produced <5 spontaneous words during the session). Breakdown of verbal output by age (Fig. 1B) fails to suggest a relationship between number of words and age. Linguistic comprehension was limited to simple (subject, verb, object) grammatical relationships in all subjects except for one child who was able to comprehend more complex constructions, such as the use of the passive voice or hierarchical structures. Verbal output for controls could not be assessed in the same way as the autistic patients due to the absence of standardized instruments appropriate for both low-functioning language-impaired autistic children and typically developing controls. Behavioural milestones reported by the National Institute on Deafness and Other Communication Disorders, (NIDCD, 2001; retrieved from http://www.nidcd.nih.gov/health/voice/speechandlanguage.aspx) for typically developing children include understanding of approximately 2000 words, production of more than 300 words, and use of grammatically correct compound and complex sentences by the age of five.

Figure 1

Verbal output of autistic subjects. (A) Percentage of patients with 0–50 spontaneous (filled bars) words or words uttered in response to a prompt (open bars). (B) Mean age of patients with 0–50 spontaneous or response words.

Despite language impairments, the autism group did not differ from the control group on ratings of music affinity. Parent ratings (on a scale of 1–10) of how receptive their child was to the familiar song showed no significant group differences between the autistics and controls in their affinity for familiar songs (autism mean = 8.20 SD = 2.16, control mean = 9.05 SD = 1.10, t = −1.56, P = 0.126).

Functional magnetic resonance imaging results

During passive speech stimulation, bilateral primary auditory cortices (A1) were activated for both groups. Control subjects showed additional activation in the left inferior frontal gyrus (Fig. 2A and Supplementary Table 1). For autistic subjects, activation in left inferior frontal gyrus did not reach statistical significance, even at the lowest threshold of P < 0.05 (Fig. 2A and Supplementary Table 1). Activation in posterior brain regions including left superior temporal, middle temporal and angular gyri was also reduced in autistic patients. Control > autism group comparisons confirm that both left inferior frontal gyrus and left posterior brain regions, in addition to the thalamus, medial prefrontal cortex and precuneus, were more active in controls relative to autistic subjects during speech stimulation (Fig. 2B and Supplementary Table 1). No regions in autistic children were more active for speech stimulation than in controls.

Figure 2

Functional MRI responses during speech and song stimulation. (A) Speech: control subjects (left) activated A1, temporal regions [superior temporal gyrus (STG); middle temporal gyrus (MTG) and angular gyrus (AG)] and left inferior frontal gyrus (IFG, circles), as well as SMA. Autistic subjects (right) activated A1 and superior temporal gyrus, but not in left inferior frontal gyrus (empty circles), supplementary motor area (SMA) or midline regions. (B) Speech: control < autism contrast confirmed greater activation in regions outside A1 in control subjects. (C) Song: both controls (left) and autistic (right) children activated A1 and right inferior frontal gyrus. Autistic subjects also activated left inferior frontal gyrus (circles). (D) Song: autism > control contrast showed greater activation in left inferior frontal gyrus (uncorrected). (E) Control: speech > song showed greater activity in left inferior frontal gyrus, left temporal regions, and midline structures (F) Control: song < speech contrast showed increased activation in right temporal and parietal regions. Autism: song > speech showed greater activation in bilateral inferior frontal gyrus and temporal regions. L = left; LO = lateral occipital cortex; n.s = not significant; R = right.

Functional MRI findings for song stimulation included activation in bilateral A1 and right inferior frontal gyrus for both control and autism groups (Fig. 2C and Supplementary Table 1). Interestingly, the autism group showed additional activity in left inferior frontal gyrus (Fig. 2C and Supplementary Table 1). The control > autism contrast showed greater activation in the precuneus and lingual gyrus during song stimulation in controls (Supplementary Table 1). The reverse, autism > control contrast, yielded significant clusters without cluster correction in left inferior frontal gyrus, left supramarginal gyrus and right middle temporal gyrus in autistic subjects relative to controls during song stimulation (Fig. 2D and Supplementary Table 1).

Within-group contrasts between speech and song conditions for autistic and control subjects compared activations during speech and song stimulation. No regions were more active for speech > song in the autism group, while this same contrast in the control group yielded greater speech related activation left inferior frontal gyrus and posterior regions, including superior temporal, middle temporal and angular gyri (Fig. 2E and Supplementary Table 1). However, song > speech comparisons in the autism group confirmed greater activation in both left inferior frontal gyrus and posterior brain regions during song relative to speech stimulation, including A1, planum temporale, right supramarginal gyrus and angular gyrus (Fig. 2F and Supplementary Table 1).

Images acquired for autistic patients under sedation (without the video) also demonstrated increased activation of the left inferior frontal gyrus during song relative to speech stimulation (Supplementary Fig. 2). Greater activation in right temporal regions was also observed in sedated autistic subjects for song relative to speech stimulation.

Functional connectivity

Differences in functional connectivity with left inferior frontal gyrus was assessed using a psychophysiological interaction analysis to identify regions in the brain that showed increased co-variation with inferior frontal gyrus during stimulation relative to baseline periods (Friston et al., 1997). Within-subject contrasts (song > speech) for psychophysiological interaction analysis in the autism group showed increased connectivity between left inferior frontal gyrus and left superior temporal gyrus/angular gyrus during song relative to speech stimulation (Fig. 3A and Supplementary Table 1). No areas were more highly connected with right inferior frontal gyrus for the song > speech contrast in the autism group. Sedated and non-sedated patients were combined for this within-subject contrast (see ‘Materials and methods’ section and Table 1) due to the benefit of an increased sample size. For control subjects, speech versus song contrasts did not yield significant connectivity differences between left inferior frontal gyrus and regions in language and music pathways.

Figure 3

Functional connectivity analyses. (A) Psychophysiological interaction (PPI) analysis in autistic children revealed significant song > speech increases in functional connectivity between left inferior frontal gyrus (IFG) (seed) and clusters spanning superior temporal gyrus (STG) and angular gyrus. Results are shown at Z > 1.6 (P < 0.05) and cluster corrected at P < 0.05. (B) Regions of interest used to parcellate regions for large-scale whole-brain analysis. (C) Large-scale whole-brain analysis showed that in autism (top), connections that are stronger for song relative to speech are longer on average than connections stronger for speech relative to song (plotted with 95% CI). In controls (bottom), connections stronger for speech did not differ in length to connections stronger for song. (D) Anatomical representation of functional connections (lines) between regions (spheres) stronger in song relative to speech (red, P < 0.05 uncorrected) and speech relative to song (blue, P < 0.05 uncorrected) in autistic (top) and control (bottom) subjects (left hemisphere shown). Size of each sphere represents the sum of the lengths of all its connections. For display purposes, the thickness of connections was made proportional to their lengths.

Large-scale functional connectivity between song and speech stimulation was assessed using the mean lengths (Euclidean distance) of pair-wise functional connections across regions of interest of the whole-brain (Fig. 3B) that differed between conditions. In the autism group, positive pair-wise correlations that were greater in song relative to speech (P < 0.05) had a greater mean Euclidean distance (length) than those greater for speech relative to song (song > speech, mean length = 63 mm, speech > song, mean length = 49 mm, P = 0.0009). This was significant over a range of thresholds used to define song > speech and speech > song connections (Fig. 3C). No differences were observed in controls (song > speech, mean length = 59 mm, speech > song, mean length = 60 mm, P = 0.79, Fig. 3C), consistent with the lack of differences in the psychophysiological interaction analysis for the control group. Anatomical display of these connections (defined at P-value threshold of 0.05) illustrates greater frontoposterior connectivity during song relative to speech in autistic subjects (Fig. 3D) but not controls. Greater frontoposterior connections for song > speech relative to speech > song is consistent with psychophysiological interaction results of greater functional connectivity between left inferior frontal gyrus and posterior brain regions during song relative to speech stimulation in autistic patients. Overall numbers of increased versus decreased connections were comparable between the two groups, but interestingly in autism there were slightly fewer connections during song relative to speech, and vice versa in controls (autism: song > speech = 127; speech > song = 186; total = 313; control: song > speech = 181; speech > song = 156; total = 337). This suggests fewer, but longer distance connections during song, and a greater number of short distance connections during speech in autistic subjects.

In contrast to positive functional connections, negative functional connections exhibited no significant differences in the Euclidean length of pair-wise correlations that were greater in song relative to speech in either group: autism, song > speech, 141 connections, mean length = 79 mm, speech > song, 128 connections, mean length = 81 mm, P = 0.76; control, song > speech, 149 connections, mean length = 79 mm, speech > song, 151 connections, mean length = 86 mm, P = 0.10). Taken together, these results suggest that in autism, song induces increased long-range positive, but not negative, functional connectivity.

Structural connectivity

Tractography in both control and autistic subjects revealed dorsal and ventral projections connecting A1 to the inferior frontal gyrus, corresponding to fibre tracts typically associated with language processing (Fig. 4). For both autistic and control groups, dorsal terminations in the left inferior frontal gyrus were posteriorly located, corresponding to Brodmann area (BA) 44 (Fig. 5A) and ventral terminations were located relatively anterior in the inferior frontal gyrus, within BA45 (Fig. 5B), although overlap between the dorsal and ventral terminations was also observed (Fig. 5C). Statistical comparisons between proportions of subjects with termination points from each tract confirmed spatial segregation between dorsal and ventral tracts for each group (P < 0.05; Fig. 5D). There were no observable differences between control and autistic subjects in the locations of target projections of either tract, even at the lowest uncorrected threshold (P < 0.05).

Figure 4

DTI tractography. Example of tractography results from Heschl's gyrus (A1) seed (right hemisphere, green; left hemisphere, blue) from an autistic subject (top) and a control subject (bottom).

Figure 5

DTI tractography of dorsal and ventral language pathways. (A–C) Termination points (with a minimum overlap of two subjects for visualization purposes) of left dorsal (A, red) and ventral (B, blue) pathways in control (left) and autistic (right) subjects as determined by DTI tractography from A1 to inferior frontal gyrus. (C) Green represents areas that overlapped for both dorsal and ventral terminations. (D) Within-group comparisons between proportions of subjects with dorsal and ventral terminations in inferior frontal gyrus. In both control and autistic subjects, ventral > dorsal terminations (blue) were more anterior and dorsal > ventral terminations (red) were more posterior (Z > 1.6, P < 0.05, uncorrected). Analysis failed to confirm differences in terminations of either tract for control versus autistic comparisons. R = right.

Average fractional anisotropy values of dorsal and ventral tracts in control and autism groups were compared along the entire pathway of each tract in each hemisphere. Mean fractional anisotropy values over a specific tract were computed only for subjects where tractography analyses were able to identify streamlines connecting specified seed and target regions. The percentage of subjects with successfully identified tracts was similar across patient (left dorsal = 91.2%, 31/34; left ventral = 85.3%, 29/34; right dorsal = 97.1%, 33/34; right ventral = 100%, 34/34) and control (left dorsal = 94.4%, 17/18; left ventral = 88.9%, 16/18; right dorsal = 88.9%, 16/18; right ventral = 94.4%, 17/18) groups. Comparisons of tracts seeded from A1 between controls and age-matched autistic subjects showed lower mean fractional anisotropy values over the left dorsal tract in the autism group (autism = 0.318, control = 0.349, P = 0.042, two-tailed). No significant group differences were observed for right dorsal or for either right or left ventral tracts. No significant differences were observed between groups for mean diffusivity or tensor norm in either right or left dorsal tracts. However, autistic subjects showed decreased tensor norm values over both left and right ventral tracts (left autism = 1.595 × 103, control = 1.663 × 103, P = 0.011; right autism = 1.611 × 103, control = 1.661 × 103, P = 0.046). Table 2 summarizes these results.

View this table:
Table 2

Group comparison of fractional anisotropy, mean diffusivity and tensor norm determined by two-sample t-tests (two-tailed)

AutismControl
Mean (SE)MeanP-value
Left dorsal
    Fractional anisotropy0.3182 (0.013)0.344 (0.010)0.042*
    Mean diffusivity (×10−3)0.872 (0.127)0.864 (0.109)0.653
    Tensor norm (×10−3)1.576 (0.022)1.574 (0.019)0.951
Right dorsal
    Fractional anisotropy0.325 (0.011)0.343 (0.012)0.337
    Mean diffusivity (×10−3)0.859 (0.205)0.866 (0.141)0.814
    Tensor norm (×10−3)1.547 (0.032)1.581 (0.028)0.386
Left ventral
    Fractional anisotropy0.309 (0.011)0.322 (0.012)0.614
    Mean diffusivity (×10−3)0.883 (0.910)0.916 (0.144)0.065
    Tensor norm (×10−3)1.596 (0.016)1.663 (0.026)0.011*
Right ventral
    Fractional anisotropy0.309 (0.011)0.307 (0.008)0.462
    Mean diffusivity (×10−3)0.894 (0.125)0.924 (0.114)0.087
    Tensor norm (×10−3)1.611 (0.021)1.661 (0.019)0.046*
  • *P ≤ 0.05.

Relationship between functional and structural results

Registration of functional MRI activation maps with inferior frontal gyrus terminations from the dorsal and ventral projections connecting A1 to inferior frontal gyrus revealed that activation in left inferior frontal gyrus for both speech (control) and song (autism) conditions overlaps with dorsal terminations (Fig. 6A).

Figure 6

Relationship between function and structure. (A) Functional MRI (fMRI) activation for the speech condition in controls (left) and the song condition in autistic children (right) overlaid on dorsal (red) and ventral (blue) termination points show that activation in left inferior frontal gyrus overlap almost entirely with dorsal terminations (yellow circle). (B) Correlations between functional MRI activation during song and speech within left inferior frontal gyrus region of interest and fractional anisotropy (FA) values of left dorsal and ventral tracts. (B, top) Song: activity in left inferior frontal gyrus positively co-varied with left dorsal fractional anisotropy values (Z > 1.6, P < 0.05, uncorrected). (B, bottom) Speech: activity in left inferior frontal gyrus was correlated with fractional anisotropy of left dorsal and ventral tracts (Z > 1.6, P < 0.05, cluster corrected at P < 0.05).

We determined the co-variation between fractional anisotropy values of dorsal and ventral tracts and functional MRI activation within an inferior frontal gyrus region of interest. Because usable structural DTI images were obtained from only 7 out of 12 autistic subjects imaged alert due to excess head movement, images acquired from the 27 autistic children under sedation were employed for this comparison. During song stimulation in the autism group, mean fractional anisotropy values for the left dorsal tract co-varied with functional MRI activity in left inferior frontal gyrus (Fig. 6B). No significant correlations were observed between fractional anisotropy values for the left ventral tract and activation in inferior frontal gyrus. During speech stimulation in the autism group, fractional anisotropy of both dorsal and ventral pathways was correlated with activity in left inferior frontal gyrus (Fig. 6B). Fractional anisotropy values of right dorsal and ventral tracts did not correlate with right inferior frontal gyrus activation in either condition.

Significant correlations were not observed between tract integrity and functional MRI activity in the control group. Additionally, correlations were not observed between behavioural measures and functional MRI and/or DTI results.

Discussion

Using functional MRI and DTI, we investigated the functional and structural organization of neural systems that typically overlap for language and music in autistic patients. Consistent with previous studies and models of neural disconnection (Just et al., 2004; Kana et al., 2006), we observed decreased functional responses to speech stimulation in autistic subjects in left inferior frontal gyrus and secondary auditory cortices in left temporal lobe. We also observed decreased fractional anisotropy of the left dorsal pathway, consistent with reports of reduced fractional anisotropy of the left arcuate fasciculus and temporal lobe regions in autistic subjects (Fletcher et al., 2010; Lange et al., 2010; Ingalhalikar et al., 2011). Additionally, tensor norm values for the ventral tract were significantly lower for the autistic subjects than for the control subjects. Increased fractional anisotropy and decreased tensor norms have been correlated with neurodevelopmental processes (Goodlett et al., 2009), and variation in these values may suggest changes in microstructure.

In contrast to language impairment in the autism group, we observed no differences in parent ratings of music affinity, consistent with a previous report that low functioning autistic patients do not differ from controls in their preference for pleasant and harmonious musical stimuli (Boso et al., 2009). Likewise at the neural level, in contrast to decreased speech related activity in autistic subjects, song stimulation resulted in increased left inferior frontal gyrus activation and increased frontal-posterior functional connectivity relative to speech stimulation. A recent imaging study reported similar activation in frontal (including left inferior frontal gyrus) and temporal regions in high-functioning autistic and control subjects during music stimulation (Caria et al., 2011). Our results in low-functioning subjects are consistent with these findings and with behavioural observations of preserved music functioning in autism, and confirm both increased and decreased activation in the same brain regions in response to song and speech stimulation, respectively. Right frontal and temporal activations observed in both autistic and control subjects during song also confirm previous functional MRI findings using both music and song stimuli (Limb, 2006; Schon et al., 2010).

Our structural DTI analyses failed to confirm a difference between control and autistic subjects with respect to dorsal and ventral projections between A1 and inferior frontal gyrus. Dorsal and ventral projections of both control and patient groups were consistent with findings from previous DTI studies in healthy adults (Anwander et al., 2007; Frey et al., 2008). In autistic subjects, greater fractional anisotropy of the dorsal pathway was also associated with greater functional MRI activation in left inferior frontal gyrus during song stimulation and fractional anisotropy values of both pathways correlated with greater left inferior frontal gyrus during speech stimulation. In typical adults, the dorsal pathway has been more commonly associated with expressive and high-level language functions (Catani and Mesulam, 2008; Duffau, 2008), while the ventral pathway is commonly associated with semantic comprehension (Duffau, 2008; Saur et al., 2008) and simple grammatical structures (Friederici et al., 2006). The observation that fractional anisotropy of these pathways correlated with left inferior frontal gyrus activation during both song and speech stimulation suggests an involvement of these pathways in conveying speech, as well as song information to frontal regions. The lack of significant correlations between functional and structural measures in controls may be due to ceiling effects of functional activation to the passive listening task. Decreased fractional anisotropy of the left dorsal pathway in autistic subjects is consistent with decreased engagement of this pathway for language functions in autism. On the other hand, decreased tensor norm values of the ventral pathways may suggest increased reliance on ventral pathways for these functions.

Together, our functional and structural MRI findings support the hypothesis that long-range disconnection may not be a sufficient account for language impairment in autism. Instead, reduced activation of left inferior frontal gyrus during speech may be associated with the failure to receive language-specific information from impaired lower-processing regions, rather than disconnection of the system as a whole. Indeed, during speech stimulation, autistic subjects showed reduced activation in temporal lobe (including superior temporal and middle temporal gyri) relative to controls, consistent with previous studies in autistic subjects that employed receptive language tasks (Muller et al., 1999; Boddaert et al., 2004; Gervais et al., 2004; Lai et al., 2011).

One possibility for discrepancies between music and language functions in autism and models that propose long-range disconnection may be a speech-specific (and in general, social-information-specific) attentional deficit (Muller, 2007; Groen et al., 2008). Whereas typically developing individuals prefer speech to non-speech stimuli and are automatically inclined to process higher-level linguistic and semantic information in speech stimuli, there is evidence that autistic subjects do not appear to exhibit the typical bias towards social stimuli (Klin, 1991; Dawson et al., 1998; Schultz et al., 2000; Ceponiene et al., 2003; Kuhl et al., 2005; Jarvinen-Pasley and Heaton, 2007).

In the auditory domain, engagement of systems typically involved in aspects of language and music processing for song but less so for speech may result from differences in low-level processing of language and music stimuli (Zatorre et al., 2002). Speech processing is thought to primarily involve discrimination of rapidly changing broadband sounds, while tonal changes in music are thought to occur on a much slower timescale. Behavioural evidence of impaired spatiotemporal processing in autism, favouring static or slowly presented stimuli, have been associated with decreased functional connectivity in functional MRI and decreased coherence in electrophysiological studies (Gepner and Feron, 2009) and interpreted as an impaired ability to process dynamic multi-sensory stimuli. There is also evidence that in autism, disturbances in processing temporal modulations of sound increase with the complexity of stimuli (Alcantara et al., 2004; Samson et al., 2006, 2011; Groen et al., 2009) and are located within secondary, rather than primary, auditory regions (Samson et al., 2011). Our finding of decreased secondary (superior temporal, middle temporal and angular gyri), but not primary auditory (A1), activation in autistic patients is consistent with these previous findings in high-functioning autistic individuals. Thus, speech comprehension and language acquisition may be impaired as a result of disrupted local connectivity and/or anatomical maturation in regions such as superior temporal gyrus, even if long-range connectivity between primary and secondary auditory regions and other parts of the brain is intact. Due to the variability of language disability along the autism spectrum, however, it is likely that impairment may occur along different levels of the language-processing stream for different individuals.

This study is one of the first to image a group of alert low-functioning autistic children using a functional task. Due to the difficulty of imaging this patient group, there are limitations in the sample size of subjects available and design of experimental stimuli and procedures to accommodate clinical needs. Because of the limited sample size and broad age range of alert patients we were not able to calculate response profiles within specific age groups. Autistic subjects’ imaged alert also showed increased movement. However, regressing out age and movement parameters did not produce qualitative differences in our functional MRI results. The quality of DTI data for tractography analyses was also limited by the highly anisotropic voxels standard in clinical MRI image sequences. Finally, future studies would benefit from the inclusion of an IQ-matched control group, as we cannot rule out the possibility that IQ may have had some role in these findings.

Limitations related to experimental procedures included restriction to the use of familiar stimuli specific to individual subjects, and a possible interaction of the silent video with the auditory stimulus. The use of non-standardized stimuli consisting of subjects’ favourite songs and their own parents’ voices may also introduce possible effects relating to degree of familiarity between autistic and controls groups—autistic children may be obsessively familiar with their particular song selections. In regard to the silent video, although it was effective for imaging of young and autistic children, we acknowledge possible concerns regarding stimulus × video × group interactions. However, the sedated condition did not employ the video. Findings from sedated patients, where this concern was not present, were consistent with those from patients who were imaged alert. Within-group song versus language comparisons across these conditions, support an effect independent of either the video or the sedation. Future studies with larger sample sizes and comparisons between familiar and standardized stimuli controlled for varying musical and linguistic properties can provide more specific understanding of how music and language abilities correlate with neural activation to varying properties of music, song and speech.

Finally, the conclusion that disconnection may not be the cause of atypical neural function and structure of the language system suggests a positive indication for the rehabilitation of language in autism. Additionally, overlapping systems between language and music possibly support the use of music for rehabilitation of impaired language (Patel, 2011). Musical interaction techniques have been reported to be effective in improving social and communicative behaviour in autistic children (Whipple, 2004; Kaplan and Steele, 2005; Gold et al., 2006; Wigram and Gold, 2006; Wan et al., 2010). It has been proposed that music therapy for language and communication may be achieved through improving interpersonal responsiveness, increasing joint-attention and engaging mirror neuron responses in the inferior frontal gyrus during verbal communication coupled with musical tasks (Gold et al., 2006; Molnar-Szakacs and Overy, 2006; Wigram and Gold, 2006; Overy, 2009; Rizzolatti and Fabbri-Destro, 2010; Wan et al., 2010). Future therapeutic studies that systematically vary musical and linguistic properties would provide further evaluation of this hypothesis.

Funding

Studentship funded by the Gatsby Initiative in Brain Circuitry, GAT 2742 (to G.L.), NRSA predoctoral fellowship F31MH088104-02 (to S.P.P.).

Supplementary material

Supplementary material is available from Brain online.

Acknowledgements

Johanna Schwarzenberger MD, paediatric anaesthesiologist, provided medical and technical assistance with patients who received anaesthesia as part of their clinical MRI evaluations; Amy Newhouse, Hillary Hancock, Elliot Huang, Janill Briones and Emily Mandel assisted with various parts of data analysis and stimulus preparation; Stephen Dashnaw, Andrew Kogan and Boateng Ohene-Adjei provided technical assistance for image acquisition; Ted Yanagihara and Hal Hinkle consulted on DTI analysis methods and neural systems that underlie music perception, respectively; Darcy Kelley, Gerald Fischbach, and Vincent Ferrera provided comments and suggestions during the development of this study as part of the PhD thesis committee (G.L.) for which this work fulfils partial requirements (mentor: J.H.). The children and parents who participated in this study are especially appreciated.

Abbreviations
DTI
diffusion tensor imaging

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

References

View Abstract