158

Brain Research, 519 (1990) 158-16t Elseviel

BRES 15562

Tonotopic features of speech-evoked activity in primate auditory cortex Mitchell Steinschneider, Joseph C. Arezzo and Herbert G. Vaughan Jr. Departments of Neurology and Neuroscience and the Rose F. Kennedy Center for Research in Mental Health and Human Development, Albert Einstein College of Medicine, Bronx, NY 10461 (U.S.A.)

(Accepted 28 November 1989) Key words: Auditory cortex; Thalamocortical radiation; Multiple unit activity; Syllable; Formant; Tonotopic organization

To further clarify the neural mechanisms underlying the cortical encoding of speech sounds, we have recorded multiple unit activity (MUA) in the primary auditory cortex (A1) and thalamocortical (TC) radiations of an awake monkey to 3 consonant-vowel syllables,/da/,/baJ and /ta/, that vary in their consonant place of articulation and voice onset time (VOT). In addition, we have examined the responses to the syllables' isolated formants and formant pairs. Response features are related to the cortical tonotopic organization, as determined by examining the responses to selected pure tones. MUA patterns that differentially reflect the spectral characteristics of the steady-state forrnant frequencies and formant transition onset frequencies underlying consonant place of articulation occur at sites with similarly differentiated tone responses. Whereas the detailed spectral characteristics of the speech sounds are reflected in low frequency cortical regions, both low and high frequency areas generate responses that reflect their temporal characteristics of fundamental frequency and VOT. Formant interactions modulate the responses to the whole syllables. These interactions may sharpen response differences that reflect consonant place of articulation. Response features noted in A1 also occur in TC fibers. Thus, differences in the encoding of speech sounds between the thalamic and cortical levels may include further opportunities for formant interactions within auditory cortex. One effect could be to heighten response contrast between complex stimuli with subtle acoustical differences. INTRODUCTION Previous work has shown that several acoustic parameters of synthetic consonant-vowel (CV) syllables, namely fundamental frequency, voice onset time (VOT) and consonant place of articulation are reflected in the temporal patterns of the multiple unit activity ( M U A ) in the auditory cortex of the awake m o n k e y 21'22. To clarify the neural basis of this activity, it is important to relate the syllable-evoked responses to both the cortical activity generated by the syllables' component formants and to the tonotopic organization of auditory cortex. Although the band-limited character of speech sounds predicts that cortical responses would manifest tonotopic features, a more complex interaction is suggested by the lack of a simple and consistent relationship between the responses of single primate auditory cortical neurons to speciesspecific vocalizations and their pure tone specificity ~6'29. Further evidence demonstrating the complex factors that underlie the generation of response patterns to speech sounds at higher auditory centers is exemplified by the finding that differentiated single unit responses to vowels in field L of the mynah bird are determined by excitatory

and inhibitory interactions of the isolated forrnants la. This finding emphasizes the need to include formant interactions in formulations of the auditory cortical mechanisms underlying speech processing. The lack of a consistent relationship between the cortical tonotopic organization and single unit responses to complex vocalizations, in association with the marked single unit response lability to the same stimuli 8'14'3°, have led several investigators to suggest that the activity of neuronal ensembles may more fully delineate the response characteristics to specific vocalizations s'16,2s. In this study, we recorded M U A , which reflects the activity within neuronal ensembles, in primary auditory cortex (A1) of an awake m o n k e y to compare cortical responses generated by the first 3 spectrally discrete formants of stop CV syllables with the responses to the whole syllables. We examined the manner in which the responses to the individual formants modulate those elicited by the full speech sounds. We relate this activity to that elicited by pure tones to examine relationships between the speech-evoked response patterns and the tonotopic organization of A1.

Correspondence: M. Steinschneider, Department of Neurology, Kennedy Center, Rm. 322, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, U.S.A.

0006-8993/90/$03.50 (~) 1990 Elsevier Science Publishers B.V. (Biomedical Division)

159 MATERIALS AND METHODS One adult male Macaca fascicularis was studied to examine the relationships among cortical responses to syllables, isolated formants, formant pairs, and tone bursts. Preliminary studies using 4 additional monkeys established the reliability of the syllable-evoked response patterns. We used old world monkeys because of similarities with man in the gross and cytoarchitectonic organizations of auditory cortex 6,TASA7,2° and in the phonetic discrimination of stop CV syllables 11. Surgical, recording and histological procedures were previously described 22. Briefly, under general anesthesia (sodium pentobarbitol) and using aseptic techniques, rectangular matrices of adjacently placed 18-gauge stainless steel tubing were positioned vertically with respect to stereotaxic coordinates to target auditory cortex and to serve as guides for the recording electrodes. The animal was housed after surgery and between recording sessions in our AAALAC-accredited animal facility where he was monitored daily by ourselves and the veterinary staff for assessment of his health and well-being. Recordings were obtained from multicontact electrodes containing 8 contacts evenly spaced 100-300 pm apart 3. Each channel had an impedance of 0.2-0.5 MI2 at 1 kHz. Brain potentials were amplified by differential amplifiers with a frequency response down 3 dB at 3 Hz and 3 kHz. The reference was an occipital bone electrode. Auditory evoked potentials (AEPs) were averaged on-line by computer. The neuroelectric signals were simultaneously recorded for subsequent off-line MUA analysis. These signals were high-pass filtered above 500 Hz, digitized, full-wave rectified and averaged by computer for analysis of MUA. The MUA represented the weighted sum of action potentials from neurons surrounding the recording contacts. Our experience has shown that with the amplification and electrode impedances used in this study, MUA represented the neuronal activity within a radius of 75/~m from the recording contacts. Local net cellular excitation and inhibition were determined by comparing the stimulus-evoked MUA with prestimulus baseline levels of spontaneous firing for each stimulus condition. Three synthetic syllables, /da/, /ba/ and /ta/ were used. These stimuli, produced at the Haskins Laboratories, were the same as those previously employed 22. They were 166 ms in duration. The shape and phase of the sound pressure waves for/da/and/ba/were nearly identical. VOT was 0 ms./Da/and/ba/differed in the second (F2) and the third (F3) formant transitions, which lasted 40 ms. fra/ differed f r o m / d a / b y an increase in the VOT to 80 ms and by an increase in the starting first formant (F1) frequency from 200 to 526 Hz, a modification that enhanced its unvoiced quality. The F1 transition lasted 30 ms. F2 and F3 starting frequencies were 1835 and 3439 Hz f o r / d a / a n d / t a / , 621 and 2029 Hz for/ba/. Formant steady-state frequencies were 817, 1181 and 2632 Hz, respectively. The syllables and a timing pulse were transferred from computer to a tape recorder for subsequent presentation. Syllable intensity was 80 dB sound pressure level (SPL). Isolated formants and formant pairs were produced by passing the syllables through two band pass filters connected in series (each 24 dB down per octave). Equal amplitude sinusoids that approximated the formant center frequencies were passed through the filters and measured to determine the adequacy of filtering. Fourier analysis of the derived formants confirmed the adequacy of the separation. F2 of/ba/ could not be isolated due to the frequency overlap of its transition with F1. Sound pressure waves of the formants are shown in Fig. 1. The formants were presented at the intensities obtained after filtering the parent syllables so as to maintain the formant intensity relationships inherent in the full syllables. These intensities were: F1, 76 dB SPL; F2/da/, 69 dB SPL; F3/da/and/ba/, 56 dB SPL. Nine tones ranging from 600 to 8000 Hz were also presented. The lower frequencies were used to compare the tone response strengths at the starting and steady-state frequencies of the speech sounds. Tone bursts were 55 ms in duration with 5 ms rise-decay times. Tones below 2000 Hz were presented in phase via a tape recorder

whereas higher frequency tones were delivered at random phase by a waveform generator. All tones were presented at 65 dB SPL. Clicks, produced by 100 /~s positive square-wave pulses and delivered at 80 dB SPL, were also presented. The timing and patterns of the intracortical laminar profiles of the click-evoked AEPs, MUA and one-dimensional current source density derived from the AEPs were analyzed and used to guide the position of the electrode contacts within auditory cortex, to locate their laminar locations and to differentiate MUA derived from cortical cells and thalamocortical (TC) afferents. More complete descriptions of the laminar pattern of click-evoked activity in primate auditory cortex have been published 2"22'23. Recording sessions lasted about 2 h and were conducted in a sound-attenuated chamber. The subject maintained a relaxed but alert state while seated in a primate chair with head painlessly fixed and arms restrained. Prior to intracortical recordings, normal click-evoked brainstem AEPs were recorded epidurally at the vertex to insure normal peripheral and brainstem auditory functioning. Positioning of the electrodes was accomplished with a microdrive and was guided by the on-line observation of the laminar pattern of the click-evoked cortical AEP. Presentation of the test stimuli was initiated when the multicontact electrode array straddled the plane of inversion of the early cortical AEP components, indicating that the electrode was within auditory cortex. Comparisons between the responses to the syllables, formants and tones were only performed if reproducible click-evoked responses were obtained before and after presentation of the test stimuli. All stimuli were presented at an interstimulus interval of 650 ms to the ear contralateral to the recording sites via dynamic headphones placed snugly against the animal's ears. Two hundred presentations of the speech sounds and 125 presentations of the tones were used to generate the averages. Recording sessions were terminated after 4 months and 32 electrode penetrations to ensure histological identification of all electrode tracks. The monkey was then deeply anesthetized with sodium pentobarbitol and perfused through the aortic arch with 10% buffered formalin. Tissue was blocked, sectioned and histologically examined to reconstruct the electrode tracks and to identify selected recording sites which had been marked with iron deposition. A1 was identified by previously published histological criteria 6,15,17.

RESULTS

Response features are related to the tonotopic organization oral T h e r e l a t i o n s h i p b e t w e e n syllable- a n d t o n e - e v o k e d activity was e x a m i n e d f r o m M U A during

13 e l e c t r o d e

passes

o b t a i n e d at 71 sites

through

A1.

Excitatory

s p e e c h - e v o k e d M U A was g e n e r a t e d at sites e x c i t e d by t o n e s within t h e s p e c t r a l r a n g e o f t h e syllables, as s h o w n in Fig. 2. T o n e r e s p o n s e s at this site are largest to 600 and 800 H z , s m a l l e r to 1200 H z and

m i n i m a l to h i g h e r

f r e q u e n c i e s . T h i s p a t t e r n of t o n e r e s p o n s e s p r e d i c t s that t h e syllables s h o u l d also e v o k e e x c i t a t o r y r e s p o n s e s , as much

of t h e i r

spectral

content

is b e l o w

1200

Hz.

F u r t h e r m o r e , activity to the i s o l a t e d f o r m a n t s s h o u l d p r i m a r i l y be to t h e l o w - f r e q u e n c y F1 and F2, w h o s e s t e a d y - s t a t e c e n t e r f r e q u e n c i e s lie n e a r 800 and 1200 H z , and n o t to t h e h i g h e r f r e q u e n c y F3. T h e s e p r e d i c t i o n s are c o n f i r m e d by t h e s p e e c h - e v o k e d r e s p o n s e p a t t e r n s . N o t e also t h a t t h e i n t e r v a l b e t w e e n t h e first a n d third burst in the response to/ta/equals the VOT.

160

0|l

V

0111

V

O~l

V

008

V

DA 1 FORMANT

DA 2 FORMANT

I10

DA 3 F O R M A N T

DA 1 + 2 FORMANT

I 0.211 V

0.18

¥

O|lIY

IT

NO

BA 1 + 2 FORMANT

OII|V

mO

BA 3 FORMANT I

1 0

T ~* 80

~"

T I10

I

f eO

!

; ; f 1110 100

i

i t80

-,

, 0

,,, I

,, t

. . . . . . . . $0

40

eO

eO

ffl III e C

Fig. 1. Sound pressure waves o f / d a / , / b a / , their isolated formants and formant pairs used in this study. The left-hand column depicts the entire sounds while the right-hand column magnifies the first 60 ms of the stimuli. Voltage calibrations are shown at the far right.

161

DA 6 0 0 Hz ST

DA 1 F O R M A N T

8 0 0 Hz NO

DA 2

FORMANT

1 2 0 0 Hz RD

OA 3 F O R M A N T

J i Jt,M

1800 Hz ST

~-~w

• ~ r l ,,w_ _.w. ~ . l ~ . l m l r ~ ..w

ND

DA 1 "1" 2

FORMANT 2 0 0 0 Hz

NO

DA 2

RO

-I- 3

FORMANT

3 0 0 0 Hz

BA 4 0 0 0 Hz

ST

NO

BA 1 -I- 2

FORMANT

L , ta d l L . . I t I.mllt iMw . J . ~ * . L ii k • I,1FIM r l - n l q - . - i ~ . T r l l p " F " ' r w "

6 0 0 0 Hz

..h...

8 0 0 0 Hz

vqmr

j,ix.~_....,__~

,j. ~

~uL.J.,

RD

BA 3 F O R M A N T

I 0

I

I 30

f

I 60

I

I 90

I

I | 20

l

msec

TA 4 IJV !

I 0

I

I 40

I

I 80

I

I 120

I

I 160

F

I 200

I

msec 4 pV

Fig. 2. Cortical M U A to the syllables/da/,/ba/and/ta/, the first 3 formants o f / d a / a n d / b a / a n d tones ranging in frequency from 600 to 8000 Hz. Tone responses are largest to 600 and 800 Hz, less to 1200 Hz and minimal to higher frequencies. The syllables, whose spectral content lie predominantly within the low frequency range encompassed by the tones eliciting the largest MUA, generate excitatory responses. Similarly, the lower frequency Fls and F2s of the syllables also evoke prominent excitation, in contrast to the higher frequency F3s whose spectral content lies above 2000 Hz. Consonant place of articulation is reflected by a larger initial response t o / b a / t h a n t o / d a / a n d / t a / . This difference is due to the greater response to t h e / b a / F 2 , and correlates with the larger response to 600 Hz vs 1800 Hz, the starting formant frequencies for/ba/ and/da/, respectively. Formant interactions are most evident in the effects the F3s have on the responses to the whole syllables. Furthermore, the F3s sharpen response differentiation that reflects consonant place of articulation. See text for details.

162 In contrast to the excitatory speech-evoked activity and differentiated f o r m a n t - e v o k e d responses at sites responding best to frequencies within the spectral range of the syllables, loci that are activated by higher frequencies often are inhibited by the complex sounds. A t the

site shown in Fig. 3, 4000, 6000 and 8000 H z tones evoke excitatory responses whereas lower frequency tones that lie within the spectral range of the syllables p r o d u c e M U A inhibition. Consistent with the tone responses, the syllables and formants primarily elicit undifferentiated,

DA 6 0 0 Hz ST

DA 1 FORMANT 8 0 0 Hz ND

DA 2 FORMANT 1200Hz R~

DA 3 FORMANT 1 8 0 0 Hz ST

NO

DA 1 "!- 2 FORMANT 2 0 0 0 Hz NO

RD

DA 2 -I- 3 FORMANT 3 0 0 0 Hz

BA 4 0 0 0 Hz ST

ND

BA 1 -i- 2 FORMANT 6 0 0 0 Hz RD

BA 3 FORMANT 8 0 0 0 Hz

TA

I 0 I

o

I

I

1

I

so

1oo

msec

I -T~----7--7 15o

1

I 40

1

I 80

l

|--7 120

msec

200 i

I 4 uV I

r

4uv

Fig. 3. MUA elicited by the same stimuli as in Fig. 2 at another cortical site. MUA at this site is increased to 4, 6 and 8 kHz tones and diminished to lower frequency tones. The speech sounds generate undifferentiated and inhibitory MUA that conform to the tone response profile. Thus, in contrast to the differentiated responses to the syllables and formants at sites maximally and differentially activated by low frequency tones, sites excited by tones higher than the spectral content of the syllables fail to generate speech-evoked responses differentiated on the basis of frequency content.

163 inhibitory responses. The initial excitatory response to the unvoiced segment of/ta/is elicited by 4000 Hz energy

at/ta/onset. The 'off' responses to the speech sounds presumably reflect a post-inhibitory rebound.

DA

6 0 0 Hz $1"

DA 1 FORMANT

8 0 0 Hz NO

DA 2 FORMANT

1200 Hz NO

DA 3 FORMANT

1800Hz DA lSr-I - 2NOF O R M A N T

2000 Hz NO

DA 2

II)

"1"3 FORMANT

3000 Hz BA

4000 Hz

• A /'+

FO..ANT 6000 Hz RO

.

BA 3 FORMANT 8000 Hz TA I 0

I

I3uV

o

I

I

so

I

I

I

Ioo

I

I

15o

1

|

20o

I

I 40

|

I 80

1

! 120

|

msec

13 .V l

msec

Fig. 4. Tone-evoked MUA at this cortical site is maximal to 6000 Hz whereas lower frequency tones evoke predominantly transient inhibition. Activity to the speech sounds is characterized by sustained inhibition with superimposed periodic peaks equal to the fundamental frequency. The interval between the inhibition to the onset of the aperiodic and periodic portions of/ta/equals the 80 ms VOT. Thus, loci activated by frequencies higher than those present in speech sounds may encode temporally significant parameters such as stimulus periodicity and VOT.

164 DA

the starting frequencies of the F2 transitions f o r / b a / a n d /da/, respectively. The M U A to the 600 Hz tone is substantially greater than that to the 1800 Hz tone, in accordance with the response differences seen for the speech sounds. Despite the conspicuous absence of predominantly excitatory responses to speech sounds at sites preferentially activated by high frequency tones, temporal characteristics of the speech sounds, such as stimulus periodicity and VOT, can be prominently reflected at these locations, as depicted in Fig. 4. This site responds most vigorously to the 6000 Hz tone, and mainly with transient inhibition to lower frequency tones. Nevertheless, the speech sounds elicit prominent periodic MUA excitation and inhibition at the fundamental frequency of the stimuli. The interval between the inhibition at the onset of the unvoiced and voiced segments of/ta/equals the 80 ms VOT.

FIRST FORMANT

SECOND FORMANT

THIRD FORMANT

[

I 0

I

I 40

I

I 80

I

I 120

I

I 100

I

I 200

[

msec

3 IIV

Fig. 5. Complex summation of formant responses generate the activity to /da/ in this example. The cortical MUA to the whole syllable consists of phase-locked responses to the syllable fundamental frequency with accentuation of the first 3 peaks followed by an 'off' response. The first two periodic peaks are predominantly generated by F3 (solid arrows), whereas the 'off' response is evoked by F1 and F2 (open arrows). The remaining periodic peaks are generated by each of the isolated formants.

Differential responses that reflect consonant place of articulation are also related to tone responses, as shown in Fig. 2. The MUA at this site reflects consonant place of articulation by an initial 'on' response to /ba/ that is more than 4 times larger than the 'on' response to/da/. This difference is due to the greater response at this locus to the /ba/ F2 transition. Although testing with the isolated/ba/F2 was not possible, the 'on' response to the F1-F2 pair of/ba/is larger than the initial burst to the/da/ F1-F2 pair. Furthermore, no early 'on' responses occur to F3 of e i t h e r / b a / o r / d a / , and both syllables contain the same F1. The key comparison is between the MUA elicited by the 600 and 1800 Hz tones, which approximate

Formant interactions modulate responses to whole syllables in A1 The relationship between syllable- and formantevoked MUA was examined at 64 sites during 17 electrode penetrations through A1. Usually, more than one isolated formant evokes a response. In these situations, responses to whole syllables represent complex, non-linear summations of the responses to the isolated formants. Often, various segments of the response to a syllable are derived from different combinations of isolated formant responses, as shown in Fig. 5. The MUA t o / d a / i s a composite of an early response comprising 3 large peaks during the formant transitions, followed by periodic responses to the syllable fundamental frequency and a small 'off' response. The initial peak to /da/ is predominantly elicited by F3 (solid arrows), whereas the 'off' response is absent from the F3 response but present to F1 and F2 (open arrows). The periodic activity during the vowel is elicited by all 3 isolated formants. These summations are all non-linear. The maximum amplitude of the early peak in the/da/-evoked response is only 77% the size of the corresponding peak to F3 despite the absence of concurrent inhibition in the other formant responses. The 'off' response t o / d a / i s identical in amplitude to those elicited by F1 and F2, and the periodic responses t o / d a / a n d its isolated formants during the steady-state vowel are about equal in amplitude. Inhibitory interactions among individual formants modulate the activity evoked by the syllables. For example, inhibitory effects of F3 are seen in Fig. 2 when comparing the initial bursts to the F1-F2 pairs and to the syllables, or in other words, when F3 is added to the F1-F2 pairs. The peak of the initial burst at 17 ms to the /ba/F1-F2 pair is 1.8 times larger than the same burst to

165

DA 600 Hz

ST

DA 1 FORMANT

DA 2N°FORMANT

800 Hz

1

RO

DA 3

FORMANT

DA lST'I "

NO 2 FORMANT

1 2 0 0 Hz

1800

Hz

DA 2NO+ 3RDFORMANT 2 0 0 0 Hz

._, ..~. e..,.rr.., ~....,..r ~ '

3 0 0 0 Hz

BA

4 0 0 0 Hz

6 0 0 0 Hz ST

~

ND

BA 1 + 2

Ii Jl.& , ~.jld. I,I l.d . _ j lJkJ,. ""

w ~ + ' " l ~ " -~" T I r ~ ' + ' T "

_.. +r"r

FORMANT 8 0 0 0 Hz

RO

I 0

BA 3 FORMANT

,,i,

r.tn" I

I 30

f

I 60

I

I 90

I

I 120

l

msec 4 pV

TA T

4 uV

l 0

I

i 40

l

l 80

I

I 120

l

I 160

1

I 200

I

msec

Fig. 6. Thalamocorticai M U A to the same syllables, formants and tones. Tone responses at this subcortical site are largest to 600, 800 and 1200 Hz and minimal to higher frequencies. Syllable-evoked responses are excitatory and consistent with the tone response profile. The first two formants also elicit M U A increases whereas the higher frequency F3s evoke minimal excitation, reflecting the lack of prominent higher frequency tone responses. The responses t o / d a / a n d / b a / d i f f e r in the presence of the first periodic peak to/ba/(arrows). This differential activity reflecting consonant place of articulation is due to the greater response to t h e / b a / F 2 . / B a / a n d its F1-F2 pair are the only speech stimuli that evoke this peak, while the F1 presented alone does not. Activity to 600 Hz, the starting frequency of t h e / b a / F 2 , is greater than to 1800 Hz, the starting frequency of t h e / d a / F 2 . Note the poor reflection of the increased VOT t o / t a / a t this low-frequency site.

166 /ba/. Furthermore, the second smaller peak at 28 ms in the response to t h e / b a / F 1 - F 2 pair, coincident with the initial peak in the F1 response, is eliminated from the response to/ba/. Thus, F3 o f / b a / h a s diminished early excitatory activity despite the absence of M U A inhibition in the isolated /ba/ F3 response and the absence of inhibition to the 2000 Hz tone that coincides with the onset frequency of t h e / b a / F 3 transition. Effects of t h e / d a / F 3 transition upon the initial burst to /da/ are even more striking. The amplitude of the M U A at 17 ms to the/da/F1-F2 pair is 10.0 times greater than the activity at 17 ms t o / d a / a n d 3.3 times larger if one measures from the peak of the 'on' response to/da/, which occurs 5 ms earlier. Similar to the response of/ba/, the second peak in the response to the/da/F1-F2 pair at 28 ms is absent from the response to /da/. Transient inhibition in the i s o l a t e d / d a / F 3 response is concurrent with the second peak in the F1-F2 pair response, and may therefore be involved in the disappearance of this peak in the response to /da/. It occurs after the initial burst, however, and therefore cannot explain the diminished amplitude of this component. These inhibitory effects may sharpen response differences that reflect consonant place of articulation. The amplitude of the initial burst to the/ba/F1-F2 pair is 2.3 times larger than the same burst in the /da/ F1-F2 response. The ratio increases to 4.3 with the full syllable responses (12.5 if only amplitudes at 17 ms are measured). Thus, the effect of the F3s in this case is to further increase the response differences between the syllables. In summary, M U A response patterns to syllables are significantly modified by formant interactions, and may be due to formants that fail to elicit a response when presented in isolation. These interactions modulate response strengths initially determined by the spectral characteristics of the stimuli and the tonotopic characteristics of the cortical locus. Similar p h e n o m e n a are observed in thalamocortical afferen ts

Examples of the phenomena that have been evaluated in this report were all recorded from cortical MUA located in A1. Similar effects are seen in the activity of TC afferents, demonstrating that: (1) relationships between the tonotopic organization of the auditory system and speech-evoked activity are initiated subcortically; and (2) non-linear formant summations and complex formant interactions are not confined to auditory cortex but occur subcortically as well, as shown in Fig. 6. The M U A at this site is derived from activity in TC afferents and was recorded in the white matter beneath A1. Tonotopic specificity for steady-state formant responses are reflected in the prominent phase-locked

responses to the isolated /da/ F1 and F2, whereas F3 elicits a weak response. In accordance with these findings, both the 800 and 1200 Hz tones, which approximate the center formant frequencies of F1 and F2, evoke robust responses whereas higher frequency tones do not. Note that the increased VOT o f / t a / i s poorly reflected in the response at this site activated by low frequency tones. Consonant place of articulation is reflected in the M U A by larger first and second peaks in the response to / b a / c o m p a r e d with the response t o / d a / ( a r r o w s in Fig. 6). These differences stem from a greater responsiveness of this site to the /ba/ F2 transition. The response differences are consistent with the tonotopic specificity at this site, as the 600 Hz tone elicits a much greater response than the 1800 Hz tone. Similar to cortical M U A , non-linear summation of the formant-evoked phase-locked responses are observed. In this example, formant interactions include a decrease in the amplitude of the first peak in the F1 response when t h e / d a / F 2 (the/da/F1-F2 pair) o r / d a / F 2 and F3 (/da/) are added. Interestingly, there is a larger peak in the response to /ba/ compared to the response of the /ba/ F1-F2 pair, which may indicate a facilitatory effect of the /ba/ F3 despite a lack of response at that time to the isolated/ba/F3. DISCUSSION Sites within A1 usually respond to more than 1 formant with the CV syllables used in this study. These formant responses generally sum non-linearly and in varied ways to generate the responses to the syllables. Often, specific response components to a syllable are produced from different combinations of individual formant responses. Among the more intriguing and complex of the formant summations is the ability of a single formant to significantly modify the response to the whole syllable. These formant interactions can sharpen response differences between two acoustically similar but phonetically distinct sounds such as /da/ and /ba/, and may represent an important mechanism that assists the auditory cortex in discriminating acoustically similar sounds with different signal value. Similar effects are seen in field L of the mynah bird, where response specificity to different vowels are determined by complex excitatory and inhibitory interactions of the isolated formants 12. To account for these interactions, Langner and associates la suggest that neural circuits consisting of multiple convergent inputs, each with their own pattern of excitatory and inhibitory regions within their frequency-response tuning curves, are responsible for the response specificity to different vowels. These effects could be seen even when the

167 response to a tone within the inhibitory region fails to generate a response decrement, an effect noted in the present data. Germane to the present findings are the appropriately complex tuning curves with multiple regions of excitation and inhibition demonstrated for medial geniculate nucleus (MGN) and A1 neurons in awake monkeys 1'18. Thus, in support of previous suggestions, we propose that simultaneous activation and inhibition of multiple convergent inputs from the thalamus to cortical loci are partially responsible for the complex non-linear summations seen for the speech sounds. Though formant interactions increase the complexity of speech-evoked activity, responses to the syllables and their formants are still meaningfully related to the tonotopic organization of A1. MUA patterns that reflect the spectral characteristics of steady-state formant frequencies occur at sites with similarly differentiated tone responses. Thus, tone responses are generally largest to the frequencies contained within the formants that generate the largest MUA. Differential responses to/ba/ a n d / d a / t h a t reflect consonant place of articulation are associated with tone responses larger to the more active formant transition onset frequency than to the onset frequency of the less active formant transition. In contrast, high frequency regions that do not exhibit differential responses to tones within the spectral range of the syllables generate nearly identical responses to the isolated formants that do not reflect place of articulation. These regions do, however, often exhibit inhibitory responses to the speech sounds, suggesting that these responses may delineate the upper boundary of the syllables' spectral content. Whereas the detailed spectral characteristics of the speech sounds are primarily reflected in low frequency cortical regions, both low and high frequency areas of A1 generate responses that reflect temporal characteristics such as waveform periodicity and VOT. In fact, there is a tendency for the phase-locked responses to be more distinct and better delineated within middle and high frequency cortical regions than within the lowest frequency regions. Similarly, responses reflecting VOT were also better demonstrated in these regions. The relationships between the responses to the syllables, formants and tones within A1 are not fundamentally different from those in TC axons, and therefore, in relay cells within the MGN. Differences in the neuronal processing of speech sounds between the MGN and A1 may include an additional opportunity for formant interactions within A1. We have shown that these interactions may sharpen differences in syllable-evoked responses that reflect consonant place of articulation. Therefore, one effect of cortical processing may be to

heighten contrast between complex stimuli with subtle acoustical differences. Supporting evidence comes from the observation that there is an increased specificity of cortical cell responses to species-specific vocalizations in primate auditory cortex when compared to the MGN 27 and an increased response specificity to vowels in upper and lower layers of field L in the mynah bird vs the middle layer12, which receives the main input from more peripheral auditory centers 19. Our finding that activity from a large region of A1 is modified by speech-sound stimulation in a manner that reflects important acoustic and phonetic parameters lends strong support to the spatiotemporal pattern theory for the perception of biologically significant sounds. This theory, which states that encoding of a complex acoustic signal is represented by the overall response patterns of non-specialized cells in the highest auditory centers 26, contrasts with the more widely held detector theory, which argues that the excitation of specialized neurons that selectively respond to a specific type of signal is directly related to the perception of that signal. As described in the review by Suga26, the essential component of the detector theory is a one-to-one correspondence between a categorical perception and the activity within a specialized group of neurons. The tonotopically organized reflection of formant frequencies and widespread cortical activation reflecting the temporal characteristics of stimulus periodicity and VOT presented in this report argues against this notion, and is consistent with conclusions reached by others concerning the encoding of complex vocalizations5'12. Our findings have relevance to the human perception of stop CV syllables. Vowel perception is dependent upon the frequency composition of its component formants 13, and may therefore be reflected in A1 by different spatial patterns of activity generated by the formant frequencies maximally exciting appropriate regions of the tonotopically organized auditory cortex. A similar argument can be used for cortical encoding of stop consonants. Place of articulation is at least partially cued by the short-term spectral pattern at the onset of a stop consonant in syllable-initial position 4'9A°'24. Furthermore, the patterns reflecting stop consonants are detected in on-going speech by an accompanying abrupt increase in spectral energy at the time of their onset 1°'25. Both of these acoustic attributes of stop consonants are reflected in the activity of A1. The abrupt onset of spectral energy is associated with the transient 'on' responses that are a major temporal response pattern for speech sounds 22. The spectral patterns signifying different stop consonants would be transformed into tonotopically organized spatial templates of activity across auditory cortex that vary with the consonant.

168

Acknowledgements. The authors gratefully acknowledge the technical assistance of J. Barna, N. Brennan and C. Freeman, secretarial assistance of L. O'Donnell, and thank T. Halwes of the Haskins Laboratories for his aid in constructing the syllables. This

research was supported in part by N1H Training Grant 5T32GM7288 from NIGMS, Grants HD01799 and MH06723 from the USPHS and NIH Contract HD-5-2910 to the Haskins Laboratories.

REFERENCES

partition on the superior temporal plane of the macaque monkey, Brain Research, 50 (1973) 275-296. 16 Newman, J. and Wollberg, Z., Multiple coding of speciesspecific vocalizations in the auditory cortex of squirrel monkeys, Brain Research, 54 (1973) 287-304. 17 Pandya, D.N. and Sanides, E, Architectonic parcellation of the temporal operculum in rhesus monkey and its projection pattern, Z. Anat. Entwicklungsgesch., 139 (1973) 127-161. 18 Pelleg-Toiba, R. and Wollberg, Z., Tuning properties of auditory cortex cells in the awake squirrel monkey, Exp. Brain Res., 74 (1989) 353-364. 19 Scheich, H., Bonke, D. and Langner, G., Tonotopy and analysis of wide-band calls in field L of the Guinea fowl, Exp. Brain Res., Suppl. II (1979) 94-109. 20 Seldon, H.L., Structure of human auditory cortex. I. Cytoarchitectonic and dendritic distributions, Brain Research, 229 (1981) 277-294. 21 Steinschneider, M., Arezzo, J.C. and Vaughan Jr., H.G., Phase-locked cortical responses to a human speech sound and low-frequency tones in the monkey, Brain Research, 198 (1980) 75-84. 22 Steinschneider, M., Arezzo, J.C. and Vaughan Jr., H.G., Speech evoked activity in the auditory radiations and cortex of the awake monkey, Brain Research, 252 (1982) 353-365. 23 Steinschneider, M., Speech evoked activity in the auditory cortex of the monkey, Doctoral dissertation, Albert Einstein College of Medicine, Yeshiva University, New York, 1984. 24 Stevens, K. and Blumstein, S., Invariant cues for place of articulation in stop consonants, J. Acoust. Soc. Am., 64 (1978) 1358-1368. 25 Stevens, K., Acoustic correlates of some phonetic categories, J. Acoust. Soc. Am., 68 (1980) 836-842. 26 Suga, N., Specialization of the auditory system for reception and processing of species-specific sounds, Fed. Proc., 37 (1978) 2342-2354. 27 Symmes, D., Alexander, G. and Newman, J., Neural processing of vocalizations and artificial stimuli in the medial geniculate body of squirrel monkey, Hearing Res., 3 (1980) 133-146. 28 Symmes, D., On the use of natural stimuli in neurophysiological studies of audition, Hearing Res., 4 (1981) 203-214. 29 Winter, P. and Funkenstein, H.H., The effect of species-specific vocalizations on the discharge of auditory cortical cells in the awake squirrel monkey (Saimiri sciureus), Exp. Brain Res., 18 (1973) 489-504. 30 Wollberg, Z. and Newman, J., Auditory cortex of squirrel monkey: response patterns of single cells to species-specific vocalizations, Science, 175 (1972) 212-214.

1 Alion, N., Yeshurun, Y. and Wollberg, Z., Responses of single cells in the medial geniculate body of awake squirrel monkeys, Exp. Brain Res., 41 (1981) 222-232. 2 Arezzo, J.C., Vaughan Jr., H.G., Kraut, M., Steinschneider, M. and Legatt, A.D., Intracranial generators of event-related potentials in the monkeys. In R.Q. Cracco and I. Bodis-Wollner (Eds.), Frontiers of Clinical Neuroscience, vol. 3, Evoked Potentials, Alan Liss, New York, 1986, pp. 174-189. 3 Barna, J., Arezzo, J.C. and Vaughan Jr., H.G., A new multicontact array for the simultaneous recording of field potentials and unit activity, Electroenceph. Clin. Neurophysiol., 52 (1981) 494-496. 4 Cole, R. and Scott, B., The phantom in the phoneme: invariant cues for stop consonants, Percept. Psychophys., 15 (1974) 101-107. 5 Creutzfeldt, O., Hellweg, F.-C. and Schreiner, C., Thalamocortical transformation of responses to complex auditory stimuli, Exp. Brain Res., 39 (1980) 87-104. 6 Galaburda, A. and Pandya, D.N., The intrinsic architectonic and connectional organization of the superior temporal region of the rhesus monkey, J. Comp. Neurol., 221 (1983) 169-184. 7 Galaburda, A. and Sanides, E, Cytoarchitectonic organization of the human auditory cortex, J. Comp. Neurol., 190 (1980) 596-610. 8 Glass, I. and Wollberg, Z., Lability in the response of cells in the auditory cortex of squirrel monkeys to species-specific vocalizations, Exp. Brain Res., 34 (1979) 489-498. 9 Halle, M., Hughes, G. and Radley, J.-P., Acoustic properties of stop consonants, J. Acoust. Soc. Am., 29 (1957) 107-116. 10 Kewley-Port, D., Time-varying features as correlates of place of articulation in stop consonants, J. Acoust. Soc. Am., 73 (1983) 322-335. 11 Kuhl, P.K., Theoretical contributions of tests in animals to the special-mechanisms debate in speech, Exp. Biol., 45 (1986) 233-265. 12 Langner, G., Bonke, D. and Scheich, H., Neuronal discrimination of natural and synthetic vowels in field L of trained mynah birds, Exp. Brain Res., 43 (1981) 11-24. 13 Liberman, A., Cooper, ES., Shankweiker, D.P. and StuddertKennedy, M., Perception of the speech code, Psychol. Rev., 74 (1967) 431-461. 14 Manley, J.A. and Mueller-Preuss, P., Response variability in the mammalian auditory cortex: an objection to feature detection?, Fed. Proc., 37 (1978) 2355-2359. 15 Merzenich, M. and Brugge, J., Representation of the cochlear

Tonotopic features of speech-evoked activity in primate auditory cortex.

To further clarify the neural mechanisms underlying the cortical encoding of speech sounds, we have recorded multiple unit activity (MUA) in the prima...
850KB Sizes 0 Downloads 0 Views