Journal of Experimental Psychology: Human Perception and Performance 1978, Vol. 4, No. 4, 599-609

Contrast Effects on Stop Consonant Identification Randy L. Diehl

Jeffrey L. Elman

University of Texas at Austin

Department of Linguistics University of San Diego, San Diego

Susan Buchwald McCusker University of Texas at Austin Changes in the identification of speech sounds following selective adaptation are usually attributed to a reduction in sensitivity of auditory feature detectors. An alternative explanation of these effects is based on the notion of response contrast. In several experiments, subjects identified the initial segment of synthetic consonant-vowel syllables as either the voiced stop [b] or the voiceless stop [ph]. Each test syllable had a value of voice onset time (VOT) that placed it near the English voiced-voiceless boundary. When the test syllables were preceded by a single clear [b] (VOT = — 100 msec), subjects tended to identify them as [ph], whereas when they were preceded by an unambiguous [ph] (VOT = 100 msec), the syllables were predominantly labeled [b]. This contrast effect occurred even when the contextual stimuli were velar and the test stimuli were bilabial, which suggests a featural rather than a phonemic basis for the effect. To discount the possibility that these might be instances of single-trial sensory adaptation, we conducted a similar experiment in which the contextual stimuli followed the test items. Reliable contrast effects were still obtained. In view of these results, it appears likely that response contrast accounts for at least some component of the adaptation effects reported in the literature. Almost without exception, current theoretical approaches to speech perception assign a major information-processing role to "feature detectors," a notion referring to hypothetical neural assemblies in the auditory pathways which are selectively tuned to certain complex acoustic properties of the speech signal (Abbs & Sussman, 1971; Eimas & Corbit, 1973; Pisoni, 1975; Stevens, 1975). Behavioral evidence for such detectors has come principally from studies using the experimental paradigm _ This work was supported by National Institutes of Health Grant NS 13764-01 and by a Biomedical Sciences research grant from the University of Texas at Austin. We wish to thank Alvm M. Liberman for making the facilities of Haskins Laboratories available to us for the preparation of stimulus materials.

Requests for reprints should be sent to Randy L.

Diehl, Department of Psychology, 330 Mezes, University of Texas, Austin, Texas 78712.

of selective adaptation (for recent reviews, see Ades, 1976; Cooper, 1975; Eimas & Miller, in press). In the first of these studies, Eimas and Corbit (1973) obtained subjects' identification functions before and after adaptation for a series of synthetic test syllables varying in voice onset time (VOT). Changes in VOT, the interval between consonantal release and voicing onset, are sufficient to signal the contrast between voiced and voiceless initial stops, that is, fjb] versus [ph], Cd] versus [th], and [g] versus [kh] (Lisker & Abramson, 1970). Following repeated presentation of a voiced adapting st j mu l US) t h e subjects identified a greater , , , . . . „, number of the test items as voiceless. That is to say, the perceptual boundary separatjng (-j^ voiced and voiceless categories ..

shlf f

.

. .

.

.

.

, .

ted toward the voiced end of the test series. On the other hand, a voiceless

Copyright 1978 by the American Psychological Association, Inc. 0096-1523/78/0404-0599$00.7S

599

600

R. DIEHL, J. ELMAN, AND S. McCUSKER

adaptor shifted the category boundary toward the voiceless end of the series. To account for these results, Eimas and Corbit proposed that (a) there are two detectors with different but partially overlapping ranges of sensitivity along the VOT dimension, (b) the perceptual boundary between the voiced and voiceless categories lies at the VOT value to which the detectors are equally sensitive, (c) repeated presentation of a feature to which one of the detectors is highly sensitive will eventually fatigue that detector, rendering it less sensitive across its entire input range, and (d) the differential reduction in the sensitivity of one of the detectors will displace the point of equal sensitivity (i.e., the category boundary) in the direction of the adapted category. These assumptions are now widely accepted and have been generalized to apply to a variety of different feature dimensions (e.g., Cole & Cooper, 1975; Cooper, 1974a; Diehl, 1975, 1976). An alternative explanation of the boundary shift is that the adapting sitmulus serves as an anchor or reference that modifies the subject's phonetic decision criterion. What makes this especially plausible is that the adapting stimuli used in these experiments are typically selected from the extremes of the test series and are thus good exemplars of their respective categories. Nevertheless, this type of account, which we refer to as the contrast (as opposed to the sensory fatigue} hypothesis, has been rejected on the basis of a number of empirical findings. We now present and evaluate these findings to determine whether they, in fact, justify a rejection of the contrast hypothesis. First, adaptation may produce not only a shift in the location of the category boundary but also a comparable shift in the location of peak discriminability along the stimulus dimension (Cooper, 1974a; Eimas & Corbit, 1973). If the ABX discrimination test used in these studies constitutes a relatively criterion-free measure of sensitivity, it seems reasonable to conclude, following Cooper (1974a), that the contrast hypothesis does not provide a satisfactory account of such shifts in peak

discriminability. However, more recently Cooper (1975) pointed out that the ABX procedure may not be a criterion-free measure of sensitivity as earlier assumed. Rather, the peaks in the discrimination function may reflect covert identification performance that is subject to shifts in response criterion. By such an account, shifts in peak ABX discriminability are quite consistent with the contrast hypothesis. To obtain a truly criterion-independent measure of sensitivity changes following adaptation, Cooper, Ebert, and Cole (1976) applied the techniques of signal detection theory. Instead of the usual phoneme identification, subjects were required to assign the numbers 1 through 7 to the seven test stimuli, before and after adaptation. (The stimuli, which varied in transition duration, ranged perceptually from [ba] to [wa].) Responses were compiled into a confusion matrix, and d' values for pairs of successive stimuli were estimated. After adaptation with each of the series end-point stimuli, the cumulative d' was reduced, which prompted the authors to conclude that "a primary alteration in listener's sensitivity accompanies selective adaptation to speech sounds" (pp. 101102). There are, however, at least two reasons for questioning this conclusion. First, the changes in d' for individual stimulus pairs were highly variable, with 5 of 12 (6 pairs X 2 adaptation stimuli) d' values actually increasing after adaptation. Second, those decreases in d' that did occur were in regions of the stimulus continuum relatively far from the location of the adapting stimulus. After [ba] adaptation, a d' decrement was observed only for stimulus pairs near the [b]-[w] boundary or in the [wa] region of the stimulus series, and similarly, after [wa] adaptation, the primary reduction in d' occurred in the [ba] region of the series. It is not clear how any reasonable version of the sensory fatigue model could account for these rather paradoxical findings. In any case, the reliability of the observed changes in d' appears somewhat doubtful, given the high variability of the data.

CONTRAST EFFECTS ON CONSONANT IDENTIFICATION

The occurrence of "cross-series" adaptation effects is another frequently cited reason for dismissing the contrast hypothesis. A cross-series effect refers to a boundary shift produced by an adaptor that shares a phonetic feature with some of the test stimuli but is phonemically distinct from them. For example, Eimas and Corbit (1973) found that a [da] adaptor reduced, and a [tha] adaptor increased, the number of voiced responses to a [b]-[ph] test series. Similar cross-series effects have been observed for the place-ofarticulation dimension (Cooper, 1974a; Cooper & Blumstein, 1974; Diehl, 1975) and the stop-glide dimension (Cooper et al., 1976; Diehl, 1976). It has been argued (Cooper, 1974a; Cooper & Blumstein, 1974; Eimas, 1975) that a simple contrast interpretation would describe the effect of adaptation as a response tendency to assign test stimuli to a different phoneme category from that of the adaptor. This version of the contrast hypothesis fails to predict cross-series effects and therefore appears incorrect. It is not evident, however, that contrast effects must be phoneme based in the above sense. If there are feature-specific contrast effects (e.g., a greater tendency to label stimuli as voiced in the context of clear voiceless exemplars), then cross-series adaptation effects may well be due to contrast rather than sensory fatigue. Later we shall present experimental evidence for the occurrence of such featurespecific contrast effects. Finally, results of several experiments by Sawusch and Pisoni (Note 1) and Sawusch, Pisoni, and Cutting (Note 2) have often been cited as evidence against a contrast account of adaptation effects. In the first of these studies, subjects identified two types of stimulus series, one consisting of seven synthetic speech sounds ranging from [ba] to [pha] in equal VOT increments and the other consisting of seven tones (to be labeled "loud" or "soft") varying in equal decibel increments of intensity. Three taped versions of each of these series were presented. In the control tape, each stimulus occurred equally often, whereas in the two anchor tapes, a series

601

end-point stimulus occurred twice as often as any of the other stimuli. A formalization of the contrast hypothesis known as adaptation-level theory (Appley, 1971; Helson, 1964) predicts that the effect of the more frequently occurring anchor stimulus will be to shift the subject's category boundary toward that stimulus. If this were indeed the outcome for the speech stimuli, then one'might reasonably argue that adaptation effects are simply contrast phenomena, with the adaptor serving as an anchor. However, no such boundary shift occurred for the speech stimuli, even though large shifts were observed for the tone series. Sawusch and his colleagues therefore concluded that adaptation effects in speech perception are probably not the result of contrast. Several types of empirical results lead us to question this dismissal of the contrast hypothesis. First, in a recent study by Simon and Studdert-Kennedy (Note 3), which was similar to those of Sawusch and Pisoni (Note 1) and of Sawusch et al. (Note 2), significant anchor effects were obtained along a stop consonant dimension (although not when the conditions used by Sawusch et al. were precisely replicated). Second, Brady and Darwin (cited in Darwin, 1976) demonstrated that the location of the phoneme boundary along the VOT continuum depends on the range values of the stimuli presented. As the entire range is displaced to higher values of VOT, the boundary also shifts upward in VOT value. Such a result is obviously compatible with an adaptation-level or contrast account. Finally, experiments by Eimas (1963) showed that stop consonant identification is clearly influenced by surrounding stimulus items in the presentation schedule and that the nature of this influence is contrastive rather than assimilative. The present series of experiments was designed to explore further the contrastive role of adjacent stimuli in stop consonant identification. We wished to determine whether the contrast hypothesis may conceivably account for a significant portion of speech adaptation effects.

602

R. DIEHL, J. ELMAN, AND S. McCUSKER Experiment 1

In the study by Eimas (1963), context effects were measured for two series of stop consonants, one varying in place-of-articulation ([b]-[d]-[g]) and the other varying in voicing characteristic ([d]-[t]). Each stimulus was paired with another stimulus of either higher or lower value on the relevant acoustic dimension. As noted, a significant contrast effect was observed. For example, an item generally identified as FJt] in the context of [d] might instead be labeled [d] when in [t] context. For the Cd]~CO stimuli in particular, the magnitude of this contrast effect was positively related to the degree of separation between the two items along the stimulus dimension. This finding is predicted by adaptationlevel theory (Helson, 1964) and is similar to contrast results obtained with a variety of other stimulus sets (e.g., Helson, 1947). Experiment 1 replicated some of the general findings of Eimas (1963), using stimuli that varied in voicing characteristic. Method Subjects. Seventeen undergraduate introductory psychology students at the University of Texas at Austin received course credit for participating in this experiment. All were native English speakers and reported having normal hearing. Stimuli. Two tests tapes were prepared with the parallel-resonance speech synthesizer at Haskins Laboratories. The first, referred to as the strong context tape, included five three-formant consonantvowel (CV) syllables that varied in VOT value and were perceived as either the voiced stop [b] or the voiceless stop £ph], followed by [&~\. The five values of VOT (in milliseconds) were -100, 10, 26, 40, 100. (A negative VOT indicates that voicing precedes the release, whereas a positive VOT indicates just the reverse.) The central stimulus was chosen to coincide approximately with the English voicedvoiceless boundary (Lisker & Abramson, 1970) and was the only true test item. The two extreme stimuli, which were selected as clear exemplars of the voiced and voiceless categories, respectively, served as context items. The two remaining stimuli, with 10 and 40 msec VOT, were perceived as somewhat weaker versions of [bse] and [phae], respectively, and were used as filler items. The second, or weak context, tape was identical to the first except that the context stimuli had VOT values (in milliseconds) of 10 and 40 (rather than — 100 and 100). Thus the context items on this tape had the same VOT values as the filler items

and were less distinctive then their counterparts on the first tape. On both tapes the order of stimulus presentation was as follows: A context stimulus occurred first, followed by the test stimulus which was in turn followed by two filler items. This four-item sequence was repeated 40 times per tape with the two context stimuli appearing equally often and otherwise randomly. (The two fillers also occurred equally often and randomly.) The interval between successive items on both tapes was always 1.5 sec (offset to onset). All stimuli were 460 msec in duration except for the one context item, which included an additional 100 msec of prevoicing. The formant transitions were approximately linear and lasted 40 msec. There was an amplitude rise during the first 20 msec of the transitions and a decay during the last 40 msec of the steady-state (vowel) portion of the stimulus, but fundamental frequency was held constant throughout. The starting frequencies of the secondand third-formant transitions were 770 Hz and 2020 Hz, respectively, and the steady-state formant frequencies were 740 Hz (first formant), 1620 Hz (second formant), and 2860 Hz (third formant). The starting frequency of the first-formant transition was 230 Hz for the prevoiced context stimulus but varied among the other stimuli as a function of VOT. Figure 1 displays spectrograms of the test stimulus and the two context items that appeared on the strong-context tape. Procedure. During an experimental session subjects listened to both tapes and identified each stimulus (test, contexts, and fillers) as either b or p on an answer sheet provided. Nine of the subjects heard the weak-context tape first; the remaining eight heard the strong-context tape first. The tapes were played on a Teac A2300S tape recorder; the signal was amplified and presented binaurally over Koss K6 earphones at a comfortable level.

Results and Discussion

Table 1 shows the number of test items identified as £b] in each of the four context conditions: strong |j)h] (100 msec VOT), strong [b] (-100 msec VOT), weak [ph] (40 msec VOT), and weak [b] (10 msec VOT). The effect of context for each experimental tape was obtained by subtracting the number of [b] responses that followed [b] context from the number that followed [ph] context. Since the maximum number of [b] responses in any of the four context conditions was 20, the measured context effect, D, was obviously restricted to the range of — 20 to 20 for each subject. A zero number indicates no context effect, a positive number indicates a context effect in the form of contrast (a

CONTRAST EFFECTS ON CONSONANT IDENTIFICATION

603

••,i;atiitrr«fflt««fi«tf».

STRONG [baeO CONTEXT

TEST

STIMULUS

STRONG fca&l CONTEXT

200 msec

Figure 1. Spectrograms of the test and strong-context stimuli used in Experiment 1.

tendency to label the test item as different from the context stimulus), and a negative number indicates a context effect in the form of assimilation (a tendency to identify the test item the same as the context stimulus). For the strong-context tape, subjects yielded a large and highly significant (p < .001) contrast effect, with a mean D score of 8. However, the mean D score for the weak-context tape was a negligible .06, clearly nonsignificant. As might therefore be expected, there was a highly significant difference (p < .001) between the measured D scores for the strong- and weak-context tapes. Order of tape presentation had no reliable effect on the D scores in either the strong- or weak-context conditions. The relative perceptual distinctiveness of the strong- and weak-context items may be assessed by comparing subjects' labeling consistency for the two types of stimuli. Whereas 98.7% of the occurrences of the strong-context stimuli were identified as expected, only 89.4% of the weak-context items were so identified. (Instances of "incorrect" labeling occurred about equally often for voiced and voiceless items.) That labeling performance for the weak-context stimuli was still quite accurate is important. It means that the absence of a contrast effect was not due merely to subjects' inability to distinguish between the two context items. One further result should be noted. For most subjects the test stimulus was located

slightly on the {\f\ side of the phoneme boundary when the effect of context was negligible. After both the weak [ph] and weak [b] context items, the test stimulus was identified as [b] 56% of the time. This imbalance was also reflected in the strongcontext condition, in which the test Table 1 Number oj Test Stimuli Identified as [b] in the Four Context Conditions of Experiment 1 Context stimuli Strong Subject

TS TB MB BS JN CW RW

we

BB MR KB AR KS CB CC CJ BL M

Weak

[jftelU.. ' [bag ]_ 100

20 20 18 11

19

12

12 1

13 10 20 16 14 20 20 20 17 12 17 20 16.47

6

3 1 2 10 11 9 IS 12 12 10 10 6 5 8.47

CP^]+40

19 18 13 4 4 13 8 14 9 10 20 8 11 17 2 11 11 11.29

[b*],io

16 17 13 3 1 8 5 9 IS 13 19 11 13 11 15 14 8 11.23

Note. Subscripts refer to voice onset time in milliseconds.

604

R. DIEHL, J. ELMAN, AND S. McCUSKER

stimulus was identified as [b] 82% of the time in [ph] context and 42% of the time in [b] context. Had the test stimulus been located precisely at the phoneme boundary (disregarding context), we would expect that even larger contrast effects would have been observed in at least the strong-context condition. Despite differences in stimuli and method, the overall results of the present experiment closely parallel those of Eimas (1963). The nature of the context effect observed is contrastive rather than assimilative, and, furthermore, the effect diminishes or disappears as the acoustic differences between context and test stimuli are reduced. These results also resemble those found in typical selective adaptation experiments. As noted earlier, the response shift obtained following adaptation is in the same direction as a contrast effect. Moreover, the magnitude of the adaptation response shift is smaller when the adapting stimulus is located near the phonetic boundary than when it is drawn from the extremes of the test stimulus dimension (Anderson, 1975; Miller, 1977; McNabb, Note 4). Thus, in both contrast and adaptation experiments, stimuli that are clear exemplars of their respective categories are most effective in inducing a change in labeling performance. Of course, the similar pattern of results found with the contrast and adaptation paradigms does not imply that the latter is in fact an instance of the former, but such is a possibility and is certainly worth exploring further. Experiment 2 As we have seen, one of the principal arguments against viewing adaptation effects as a form of contrast is the occurrence of cross-series adaptation effects (Cooper, 1974a; Cooper & Blumstein, 1974; Eimas, 1975; Eimas & Miller, in press). Recall that, in a cross-series experiment, the adapting stimulus is phonemically different from any of the test items but shares a phonetic feature (e.g., voiceless) with a portion of those items. It is reasoned

that contrast, if it has any role at all, would most likely operate at the level of the phoneme decision and that cross-series effects, which depend on quite subtle subphonemic feature relations, must therefore be due to sensory changes. The present experiment tested the validity of the main assumption on which the argument rests, namely, that there are no feature-specific contrast effects. Method Subjects. Fifteen subjects participated in the experiment. They were selected from the same population that was sampled in Experiment 1. Stimuli. The experimental tape contained threeformant CV stimuli originally produced on the Haskins parallel-resonance speech synthesizer according to parameter values used by Lisker and Abramson (1970). Excluding periods of prevoicing, all stimuli had a duration of 460 msec. The actual test stimuli consisted of two bilabial stop-[a] syllables with VOT values of 20 msec and 30 msec, respectively. Both of these stimuli were sufficiently close to the voicing boundary (Lisker & Abramson, 1970) as to be somewhat ambiguous between [ba] and [p^a]. Each test syllable occurred in four different contexts: after a strong [ba] (VOT = — 100 msec), a strong [pha] (VOT = 100 msec), a strong [ga] (VOT = — 100 msec), and a strong [kha] (VOT = 100 msec). (Recall that [b] and [g] are both voiced, whereas [ph] and [kh] are both voiceless.) After each test syllable, two filler items were presented (at random and with repetition allowed) from the following set of four: [ba] (VOT = 10 msec), [pha] (VOT = 40 msec), [ga] (VOT = 20 msec) and [kha] (VOT = 50 msec). The higher VOT values for the velar filler items reflect the higher value of the voicing boundary on the velar dimension (Lisker & Abramson, 1970). Each combination of context stimulus and test stimulus occurred 10 times, which, together with the filler items, amounted to a total of 320 items on the test tape. Apart from the constraint of equal frequency, the presentation of the eight combinations of context and test stimuli was random. The interval between each item was 2.2 sec (offset to onset). Figure 2 shows spectrograms of the four context stimuli used in the experiment. Procedure. The same equipment described in Experiment 1 was used to present the experimental tape. Subjects were instructed to label each item as b, p, g, or k on an answer sheet provided.

Results and Discussion The number of test items identified as voiced ([b] or [g]) in each of the four

CONTRAST EFFECTS ON CONSONANT IDENTIFICATION

605

Ul

at u.

200 msec Figure 2. Spectrograms of the four context stimuli used in Experiment 2.

contexts is displayed in Table 2. Consider first the bilabial contexts. When we pool the results for both test stimuli, the number of voiced responses is significantly greater in CPhH context than in \\>~\ context (p < .05), although the contrast effect is quite small (the mean D score is 1.73 of a maximum of 20). It is somewhat surprising that even though the test stimuli were identified as bilabial over 92% of the time, the velar contexts produced a larger contrast effect (D = 3.26) than did the bilabial contexts. This effect was highly reliable (p < .001), but it was just short of being significantly greater than the bilabial context effect (.05 < p < .10). As noted, subjects failed to identify the test items as bilabial less than 8% of the time. In each such instance the sound substituted for the bilabial was [g]. A question arises as to whether this may account for part or all of the contrast effect produced by the velar contexts. If so, the effect

would not be "cross-series" as assumed but rather a case of velar contexts affecting the perception of velar test stimuli. Fortunately, we can rule out this possibility because the number of [g] responses to test items was virtually the same in both the [g] and the [kh] contexts, 23 and 22, respectively. If the contrast effect produced by the velar contexts were dependent on the occurrence of [g] responses to the test stimuli, then there should have been a greater number of such responses in [kh] context than in [g] context. This demonstration of a cross-series (or featurespecific) contrast effect is important because it removes one of the principal objections to a contrast account of adaptation effects, namely, that cross-series effects must derive from sensory changes rather than changes in decision criteria. In the case of the bilabial contexts, the small contrast effect that occurred is more difficult to interpret. The number of [g]

R. DIEHL, J. ELMAN, AND S. McCUSKER

606

Table 2 Pooled Number of Test Stimuli Identified as Voiced in the Four Context Conditions of Experiment 2 Context stimuli

Subject IS RG JJ KD RE KR RZ SF RB MA SP NS DT JA LM

M

Bilabial

Velar

[pha]+ioo [ba]_ioo

[kha]+loo [ga]_ioo

11 10 10 9 14 20 13 10 14 14 11 10 10 15 10 12.07

10 10 11 11 13 13 11 11 9 10 10 11 6 8 11

10.33

15 14 11 10 14 19 15 11 14 10 10 13 11 10 12 12.60

10 9 9 9 12 14 15 9 8 9 6 8 5 8 9 9.33

Note. Subscripts refer to voice onset time in milliseconds. h

responses in Q) ]] context was substantially larger than in Q)^ context (35 vs. 13). It is conceivable that the observed difference in voiced responses in these two contexts was not a contrast effect at all but rather a natural outcome of the greater number of velar responses in the voiceless ([ph]) context. Given that the VOT value of the voiced-voiceless boundary has a higher positive value for velar than for bilabial stimuli (Lisker & Abramson, 1970; Miller, in press), a sound that would be labeled voiceless if judged to be bilabial might well be identified as voiced if judged to be velar. Thus the greater tendency to identify the test items as velar in [ph] context may itself have been responsible for the greater number of voiced responses in that context. The results of a separate experiment suggest that the above account is probably incorrect. Twenty-five new subjects listened to a tape that included only the bilabial stimuli used in Experiment 2 (contexts, test items, and fillers) separated by intervals of 2.4 sec. With labeling responses restricted to b and p, the subjects yielded

an average D score of 1.8 (of a maximum of 20), which is nearly identical to that obtained for the bilabial contexts in Experiment 2. The contrast effect observed in this subsidiary experiment, though small, was quite reliable (p < .001). It is likely, therefore, that the effect produced by the bilabial contexts in Experiment 2 was a genuine contrast effect. A comparison of the results of Experiments 1 and 2 suggests that the magnitude of the contrast effect diminishes as the test item is selected from points on the VOT dimension progressively farther removed from the phonetic boundary, that is, as the test item becomes a clearer exemplar of its phonetic category. A similar trend was noted by Eimas (1963). This pattern of results closely parallels that found in typical selective adaptation experiments in which the change in labeling performance is greatest at or near the phonetic boundary. Experiment 3 The results of Experiments 1 and 2 suggest to us the strong likelihood that many (if not all) adaptation effects are actually contrast effects. A potential objection to this interpretation is that we have not ruled out the converse possibility, namely, that contrast effects are instances of singletrial adaptation. It seems to us quite implausible to assume that feature detectors in the human speech processing system can be measurably fatigued after a single presentation of an adequate stimulus. But because there is some possibility of this, however small, it is important that we demonstrate a contrast effect that is not in principle attributable to any such fatigue process. This demonstration is provided in Experiment 3. Method Subjects. Twenty-one subjects who had not served in either of the previous experiments participated in Experiment 3. They were drawn from the same population sampled in Experiments 1 and 2. Stimuli. Of the stimuli described in Experiment 2, only the bilabial items were used in the present experiment. Thus there were two test stimuli (VOT values = 20 msec and 30 msec), two context stimuli

CONTRAST EFFECTS ON CONSONANT IDENTIFICATION (VOT values = - 100 msec and 100 msec), and two fillers (VOT values = 10 msec and 40 msec). Whereas in the previous experiments the test stimuli followed the context items, in this experiment they preceded them. A given test stimulus/context stimulus pair was separated by a 600-msec interval (offset to onset). These were followed by a pair of filler items (each selected at random and with repetition allowed) also having an interstimulus interval of 600 msec. The interval between successive pairs was 4 sec, and the test/context pairs always alternated with the filler pairs. Each of the two test stimuli occurred 10 times with each context stimulus, so, with fillers included, there was a total of 160 items on the experimental tape. Procedure. The tape was presented with the same equipment described in Experiment 1. Subjects were instructed to listen to each pair of stimuli and to identify the items as either b or p. They were further told not to write their answers until they had heard both members of a given pair.

Results and Discussion Table 3 shows the number of test items identified as [b~\ in each of the two contexts. When the results for both test items are pooled, the average D score is 1.52 (of 20 maximum). This contrast effect is significant (p < .01) and only slightly smaller than that produced by the bilabial contexts in Experiment 2 in which the contexts preceded the test stimuli, (A direct comparison between the effects obtained in Experiments 2 and 3 is not very informative because the interstimulus interval, which may be an important variable, differed considerably in the two cases.) Given that the context stimuli followed the test items in the presentation schedule, their effect on labeling performance obviously cannot be due to any process of sensory fatigue. It is reasonable to assume, therefore, that the effects observed in Experiments 1 and 2 are genuine contrast effects and not instances of single-trial adaptation. General Discussion On the basis of these experimental results, we think it likely that at least a component of the effect observed in selective adaptation experiments is due not to the fatigue of feature detectors but rather to a type of response contrast. It is natural

607

Table 3 Pooled Number of Test Stimuli Identified as [bj in the Two Context Conditions of Experiment 3 Context stimuli Subject MC RM RT LW TR JS TW RO GP GL VB KH DS MR MS CB CR LC KB BB

ss

M

[pha]+ioo 11 13 10 12 14 10 10 11 17 10 10 10 14 10 19 11 12 11 6 10 8 11.38

[baj-ioo 11 9 10 13 10 9 8 9 15 10 10 4 13 8 14 10 11 8 7 10 8 9.86

Note. Subscripts refer to voice onset time in milliseconds.

to consider whether this conclusion is unnecessarily weak. Are we perhaps justified in claiming that adaptation effects are entirely a matter of contrast? One fact that seems to argue against this strong claim is the rather small magnitude of the contrast effects observed (especially in Experiments 2 and 3). In defense of the strong claim, however, one may point out that adaptation effects on the VOT didimension are themselves typically quite small (Eimas & Corbit, 1973) and, furthermore, that a stimulus repeated many times over a period of a minute or longer might understandably yield a greater contrast effect (on following test items) than if it is presented only once. There are a variety of findings in the adaptation literature that, at first glance, appear to pose difficulties for the contrast hypothesis. For example, it is possible to produce adaptation effects on phonetic stimulus dimensions with nonspeech adap-

608

R. DIEHL, J. ELMAN, AND S. McCUSKER

tors (e.g., single formants, isolated formant transitions) when such adaptors have certain spectral similarities to the test items (Tartter & Eimas, 1975; Ades, Note 5). It is difficult to understand how adaptors that are perceptually so unlike the test stimuli might induce contrast effects. Nevertheless, if contrast can be based on subphonemic feature relations (see Experiment 2), conceivably it can also be based on even lower level acoustic similarities. We are now preparing to conduct a series of experiments, using the contrast paradigm, to test this possibility. In general, for any given adaptation experiment, there corresponds an analogous contrast experiment in which the adaptor is presented singly as a context stimulus and the test series is reduced to one or two items selected from near the phonetic boundary. This is very convenient because it allows us to test the contrast hypothesis directly for those cases that seem to be most naturally explained in terms of sensory fatigue. If, in each instance, the pattern of contrast results is similar to that of adaptation results, there would clearly be little need to invoke sensory fatigue explanations. In addition to the effects produced by nonspeech adaptors, the adaptation literature includes perceptuomotor effects (Cooper, 1974c; Cooper & Lauritsen, 1974), contingent effects (Cooper, 1974b; Ganong, Note 6) and source effects (Ades, 1976), all of which we intend to try to duplicate with a contrast paradigm. In an interesting variation of the adaptation procedure, Miller (1977) combined a dichotic ear-monitoring task with binaural adaptation. On a given test trial, subjects were presented a voiced stop in one ear and one of a number of different voiceless stops in the other ear. They were told to report only the sound heard over one preselected ear. After adaptation with a voiceless stop, subjects yielded a higher percentage of voiced responses per pair, even when the voiceless member of the pair had a large VOT value that placed it well within the voiceless category. Miller concluded that adaptation depressed the output function of the voiceless detector over its entire

range and not simply near the voicing boundary. It is possible, however, to offer an alternative contrast explanation of Miller's results. Stimuli that are unambiguous in isolation often become very difficult to identify when presented dichotically. This condition of uncertainty is similar to that which obtains when boundary-area stimili are presented binaurally, and as we have seen, such stimuli are quite amenable to contrast effects. Therefore, one might reasonably suppose that the labeling shift observed by Miller resulted from response contrast, not sensory fatigue. A recent study by Elman (1977) offers the best available evidence that adaptation effects may be entirely due to response contrast. He performed an analysis of adaptation data, using signal detection theory, and found that virtually all of the change in subjects' identification performance could be accounted for by a change in decision criterion (as against a change in d'). This was true for all three phonetic dimensions investigated. What makes Elman's findings perhaps more reliable than those of previous studies (e.g., Cooper et al., 1976) is the large number of observations per subject that he obtained (an average of 380 and 240 responses to each test item in the preadaptation and postadaptation conditions, respectively). Even if it is conceded that there may be a purely sensory component in speech adaptation effects, the interpretation of any given adaptation effect remains problematic if one is unable to factor out that component which is due to response contrast. In short, adaptation results may no longer constitute unambiguous support for particular hypotheses concerning feature detectors. We believe that future work in the area might profitably concern itself directly with nonsensory factors influencing the phonemic decision rather than treating such factors as mere "nuisance" variables. Reference Notes 1. Sawusch, J. R., & Pisoni, D. B. Category boundaries for speech and nonspeech sounds. Paper presented at the 86th meeting of the Acoustical Society of America, Los Angeles, November 1973. 2. Sawusch, J. R., Pisoni, D. B., & Cutting, J. E.

CONTRAST EFFECTS ON CONSONANT IDENTIFICATION Category boundaries for linguistic and nonlinguistic dimensions of the same stimuli. Paper presented at the 87th meeting of the Acoustical Society of America, New York, April 1974. 3. Simon, H. J., & Studdert-Kennedy, M. Anchoring effects on a synthetic consonant continuum. Paper presented at the 93rd meeting of the Acoustical Society of America, University Park, Pennsylvania, June 1977. 4. McNabb, S. D. Must the output of the phonetic detector be binary? (Research on Speech Perception, Progress Report No. 2) Bloomington: Indiana University, Department of Psychology, 197S. 5. Ades, A. E. Some effects of adaptation on speech perception, Cambridge, Mass.: Quarterly Progress Report of the Research Laboratory of Electronics, M.I.T., 1973, 211, 121-129. 6. Ganong, W. F. Amplitude contingent selective adaptation to speech. Paper presented at the 91st meeting of the Acoustical Society of America, Washington, D.C., April 1976.

References Abbs, J. H., & Sussman, H. M. Neurophysiological feature detectors and speech perception: A discussion of theoretical implications. Journal of Speech and Hearing Research, 1971, 14, 23-36. Ades, A. E. Adapting the property detectors for speech perception. In R. J. Wales & E. Walker (Eds.), New approaches to language mechanisms, Amsterdam: North-Holland, 1976. Anderson, F. Some implications for the operation of feature detectors in speech perception: Use of identification response time as a converging operation. Unpublished doctoral dissertation, Brown University, 1975. Appley, M. H. (Ed.). Adaptation-level theory. New York: Academic Press, 1971. Cole, R. A., & Cooper, W. E. Perception of voicing in English affricates and fricatives. Journal of the Acoustical Society of America, 197S, 58, 12801287. Cooper, W. E. Adaptation of phonetic feature analyzers for place of articulation. Journal of the Acoustical Society of America, 1974, 56, 617-627. (a) Cooper, W. E. Contingent feature analysis in speech perception. Perception & Psychophysics, 1974, 16, 201-204. (b) Cooper, W. E. Perceptuo-motor adaptation to a speech feature. Perception &• Psychophysics, 1974, 16, 229-234. (c) Cooper, W. E. Selective adaptation to speech. In F. Restle, R. M. Shiffrin, N. J. Castellan, H. R. Lindman, & D. B. Pisoni (Eds.), Cognitive theory (Vol. 1). Hillsdale, N.J.: Erlbaum, 1975. Cooper, W. E., & Blumstein, S. A "labial" feature analyzer in speech perception. Perception & Psychophysics, 1974, 15, 591-600. Cooper, W. E., Ebert, R. R., & Cole, R. A. Perceptual analysis of stop consonants and glides. Journal of Experimental Psychology: Human Perception and Performance, 1976, 2, 92-104.

609

Cooper, W. E., & Lauritsen, M. Feature processing in perception and production of speech. Nature, 1974, 252, 121-123. Darwin, C. J. The perception of speech. In E. C. Carterette & M. P. Friedman (Eds.), Handbook of perception (Vol. 7). New York: Academic Press, 1976. Diehl, R. L. The effect of selective adaptation on the identification of speech sounds. Perception & Psychophysics, 1975, 17, 48-52. Diehl, R. L. Feature analyzers for the phonetic dimension stop vs. continuant. Perception & Psychophysics, 1976, 19, 267-272. Eimas, P. D. The relation between identification and discrimination along speech and nonspeech continua. Language and Speech, 1963, 6, 206-217. Eimas, P. D. Speech perception in early infancy. In L. B. Cohen & P. Salapatek (Eds.), Infant perception: From sensation to cognition (Vol. 2). New York: Academic Press, 1975. Eimas, P. D., & Corbit, J. D. Selective adaptation of linguistic feature detectors. Cognitive Psychology, 1973, 4, 99-109. Eimas, P. D., & Miller, J. L. Effects of selective adaptation on the perception of speech and visual patterns: Evidence for feature detectors. In H. Pick & R. Walk (Eds.), Perception & Experience, in press. Elman, J. L. Sensory and cognitive components of speech perception. Unpublished doctoral dissertation, University of Texas at Austin, 1977. Helson, H. Adaptation-level as a frame of reference for prediction of psychophysical data. American Journal of Psychology, 1947, 60, 1-29. Helson, H. Adaptation-level theory. New York: Harper & Row, 1964. Lisker, L., & Abramson, A. S. The voicing dimension : Some experiments in comparative phonetics. In Proceedings of the Sixth International Congress of Phonetic Sciences, Prague 1967. Prague: Academia, 1970. Miller, J. L. Properties of feature detectors for VOT: The voiceless channel of analysis. Journal of the Acoustic Society of America, 1977, 62, 641-648. Miller, J. L. The perception of voicing and place of articulation in initial consonants: Evidence for the nonindependence of feature processing. Journal of Speech and Hearing Research, in press. Pisoni, D. B. Dichotic listening and processing phonetic features. In F. Restle, R. M. Shiffrin, N. J. Castellan, H. R. Lindman, & D. B. Pisoni (Eds.), Cognitive theory (Vol. 1). Hillsdale, N.J.: Erlbaum, 1975. Stevens, K. N. The potential role of property detectors in the perception of stop consonants. In G. Fant & M. Tatham (Eds.), Auditory analysis and the perception of speech. New York: Academic Press, 1975. Tartter, V. C., & Eimas, P. D. The role of auditory and phonetic feature detectors in the perception of speech. Perception & Psychophysics, 1975, 18, 293-298. Received October 31, 1977 •

Contrast effects on stop consonant identification.

Journal of Experimental Psychology: Human Perception and Performance 1978, Vol. 4, No. 4, 599-609 Contrast Effects on Stop Consonant Identification R...
933KB Sizes 0 Downloads 0 Views