Original Paper Phonetica 2013;70:207–239 DOI: 10.1159/000356194

Received: December 21, 2012 Accepted after revision: October 6, 2013

Sonorant Onset Pitch as a Perceptual Cue of Lexical Tones in Mandarin Tsung-Ying Chen

Benjamin V. Tucker

Department of Linguistics, University of Alberta, Edmonton, Alta., Canada

Abstract Lexical tone identification requires a number of secondary cues, when main tonal contours are unavailable. In this article, we examine Mandarin native speakers’ ability to identify lexical tones by extracting tonal information from sonorant onset pitch (onset contours) on syllable-initial nasals ranging from 50 to 70 ms in duration. In experiments I and II we test speakers’ ability to identify lexical tones in a second syllable with and without onset contours in isolation (experiment I) and in a sentential context (experiment II). The results indicate that speakers can identify lexical tones with short distinctive onset contour patterns, they also indicate that misperception of tones 213 and 24 are common. Furthermore, in experiment III, we test whether onset contours in a following syllable can be utilized by listeners in tone identification. We find that onset contours in the following syllable also contribute to the identification of the target lexical tones. The conclusions are twofold: (1) Mandarin lexical tones can be identified with onset contours; (2) tonal domain must be extended to include not just typical cues of tones but also coarticulated tonal patterns. © 2013 S. Karger AG, Basel

Introduction

© 2013 S. Karger AG, Basel 0031–8388/13/0703–0207 $38.00/0 E-Mail [email protected] www.karger.com/pho

Tsung-Ying Chen Assiniboia Hall 2–40 University of Alberta Edmonton, AB T6G 2C5 (Canada) E-Mail [email protected]

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

Lexical tone is often described in terms of tonal height targets, e.g. high fundamental frequency (F0) versus low F0, and tonal shape targets, e.g. flat F0, rising F0, falling F0, and so on [e.g. Chao, 1948; Howie, 1976; Shih, 1987]. These primary tonal targets have been argued to be the main perceptual cues for identifying lexical tone [e.g. Abramson, 1978; Gandour, 1978, 1981; Lin, 1988]. Much of the previous description of the production and perception of lexical tone has focused on tone in isolated contexts. The actual phonetic realization of lexical tone, however, greatly depends on the context in which it occurs, due to the influence of either adjacent or nonadjacent tonal targets. This variation has been referred to as tonal coarticulation [Shen, 1990; Shih, 1988; Wu, 1984; Xu, 1994], which mirrors segmental coarticulation [e.g. Gay, 1978; House and Stevens, 1963; Lindblom, 1963; Öhman, 1966; Strange et al., 1983]. The present article expands the investigation of the perception of lexical tone and the

55 24 213 51

Fundamental frequency (Hz)

180

160

140

120

100

80

2

4

6

8

10

12

Absolute time scale

Fig. 1. The first author’s production of [ɕi55] ‘west’, [ɕi24] ‘attack’, [ɕi213] ‘wash’, and [ɕi51] ‘drama’ in Mandarin.

208

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

Chen/Tucker

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

contribution of tonal coarticulation to tone perception. Specifically, do sonorant onset pitch contours act as perceptual cues for coarticulated lexical tones? There are four lexical tones in Mandarin, which are phonologically transcribed as 55 (high level), 24 (or 35, rising), 213 (or 214, convex), and 51 (falling) in Chao’s [1930] 5-digit system. These transcriptions generally match the F0 contours of isolated productions as found in figure 1, but coarticulation, as shown by Shen [1990] and Shih [1988], among many others, can change the shape of the lexical tone. For example, when a 51 tone is followed by another 51 tone, the first 51 is realized as 52 or 53; that is, the final pitch of the first 51 tone is assimilated to the high tonal onset of the following 51 tone. Shen and Lin [1991] and Shih [1987, 1988] have found that the production of tonal coarticulation in Mandarin is perseverative and assimilatory. For example, the height of a 55 tone is lowered when preceded by a 24 or 51 tone. The concave tone 213 undergoes ‘half-tone sandhi’, becoming a low-dipping 21 tone in nonfinal position [e.g. Chao, 1948, 1968; Chin, 1987; Zhang and Lai, 2010], lowering a following 55 to 44. Xu [1997] discovered an assimilatory carryover effect in disyllabic /mama/ words with different tonal sequences: at least the first 60% of the second tone is higher when preceded by a tone with a high tonal offset (i.e. 55 and 24). However, tonal coarticulation in Mandarin can also be anticipatory and dissimilatory. Xu [1994, 1997] investigated the production of tonal coarticulation in disyllabic and trisyllabic tonal sequences. Xu [1994, 1997] confirmed that when a 55 tone is followed by a 213 tone or its reduced variant, a 21 tone, the overall pitch height of the 55 tone is higher than a 55 tone in other contexts, indicating dissimilation from the incoming low tonal onset. Tonal coarticulation may completely change the contour shape of a lexical tone. For example, in the tone sequence 55-24-213, the tonal coarticulation process interpolates F0 between the offset of a 55 tone and the onset of a 213 tone, replacing the original shape of the intermediate 24 tone with a falling F0 contour [Shen, 1990; Xu, 1994]. In terms of perception, the assimilatory tonal coarticulation, reviewed above, has been found to be a perceptual cue for following tones. Peng [1997] found that in

Sonorant Onset Pitch Cue to Tone

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

209

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

Taiwanese (a Southern Min Chinese dialect), if the overall F0 of the first syllable tone is higher, then listeners predict that the following tone will start with a higher pitch as well. If F0 of the first syllable is lower, then listeners expect the following tone to have a lower tonal onset. Compensatory effects have also been found for coarticulated tones. For example, Xu [1994] and Chen and Xu [2006] found that since tonal coarticulation in Mandarin is largely assimilatory [Shen, 1990], Mandarin listeners tend to have a ‘dissimilatory perception bias’. That is, if a nonlexical level pitch (e.g. 33) is inserted between a 55 tone and a 21 tone (e.g. 55-33-21), the level pitch will be perceived as a 24 tone in this context. These researchers concluded that the level pitch is perceived as a rising tone by listeners as a consequence of their knowledge of assimilation; the onset of a 24 tone is raised with the preceding 55 tone, and the offset is lowered by the following 21 tone. Further, if the level pitch is preceded by a 21 tone and followed by a 55 tone (e.g. 21-33-55), the level pitch will be perceived as a falling tone. The research summarized thus far has focused on coarticulatory effects on the rhyme portion of a word. However, there are presumably other effects of tonal coarticulation which affect production and perception. For example, it is possible to see transitional coarticulatory effects on other parts of the syllable, like the onset of the second /ma/ in /mama/, where the transition falls on the nasal [e.g. Xu, 1997; see above]. These transitional coarticulatory effects on voiced syllable onsets, which we will refer to as sonorant onset tone contours or onset contours, may help understand the nature of the tonal domain. In previous literature, the nature of the tonal domain has undergone intense debate. Howie [1974] joined Cheng [1966] and Kratochvil [1968] in arguing that only F0 contours realized on the rhyme of the syllable can represent the typical contour shape of Mandarin lexical tones. Howie [1974], in a production study, found a uniform set of ‘basic curves’ across all syllable types. Howie [1974] thus concluded that ‘if these basic curves can be accepted as the characteristic shapes of the tonal contours of Mandarin, […] the domain of tone in Mandarin does not include initial voiced consonants or nonsyllabic vowels’. Lin [1995] conducted a perceptual experiment using gated stimuli and found that the first 80–100 ms of the F0 contour of Mandarin monosyllabic words were almost always identified as level tones, supporting the exclusion of voiced onsets and codas from the tonal domain. The results in Lin [1995], however, may be due to small changes in F0 in the stimuli (e.g. only 9 Hz in the first 80–100 ms for the 24 tone). Xu [1998, 2001] disagrees with the above research and asserts that the syllable is the domain of tone. For example, Xu [1998] finds that although the location of the tonal offset (either a high or low pitch) slightly varies depending on the lexical tone in Mandarin, it generally aligns with the end of the syllable, regardless of whether the syllable ends in a vowel or a nasal coda. We intend to investigate whether Xu’s [1998, 2001] conclusion is also borne out in terms of onset contours and in tonal perception. If the domain of tone is the syllable, we might expect perceptually to find that onset contour contributes to the perception of target tones but previous findings are inconclusive. For example, House [1990] investigated whether F0 movements in VNV sequences could be detected by listeners, and he found when the movements were located in the intervocalic nasal, listeners failed to perceive the movements. House [1990] interpreted this result as a masking effect in which F0 movements are masked by spectral changes in the intervocalic nasal, which implies that onset contours may not be perceived. However, the subjects in House’s [1990] experiment were not native speakers of any tone languages and thus they might be less sensitive to changes in F0.

210

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

Chen/Tucker

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

Later in Lee’s [2000] research, while not concentrating on probing the role of the onset contour, he did find that lexical tones of monosyllabic words with a sonorant onset are better recognized. However, Lee [2000] did not provide an explanation relating this finding specifically to perceptibility of onset contours. Moreover, since the stimuli were created in the same context – with a preceding 51 tone, the onset contour patterns under investigation were limited. Lee et al. [2008] further followed Gottfried and Suiter [1997] to investigate whether Mandarin lexical tones in full-vowel, silent-center and onset-only syllables can be correctly identified. The onset-only syllables with a voiceless obstruent onset included the first six glottal pulses of the vowel. For syllables with a sonorant onset, the six glottal pulses may have included the sonorant and part of the vowel, and F0 on these pulses is likely to be affected by the preceding tones in the carrier sentences. Lee et al. [2008] compared the correct identification rate and response latency of the voiceless obstruent onset syllables to the sonorant onset syllables in a post hoc analysis. The results, however, were inconclusive. The correct rate of sonorant onset identification is significantly higher, but the response latencies were not found to be significantly different. When presented with their original context, the identification rates are similar for obstruent and sonorant onsets but the response latencies are significantly shorter for sonorant onsets. Despite the inconsistent patterns, both results seem to suggest that listeners do benefit from onset contour in the identification task. In the present study, we build on the findings in Lee et al. [2008] by investigating the contribution of onset contours to the perception of lexical tone and the contribution of the preceding tone contexts. We control the sonorant onset type and the rhyme type to obtain similar onset contours for each of the four Mandarin lexical tones. We expect that this more careful design will allow us to better understand the role of onset contours in the perception of lexical tone. In addition to Lee et al. [2008], there are further reasons to suspect that Mandarin speakers incorporate onset contours as a perceptual cue of lexical tone. The first reason is that onset contours are realized with a rising, falling or level pitch which are the most salient perceptual cues (i.e. pitch shape) for identifying Mandarin lexical tones [Chandrasekaran et al., 2010; Gandour, 1981, 1983; Guion and Pederson, 2007; Huang, 2001]. As indicated in figure 2 when a 51 tone is followed by different lexical tones, the pitch shapes of the onset contours during the nasal in the second syllable are contrastive. The approximated beginning and end of the onset contours are indicated by the 13th and 20th point. Upon visual inspection, we find three possible patterns: (1) when a 51 tone is followed by a low-initial tone (213 and 24), F0 continues to dip to a lower target, (2) when a 51 tone is followed by a 55 tone, the onset contour involves a mildly rising pitch movement but tends to remain stable, (3) when a 51 tone is repeated on the second syllable, F0 is raised dramatically approaching the higher tonal onset of the second 51 tone. We predict that Mandarin listeners are sensitive to these dynamic onset contours and that these three patterns will play a role in identification. The second reason is that there is considerable evidence from secondary cue adaptation that onset contours may be a cue in tone perception. When F0 is not available, as in whispered speech, amplitude curve and syllable duration can assist Mandarin speakers in perceiving lexical tones [Fu and Zeng, 2000; Liu and Samuel, 2004]. F1-F2 harmony can also be incorporated as an indicator of pitch height [Higashikawa and Minifie, 1999]. For Japanese (a mora-rhymed accentual language), F0 contours are judged to be more acceptable when F0 turning points align with mora boundaries

Fundamental frequency (Hz)

250

51–213 51–24 51–55 51–51

200

150

Fig. 2. The F0 contours from

100

5

10 Normalized time scale

15

20

the first vowel to the second onset of [ɕiυ51 mɑ55] ‘treemother’, [ɕi51 mɑ24] ‘treehemp’, [ɕi51 mɑ213] ‘tree-horse’, and [ɕi51 mɑ51] ‘tree-scold’ by the male native Mandarin speaker. The solid line represents the syllable boundary.

Sonorant Onset Pitch Cue to Tone

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

211

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

[Nagano-Madsen and Eriksson, 1989]. Experimental results in Zsiga and Nitisaroj [2007] support Morén and Zsiga’s [2006] claim that mora is the tone-bearing unit in Thai, showing that the coincidence between F0 turning points and mora boundaries is crucial to correctly recognize Thai tones. Onset contours are not likely to be the primary perceptual cue for the perception of Mandarin tones, but are potentially strong secondary cues when other more salient cues are not available. Finally, as reviewed earlier, Mandarin speakers are capable of extracting tonal information from short segments as well. In addition to Gottfried and Suiter [1997] and Lee et al. [2008], Whalen and Xu [1992] excised different parts of a tonal syllable, which could be as short as 40 ms, and they found that Mandarin speakers can correctly identify the lexical tones with limited tone information. In the present set of experiments, the length of onset contours left in the auditory stimuli ranges from 50 to 70 ms, which is longer than the target segment duration in the studies by Gottfried and Suiter [1997], Lee [2009], Lee et al. [2008, 2009] and Whalen and Xu [1992]. While listeners can identify tones with short segments, we seek to establish the role of sonorant onset as part of the tonal domain. Three experiments were designed to explore the main research question: Can native Mandarin speakers identify Mandarin lexical tones with only onset contours. In experiment I, participants identified the second word of disyllabic sequences to investigate whether onset contours and the preceding tone in an isolated context play a role in identification. The disyllabic sequences were then embedded in a carrier sentence in experiment II to investigate the role of onset contours in a larger contextual domain. While much of the work reviewed has focused on the effect of the immediate context in isolated word paradigms, very little of this work has investigated more natural sentential contexts. It is possible that a listener may use the larger sentential context to gauge the speaker’s pitch range and use this additional information in tone identification. We believe it is important to investigate these topics not only for isolated word sequences but in sentential environments as well. We hope that using sentential contexts is a small step

toward understanding how tones are identified in actual connected speech. Experiment III examined whether a following onset contour (along with the preceding onset contour) facilitates identification of Mandarin lexical tones.

Experiment I Experiment I investigates the effect of onset contours on tone identification of Mandarin lexical tones using a four-alternative forced choice task. In this experiment, participants listened to the first word of a disyllabic nonsense sequence with or without an onset contour for the second word, and identified the second word as one of four lexical tones.

Method

212

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

Chen/Tucker

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

Stimuli The disyllabic nonsense stimuli were created following four criteria. First, each disyllabic stimulus was created from two real monosyllabic words in Mandarin, so that Chinese characters could be used by participants to identify the words associated with the specific tones. It was hoped that participants would realize that the goal of the experiment was to judge the lexical tones of the second syllable, and therefore we did not present tonal labels (e.g. T1, T2, etc.) to them. It is possible that some of the participants may not have understood the goal of experiment resulting in an increase of an individual participant’s data variance, as discussed later, but it was expected that most would successfully infer the goal of the experiment without any direct instruction. Second, the two individual words, despite being meaningful respectively, should form a meaningless disyllabic sequence to avoid biasing the participants toward certain choices with any specific semantic context. Furthermore, the nonsense sequences should not have any meaningful homophone in Mainland Mandarin. Third, the rhyme of either the first or second syllable must remain as consistent as possible to avoid any influence of intrinsic F0 on different vowels [e.g. Connell, 2002; Whalen and Levitt, 1995]. Fourth, the onset of the second syllable must be a nasal [n] or [m]. Sonorants such as glides, liquids, and nasals are possible Mandarin onsets, on which tonal contours are not obscured. We opted for nasals because the segmental boundary is acoustically more distinct than for glides and liquids [see Xu and Liu, 2006, 2007 for an approach to measure the sonorant boundary by referring to F0 peaks]. Following our criteria, three groups of the selected second words used in this experiment are listed in (1), categorized according to lexical tone. Finally, the first syllable is either [ȿu55] ‘loss’ or ‘book’ or [ȿu51] ‘tree’ or ‘method’. These two words were selected because the tones end with either a high or low tonal offset, generating two onset contour groups when followed by the four Mandarin lexical tones, and allowing us to examine the perceptibility of different onset contour patterns. Combining the selected first and second words together, a list of 24 tone pairs was created (2 First Tones × 3 Syllable Types × 4 Target Tones, see ‘Appendix A’ for the full list). (1) The second words of the disyllabic stimuli: High level Rising Convex Falling [mɑ24] ‘sesame’ [mɑ213] ‘horse’ [mɑ51] ‘scold’ [mɑ55] ‘mother’ [mɑu24] ‘feather’ [mɑu213] ‘5 to 7 p.m.’ [mɑu51] ‘hat’ [mɑu55] ‘cat’ [nɑu24] ‘hinder’ [nɑu213] ‘brain’ [nɑu51] ‘noisy’ [nɑu55] ‘bad’ The stimuli were recorded by a male native speaker of Beijing Mandarin born in Xi-An, China, who spoke English as a second language. The speaker was not aware of the purpose of the experiment. The recording was done in a sound-attenuated booth in the Alberta Phonetics Laboratory using an Alesis MultiMix® 8 USB FX mixer and a CountryMan EP6 head-mounted microphone and was recorded using Praat [Boersma and Weenink, 2012] at a sampling frequency of 44,100 Hz with a 16-bit sampling rate. The speaker was instructed to read each disyllabic stimulus aloud 5 times at a normal speech rate and to pause between each stimulus, which created 120 (24 × 5) disyllabic stimuli. The disyllabic stimuli underwent a sequence of editing steps for the perceptual experiment. (1) The F0 contour from the rhyme of the first syllable to the nasal onset of the second syllable (e.g.

Frequency (Hz)

5,000

0 0

Time (s)

0.6931

Fig. 3. Spectrogram of a pro-

duction of [ɕi55 mɑ55] by a male native Mandarin speaker.

Sonorant Onset Pitch Cue to Tone

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

213

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

[ʂu mɑ]) of the five recordings of each disyllabic sequence type was extracted as a 20-point normalized F0 contour in Praat with the TimeNormalizeF0 script (an earlier version of Yi Xu’s [2013] ProsodyPro). (2) The F0 contours from the utterances were averaged to generate a typical F0 contour of the disyllabic sequence type. (3) Using Praat the averaged F0 contour was resynthesized with each of the five recordings to replace the original F0 contour from the rhyme to the following nasal onset. The resynthesis process was not affected by variable lengths of different disyllabic stimuli because the 20-point averaged F0 contour was resynthesized with 20 corresponding identical intervals from the rhyme of the first syllable to the following nasal onset. Therefore, each time point of the F0 contour was anchored at the beginning point of the vowel and the end point of the nasal onset regardless of the length difference (fig. 4), essentially stretching or shrinking the F0 contour to the vowel-nasal sequence. The result of this procedure caused the F0 contour to match up the vowel-nasal boundary in slightly different places. (4) The resulting 120 stimuli were further manipulated resulting in 2 sets or a total of 240 (120 × 2) stimuli. In the first set, only the vocalic portion of the second syllable was removed, leaving the onset contour. The trimming process relied on the offset boundary of the nasal, which can be determined by either a release burst or a change from weaker nasal formants to stronger oral formants. The mean length of the nasal onset is 70.28 ms (SD = 11.14). In the second set, the entire second syllable was removed from the disyllabic stimuli. That is, the signal after the onset boundary of the nasal in figure 3 was trimmed off. With these two manipulations we expect to determine whether the Mandarin lexical tones can be perceived with and without onset contours. The intensity of the disyllabic stimuli was normalized by setting the absolute amplitude peak to 0.99 using Praat. The F0 contours with a preceding 55 and 51 tone on the disyllabic stimuli are illustrated in figure 4. The F0 contours in figure 4a, b are normalized for duration and averaged over the disyllabic stimuli. The vertical line represents the approximated boundary between the rhyme of the first syllable and the nasal onset of the target syllable. In figure 4 the divergence point of the F0 contours seems to occur earlier than the boundary between the preceding vowel and the following nasal onset (i.e. the 13th normalization point). Xu and Liu [2006] suggest this ‘misalignment’ to be an indicator of articulatory coordination at the beginning of a ‘real’ syllable onset. That is, before the end of the current syllable, the articulators necessary for the following syllable onset have started approaching their individual target. The abrupt changes in the acoustics (e.g. from a vowel formant structure to a nasal) in fact demonstrate the complete movement of the articulators (e.g. a complete oral closure to create a nasal). The early F0 divergence point also aligns with the early articulatory onset of the following syllable. We left these early coarticulatory movements in the stimuli, to investigate the contribution of these tone movements as compared to the movements in the full nasal onset contour. There are three visible onset contour patterns in figures 4a, b, respectively. When the preceding tone is a 55 tone as in figure 4a, the onset contour to the following 51 tone is the highest level pitch because of the higher initial target of a 51 tone. The onset contour to the following 55 tone is slightly lower than that to the following 51 tone. Finally, the two onset contours to the following 24 and 213 tones appear highly similar to each other, probably due to the identical low tonal onset of the two

F0 (Hz)

180

140

55–55 55–24 55–213 55–51

100 5

a

10

15

250

51–55 51–24 51–213 51–51

200 F0 (Hz)

20

Normalized time scale

150

100 5

b

10

15

20

Normalized time scale

Fig. 4. F0 contours of (a) a 55 tone followed by onset contours on the nasal onset and (b) a 51 tone followed by onset contours on the nasal onset in experiment I; the vertical line represents the 13th normalized point and the approximated syllable boundary.

lexical tones. If the preceding tone is a 51 tone, there are also three onset contour patterns in figure 4b. The onset contour to the following 51 tone is a steep rise toward the highest tonal onset of a 51 tone. A less dramatic rise occurs in the onset contour to the following 55 tone, and again the onset contours for 24 and 213 tones appear to merge with each other. This visual inspection leads us to predict that Mandarin listeners will have at least a three-way distinction in either context.

214

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

Chen/Tucker

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

Procedure The experiment was conducted using E-Prime® [Schneider et al., 2002] in the Alberta Phonetics Laboratory. During the four-alternative forced choice task, the participants sat in a sound-attenuated booth and wore headphones and responses were recorded using a keyboard. Prior to the beginning of the experiment, participants were asked to read the 12 characters used as responses in the four-alternative forced-choice task to assure that participants could recognize each character and pronounced them in the same way. If they failed to recognize any of the words (usually the two words with a very low token frequency, 孬 and 卯, table 4), the investigator would introduce the pronunciation of the word to them without noting the meanings of the characters to avoid distracting them from focusing on the auditory stimuli. The participants were then told that they would listen to the first word in a disyllabic nonsense sequence. They were informed that the second word had been trimmed, and that they should disregard the semantic content of the word and judge which word on the screen ‘sounds’ most similar to the trimmed second word. Before the formal session, there was a practice session composed of 8 stimuli, which were the first token of [ʂu55 mɑ55], [ʂu55 mɑ24], [ʂu55 mɑ213], and [ʂu55 mɑ51] with and without the nasal onset. In each practice trial, the character corresponding to the first word appeared on the computer screen. Below the first word, the four choices for the second word appeared on the screen as in (2) with simplified Chinese characters. English translations for the characters were not provided in the experiment but are provided here as annotations. In the experimental trial the first word was not presented graphically to help ensure that the participants focused on the target. Responses were recorded by pressing

1–4 on the keyboard, with each key corresponding to one of the words on the screen. Participants could replay the auditory stimulus by pressing ‘r’ on the keyboard. Preceding each trial, a visual fixation ‘+’ appeared in the center of the screen for 500 ms. The experimental trial, with the 240 stimuli, immediately followed the practice session. In the practice and the experimental trial, items were randomly presented. Experiments I, II and III were run in a single session; experiment presentation order was randomized for each participant. Participants were given an opportunity to take a break at the midpoint of each experiment. The experimental sequence lasted no more than 90 min and was usually completed in 40 min. (2) An example of the trial presentation in experiment I (English added): 书__ (book_____) 1. 妈 (mother) 2. 麻 (hemp) 3. 马 (horse) 4. 骂 (scold) Participants Six male and 28 female participants, from 19 to 24 years old, were recruited at the University of Alberta. All participants are Standard Mandarin native speakers born in Mainland China (Beijing, Guangxi, Fujian, Hebei, Henan, Hunan, Jiangsu, Jiling, Liaoning, Shandong, Shangxi, Sichuan, Suzhou, Xian, Zhejiang). All participants were enrolled as undergraduate students in an introductory linguistics class and received course credit for participation. None reported any hearing or reading impairment. All participants speak English as their second language. Some of the participants may speak one or more Chinese languages in addition to Mandarin (Cantonese, Henan, Guizhou, Nanjing, Southern Min, Shanghai, Shanxi, Wenzhou, Wujiang, Wuxi, Xuzhou). All 34 participants participated in all three experiments.

Results Data from 3 participants were excluded due to procedural errors. Data from 4 participants were excluded because they did not respond to one or more choices in any experiment regardless of the target tone, i.e. their judgments were extremely biased. Another participant was excluded because he reported that he rarely speaks Mandarin since moving from China to Canada. Consequently, the results of 26 participants (22 females and 4 males) are analyzed in this study. As a first step we followed Peng [1997] to calculate I-values [Green, 1964] using the formula in (3), where %FA represents the proportion of false alarms and %CR represents the proportion of the correct responses. The higher the I-value is, the lower the probability of a biased response is. There were four choices in this experiment, thus chance level of an I-value is 0.25.

(3)

I=

(1 – %FA + %CR ) . 2

Sonorant Onset Pitch Cue to Tone

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

215

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

As illustrated in figure 5, I-values are higher than chance when the nasal onset contour is present in the ‘with onset contour’ (WOC) condition. One-sample two-tail t tests show that the I-value is significantly greater than chance for all four tones: 55 [t(25) = 7.53, p < 0.001], 24 [t(25) = 3.77, p < 0.001], 213 [t(25) = 7.07, p < 0.001], 51 [t(25) = 4.79, p < 0.001]. In the ‘no onset contour’ (NOC) condition, only the I-value of a 51 tone is significantly greater than chance [t(25) = 2.81, p < 0.01]. Within-subject two-way ANOVAs were also conducted for each response category with Onset Contour (Yes/No) and Preceding Tone (51 and 55) as the independent variables. The WOC items

1.0

No onset contour With onset contour

Averaged I-value

0.8

0.6

0.4

0.2

0

Fig. 5. I-value of lexical tones

55

24

213

51

Lexical tones

with or without the onset contours in experiment I; the horizontal line represents the chance level of 0.25. Error bars stand for standard deviation.

216

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

Chen/Tucker

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

were found to be significantly more likely for three out of four response categories: 55 [F(1, 94) = 12.34, p < 0.05], 24 [F(1, 94) = 3.8, p < 0.05], 213 [F(1, 94) = 10.45, p < 0.05], 51 [F(1, 94) = 1.7, n.s.]. There was no significant main effect of Preceding Tone, and no significant interaction between Onset Contour and Preceding Tone was found. Thus, the higher I-values in the WOC condition are in general significantly higher than those in the NOC condition. The I-value analysis cannot examine the difference in the number of target responses in the WOC condition as opposed to the NOC condition – i.e. whether onset contours aid tone identification. Lower I-values only represent higher numbers of false positives. Thus, the number of correct and incorrect responses by each target response category was analyzed using the linear mixed-effect regression [Bates et al., 2013] package in R [R Team, 2012]. We see from the error bars of the I-values in figure 5 that the variation is relatively large, which is probably due to large intersubject variance; thus some of the subjects were really good at the task, and some were not. Linear mixed-effect regression can statistically account for some of the individual variation. This analysis method also allows us to include gradient variables as model predictors, either as control variables or variables of possible interest. We converted the individual data points into correct and incorrect responses (incorrect responses were collapsed over the three possible incorrect responses: illustrated in fig. 6) and performed a logistic regression. Each target response (55, 51, 213, 24) was analyzed separately (i.e. one model per correct response) with the variables Onset Contour (Yes/No), Preceding Tone (55 or 51), Speaker of Other Chinese Languages (Dialects as the variable name, Yes/No), log frequency of the target characters (logFTC), and log frequency of the response characters (logFRC). Two crucial two-way interactions are included in the model (Onset Contour-Preceding Tone and Onset Contour-Dialects). The frequency of each character is counted from Da [2004] and is log-transformed (‘Appendix B’). Subject was included in the model as a random effect.

Target: 55

Target: 24 100 Correct response (%)

Correct response (%)

100 80 60 40 20 0

55

a

With onset contour

80 60 40 20 0

51

No onset contour

55

b

First tone

Target: 213

Target: 51

c

100 Correct response (%)

Correct response (%)

100 80 60 40 20 0

51 First tone

55

60 40 20 0

51 First tone

80

55

d

51 First tone

Fig. 6. Correct response proportions of target tones (a) 55, (b) 24, (c) 21(3), and (d) 51 preceded by different tones in experiment I. Gray bars represent the proportions with no onset contour, and white bars represent those with onset contours. Horizontal lines represent the chance level of 25%.

Sonorant Onset Pitch Cue to Tone

Phonetica 2013;70:207–239 DOI: 10.1159/000356194

217

Downloaded by: Kungliga Tekniska Hogskolan 198.143.54.65 - 3/11/2016 7:50:35 PM

The results of each model are given in table 1. We find interactions between Onset Contour and Dialect for both 24 and 51 tones, but the effect of Dialect is inconsistent across these two tones; thus it may be that speaking more than one tonal dialect has advantages in identifying 24 onset contours but disadvantages in identifying 51 onset contours. We find that when the correct target tone is a 55 tone the predictors Onset Contour and First Tone interact. This interaction indicates that when the target 55 tone is preceded by another 55 tone, the difference between the WOC condition and the NOC condition is smaller than when the target is preceded by a 51 tone. No interactions were found when the target tone was 213. For target tones 55, 213 and 51, there was a significant main effect for Onset Contour, indicating that for these tones there was a significant improvement in tone recognition when the onset contour was present. Further, these same three target tones showed a significant effect for First Tone. For the target tones 213 and 51 having a preceding 51 tone increased the likelihood of correct identification, while for the target tone 55 having 51 precede it as opposed to 55 decreased the identification accuracy. One disadvantage to the analyses of proportion correct responses in the preceding analysis is that we ignore the responses to other tones. It is also important to investigate the misperception patterns which may shed light on the mismatch between the target

Table 1. Statistical tests with linear mixed-effect models in experiment I

Target

Estimate

z value

Pr(>|z|)

55

(intercept) OnsetC:Yes FirstT:51 Dialect:Yes logFAI logFRI OnsetC:Yes×FirstT:51 OnsetC:Yes×Dialect:Yes

1.90066 0.77618 –1.90441 0.37631 0.40611 –0.65376 0.90876 –0.26796

3.707 3.010 –9.140 0.764 10.194 –13.480 3.354 –0.945

Sonorant onset pitch as a perceptual cue of lexical tones in Mandarin.

Lexical tone identification requires a number of secondary cues, when main tonal contours are unavailable. In this article, we examine Mandarin native...
911KB Sizes 0 Downloads 0 Views