Cognition 142 (2015) 1–11
Contents lists available at ScienceDirect
Cognition journal homepage: www.elsevier.com/locate/COGNIT
Spoken word recognition in early childhood: Comparative effects of vowel, consonant and lexical tone variation Leher Singh , Hwee Hwee Goh, Thilanga D. Wewalaarachchi Dept. of Psychology, National University of Singapore, Singapore
a r t i c l e
i n f o
Article history: Received 2 June 2014 Revised 12 May 2015 Accepted 13 May 2015 Available online 23 May 2015 Keywords: Language acquisition Word recognition Lexical tone
a b s t r a c t The majority of the world’s languages exploit consonants, vowels and lexical tones to contrast the meanings of individual words. However, the majority of experimental research on early language development focuses on consonant–vowel languages. In the present study, the role of consonants, vowels and lexical tones in emergent word knowledge are directly compared in toddlers (2.5–3.5 years) and preschoolers (4–5 years) who were bilingual native learners of a consonant–vowel–tone language (Mandarin Chinese). Using a preferential looking paradigm, participants were presented with correct pronunciations and consonantal, vowel, and tonal variations of known words. Responses to each type of variation were assessed via gaze fixations to a visual target. When their labels were correctly pronounced, visual targets were reliably identified at both age groups. However, in toddlers, there was a high degree of sensitivity to mispronunciations due to variation in lexical tones relative to those due to consonants and vowels. This pattern was reversed in preschoolers, who were more sensitive to consonant and vowel variation than to tone variation. Findings are discussed in terms of properties of tones, vowels and consonants and the respective role of each source of variation in tone languages. Ó 2015 Elsevier B.V. All rights reserved.
1. Introduction The ability to correctly identify the sounds used to contrast meaning in one’s native language is essential to language development. This set of sounds, the native phonological inventory, must be acquired and appropriately integrated into word representations in order to initiate and sustain the growth of a vocabulary. Thus far, experimental research on this process has focused almost exclusively on particular types of languages, such as English and French. These types of languages define changes in meaning by varying vowels and consonants. However, most children learn a native language where meaning is contrasted by three sources of phonemic Corresponding author at: Dept. of Psychology, AS 4, 03-40, 9 Arts Link, Singapore 117570, Singapore. http://dx.doi.org/10.1016/j.cognition.2015.05.010 0010-0277/Ó 2015 Elsevier B.V. All rights reserved.
variation: vowels, consonants and tones (Fromkin, 1978). Approximately 70% of the world’s languages are tone languages (Yip, 2002). As such, much of what is known about early phonological and lexical development does not capture the linguistic environment in which the majority of learners are immersed. On account of a near-exclusive focus on vowels and consonant systems in experimental research, a major theoretical gap exists with regards to our collective understanding of the role of lexical tones in the emergent lexicon. Moreover, there have been no direct comparisons of children’s sensitivity to vowels, consonants and tones as determinants of word meaning. This scenario is potentially limiting, as theories of language development would optimally draw from the natural phonological diversity manifest in human language. An absence of comparative research from tone languages necessarily constrains our interpretation of both research and
L. Singh et al. / Cognition 142 (2015) 1–11
theory in early language development, providing a strong impetus to investigate the acquisition of lexical tones alongside vowels and consonants. The purpose of the current study is to directly compare the development of consonant, vowel and tone sensitivity in one domain of language acquisition: spoken word recognition. One approach to researching children’s sensitivity to phonological variation in word recognition has been to investigate children’s abilities to recognize familiar objects when their labels are mispronounced (e.g. Dietrich, Swingley, & Werker, 2007; Havy & Nazzi, 2009; Mani & Plunkett, 2008, 2011; Nazzi, 2005; Nazzi & New, 2007; White & Morgan, 2008). Mispronunciation studies have been conducted using a variety of paradigms, such as habituation, name-based categorization and preferential looking. Preferential looking approaches typically involve presenting children with two visual objects: a target and a distractor. In a common instantiation of this paradigm, children view both objects for a short duration of time (the pre-naming phase). After this phase, a verbal label is presented for an equal duration of time (the post-naming phase). In some trials, the verbal label is correctly produced whereas in others, the label is mispronounced. Proportionate gaze fixations to the visual target are compared prior to and following the production of the verbal label to determine whether the target was identified as an appropriate referent in trials when labels were correctly pronounced versus when they were mispronounced. Comparisons of mispronunciation effects for vowels and consonants have yielded mixed results, which have been partially attributed to differences in linguistic functions served by vowels and consonants (Nespor, Peña, & Mehler, 2003). In infants and toddlers, some studies have suggested that consonants are more prominent than vowels in early lexical representations, demonstrating that consonant substitutions compromise target recognition to a greater extent than vowel substitutions (Havy & Nazzi, 2009; Nazzi, 2005; Nazzi, Floccia, Moquet, & Butler, 2009). Other studies have suggested that consonant and vowel substitutions influence word recognition in equal measure and that both types of segments are represented with equivalent strength in the developing lexicon (e.g. Floccia, Nazzi, Delle Luche, Poltrock, & Goslin, 2013; Mani & Plunkett, 2007). In the case of both vowels and consonants, sensitivity to mispronunciations appears to vary depending on the specific contrast used (Curtin, Fennell, & Escudero, 2009; Mani, Coleman, & Plunkett, 2008; Mani & Plunkett, 2008, 2011; Van der Feest, 2007; White & Morgan, 2008). Mispronunciation studies are not limited to infants and toddlers: in a study assessing mispronunciations in preschoolers, they demonstrate a transient phase of comparable sensitivity to vowel and consonant changes (Havy, Bertoncini, & Nazzi, 2011). In contrast to vowel and consonant variation, there have been no studies thus far to investigate effects of tone variation on familiar word recognition in native tone language learners. However, several studies have investigated tone sensitivity in discrimination paradigms as well as the integration of tones into novel words. For example, infants raised in a tone language environment appear to orient toward native tone categories prior to attuning to native vowel and consonant contrasts (Yeung, Chen, & Werker,
2013). In auditory word segmentation, tone language learning infants appear to integrate tones into wordforms in a language-specific manner by 11 months (Singh & Foong, 2012). Later, by 18 months, toddlers learning a tone language integrate lexical tones into newly learned words (Singh, Tam, Chan, & Golinkoff, 2014). Finally, studies with preschool and school-aged children demonstrate that native tone language learners can discriminate familiar words based on lexical tone with a high level of accuracy in auditory discrimination experiments (Burnham et al., 2011; Ciocca & Lui, 2003; Wong, Ciocca, & Yung, 2009). It should be noted that while the focus of these studies was auditory discrimination of tones, Ciocca and Lui (2003) and Wong et al. (2009) measured discrimination via a picture-pointing paradigm. In these paradigms, target words represented minimal tone contrasts. These paradigms are therefore similar to preferential looking in task demands, although data are derived from participants’ explicit verbal responses. In combination, these studies have advanced our understanding of the development of lexical tones by demonstrating a sustained sensitivity to native tone contrast from infancy to early childhood. However, the extent to which language learners are sensitive to tones in comparison to vowels and consonants remains unclear. When considering why vowels, consonants and tones may exert differential effects on word recognition, a potentially relevant factor is that each of these sources of lexical contrast is compositionally distinct. Lexical tone is defined by syllable-level shifts in fundamental frequency (voicesourced pitch). Secondary determinants of tone include duration, amplitude and voice quality (Howie, 1976), although native tone identification rests primarily on measures of fundamental frequency (Gandour, 1978, 1983; Kuo, Rosen, & Faulkner, 2008). In contrast to tones, vowels are primarily characterized by the height of the first three formants, and identified principally by the first and second formants (Reetz & Jongman, 2008). Like tones, vowels are defined by high concentrations of energy at lower frequencies and represent long-term, steady-state components of speech (Abramson, 1978). In further contrast to vowels and tones, consonants are defined by energy maxima at higher frequencies relative to tones and vowels and represent brief acoustic events, theoretically defined as formant transitions (Ladefoged, 2001). The structural distinctiveness of tones, vowels and consonants may lead us to venture that these three sources of phonological variation may impact upon lexical development in different ways. The goal of the present study was to investigate the relative impact of vowel, consonant and tone identity on emergent word knowledge in toddlers and preschool children. Previous studies investigating novel word learning have revealed that tone is recognized to be phonemic in toddlers: language-specific integration of lexical tone into word learning is evident at 2 years of age (Singh et al., 2014). As such, responses to mispronunciation effects were studied within a sample of toddlers (2.5–3.5 years), when vowels, consonants and tones are likely to be recognized as lexically relevant and substitutions in any of these phonemes treated as a mispronunciation. Our primary focus was on children’s relative sensitivity to tone, vowel and consonant substitutions when recognizing spoken words.
L. Singh et al. / Cognition 142 (2015) 1–11
2. Experiment 1a In Experiment 1a, sensitivity to vowel, consonant and tone mispronunciations as well as to correct pronunciations was investigated in a spoken word recognition task in toddlers and preschoolers. Relative sensitivity to each type of mispronunciation was investigated within each group. Due to early influences of tones in relation to vowels and consonants observed in infancy (Yeung et al., 2013), we hypothesized that younger children (toddlers) would be highly sensitive to tone changes due to the early availability of tones as a phonetic category. Our objective in investigating sensitivity to vowels, consonants and tones in an older sample of preschoolers was to seek evidence of continuity or change over the preschool years in light of transitions documented over this period for vowels and consonants (see Havy et al., 2011). 3. Methods 3.1. Participants Forty-nine native language speakers of Mandarin Chinese participated in the current study. The participant sample comprised 24 toddlers (mean age: 36 months; range: 30–42 months, 12 boys) and 25 preschoolers (mean age: 50 months; range: 45–57 months, 13 boys). All participants were bilingual (Mandarin-English) and were learning Mandarin as a primary language as confirmed by parents. All attended kindergartens or preschools with Mandarin instruction. Prior to testing, the experimenter initiated a short conversation lasting approximately 5 min with each child to determine the child’s level of Mandarin proficiency. All participants demonstrated native proficiency in Mandarin, defined as the ability to respond in Mandarin to open-ended questions with correct grammar, appropriate lexis and native pronunciation. Individuals who did not meet these specific proficiency requirements were not tested. Testing instructions were provided only in Mandarin and the session was conducted entirely in Mandarin. Data from 3 participants were excluded based on failure or refusal to complete the testing session. All participants were typically developing children and were performing at grade level. 3.2. Stimuli Visual stimuli consisted of 21 targets (18 test targets and 3 practice trial targets) which were common objects judged to be familiar to young children. There were also 21 distractors (18 test trial distractors and 3 practice trial distractors). Distractors were novel objects. Auditory stimuli consisted of a set of sentences produced by a native Mandarin Chinese speaker in infant-directed speech and were recorded in a sound-attenuated booth. 3.3. Apparatus and procedure The experiment was carried out on a Macintosh computer and began with a set of three task
familiarization trials. During this phase, participants saw a split-screen display of two familiar objects (e.g. a book and a ball) and were presented with the carrier sentence ‘‘(You) look! That’s the book!” spoken in Mandarin. Data from this phase were not analyzed: its primary purpose was to initiate participants to the paradigm. Following the task familiarization phase, the testing session began. There were 18 test trials in total. During nine of the test trials, participants saw a familiar target object accompanied by an unfamiliar distractor object (correct pronunciation trials). During nine test trials, participants saw a familiar target paired with an unfamiliar distractor, but the verbal label was mispronounced. Two mispronunciations were caused by a vowel substitution, three were caused by a consonant substitution and three were caused by a tone substitution.1 Each vowel and consonant substitution consisted of a single-feature change from the correct pronunciation of the target within the Mandarin Chinese phonological inventory. Tone changes could not be easily classified as single-feature or multiple-feature changes, but each combination of tone changes using tones 1, 2, and 4 was incorporated into the paradigm. Vowel mispronunciations trials consisted of a change in vowel quality due to roundedness and height. Consonant mispronunciation trials consisted of a change in place of articulation, manner of articulation and aspiration. Tone mispronunciation trials consisted of a change between Tones 1 and 2, 2 and 4, and 1 and 4. Tone 3 was not included as it is considered to be the least stable tone due to a context-dependent alternation to Tone 2 on account of the Tone 3 sandhi rule. For half of the infants within each age group, tone mispronunciations were reversed in direction (i.e. Tone 2 to Tone 1, Tone 4 to Tone 2 and Tone 4 to Tone 1) to mitigate possible effects of direction asymmetries in tone sensitivity previously reported in adults (Francis & Ciocca, 2003). As a result, there were two conditions (sequences A and B) and each participant went through either sequence A or sequence B (see a listing of stimuli for each sequence in Appendix A). Two sequences allowed for re-assignment of target-distractor pairings to mitigate effects of particular targets/distractor pairings. For all vowel and consonant mispronunciations, tones remained constant. Overall, it should be noted that the number of correct pronunciation trials was higher than the number of each type of mispronunciations. This is not an uncommon feature of infant word recognition paradigms, as the presence of familiar, known words is thought to help to sustain attention to the task (e.g. Ballem & Plunkett, 2005; Mani & Plunkett, 2007; Quam & Swingley, 2010; White & Morgan, 2008). It should be noted that mispronunciation paradigms have often incorporated familiar distractors. In our design, distractors were unfamiliar objects as adopted by White and Morgan (2008). Our rationale for using unfamiliar distractors was similar to that put forth by White and Morgan
1 In the original study, there was a vowel backness change trial from the apical vowels zhi3 to zi3 (from the back vowel /ɨ/ to the front vowel /ɯ/). However, this is primarily a consonant change in Mandarin. As such, the shift in this trial is not a true vowel mispronunciation trial and more likely construed as a consonant mispronunciation. Therefore, data for this trial were omitted.
L. Singh et al. / Cognition 142 (2015) 1–11
(2008). Specifically, in a context where children see two known objects with different names, a mispronunciation of the target may lead to a target preference because the mispronunciation is more similar to the target label than to the distractor label. An unfamiliar object provides a possible alternative referent for the mispronounced label and its presence may lead to a distractor preference in mispronunciation trials based on children’s early potential for forming novel mappings via disambiguation (Bion, Borovsky, & Fernald, 2013; Halberda, 2003). As such, a distractor preference in mispronunciation trials can be construed as a qualitatively distinct response to an absence of preference for either target or distractor upon hearing a mispronounced form (Mani & Plunkett, 2011). That said, the issue of how distractor familiarity influences naming effects and sensitivity to mispronunciations remains an empirical question as there have been no systematic comparisons of naming effects using familiar versus unfamiliar distractors. In a follow-up experiment (Experiment 1b) we examined toddler’s performance in an analogous task with familiar distractors to evaluate the impact of this methodological choice on our pattern of results. Test trials were presented in pseudo-randomized order. Four trial orders were created by listing trials in random order. Across the four lists, there were two different sets of target-distractor pairings. The auditory event for each test trial consisted of the carrier phrase (你看 那是 [target]” (English translation: ‘‘(You) look! That is a [target]”). Each trial was divided into two phases – pre-naming and postnaming. During each trial, participants were simultaneously presented with two horizontally aligned images on the screen at eye level – one being a familiar target stimulus, the other a novel distractor stimulus. The auditory and visual stimuli were synchronized such that the target word in each sentence appeared 2500 ms from the start of each trial. Visual stimuli were presented throughout the 5000 ms duration of each trial. Fixation to the target was compared during the pre-naming and post-naming phase to establish whether participants associated the auditory label with the visual target. Following each testing session, participants received a vocabulary test in Mandarin Chinese consisting of pairings of all of the stimulus items. When presented with a visual image of each pair, participants were asked to point to the target (correctly labeled) in Mandarin. The purpose of this was to determine whether participants demonstrated knowledge of names of the familiar objects. The toddler group obtained a mean score of 95% (range 89–100) and the preschooler group obtained a mean score of 96% (range 94–100).
4. Results Eye movements were coded frame-by-frame (every 33 ms.) for each test trial to generate values for proportion of total looking to target (PTL). PTL was defined as the looking time to the target (T) divided by the combined looking time to the target and distracter (D) combined, expressed as T/(T + D). Data for 25% of the sample was coded by a
second coder and there was high agreement between coders (r = .95, p < .001). Evidence of target recognition is typically inferred from the presence of a naming effect (Bailey & Plunkett, 2002; Meints, Plunkett, & Harris, 1999; Schafer & Plunkett, 1998; Swingley & Aslin, 2000). Naming effects are computed by the following formula: Proportion of Total Looking to the Target during the post-naming phase minus Proportion of Total Looking to the Target during the prenaming phase. The purpose of computing naming effects from both pre- and post-naming values is to mitigate effects of stimulus characteristics that may elicit preferential fixation independent of labeling. A significant positive naming effect created by an increase in fixation to target (PTL) between the pre- and post-naming phases is recruited as evidence for word recognition for a particular trial type. In contrast, a naming effect that does not deviate significantly from zero (i.e. no significant difference in PTL values between the pre- and post-naming phases) is presumed to indicate uncertainty on the part of the child as to the referent for the verbal label. Finally, a significant negative naming effect (i.e. significant decrease in target fixation during post-naming) suggests a distractor preference following naming and is often recruited as evidence for the formation of a novel mapping between the auditory label and the distractor object (Ballem & Plunkett, 2005; Mani & Plunkett, 2008). In addition to naming effects, a second measure derived from mispronunciation studies are mispronunciation effects. Mispronunciation effects are computed by the following formula: Naming effects for a correct pronunciation minus naming effects for a mispronunciation. These values are computed to analyze the extent to which responses to a mispronunciation deviate from those to a correct pronunciation and from other types of mispronunciations. PTL was calculated for each of the four pronunciation types (correct pronunciation; vowel mispronunciations; consonant mispronunciations; tones mispronunciations). Naming effects were calculated for each trial type. Trials were excluded where participants attended to the screen for less than 20% of the trial duration (3% for preschoolers, 4% for toddlers, or where participants did not fixate both the target and distractor during the pre-naming phase (7% for preschoolers, 8% for toddlers). Trials were also excluded for labels that participants did not correctly link to their meaning during the vocabulary test (3% for preschoolers and 5% for toddlers. On account of these exclusions, 13% of trials were excluded for preschoolers and 16% of trials were excluded for toddlers. The decision to exclude trials based on participants’ knowledge of words, an attentional criterion of 20% (1 s), and fixation to target and distractor during the pre-naming phase has been employed in previous mispronunciation studies (e.g. Altvater-Mackensen & Mani, 2013). Finally, while backness change trials were included in the study (see footnote 1), data from this trial was not analyzed. An initial analysis of pre-naming data was computed to ensure that fixation to objects during the pre-naming phase did not vary by pronunciation type or age. An age pronunciation type repeated-measures Analysis of Variance on pre-naming values (PTL) revealed no
PTL Post-naming minus PTL Prenaming
L. Singh et al. / Cognition 142 (2015) 1–11 0.15 0.1 0.05 0
Fig. 1. Naming effects for toddlers for correct pronunciations and mispronunciations (consonants, vowels, tones) for Experiment 1a. Error bars reflect SEM.
PTL Post-naming minus PTL Prenaming
collapsed across each set of consonant, vowel and tone mispronunciation trials and across the direction of tone change. Naming effects for each age group are depicted in Figs. 1 and 2. Results of the analysis of variance revealed a main effect of trial type (F(3, 141) = 3.8, p = .01, g2 = .07) and an interaction of trial type and age type (F(3, 141) = 3.07, p = .03, g2 = .06). Analyses were therefore conducted separately for toddlers and preschoolers. Mean looking times and standard deviations for pre- and post-naming phases are listed in Table 1.
0.15 0.1 0.05 0 Vowel MP
-0.1 -0.15 - 0.2
Fig. 2. Naming effects for preschoolers for correct pronunciations and mispronunciations (consonants, vowels, tones) for Experiment 1a. Error bars reflect SEM.
significant effects of age, pronunciation type or interaction of these factors on pre-naming values (p < .4). A 4 2 (trial type age) repeated-measures ANOVA was computed with naming effects as the dependent variable. As there were no effects of the type of feature substitution involved in vowel mispronunciation (height, roundedness), consonant mispronunciation (place, manner, aspiration) or tone mispronunciation (1/2, 2/4, 1/4) on naming effects within either age group, nor were there any effects of direction of tone change (p > .7), all analyses
In an initial set of planned analyses to determine whether participants demonstrated naming effects for each pronunciation type, naming effects were compared to zero for each pronunciation type via one-sample ttests. Naming effects were significant for correct pronunciations (t(23) = 2.97, p = .007, Cohen’s d: .84), but not for vowel (t(23) = .85, p = .41), consonant (t(23) = .96, p = .35) or tone changes. (t(23) = 1.61, p = .12). Toddlers therefore admitted correct pronunciations as acceptable labels for visual targets, but treated vowel, consonant and tone substitutions as mispronunciations. A second set of analyses focused on mispronunciation effects, specifically on the difference between naming effects for mispronounced trials and correctly pronounced trials. Naming effects for toddlers were entered into a within-subjects repeated measures ANOVA with vowel mispronunciations, consonant mispronunciations, tone mispronunciations and correct pronunciations as levels of the factor, trial type. Results revealed a main effect of trial type (F(3, 69) = 2.7, p = . 05, g2 = .07). Post-hoc pairwise comparisons were computed using Tukey’s HSD test with a significance criterion of p < .05. Comparisons were drawn between each type of mispronunciation and correct pronunciations. Naming effects for vowels and consonants were also compared in light of recent debate on the relative influence of vowels and consonants on word recognition (Floccia et al., 2013; Havy & Nazzi, 2009; Mani & Plunkett, 2007; Nazzi, 2005; Nazzi et al., 2009). Naming effects for tone mispronunciations were significantly less than those for correct pronunciations. However, naming effects for vowel mispronunciations and consonant mispronunciations did not differ from those obtained for correct pronunciations. Lastly, naming effects were not
Table 1 Mean looking times (ms.) for vowel, consonant, and tone mispronunciations and correct pronunciations. Standard deviations are in parentheses.
Prenaming Vowel Postnaming Vowel Prenaming Consonant Postnaming Consonant Prenaming Tone Postnaming Tone Prenaming Correct Postnaming Correct
Mean looking times to target for Experiment 1a (Toddlers)
Mean looking times to target for Experiment 1a (Preschoolers)
Mean looking times to target for Experiment 1b (Toddlers)
1086 1235 1084 1183 1448 1261 1312 1536
1236 1070 1435 1186 1343 1354 1251 1542
1343 1533 1403 1493 1385 1189 1274 1514
(620) (796) (558) (631) (465) (728) (282) (337)
(614) (624) (258) (553) (370) (673) (232) (355)
(296) (547) (377) (511) (519) (550) (259) (301)
L. Singh et al. / Cognition 142 (2015) 1–11
significantly different for vowel and consonant mispronunciation trials. 4.2. Preschoolers A parallel set of analyses was carried out for the older sample of preschoolers. First, a set of planned analyses were carried out to determine whether participants were sensitive to correct and incorrect pronunciations by comparing naming effects to zero for each pronunciation type. As with the younger sample of toddlers, in a series of planned comparisons, naming effects were significant for correct pronunciations (t(24) = 5.2, p < .0001, Cohen’s d: .1.41), but not for vowel (t(24) = 1.45, p = .35) or tone changes. (t(24) = .07, p = .94). Naming effects for consonant changes were significantly lower than zero (t(24) = 2.21, p = .04). Mean looking times and standard deviations for pre- and post-naming phases are listed in Table 1. A second set of analyses focused on mispronunciation effects, specifically on the difference between naming effects for mispronounced trials and correctly pronounced trials. Results revealed a main effect of trial type (F(3, 72) = 4.2, p = .008, g2 = .15). Post-hoc comparisons were computed using Tukey’s HSD test with a significance criterion of p < .05. Naming effects for tone mispronunciations were not significantly different those obtained for correct pronunciations. However, naming effects for vowel mispronunciations and consonant mispronunciations were significantly lower than those obtained for correct pronunciations. As with the toddlers, naming effects were not significantly different for vowel and consonant mispronunciation trials. In the present study, there were two methodological choices that may have influenced our pattern of results. First, children were presented with one familiar item and one unfamiliar item, a design adopted in some previous mispronunciation studies (Mani & Plunkett, 2011; White & Morgan, 2008). While this choice was made in the interests of limiting effects of the distractor label on participants’ visual choice, it is possible that unfamiliar distractors may have modified naming and/or mispronunciation effects. A systematic comparison of unfamiliar versus familiar distractors has not yet been conducted and it is therefore not clear whether our findings would generalize to more common instantiations where both objects are familiar (e.g. Mani & Plunkett, 2007; Mani et al., 2008). The second methodological departure from previous studies is the high item variability inherent in our task. An alternative would have been for each stimulus item to rotate across trial types, such that each item underwent a vowel, consonant, tone mispronunciation as well as a correct pronunciation. A follow-up experiment, Experiment 1b, was conducted to determine whether our pattern of results was modified by these methodological decisions. In Experiment 1b, we presented toddlers with two familiar items for each trial. Moreover, each object rotated across the four trial types across participants. We administered Experiment 1b with a sample of toddlers in order to determine whether the mispronunciation effects observed in Experiment 1a could be replicated in an alternative paradigm.
5. Experiment 1b The goal of Experiment 1b was to investigate effects of tone, vowel, and consonant mispronunciations under more conventional experimental conditions. In Experiment 1b, a sample of toddlers was presented with a set of familiar words linked to a familiar word alongside a familiar distractor. Across participants, each word served as a target and distractor and each word underwent each type of change (vowel, consonant, tone mispronunciation, correct pronunciation). 5.1. Participants Sixteen native language speakers of Mandarin Chinese participated in the current study (mean age: 33 months, range: 28 to 40 months). Seven participants were boys and 9 were girls. All participants were bilingual and were learning Mandarin as a primary language as confirmed by parents. Conditions on participation were identical to those in Experiment 1a. As before, participants’ knowledge of each test item was confirmed in a post-test. Words that participants did not recognize were excluded from analysis (6% of trials). 5.2. Stimuli Sixteen monosyllabic words that could undergo a phonotactically legal single-feature tonal, vowel and consonant mispronunciation served as stimuli. All items were judged to be early-acquired, imageable, concrete nouns and to be familiar to toddlers. Four versions of the experiment were created such that each correctly pronounced label was subjected to different single-feature mispronunciations in each version (see Table 2 for a list of target words for Version 1). For example, the target word ‘door’ was realized as /mən2/ (correct pronunciation) in Version 1, /mən1/ (tonal mispronunciation) in Version 2, /man2/ (vowel mispronunciation) in Version 3 and /nən2/ (consonant mispronunciation) in Version 4. As a result, each word underwent all pronunciation types (vowel/tone/consonant mispronunciation and correct pronunciations). Each word served as a target and distractor across participants. Two additional correctly pronounced words were included in each session as filler trials. These items did not rotate and remained constant across versions and were paired with familiar distractors, which also did not rotate. The rationale for including two filler items was to increase the number of correct pronunciation trials in order to sustain interest in the task. The reason that these items did not rotate across trial types was because it was not possible to devise a list of more than sixteen familiar items likely to be known by toddlers that could legally undergo different vowel, consonant and tone variations. Two practice trials containing correctly pronounced words preceded the test session. Data from practice trials and non-rotating (filler) trials were not included in our analyses. The presentation of trials was pseudo-randomized as in Experiment 1a. There was one necessary deviation from the design of Experiment 1a. In contrast to Experiment 1a, each
L. Singh et al. / Cognition 142 (2015) 1–11 Table 2 A list of stimulus items for Experiment 1b (Version 1). Distractor image
Heart Mountain Book Trousers
Door Plate Pig Rabbit
Xin1 Shan1 Shu1 Ku4
Correct Correct Correct Correct
Plate Door Egg Pear Rice Chicken
Egg Pear Chicken Rice Heart Mountain
Pan1 Men4 Den4 Lü2 San4 Xi1
Tone mispronunciation (2 to 1) Tone mispronunciation (2 to 4) Vowel mispronunciation (height) Vowel mispronunciation (roundedness) Consonant mispronunciation (place) Consonant mispronunciation (manner)
mispronunciation trial in Experiment 1b only comprised two types of single-feature mispronunciations instead of three. Vowel mispronunciation trials comprised singlefeature changes in roundedness and height, consonant mispronunciation trials comprised single-feature changes in manner and place, and tone mispronunciation trials involved single tone substitutions (1 and 2, 2 and 4). The rationale for this was there were limits on the number of items that could rotate across four changes while remaining phonotactically legal, all the while constituting familiar items for toddlers. As such, the number of mispronunciation trials had to be limited to two for each type of mispronunciation. This decision was partially informed by a relevant finding from Experiment 1a, which was that mispronunciation effects generalized across single-feature changes within vowel, consonant and tone changes, suggesting that using a subset of two types of mispronunciations per segment may not elicit major differences across experiments. The carrier sentence was identical to that in Experiment 1b. All stimuli were recorded by a native speaker of Mandarin Chinese in infant-directed speech. 5.2.1. Apparatus and procedure The apparatus and procedure were identical to Experiment 1a. 6. Results In an initial set of planned analyses to determine whether participants demonstrated naming effects for each trial type, naming effects were compared to zero via one-sample t-tests. Naming effects were significant for correct pronunciations (t(15) = 2.98, p = .03, Cohen’s d: .79), but not for vowel (t(15) = 1.23, p = .24), consonant (t(15) = .59, p = .35) or tone changes. (t(16) = 1.12, p = .25). As before, trials were excluded if participants did not know the word (5% of trials) or did not fixate the target and distractor during the pre-naming phase (2%) of trials leading to an overall exclusion rate of 7%. Naming effects for each trial type are displayed in Fig. 3. A second set of analyses was conducted to investigate mispronunciation effects. Naming effects were entered into a within-subjects repeated measures Analysis of Variance with vowel, consonant and tone
PTL Post-naming minus PTL Prenaming
Target image Practice
pronunciation pronunciation pronunciation pronunciation
(rotating) (rotating) (non-rotating) (non-rotating)
0.2 0.15 0.1 0.05 0 -0.05
-0.1 -0.15 0.2-
Fig. 3. Naming effects for toddlers for correct pronunciations and mispronunciations (consonants, vowels, tones) for Experiment 1b. Error bars reflect SEM.
mispronunciations and correct pronunciations entered as levels of the factor, trial type. Results revealed a main effect of trial type (F(3, 45) = 2.74, p = . 05, g2 = .15). Post-hoc pairwise comparisons were computed using Tukey’s HSD test with a significance criterion of p < .05. Naming effects for tone mispronunciations were significantly less than those for correct pronunciations. However, naming effects for vowel mispronunciations and consonant mispronunciations did not differ from those obtained for correct pronunciations. Lastly, naming effects for vowel and consonant mispronunciation trials did not differ significantly from each other. The purpose of Experiment 1b was to determine whether patterns observed in Experiment 1a were dependent on a particular set of experimental conditions (unfamiliar distractors, high item variability across participants, no target/distractor alternation). Analyses point to a highly similar pattern of results to Experiments 1a: toddlers demonstrated naming effects only for correct pronunciations, showing relatively weak mispronunciation (although statistically comparable) effects for vowels and consonants in comparison to correct pronunciations. Relatively strong mispronunciation effects were observed for tones in comparison to correct pronunciations. We chose to administer Experiment 1b to toddlers
L. Singh et al. / Cognition 142 (2015) 1–11
only as the purpose of the experiment was to determine whether mispronunciation effects would generalize to a more conventional paradigm. As the pattern of results was highly comparable across paradigms, we did not administer the experiment to a preschool sample. 7. Discussion In the present study, toddlers and preschoolers, who were native speakers of Mandarin Chinese, were tested on their sensitivity to vowel, tone and consonant substitutions within a spoken word recognition task. Both toddlers and preschoolers demonstrated significant naming effects for correct pronunciations but not for vowel, consonant or tone mispronunciations. Three primary findings emerged from this study. First, in toddlers, tone mispronunciation effects were quite strong, departing significantly from correct pronunciations, yet consonant and vowel mispronunciation effects did not depart significantly from correct pronunciations, as shown in Experiments 1a and 1b. Second, at both age groups, effects of consonant and vowel variation were closely coupled whereas effects of tone variation were dissociated. Finally, in the preschool years, there was a reversal in the pattern of results for each source of phonemic variation: tone mispronunciation effects attenuated and consonant and vowel mispronunciation effects appeared stronger, departing significantly from naming effects observed in correct pronunciations. The impact of each of these findings will be discussed in sequence. In the present study, in the younger sample of toddlers, vowel and consonant mispronunciation effects were relatively weak although comparable to one another. These findings are consistent with naming effects observed in toddlers in previous studies using single-feature vowel mispronunciations (Mani & Plunkett, 2011 and consonants (White & Morgan, 2008) and form part of a larger body of evidence suggesting that toddlers do not consistently draw lexical distinctions based on changes in a single phonological feature (see Havy et al., 2011; Nazzi, 2005; Pater, Stager, & Werker, 2004; Stager & Werker, 1997; Swingley & Aslin, 2007). By contrast, tone mispronunciation effects departed significantly from correct pronunciations and therefore appeared to be strong. It therefore appears that the representation of tone in toddlers is potentially privileged relative to vowels and consonants. We propose two possible related explanations for this finding. First, the timeline of phonetic category formation for vowels, tones and consonants may account for differences in lexical integration. An early tone advantage has been reported for infants in phoneme discrimination, where infants orient toward native phonetic categories for tone prior to consonants and vowels (Yeung et al., 2013). The basis for a tone advantage in early phonetic perception in infants may extend to older children and to the lexical level. However, it cannot be presumed that constraints on lexical development in the second and third years of life fully mirrors constraints on phoneme attunement established in the first year of life as such an account would predict earlier integration of vowels than consonants, which is not supported by the present study. However, it is possible
that tones are exceptional both in phonological and lexical development and exert a particularly early and potent influence in both domains in comparison to phonetic segments. Central to this argument is the question of why tones may be privileged in phonological and lexical development. It is possible a tone advantage is rooted in the medium through which tone is conveyed. Tones are conveyed primarily through fundamental frequency variation, which is a salient and common feature of input to infants and is the primary source of suprasegmental changes in vocal prosody (Fernald, 1989). Prosody represents a suprasegmental dimension of speech to which infants are singularly attentive from birth and it is primarily in this domain, that fetal and neonatal learning are observable (e.g. Christophe, Mehler, & Sebastián-Gallés, 2001; Kisilevsky et al., 2009; Sambeth, Ruohio, Alku, Fellman, & Huotilainen, 2008; Stefanics et al., 2009). As such, the attentional capture and precocious processing linked to vocal prosody may contribute to early extraction of linguistic information contained via suprasegmental or prosodic cues. It is therefore possible that tones exert an early influence on the perception and the developing phonological lexicon because they are embedded in a powerful medium for the allocation of attention and for early linguistic uptake, vocal prosody. A second major finding of the present study is that at both age groups, vowels and consonants are closely coupled and effects of tones are strikingly dissociated. Findings support the synchronous integration of vowels and consonants, both of which appear to be asynchronously integrated compared to tones. This finding complements several studies with adult tone language learners that have demonstrated a similar dissociation between tone and segmental (vowel/consonant) processing (e.g. Cutler & Chen, 1999; Taft & Chen, 1992; Tong, Francis, & Gandour, 2008; Tsang & Hoosain, 1979; Ye & Connine, 1999; Yip, Leung, & Chen, 1998). In adult studies, various factors have been invoked to explain differences in tone processing relative to other segments, including the relatively low information value of tones compared to vowels and consonants (see Tong et al., 2008), a more protracted processing window for tones relative to phonetic segments (see Cutler & Chen, 1997) and the fact that native tone language speakers associate tone with high flexibility and ‘dispensability’ in lexical reconstruction tasks relative to vowels and consonants (Wiener & Turnbull, 2014). Finally, in Mandarin Chinese, the size of the tone inventory is considerably smaller than the vowel and consonant inventory. It is possible that inventory size affects sensitivity to phonological contrast, although a statistical account awaits further investigation across tone languages. Perhaps the most intriguing finding to emerge from the current study was the reversal of mispronunciation effects for all three types of phonemes across age groups. It is unsurprising that sensitivity to vowel and consonant mispronunciations would strengthen over the preschool years, an effect also observed in pre-schoolers learning non-tone languages and presumably intrinsic to maturation (e.g. Havy et al., 2011). As noted by Havy et al. (2011), the preschool years are a period of aggressive cognitive and linguistic growth in multiple domains, such as executive
L. Singh et al. / Cognition 142 (2015) 1–11
functioning, memory, inhibitory control, as well as the growth in understanding and producing morphosyntactic relations, phonological development and lexical expansion. It is possible that these changes enhance sensitivity to segmental detail over the preschool years. However, an unexpected finding was that tone mispronunciation effects were relatively weak in older children (pre-schoolers) and relatively strong in younger children (toddlers). One possible account for age-based differences in tone sensitivity may derive from an emerging recognition on the part of preschool children that tones fulfil multiple functions. Specifically, all languages, tonal or not, modulate pitch to draw meaningful distinctions. For example, question/statement distinctions are largely driven by pitch variation (van Heuven & Haan, 2002) as are expressions of vocal emotion (Banse & Scherer, 1996) and changes in focus and stress (Fernald & Mazzie, 1991). Tone languages are no exception: intonational determinants of emotion, communicative intent and focus are conveyed largely through pitch variation in tone languages (e.g. Liu & Pell, 2012; Yuan, 2004, 2011). As such, a mature language learner must integrate lexical tones, but disregard intonational variation when identifying words. This process is complicated by the fact that while tones and intonation are separated via linguistic description, they are acoustically very similar (Beckman & Venditti, 2010; Ladd, 1996). The rather complex differentiation of pitch movements into tonal and intonational functions may pose a challenge for the young language learner. Previous research exemplifies the weight of these challenges, demonstrating that children learning non-tone languages start to attend to intonational cues in conjunction with lexical content quite late in development, between 4 and 5 years of age (Friend, 2003; Morton & Trehub, 2001; Quam & Swingley, 2012). Similarly, tone language learners only begin to reconcile question/statement intonation contours with lexical tone contours between 4 and 5 years of age (Singh & Chee, submitted for publication). Efforts to negotiate the complex and varying functions of pitch in preschool children could weaken
tone mispronunciation effects, even temporarily, until tone-intonation relations are reconciled. Further studies are underway to investigate the comparative effects of intonational versus lexical tone variation on word recognition in tone language learners to explore whether pitch movements are integrated into word representations as a function of their lexical and prosodic functions. To summarize, the objective of the present study was to investigate the relative effects of tone, vowel and consonant variation on early word recognition in toddlers and preschool children. Using a mispronunciation paradigm, variation in vowels, tones and consonants were treated as mispronunciations by toddlers and pre-schoolers. However, the magnitude of tone variation was relatively large in toddlers as compared with vowels and consonants. This pattern of results was reversed in pre-schoolers whereby tone mispronunciation effects were attenuated and consonant/vowel mispronunciation effects strengthened. Results point to commonalities in sensitivity to vowels and consonants over time and asynchronous sensitivity to lexical tone at both age groups. These results bear on the constitution of the developing lexicon and suggest that while all vowels, tones and consonants are recognized as phonemic after 2.5 years, relative sensitivity to each source of phonological variation changes markedly over the preschool years. A chief contribution of the current study is that extant knowledge on the place of vowels and consonants in early lexical representations appears not to generalize to lexical tones.
Acknowledgments This research was funded by a Faculty Start-up Grant and Thesis Support funds from the National University of Singapore to LS. We would like to thank Felicia Woo for assistance with stimulus recording, Darrell Loh Shiqi for assistance with coding, and Melissa Chee and Yi Fen Tay for assistance with recruitment of participants.
Appendix A Target and distractor images, verbal labels and feature changes (sequences A and B) for Experiment 1a. All transcriptions are in Hanyu Pinyin.
Sequence A Practice
Flower Door Tree
Paint Roller Printer Padlock
Hua1 Men2 Shu4
Practice Practice Practice
Correction Tape Shuttlecock Radio Iron Can Opener Watering Can
Xie2 Dan4 Chi3 Lu4 Yi1
Correct Correct Correct Correct Correct
Shoe Egg Ruler Deer T-shirt
pronunciation pronunciation pronunciation pronunciation pronunciation (continued on next page)
L. Singh et al. / Cognition 142 (2015) 1–11
Appendix A (continued)
Sequence B Practice
Watch Fork Dress
Accordion Big paper clip Oven
Biao3 Cha1 Qun2
Correct pronunciation Correct pronunciation Correct pronunciation
Cup Goat Clock Trousers Chicken Paper
French Horn Sticky Tape Stapler Calculator Calendar Bullhorn
Bei2 Yang4 Zhong4 Kou4 Ju1 Zi3
Pen Car Ball
Trolley Air Con Kettle
Di3 She1 Jiu2
Tone mispronunciation (1 to 2) Tone mispronunciation (2 to 4) Tone mispronunciation (1 to 4) Vowel mispronunciation (height) Vowel mispronunciation (roundedness) Vowel mispronunciation (backness) Data from this trial were not analyzed (please see Footnote 1) Consonant mispronunciation (place) Consonant mispronunciation (manner) Consonant mispronunciation (aspiration)
Oven Correction Tape Kettle
Zhu1 Xie2 Dan4 Chi3
Correct Correct Correct Correct
pronunciation pronunciation pronunciation pronunciation
Deer T-shirt Watch Fork Dress
Trolley Printer Paint Roller Air Conditioner Sticky Tape Bullhorn Can Opener Watering Can Calendar
Lu4 Yi1 Biao3 Cha1 Qun2
Correct Correct Correct Correct Correct
pronunciation pronunciation pronunciation pronunciation pronunciation
Cow Rice Noodle Trousers Chicken Accordion
Big Paper Clip Padlock Shuttlecock Calculator Radio Bullhorn
Niu1 Fan2 Mian1 Kou4 Ju1 Zi3
Pen Car Ball
Iron Stapler French Horn
Di3 She1 Jiu2
Tone mispronunciation (2 to 1) Tone mispronunciation (4 to 2) Tone mispronunciation (4 to 1) Vowel mispronunciation (height) Vowel mispronunication (roundedness) Vowel mispronunciation (backness) Data from this trial were not analyzed (please see Footnote 1) Consonant mispronunciation (place) Consonant mispronunciation (manner) Consonant mispronunciation (articulation)
Tree Correctly pronounced
Pig Shoe Egg Ruler
References Abramson, A. S. (1978). Static and dynamic acoustic cues in distinctive tones. Language and Speech, 21(4), 319–325. Altvater-Mackensen, N., & Mani, N. (2013). The impact of mispronunciations on toddler word recognition: Evidence for cascaded activation of semantically related words from mispronunciations of familiar words. Infancy, 18(6), 1030–1052. Bailey, T. M., & Plunkett, K. (2002). Phonological specificity in early words. Cognitive Development, 17(2), 1265–1282. Ballem, K. D., & Plunkett, K. (2005). Phonological specificity in children at 1; 2. Journal of Child Language, 32(1), 159–173. Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614. Beckman, M. E., & Venditti, J. J. (2010). Tone and intonation (2nd ed.). The Handbook of Phonetic Sciences, pp. 603–652.
Bion, R. H., Borovsky, A., & Fernald, A. (2013). Fast mapping, slow learning: Disambiguation of novel word-object mappings in relation to vocabulary learning at 18, 24, and 30 months. Cognition, 126(1), 39–53. Burnham, D., Kim, J., Davis, C., Ciocca, V., Schoknecht, C., Kasisopa, B., et al. (2011). Are tones phones? Journal of Experimental Child Psychology, 108(4), 693–712. Christophe, A., Mehler, J., & Sebastián-Gallés, N. (2001). Perception of prosodic boundary correlates by newborn infants. Infancy, 2(3), 385–394. Ciocca, V., & Lui, J. Y.-K. (2003). The development of the perception of Cantonese lexical tones. The Journal of Multilingual Communication Disorders, 1, 141–147. Curtin, S., Fennell, C. T., & Escudero, P. (2009). Weighting of vowel cues explains patterns of word-object associative learning. Developmental Science, 12(15), 725–731.
L. Singh et al. / Cognition 142 (2015) 1–11 Cutler, A., & Chen, H. C. (1997). Lexical tone in Cantonese spoken-word processing. Perception & Psychophysics, 59(2), 165–179. Dietrich, C., Swingley, D., & Werker, J. F. (2007). Native language governs interpretation of salient speech sound differences at 18 months. Proceedings of the National Academy of Sciences, 104(41), 16027–16031. Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants: Is the melody the message? Child Development, 1497–1510. Fernald, A., & Mazzie, C. (1991). Prosody and focus in speech to infants and adults. Developmental Psychology, 27(2), 209. Floccia, C., Nazzi, T., Delle Luche, C., Poltrock, S., & Goslin, J. (2013). English-learning one-to two-year-olds do not show a consonant bias in word learning. Journal of Child Language, 1–30. Francis, A. L., & Ciocca, V. (2003). Stimulus presentation order and the perception of lexical tones in Cantonese. The Journal of the Acoustical Society of America, 114(3), 1611–1621. Friend, M. (2003). What should I do? Behavior regulation by language and paralanguage in early childhood. Journal of Cognition and Development, 4, 161–183. Fromkin, V. A. (1978). Tone: A linguistic survey. New York: Academic Press. Gandour, J. (1983). Tone perception in far eastern-languages. Journal of Phonetics, 11(2), 149–175. Gandour, J. (1978). The perception of tone. In V. Fromkin (Ed.), Tone: A linguistic survey. New York: Academic Press. Halberda, J. (2003). The development of a word-learning strategy. Cognition, 87, 23–34. Havy, M., Bertoncini, J., & Nazzi, T. (2011). Word learning and phonetic processing in preschool-age children. Journal of Experimental Child Psychology, 108(1), 25–43. Havy, M., & Nazzi, T. (2009). Better processing of consonantal over vocalic information in word learning at 16 months of age. Infancy, 14(4), 439–456. Howie, J. M. (1976). Acoustical studies of Mandarin vowels and tones (No. 6). Cambridge University Press. Kisilevsky, B. S., Hains, S. M. J., Brown, C. A., Lee, C. T., Cowperthwaite, B., Stutzman, S. S., et al. (2009). Fetal sensitivity to properties of maternal speech and language. Infant Behavior and Development, 32(1), 59–71. Kuo, Y. C., Rosen, S., & Faulkner, A. (2008). Acoustic cues to tonal contrasts in Mandarin: Implications for cochlear implants. The Journal of the Acoustical Society of America, 123(5), 2815–2824. Ladd, D. R (1996). Intonational phonology. (Cambridge Studies in Linguistics 79.). Cambridge: Cambridge University Press. Ladefoged, P. (2001). Vowels and consonants: An introduction to the sounds of languages. Maldon, Mass & Oxford: Blackwell Publishers. Liu, P., & Pell, M. D. (2012). Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal emotional stimuli. Behavior Research Methods, 44(4), 1042–1051. Mani, N., Coleman, J., & Plunkett, K. (2008). Phonological specificity of vocalic features at 18-months. Language and Speech, 51, 3–21. Mani, N., & Plunkett, K. (2007). Phonological specificity of vowels and consonants in early lexical representations. Journal of Memory and Language, 57(2), 252–272. Mani, N., & Plunkett, K. (2008). Fourteen-month-olds pay attention to vowels in novel words. Developmental Science, 11(1), 53–59. Mani, N., & Plunkett, K. (2011). Does size matter? Subsegmental cues to vowel mispronunciation detection. Journal of Child Language, 38(3), 606. Meints, K., Plunkett, K., & Harris, P. L. (1999). When does and ostrich become a bird? The role of typicality in early word comprehension. Developmental Psychology, 35(4), 1072. Morton, J. B., & Trehub, S. E. (2001). Children’s understanding of emotion in speech. Child Development, 72(3), 834–843. Nazzi, T. (2005). Use of phonetic specificity during the acquisition of new words: Differences between consonants and vowels. Cognition, 98(1), 13–30. Nazzi, T., Floccia, C., Moquet, B., & Butler, J. (2009). Bias for consonantal information over vocalic information in 30-month-olds: Crosslinguistic evidence from French and English. Journal of Experimental Child Psychology, 102(4), 522–537. Nazzi, T., & New, B. (2007). Beyond stop consonants: Consonantal specificity in early lexical acquisition. Cognitive Development, 22(2), 271–279.
Nespor, M., Peña, M., & Mehler, J. (2003). On the different roles of vowels and consonants in speech processing and language acquisition. Lingue & Linguaggio, 2, 203–229. Pater, J., Stager, C., & Werker, J. (2004). The perceptual acquisition of phonological contrasts. Language, 384–402. Quam, C., & Swingley, D. (2010). Phonological knowledge guides 2-yearolds’ and adults’ interpretation of salient pitch contours in word learning. Journal of Memory and Language, 62(2), 135–150. Quam, C., & Swingley, D. (2012). Development in children’s interpretation of pitch cues to emotions. Child Development, 83(1), 236–250. Reetz, H., & Jongman, A. (2008). Phonetics: Transcription, production, acoustics and perception. Oxford: John Wiley & Sons. Sambeth, A., Ruohio, K., Alku, P., Fellman, V., & Huotilainen, M. (2008). Sleeping newborns extract prosody from continuous speech. Clinical Neurophysiology, 119(2), 332–341. Schafer, G., & Plunkett, K. (1998). Rapid word learning by fifteen-montholds under tightly controlled conditions. Child Development, 69(2), 309–320. Singh, L., & Chee, M. (submitted for publication). Effects of tone and intonation on spoken word recognition in early childhood. Singh, L., & Foong, J. (2012). Influences of lexical tone and pitch on word recognition in bilingual infants. Cognition, 124(2), 128–142. Singh, L., Tam, H. J., Chan, C., & Golinkoff, R. M. (2014). Influences of vowel and tone variation on emergent word knowledge: A cross-linguistic investigation. Developmental Science, 17(1), 94–109. Stager, C. L., & Werker, J. F. (1997a). Infants listen for more phonetic detail in speech perception than in word-learning tasks. Nature, 388(6640), 381–382. Stager, C., & Werker, J. F. (1997b). Infants listen for more phonetic detail in speech perception than in word-learning tasks. Nature, 388, 381–382. Stefanics, G., Háden, G. P., Sziller, I., Balázs, L., Beke, A., & Winkler, I. (2009). Newborn infants process pitch intervals. Clinical Neurophysiology, 120(2), 304–308. Swingley, D., & Aslin, R. N. (2000). Spoken word recognition and lexical representation in very young children. Cognition, 76(2), 147–166. Swingley, D., & Aslin, R. N. (2007). Lexical competition in young children’s word learning. Cognitive Psychology, 54(2), 99–132. Taft, M., & Chen, H. C. (1992). Judging homophony in Chinese: The influence of tones. In H. C. Chen & O. Tzeng (Eds.), Language processing in Chinese. Amsterdam: North-Holland/Elsevier. Tong, Y., Francis, A. L., & Gandour, J. T. (2008). Processing dependencies between segmental and suprasegmental features in Mandarin Chinese. Language and Cognitive Processes, 23(5), 689–708. Tsang, K. K., & Hoosain, R. (1979). Segmental phonemes and tonal phonemes in comprehension of Cantonese. Psychologia, 22, 222–224. Van der Feest, S. V. H. (2007). Building a Phonological Lexicon. The acquisition of the Dutch voicing contrast in perception and production. Ph.D. Dissertation, Radboud University, Nijmegen, Utrecht: Prince Productions B.V. van Heuven, V. J., & Haan, J. (2002). Temporal distribution of interrogativity markers in Dutch: A perceptual study. Laboratory Phonology 7, 4(1), 61. White, K. S., & Morgan, J. L. (2008). Sub-segmental detail in early lexical representations. Journal of Memory and Language, 59(1), 114–132. Wiener, S., & Turnbull, R. (2014). Constraints of tones, vowels and consonants on lexical selection in Mandarin Chinese. In Paper presentation at 88th Annual Meeting of the Linguistics Society of America. Wong, A., Ciocca, V., & Yung, S. (2009). The perception of lexical tone contrasts in Cantonese children with and without Specific Language Impairment (SLI). Journal of Speech, language, and Hearing Research, 52, 1493–1509. Ye, Y., & Connine, C. M. (1999). Processing spoken Chinese: The role of tone information. Language and Cognitive Processes, 14(5–6), 609–630. Yeung, H. H., Chen, K. H., & Werker, J. F. (2013). When does native language input affect phonetic perception? The precocious case of lexical tone. Journal of Memory and Language, 68(2), 123–139. Yip, M. (2002). Tone. Cambridge University Press. Yip, M. C. W., Leung, P.-Y., & Chen, H.-C. (1998). Phonological similarity effects on Cantonese spoken-word processing. Proceedings of the ICSLP, 1998, 2139–2142. Yuan, J. (2004). Perception of Mandarin intonation. Proceedings of ISCSLP, 2004, 45–48. Yuan, J. (2011). Perception of intonation in Mandarin Chinese. The Journal of the Acoustical Society of America, 130(6), 4063–4069.