Native language affects rhythmic grouping of speech Anjali Bhataraa) CNRS (Laboratoire Psychologie de la Perception, Unit e Mixte de Recherche 8158), 45 rue des Saints-Pe`res, 75006 Paris, France
Natalie Boll-Avetisyan and Annika Unger Universit€ at Potsdam, Faculty of Cognitive Sciences, Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
Thierry Nazzib) CNRS (Laboratoire Psychologie de la Perception, Unit e Mixte de Recherche 8158), 45 rue des Saints-Pe`res, 75006 Paris, France
€hle Barbara Ho Universit€ at Potsdam, Faculty of Cognitive Sciences, Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
(Received 21 February 2012; revised 20 August 2013; accepted 17 September 2013) Perceptual attunement to one’s native language results in language-specific processing of speech sounds. This includes stress cues, instantiated by differences in intensity, pitch, and duration. The present study investigates the effects of linguistic experience on the perception of these cues by studying the Iambic–Trochaic Law (ITL), which states that listeners group sounds trochaically (strong-weak) if the sounds vary in loudness or pitch and iambically (weak-strong) if they vary in duration. Participants were native listeners either of French or German; this comparison was chosen because French adults have been shown to be less sensitive than speakers of German and other languages to word-level stress, which is communicated by variation in cues such as intensity, fundamental frequency (F0), or duration. In experiment 1, participants listened to sequences of coarticulated syllables varying in either intensity or duration. The German participants were more consistent in their grouping than the French for both cues. Experiment 2 was identical to experiment 1 except that intensity variation was replaced by pitch variation. German participants again showed more consistency for both cues, and French participants showed especially inconsistent grouping for the pitch-varied sequences. These experiments show that the perception of linguistic C 2013 Acoustical Society of America. rhythm is strongly influenced by linguistic experience. V [http://dx.doi.org/10.1121/1.4823848] PACS number(s): 43.71.Hw, 43.66.Mk, 43.71.Sy [BRM]
There is a long history of research showing that humans’ perception and processing of auditory stimuli is different for listeners of different languages. For example, the ability to discriminate and categorize consonantal sounds is affected by the native language phonemic inventory; it is easier to discriminate an acoustic contrast if it crosses a phoneme boundary in one’s native language than when it does not. This perceptual attunement, also referred to as perceptual reorganization, is a process that starts within the first year of life (Best et al., 1988; Werker and Tees, 1984). Languages also vary in their prosodic properties, mainly carried by three acoustic features: Fundamental frequency (F0), timing (duration, pauses), and intensity. The way these acoustic cues are used to mark a given function (for example, lexical stress, prosodic boundaries, etc.) varies across languages (e.g., Atterer and Ladd, 2004). Moreover, within a language, these
Author to whom correspondence should be addressed. Also at: Universite Paris Descartes, Sorbonne Paris Cite, Paris, France. Electronic mail: [email protected]
b) Also at Universite Paris Descartes, Sorbonne Paris Cite, Paris, France. 3828
J. Acoust. Soc. Am. 134 (5), November 2013
same cues are used to serve different kinds of linguistic and non-linguistic functions, including aiding in word segmentation (Cutler and Mehler, 1993; Jusczyk et al., 1999; Spinelli et al., 2010), marking prosodic and thereby syntactic boundaries (e.g., Johnson and Seidl, 2008; Venditti et al., 1996), characterizing the novelty vs givenness in an utterance (e.g., Fery and K€ugler, 2008), and communicating emotion (e.g., Banse and Scherer, 1996). Thus the acquisition of the prosodic properties of one’s native language and years of experience with these properties are likely to have a strong impact on adult processing. Following recent studies that showed reduced sensitivity to prosody at the word level in French infants and adults, the present study explores potential differences between the ways French and German listeners use the prosodic cues of duration, intensity, and pitch when grouping syllables while listening to streams of continuous speech. A. Stress “deafness” in French
An important difference between French and many other languages is that French does not use stress at the word level, whereas languages such as Spanish, English, and German use stress contrastively at the word level (Cutler and Mehler, 1993; Fery et al., 2011; Goedemans and van der
C 2013 Acoustical Society of America V
Hulst, 2011a). Many recent studies have suggested that this property of French has important implications for perception of stress in French listeners. Studies examining stress perception in adults have shown some indications of a relative “deafness” to prosodic features for French adults compared to Spanish adults (Dupoux et al., 1997; Dupoux et al., 2001). In Dupoux et al. (2001), participants had to map a sequence of target items on two previously learned nonwords. The two nonwords differed either in stress pattern or in segmental content. French listeners showed performance comparable to Spanish listeners in the segmental condition as well as in the stress condition if single tokens of the stimuli corresponding exactly to the two learned nonwords were used. However, if multiple tokens that varied in their F0 values were used in the stress condition, Spanish listeners significantly outperformed the French ones. More recently, similar cross-linguistic differences were found between French-learning infants and Spanish- or German-learning infants, suggesting attunement to the stress properties of the native language within the first year of life. Skoruppa et al. (2009) obtained results similar to those from adults when comparing French and Spanish 9-month-old infants. French infants’ discrimination performance was not different from Spanish infants when the stress patterns were instantiated by repetitions of the same disyllable, but a group difference arose when the stimuli consisted of different segmentally varying disyllables. With these materials, the Spanish infants still discriminated the stress patterns, whereas the French infants did not show any evidence for discrimination. Moreover, H€ohle, Bijeljac-Babic, Herold, Weissenborn, and Nazzi (2009) found that German 6-montholds showed a listening preference for a language-typical trochaic (strong-weak) pattern relative to an iambic (weakstrong) pattern while French 6-month-old infants did not show any preference at all. Further experiments revealed that 6-month-old monolingual French infants were able to discriminate the stress pattern, but this discrimination ability decreased by the age of 10 months. At this age, relative to both 6-month-old monolingual babies and 10-month-old bilingual babies (learning a language with contrastive wordlevel stress as well as French), French monolingual 10month-olds needed longer exposure to the experimental stimuli to show discrimination (Bijeljac-Babic et al., 2012). Differences in stress perception between German and French infants were found even earlier, by 4 months of age, using electrophysiological methods (Friederici et al., 2007). These studies do not show that French adults and infants are unable to discriminate between stressed and unstressed syllables. In fact, Dupoux et al. (Dupoux et al., 1997; Dupoux et al., 2001), Skoruppa et al. (2009), and H€ohle et al. (2009) showed that French participants can make distinctions between trochees and iambs when they are directly compared. However, they appear to be limited in the ability to encode stress along with other information, i.e., abstracting across multiple words or nonwords (or multiple tokens of these words) for storage in short-term memory (Dupoux et al., 1997; Skoruppa et al., 2009). Another way to state this is that they are unable to draw on higher-level phonological representations that may be available to listeners of a J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
language with word-level stress. This stress “deafness” may be lessened through musical training (Kolinsky et al., 2009), again suggesting that it is experience-dependent. Overall, the preceding studies establish clear differences in processing prosodic information between French and German/Spanish listeners and that these differences emerge very early in life. In the present study, we investigate whether the linguistic background of adult listeners has an impact on another aspect of prosodic perception: The grouping of a continuous stream of syllables into smaller chunks. Given evidence reported in the preceding text that language experience affects stress perception and given that French does not have stress at the word level (Fery et al., 2011), it is possible that native French listeners would have more difficulties than German listeners in using prosodic information (duration, intensity, and F0) to group syllables. Such cross-linguistic differences would be more marked when the stimuli reach a certain degree of complexity at which common prosodic properties may be easier to process if encoded on a more abstract level of representation. Alternatively, because both French and German use prosodic information to mark phrasal constituents, native speakers of both languages might be able to call on experience with prosodic information while processing sequences of continuous speech, thus leading to no group difference. B. The Iambic–Trochaic Law and the present study
The goal of the present study is to evaluate whether the Iambic-Trochaic Law (ITL)—a perceptual principle that may account for the rhythmic structure of metrical feet across languages (Hayes, 1995)—is modulated cross-linguistically, and in particular to determine whether a phenomenon similar to the stress deafness identified at the word level in French listeners is found in perception of rhythmic grouping. To do so, French and German listeners were compared. The ITL (Hayes, 1995) states that sequences of sounds that vary in loudness are grouped trochaically, i.e., in strongweak pairs consisting of the loud sound followed by the soft, whereas sequences that vary in duration are grouped iambically, in weak-strong pairs with the longer duration syllable coming second in the pair. Although not in the original formulation of the ITL, pitch may also play a role in grouping similar to that of intensity (e.g., Nespor et al., 2008). The ITL was tested in several studies, and evidence in its favor was found for the use of all three of these cues in linguistic or non-linguistic processing in adults: Studies examined either duration and intensity (linguistic: Hay and Diehl, 2007; non-linguistic: Bolton, 1894; Hay and Diehl, 2007; Iversen et al., 2008; Kusumoto and Moreton, 1997; Vos, 1977; Woodrow, 1909, 1951) or duration and pitch (linguistic: Nespor et al., 2008; Bion et al., 2011; non-linguistic: Kusumoto and Moreton, 1997). However, grouping by pitch may not be as consistent as with intensity (Hay and Diehl, 2007; Kusumoto and Moreton, 1997; Woodrow, 1911). Although the ITL is supported by several studies, only three of the studies mentioned in the preceding text investigated its potential cross-linguistic modulation. For nonlinguistic stimuli, two of those studies found differences in ITL-related performance between English and Japanese Bhatara et al.: Rhythmic grouping of speech
listeners (Iversen et al., 2008; Kusumoto and Moreton, 1997), while the other did not find differences in performance between English and French listeners (Hay and Diehl, 2007). Iversen et al. (2008; see also Kusumoto and Moreton, 1997) examined grouping of sequences of complex tones. In the stimuli for this study, every second tone varied in either intensity or duration. Participants had to indicate whether the rhythm of the sequences consisted of a strong sound followed by a weak sound or of a weak sound followed by a strong sound. They found that both language groups consistently classified the sequences with intensity differences as strong-weak (i.e., trochaic) but did not consistently classify the duration-varied sequences. Although the English listeners showed a strong preference for a short-long (iambic) grouping, the Japanese listeners showed more heterogeneous grouping patterns with a bias toward a long-short (trochaic) grouping. A follow-up study of Japanese- and Englishlearning infants revealed similar results. Before 7 months, neither group showed grouping preferences, but at 7–8 months, the English-learning group showed evidence of grouping the duration-varied sounds as iambs while the Japanese-learning infants did not (Yoshida et al., 2010). The authors explain these results by citing between-language word-order differences (Iversen et al., 2008) or different restrictions on the placement of long vowels at the end of words (Kusumoto and Moreton, 1997), both of which would result in more linguistic experience with a long-short pattern for Japanese listeners than for English listeners. To date, only one study has examined the crosslinguistic modulation of the ITL using linguistic stimuli (Hay and Diehl, 2007). In that study, the authors compared the ITL in listeners of French and English using both speech and nonspeech stimuli that varied in either intensity or duration. The speech stimuli were repetitions of the syllable /ga/. Similar to the present study, those languages were chosen because they differ in word-level prosody with English having word-level stress and French lacking it. Participants indicated whether they perceived strong-weak or weak-strong groupings. In line with the predictions of the ITL, participants consistently categorized sequences varying in intensity as strong-weak (i.e., trochaic) and sequences varying in duration as weak-strong (i.e., iambic). However, for neither speech nor non-speech stimuli were differences in categorization observed between linguistic groups, leading the authors to conclude that the effects of the ITL are not modulated by linguistic experience. The lack of a cross-linguistic effect in this task might be surprising given the evidence of relative stress deafness in French-speaking adults and French-learning infants reported in the preceding text (Bijeljac-Babic et al., 2012; Dupoux et al., 1997; Dupoux et al., 2001; Skoruppa et al., 2009). Following Iversen et al. (2008), one possible explanation is that French—in contrast to Japanese—shares relevant word order properties with English. A second possible explanation is that because French marks phrasal constituents with duration cues, French listeners are able to use these cues when processing sequences of sounds. However, in that context, it remains surprising that they could use intensity as well as duration given that the former is not a good marker of phrasal constituents in French. Additionally, the question of how 3830
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
they would fare with pitch variations remains unexplored. However, an alternative explanation for this lack of crosslinguistic variation may be related to the material and procedure used by Hay and Diehl that consisted only of repetitions of the single syllable /ga/ or a square wave with pauses of 200 ms between each syllable or wave. In addition, their stimuli were organized into blocks with one block of only duration-varied stimuli and one of only intensity-varied stimuli; this may have increased the ease of the task. Hence, it may be that the stimuli and the procedure were too simple to bring out differences between French and English listeners. Processing more complex speech stimuli will impose higher demands on memory and draw on additional languagespecific linguistic representations (this is an explanation that Dupoux et al., 1997; Dupoux et al., 2001; Skoruppa et al., 2009 used to account for their own results). Hence a possible explanation of the lack of cross-linguistic difference in that study is the fact that the task did not require a level of processing that is affected by prosodic experience. This implies that if stimuli of a higher degree of complexity are used, clearer effects of native language experience may be observed. The present study seeks to explore this question. C. Prosodic marking in French and German
In this section, we provide details about where stress is placed in both French and German and how the languages make use of changes in duration, F0, and intensity to acoustically mark stress, and we refer to these details in Sec. IV when interpreting the study’s results. Both languages use these three cues to mark phrase-level stress, but an important difference between the two languages is that German contrastively uses word-level stress (e.g., the German word modern means “to rot”’ when stressed on the first syllable, and “modern” when stressed on the second), whereas French does not (Fery et al., 2011). Here phrase-level stress is discussed first, followed by word-level stress. 1. Phrasal stress (French and German)
French is a language with a fixed phrasal stress at the final syllable of a prosodic phrase (e.g., Delattre, 1938; Jun and Fougeron, 2000, 2002; but cf. Goedemans and van der Hulst, 2011b),1 acoustically realized by both increased duration and F0 movement, generally rising utterance-medially and falling utterance-finally for declarative utterances (Fery et al., 2011; Jun and Fougeron, 2000, 2002; Michelas and D’Imperio, 2010a; also see Jun and Fougeron, 2002 for examples of French phrasal stress). Moreover, an optional secondary phrasal stress (structurally and perceptually distinct from “stress” at the word level) can occur initially in phrases with more than two syllables (and rarely in phrases with only two content word syllables). It is marked by a rise in F0 without lengthening (Jun and Fougeron, 2002), and the peak is not associated with a particular syllable but can occur on the first, second, or more rarely, the third syllable of the phrase (Jun and Fougeron, 2000, 2002; Vaissie`re, 1974; Welby, 2006). These two (early and late) rises of the accentual phrase are structurally different to give form to the phrase (Rolland and Lœvenbruck, 2002; Welby, 2006; Bhatara et al.: Rhythmic grouping of speech
Welby and Lœvenbruck, 2006). Intensity is not a relevant cue to mark phrasal stress (Delattre, 1966). Although there is no stress at the word level in French, a rise in F0 and/or lengthening of the final syllable can be used as cues for segmentation (Bagou et al., 2002; Banel and Bacri, 1994; Michelas and D’Imperio, 2010a; Rietveld, 1980) as can an F0 rise in the initial position (Spinelli et al., 2010; Welby, 2007). German phrasal stress is not fixed relative to phrase boundaries, but it is aligned with the stressed syllables of words (Atterer and Ladd, 2004; Dogil and Williams, 1999; Fery et al., 2011), and each phrase is organized around pitch accents, which are associated with lexical stress (Fery et al., 2011). These pitch accents are usually realized as rising tones (Grabe, 1998), but utterance-finally (in declarative utterances) they are falling (Dogil and Williams, 1999), and their structure varies depending on the discourse context, information structure, and pragmatic considerations (e.g., Fery and K€ ugler, 2008). Phrase-final stress can also be marked by final lengthening. However, relative to French, duration plays only a minor role in prosodic phrasing in German (Fery et al., 2011). As in French, intensity increases are the least important of the three cues (Jessen, 1999; Dogil and Williams, 1999), although French and German differ regarding the interaction of intensity and pitch: In German, both F0 and intensity are reliably higher for a syllable that has phrasal stress, whereas in French, F0 and intensity may be dissociated (Nespor et al., 2008; Vaissie`re and Michaud, 2006). 2. Word stress (German only)
Words in German are prosodically organized into trochaic feet with an initial strong syllable. Thus the majority of German disyllabic words have initial stress (e.g., Fery, 1998; Wiese, 1996, p. 282), primarily marked by an increased duration of the vowel (Dogil and Williams 1999; Jessen et al., 1995). Stressed syllables are sometimes accompanied by a pitch accent, so in these cases, they may be characterized by an increased F0 (Jessen et al., 1995). Furthermore, stress is acoustically realized by increased intensity of the vowel of the stressed syllable (Delattre, 1966); however, compared to duration contrasts, intensity contrasts are rather weak (Dogil and Williams, 1999; Jessen et al., 1995). D. Hypothesis
The studies reported here aim to reassess the potential of cross-linguistic differences in the manifestation of the ITL. Rhythmic grouping preferences may result from perceptual and/or cognitive biases. However, we hypothesize that the general mechanisms that may affect rhythmic processing are modulated by language-specific prosodic experience. The influence of language-specific knowledge on rhythmic processing should become particularly important and, hence, evident, if the auditory information has a certain degree of complexity. Specifically, we propose that experience with the prosodic systems of French and German would lead to differences in performance between linguistic groups on an ITL task. We attribute the lack of a difference between the French and English groups in Hay and Diehl (2007) to J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
their use of segmentally non-varying and hence overly simplified speech stimuli. Hence we predicted that with segmentally varying stimuli, a cross-linguistic difference would arise. Two experiments were conducted testing native listeners of German and French on their rhythmic grouping preferences for speech-like sequences. Both experiments used sequences of 16 phonetically different syllables, concatenated and co-articulated, testing the hypothesis that differences between the language groups would emerge in this condition. In experiment 1, syllables within sequences alternated in either duration or intensity, and in experiment 2, sequences instead varied in either F0 or duration. II. EXPERIMENT 1 A. Method 1. Participants
Participants were 40 (36 female, 4 male) monolingual German listeners aged 18-50 yr (M ¼ 24) tested in Potsdam and 40 (27 female, 13 male) monolingual French listeners aged 19-44 yr (M ¼ 27) tested in Paris. Their status as monolingual was verified by a questionnaire (based on the LEAP-Q; Marian et al., 2007). Those who had some school knowledge of L2 French or L2 German completed a proficiency test in their respective L2 to assess a potential influence of the L2. In addition, we recorded the number of each participant’s years of musical experience as there is evidence that musical experience may aid in perception of lexical stress (Kolinsky et al., 2009). 2. Stimuli
Sixteen different CV syllables were constructed by combining four long and tense vowels /e+/, /i+/, /o+/, /u+/ and four consonants of mixed manner and place of articulation /b/, /z/, /m/, /l/.2 This set of phonemes was selected for two reasons: First, they all are part of both the French and German phoneme inventories. Second, although they may not be perceived the same way by both groups (for example, the German /b/ may sound like a /p/ to the French group; both are voiceless and unaspirated, with short-lag voice onset time), they should nonetheless provide the same variability in segmental material for each group, i.e., the /b/ will not sound the same as the /z/, /l/, or /m/ for any of the group/voice combinations. In each stimulus sequence, each of the 16 syllables was presented twice, once in a strong and once in a weak position. This resulted in 32 syllables per sequence (e.g., /…zu+le+bo+li+lo+zi+mu+be+…/). Ninety sequences were generated from these syllables. The ordering of the syllables in the sequences was constrained such that they did not contain any syllable reduplications or strings of three identical consonants or three identical vowels. Moreover, we made sure that no CVCV string within a sequence would be a disyllabic word in German or French as listed in the CELEX database (Baayen et al., 1995) or in the Lexique database (New et al., 2001). We used text-to-speech synthesis to generate the stimulus sequences because this allows all acoustic parameters to be well-controlled, even across the two languages. For synthesis, we used MBROLA (Dutoit et al., 1996), in both a Bhatara et al.: Rhythmic grouping of speech
German voice (De5) and a French voice (Fr4) to control for unintended effects of the language of the voice used. These stimuli resembled artificial language streams used in similar studies (Bion et al., 2011; Saffran et al., 1996; Tyler and Cutler, 2009). Though the stimuli did not sound like natural speech, they sounded speech-like. The intensity and duration manipulations were performed by using PRAAT (Boersma and Weenink, 2010). The F0 contour of all syllables was flat at 200 Hz, a value chosen to be in the middle of the range of F0 for women’s spontaneous speech (Baken and Orlikoff, 2000, p. 176). The baseline intensity (mean intensity across the syllable, measured in PRAAT) was set at 70 dB and the baseline duration at 260 ms for each syllable, 100 ms for the consonant, and 160 ms for the vowel. These duration values were chosen based on values reported in previous studies examining stress cues in French and German (Friedrich et al., 2009; Nazzi et al., 2006). The four levels of intensity variation were 2, 4, 6, or 8 dB above baseline, and the four levels of duration variation were 50, 100, 150 or 200 ms above baseline. These duration values were larger than those from Hay and Diehl (2007), and the intensity values were smaller. We chose these values based on pilot testing. See Fig. 1 for a schematic illustration of the intensity variation applied to the stimuli. All intensity manipulations were applied to the entire syllable, whereas the duration manipulations were applied only to the vowel, given that vowel duration by itself is an important cue in both French and German (Dogil and Williams, 1999; Michelas and D’Imperio, 2010a) and one of the main predictors in an automated stress-accent labeling system for English (Greenberg et al., 2003). Contrary to Hay and Diehl (2007), there were no pauses between syllables, and all consecutive syllables were co-articulated. To prevent participants from grouping stimuli based on the first pair, the onsets of the stimuli were masked over the first 3 s by a combination of white noise, fading out according to a raised-cosine function and fading in of the stimulus with the intensity increasing also according to a raised-cosine function. As an additional
control, half of the sequences began with the strong syllable (longer or louder) and half began with the weak syllable. This control was put in place because Hay and Diehl (2007) reported a strong tendency to group the sequences based on the initial pair of sounds. MATLABV (R2007b, The MathWorks, Natick, MA) was used to create the white noise and PRAAT to combine it with the stimuli. R
Participants were seated in a quiet room, and the stimuli were presented at a comfortable listening level using PsyScope X (available at http://psy.ck.sissa.it/) on a MacBook laptop. In Potsdam, stimuli were presented through AKG K 55 headphones and in Paris through Sennheiser HD 558 headphones. Participants were instructed to listen carefully to each sequence and to report whether they heard the alternating stimuli as a strong sound followed by a weak sound or a weak sound followed by a strong sound. They were told that they did not have to wait until the end of a sequence to give their response but to respond as fast as possible. All of the stimuli were randomly presented within a single block. See the Appendix for exact instructions in French and German. Because of the lack of word-level stress in French, instructions given to participants as to what “weak-strong” and “strong-weak” meant differed between groups. The German group was given examples of trochaic and iambic words in German. The French group was given examples of trochaic and iambic words from Spanish as well as examples of contrastive stress in French: “You say, ‘J’aime le bateau’ (I like the boat) and your friend says’Le g^ ateau?’ (the cake). You respond, ‘Non, le BAteau’ placing the emphasis on the first syllable of the word.” (This was followed by an equivalent example of emphasis on the second syllable of the word.) The testing procedure began with four practice trials, two duration-varied sequences and two intensity-varied
FIG. 1. (Color online) Schematic illustration of the intensity manipulation performed on the stimuli.
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
Bhatara et al.: Rhythmic grouping of speech
FIG. 2. (Color online) Response buttons used during the experiments trochaic (a) and iambic (b).
sequences, both with the maximum variation (8 dB and 200 ms, respectively). Participants pressed one of two labeled buttons to indicate their choice (either a tall bar to the left of a short bar, symbolizing trochaic, or a short bar to the left of a tall bar, symbolizing iambic, see Fig. 2), and their responses were recorded.3 Over the course of the testing session, participants heard ten repetitions of each level of intensity or duration variation. Of these ten, five began with a strong syllable and five began with a weak syllable. Participants also heard ten repetitions of the control sequences. This resulted in a total of 90 stimuli. Left-right position of the response keys was counterbalanced between participants. After they heard 45 stimuli, participants were told they had finished half of the experiment and could take a small break if they wished. Most participants continued immediately with the second half. Half of the participants in each language group heard the stimuli in the French voice, and the other half heard the German voice. B. Results 1. Effects of condition
A repeated-measures analysis of variance (ANOVA) was run with the within-subject factor of condition (intensity, duration, or control) and the between-subjects factors of language and voice. Proportion of trochaic responses across the trials for each subject/condition combination was the dependent variable. Scores from the German and French proficiency tests were included as covariates but had no significant main effects or interactions, and an independent-samples t-test showed no difference between language groups on years of musical experience or number of instruments, so these factors were not included in subsequent analyses. Results showed that the main effects of condition, F(2, 152) ¼ 70.0, p < 0.001, and language, F(1, 76) ¼ 6.25, p < 0.05, were significant. The effect of voice approached significance, F(1,76) ¼ 3.91, p ¼ 0.052, with participants tending to give more trochaic responses to the German voice. Importantly, there was a significant interaction between condition and language F(2, 152) ¼ 13.5, p < 0.001, but no other interactions were significant. Results are illustrated in Fig. 3. Bonferroni-corrected (adjusted alpha ¼ 0.017) pairwise comparisons within language groups among the three conditions showed that, relative to the duration condition (M ¼ 31% trochaic responses), the German participants gave more trochaic J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
FIG. 3. Experiment 1: Proportion of trochaic responses by condition and language group.
responses in the intensity-varied condition, (M ¼ 75%), t(39) ¼ 10.3, p < 0.001, as well as in the control condition, (M ¼ 69%), t(39) ¼ 8.67, p < 0.001. However, the Germans’ responses did not differ between the intensity condition and the control condition, t(39) ¼ 1.80, p ¼ 0.08. Using the same Bonferroni correction, this analysis was also performed for the French group, whose members gave more trochaic responses in the intensity-varied condition (M ¼ 61%) than in both the control condition (M ¼ 54%), t(39) ¼ 2.59, p ¼ 0.013), and the duration-varied condition, (M ¼ 42%), t(39) ¼ 4.89, p < 0.001). The difference between the duration-varied condition and the control condition approached significance, t(39) ¼ 2.41, p ¼ 0.021. A priori hypothesis-based independent samples t-tests comparing the linguistic groups to each other on each condition showed that the German group gave more trochaic responses than the French group in the intensity condition, t(78) ¼ 4.08, p < 0.001 as well as in the control condition t(78) ¼ 3.02, p < 0.01, and more iambic responses than the French group in the duration condition t(78) ¼ 3.09, p < 0.01. One-sample t-tests within language groups showed that responses in the control condition were trochaic more often than chance for the Germans, t(39) ¼ 5.53, p < 0.001, while the French were at chance, t(39) ¼ 0.99, p ¼ 0.33. Intensity- and duration-varied conditions were significantly different from chance for both groups, all p’s < 0.01. 2. Effects of level of manipulation
Separate ANOVAs were run on each condition to examine the effect of the level of variation in intensity or duration (called here “manipulation level”). Language and voice were included as between-subjects factors. In the duration-varied condition [see Fig. 4(a)], manipulation level had a significant main effect, F(4, 304) ¼ 35.46, p < 0.001, and the main effect of language was marginal, F(1, 76) ¼ 3.98, p ¼ 0.050. There was also a significant interaction between manipulation level and language, F(4, 304) ¼ 10.34, p < 0.001. Again voice did not have a significant main effect, F(1, 76) ¼ 1.69, p ¼ 0.20, nor did it interact with any factors. To understand the interaction, we conducted Bonferroni-corrected (adjusted alpha ¼ 0.013) paired t-tests in each language group. Comparing the German responses to each of the durationvaried levels with their responses to the control sequences Bhatara et al.: Rhythmic grouping of speech
response: At higher levels of manipulation, both groups of participants responded “trochaic” more often in the intensity condition and less often in the duration condition. 2. Cross-linguistic differences
FIG. 4. Experiment 1: Proportion of trochaic responses by manipulation level and language group, for duration (a) and intensity (b).
showed that they gave a lower number of trochaic responses to all types of duration-varied sequences than to the control condition [50 ms: t(39) ¼ 4.48, p < 0.001; 100 ms: t(39) ¼ 7.89, p < 0.001; 150 ms: t(39) ¼ 8.32, p < 0.001; 200 ms: t(39) ¼ 8.74, p < 0.001]. For the French group, in contrast, only the responses to the 200 ms sequences were significantly different from the control sequences, t(39) ¼ 3.33, p < 0.01. In the intensity-varied condition [see Fig. 4(b)], manipulation level and language were both significant, F(4, 304) ¼ 7.99 p < 0.001 and F(1, 76) ¼ 17.79, p < 0.001, but the interaction between them was not, F(4, 304) < 1. Voice did not have a significant main effect, F(1, 76) ¼ 1.79, p ¼ 0.19, nor did it interact with any factors. Bonferroni-corrected (adjusted alpha ¼ 0.013) t-tests comparing the four individual levels to the control condition (M ¼ 62%) showed that the 4, 6, and 8 dB levels were judged significantly more often as trochaic than the control condition [4 dB: t(79) ¼ 3.05, p < 0.01; 6 dB: t(79) ¼ 2.71, p < 0.01; 8 dB: t(79) ¼ 3.91, p < 0.001] but the 2 dB level was not, t(79) ¼ 0, p ¼ 1.0). C. Discussion 1. Predictions of the ITL
Experiment 1 confirmed the predictions of the ITL. For the German as well as the French listeners, sequences with intensity variation were judged to be trochaic more often than chance, and sequences with duration variation were judged to be iambic more often than chance. Manipulation level in both intensity and duration affected the type of 3834
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
The results of experiment 1 show two main findings: (1) German listeners are more sensitive than French listeners to intensity and duration cues as used for grouping in the ITL and (2) German listeners show a bias toward grouping unvaried control sequences as trochees, whereas the French listeners show no bias. The first finding is that the German group gave more trochaic responses to intensity-varied sequences than the French group, and they also gave more iambic responses to duration-varied sequences. With increasing level of variation in the intensity condition, the proportion of trochaic responses increased at approximately the same rate between the two language groups. However, in the duration condition, analyses of responses at each individual level of manipulation showed that the French group needed a larger amount of variation than did the German group for their responses to become significantly different from baseline, i.e., the control sequences for that group. This crosslinguistic difference could be due to the magnitude of duration variation as a prosodic cue being typically smaller in German than in French (as discussed in Sec. I); this might make the German listeners relatively more sensitive to smaller duration variation. In Hay and Diehl (2007), there had been no cross-linguistic effect. However, the stimuli in the present study differ from theirs in several ways: Including increased variety of phonemes, no pauses between syllables, larger duration differences, and smaller intensity differences. The first two differences may have increased the complexity of the stimuli and difficulty of the task, forcing participants to rely more on processing strategies based on a lifetime of linguistic experience. This could then result in the French group showing less sensitivity than (but the same response pattern as) the German group. The second finding, of a trochaic bias in the German group, is demonstrated most clearly by the responses to the control condition. Although there is no variation in intensity or duration, the German group’s responses were trochaic significantly greater than 50% of the time (chance). But this trochaic bias was overridden when any duration information was present in the string: The German listeners gave significantly fewer trochaic responses to the sequences with the smallest level of duration variation, 50 ms, than to the control condition. These results suggest that the Germans’ perception could have been guided by their prosodic experience, which categorically offered them perceptions of either iambs or trochees with the trochee being the default (possibly because the majority of disyllabic words in German have initial stress, discussed in the preceding text). To further investigate the manifestation of the ITL in these two linguistic groups, we turn to the third stress cue: F0. To date, no studies have investigated cross-linguistic modulation of the ITL for F0 variation with linguistic stimuli (although see Kusumoto and Moreton, 1997 for a study with Bhatara et al.: Rhythmic grouping of speech
non-linguistic stimuli). Thus we performed an experiment identical to experiment 1 but substituting variation in F0 for intensity variation. III. EXPERIMENT 2 A. Method 1. Participants
Participants were 40 (36 female, 4 male) monolingual German listeners aged 18–43 yr (M ¼ 22) tested in Potsdam and 40 (27 female, 13 male) monolingual French listeners aged 17–45 yr (M ¼ 27) tested in Paris. As with experiment 1, they all completed the language questionnaire and proficiency tests in French or German, and their musical experience was recorded. 2. Stimuli
The stimulus sequences were the same as those in experiment 1 with the exception that F0 manipulations replaced the intensity manipulations. The four levels of F0 manipulation were 20, 30, 40, or 50 Hz above the 200 Hz baseline. Because the object of the F0 manipulation was to change the perception of pitch in the participants, these sequences will henceforth be referred to as “pitch-varied.” See Fig. 5 for a schematic illustration of the F0 manipulations. 3. Procedure
The procedure was the same as that of experiment 1. B. Results 1. Effects of condition
A repeated-measures ANOVA was run with the withinsubjects factor of condition (pitch, duration, or control), and the between-subjects factors of language and voice. Proportion of trochaic responses was the dependent variable.
Proficiency scores from the German and French C-tests were included as covariates but had no significant main effects or interactions, so they were excluded from subsequent analyses. An independent-samples t-test showed that the German group played on average a greater number of instruments than the French group, t(78) ¼ 3.52, p ¼ 0.001, so this was included as a covariate in all analyses in the following text. Results showed that the main effects of condition, F(2, 150) ¼ 6.85, p ¼ 0.001, and voice, F(1, 75) ¼ 12.15, p ¼ 0.001, were significant. The main effect of language was not significant, F(1,75) < 1. Importantly, there was a significant interaction between condition and language F(2, 150) ¼ 3.83, p ¼ 0.02, but no other interactions were significant (see Fig. 6). Like the marginal effect in experiment 1, the main effect of voice was due to the German voice being rated as more trochaic than the French voice. Bonferroni-corrected (adjusted alpha ¼ 0.017) pairwise comparisons among the three conditions showed that relative to the pitch condition (M ¼ 68% trochaic responses), the German participants gave fewer trochaic responses in the duration-varied condition (M ¼ 39%), t(39) ¼ 4.44, p < 0.001, as well as in the control condition (M ¼ 54%), t(39) ¼ 2.63, p ¼ 0.012. The Germans’ responses were marginally significantly different between the duration condition and the control condition, t(39) ¼ 2.47, p ¼ 0.018. Using the same Bonferroni correction, this analysis was also performed for the French group. Their ratings did not differ among any of the three conditions; the largest (though non-significant) difference was between the control (M ¼ 54%) and the duration-varied (M ¼ 46%) conditions, t(39) ¼ 1.83, p ¼ 0.08, and the pitch-varied condition (M ¼ 55%) did not differ from either of these (both p’s > 0.15). A priori hypothesis-based independent samples t-tests comparing the linguistic groups to each other on each condition showed that the German group gave more trochaic responses than the French group in the pitch-varied condition, t(78) ¼ 2.31, p < 0.05, but the groups did not differ in
FIG. 5. Schematic illustration of pitch manipulation performed on the stimuli.
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
Bhatara et al.: Rhythmic grouping of speech
FIG. 6. Experiment 2: Proportion of trochaic responses by condition and language group.
either the control condition t(78) ¼ 0.04, p ¼ 0.97, or the duration-varied condition, t(78) ¼ 1.53, p ¼ 0.13. One-sample t-tests showed that responses in the control condition were at chance for both groups, tGer(39) ¼ 0.79, p ¼ 0.43; tFr(39) ¼ 1.01, p ¼ 0.32. Pitch- and durationvaried conditions were significantly different from chance for the German group, all p’s < 0.01, but neither pitch nor duration was different from chance for the French group, all p’s > 0.24. 2. Effects of level of manipulation
Separate ANOVAs were run on each condition to examine the effect of the level of variation in pitch or duration (called here “manipulation level”). Language and voice were included as between-subjects factors, and number of instruments played was included as a covariate. In the duration-varied condition, manipulation level had a significant main effect, F(4, 300) ¼ 3.37, p ¼ 0.01, with ratings becoming more iambic with higher levels of duration variation. However, the main effects of language, F(1, 75) ¼ 2.82, p ¼ 0.1, and voice, F(1, 75) < 1, were not significant. There were no significant interactions among factors. Bonferroni-corrected (adjusted alpha ¼ 0.013) t-tests comparing the four individual levels to the control condition showed that the 100, 150, and 200 ms levels were judged significantly more often as iambic than the control condition [100 ms: t(79) ¼ 2.81, p < 0.01; 150 ms: t(79) ¼ 2.74, p < 0.01; 200 ms: t(79) ¼ 3.89, p < 0.001), but the 50 ms level was not, t(79) ¼ 1.60, p ¼ 0.11 (see Fig. 7). In the pitch-varied condition, the main effects of manipulation level, F(4, 300) ¼ 4.60 p ¼ 0.001, language, F(1, 75) ¼ 5.21, p < 0.05, and voice, F(1, 75) ¼ 12.2, p ¼ 0.001, were all significant. The interaction between manipulation level and language was significant, F(4, 300) ¼ 3.24, p < 0.05, as was the interaction among manipulation level, voice and language, F(4, 300) ¼ 2.54, p < 0.05. The interaction between voice and language was marginally significant, F(1, 75) ¼ 3.80, p ¼ 0.055. Because of the interaction among voice, language, and manipulation level, we performed separate ANOVAs with the factors of manipulation level and voice on each language group to clarify the results. The ANOVA on the responses from the German group showed a main effect of 3836
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
FIG. 7. Experiment 2. Duration condition, manipulation level across voices by language groups.
manipulation level, F(4, 148) ¼ 3.54, p < 0.01, but no main effect of voice, F(1, 37) ¼ 1.52, p ¼ 0.23 or interactions among the factors [see Fig. 8(a)]. Bonferroni-corrected (adjusted alpha ¼ 0.013) t-tests comparing the four individual levels to the control condition showed that the 30 Hz level was judged significantly more often as trochaic than the control condition, t(39) ¼ 2.63, p ¼ 0.012, whereas the other three levels were marginally significantly different from the control condition, 20 Hz: t(39) ¼ 2.40, p ¼ 0.021; 40 Hz: t(39) ¼ 2.53, p ¼ 0.016; 50 Hz: t(39) ¼ 2.49, p ¼ 0.017). The ANOVA on the responses from the French group showed a significant effect of manipulation level F(4, 148) ¼ 2.59, p < 0.05 and of voice, F(1, 37) ¼ 11.4, p < 0.01, but the interaction between the two was not significant, F(4, 148) ¼ 1.30, p ¼ 0.27 [see Fig. 8(b)]. Thus the French group gave differing numbers of trochaic responses across the
FIG. 8. Experiment 2: Pitch condition, manipulation level by voice for the German group (a) and the French group (b). Bhatara et al.: Rhythmic grouping of speech
different levels of pitch variation, and this was consistent for both voices (as evidenced by the lack of interaction between voice and manipulation level). However, Bonferronicorrected (adjusted alpha ¼ 0.013) t-tests comparing the four individual levels to the control condition showed that none of the levels was judged significantly more often as trochaic than the control condition (20 Hz: t(39) ¼ 0.05, p ¼ 0.96; 30 Hz: t(39) ¼ 0.28, p ¼ 0.78; 40 Hz: t(39) ¼ 0, p ¼ 1; 50 Hz: t(39) ¼ 1.01, p ¼ 0.32). One-sample t-tests comparing the overall ratings of the pitch condition to chance for each voice within the French group showed that they rated the pitch-varied sequences in the German voice as trochaic significantly more often than chance, M ¼ 69%, t(19) ¼ 4.08, p < 0.01, but this was not true for the French voice, M ¼ 41%, t(19) ¼ 1.48, p ¼ 0.15. Thus the French group rated the German voice as more trochaic than the French voice. 3. Comparisons across experiments 1 and 2: Duration-varied sequences
A univariate ANOVA was run comparing the number of trochaic responses to the duration-varied sequences in experiments 1 and 2 with the between-subjects factors of language, voice, and experiment. Experiment, F(1, 152) ¼ 4.42, p < 0.05, and language, F(1, 152) ¼ 9.37, p < 0.01 had significant main effects. Voice did not have a significant main effect, F(1, 152) < 1. There were no interactions among factors. As already suggested by the separate statistical analyses for experiments 1 and 2, this overall analysis confirms that the number of trochaic responses in the duration condition was lower in experiment 1 (M ¼ 36%) than in experiment 2 (M ¼ 43%) for both groups, and in both experiments, the German listeners gave fewer trochaic answers (M ¼ 35%) to
the duration-varied sequences than the French listeners (M ¼ 44%) (Table I). The difference between the experiments in the duration condition is discussed further in Sec. IV. C. Discussion 1. Effects of the ITL
In experiment 2, the duration-varied sequences were consistently grouped as iambic by the German group but not by the French group. This is in contrast with experiment 1 in which both groups showed the same pattern of iambic responses for the duration-varied sequences (see more on this finding in the following text). Though the original formulation of the ITL makes no predictions about pitch (but see Nespor et al., 2008, for a proposal of the ITL including pitch), experiment 2 showed that pitch can function similarly to intensity, at least for the German group. The German participants grouped the pitchvaried sequences trochaically and the duration-varied sequences iambically. This extends to German the findings by Bion et al. (2011) using a different method than the present study, but showing that Italian listeners (another language with word-level stress) group trochaically by pitch. The French group, on the other hand, grouped the pitchvaried sequences as trochaic only if the stimuli were presented in the German voice. If the stimuli were in the French voice, they tended to group them more iambically, although notably, for both voices, their responses varied according to the amount of pitch variation. These results, with the strong effect of voice in the French group, suggest that grouping this type of sequence by pitch was more affected by linguistic experience and context than was either intensity or duration. One interpretation of the French participants’ results is that because a typical pattern in French is to have a rise in
TABLE I. Summary of main findings for each experiment. Main effects Experiment 1
Duration only Intensity only Experiment 2
Overall Duration only Pitch only
French group (pitch condition) German group (pitch condition) Comparison Experiments 1 and 2
Language (Ger > Fr) Conditionb Voicec (Ger > Fr) Languageb (Fr > Ger) Duration manip levelb Languagec (Ger > Fr) Intensity manip levelb Conditionb Voice** (GerV > FrV) Duration manip levela Pitch manip levelb
Manip levela Voiceb (GerV > FrV) Manip levelb Experimenta Languageb (Ger > Fr)
Interactions Language conditionb
Language duration manip levelb None Condition languagea None Pitch manip level languagea Pitch manip level language voicea Voice languagec None None None
p < 0.05. p < 0.01. c Marginal. b
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
Bhatara et al.: Rhythmic grouping of speech
F0 at the end of utterance-internal phrases (Jun and Fougeron, 2002; Michelas and D’Imperio, 2010b), perhaps the French participants were more inclined to group these continuous sequences in the present study as low-high rather than high-low. However, the ITL (as modified by Nespor et al., 2008) as a general principle would predict high-low rather than low-high grouping. The conflict between French phrasal pitch contours and the ITL may be a major factor in the inconsistency demonstrated by the French participants. The same problem may not apply to the German group because pitch and intensity tend to co-occur to mark stress. 2. Cross-linguistic differences
Experiment 2 showed that the German listeners used both pitch and duration cues more consistently when grouping. Relative to the French group, the German group gave more iambic responses to the duration-varied sequences and more trochaic responses to the pitch-varied sequences. Even more striking is the fact that the French did not show consistent grouping by either pitch or duration. This is crucially different from the result in experiment 1, where the French displayed grouping preferences that are in accordance with the ITL, even if these were relatively weak. Furthermore, the German group did not show the trochaic bias for the control sequences that they had shown in experiment 1; their responses to the control condition in experiment 2 were no different from chance, and they did not differ from those of the French group. With respect to the pitch sequences, the French behaved like the German group when hearing the German voice; their responses to the pitch condition were trochaic significantly more often than chance. However, their responses were completely different when hearing the French voice, tending to be iambic rather than trochaic. Their ratings for the German voice as well as the significant effect of manipulation level show that they were not insensitive to the pitch variation (consistent with the findings of Welby, 2007 on segmentation by French speakers); rather, it is the interpretation of this variation that varies between groups and voices. This suggests that the interpretation of pitch variation in speech is highly dependent on both linguistic experience and on type of stimulus; even small variations in a stimulus, such as those variations between our German and French voice stimuli, can change the way pitch-varied sequences are categorized. It is possible that the French group was applying French prosodic grouping schemas to the French voice but not to the German voice, although we were unable to find evidence that participants were aware of the language of the voice (discussed in more detail in the following text in Sec. IV B). IV. GENERAL DISCUSSION
Our studies set out to test whether prosodic features of the phonological system of the native language have an impact on the manifestation of the ITL in adults’ grouping of speech stimuli. To this end, we tested French and German adults on their use of duration, intensity, and F0 in the grouping of strings of syllables. Our hypothesis was that language 3838
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
experience would affect grouping by these three cues, and this hypothesis was supported. Experiment 1 followed the hypothesis that with material that is more complex than that in Hay and Diehl’s (2007) study, participants would have to use automatic processing procedures affected by language-specific prosodic experience, which would increase the likelihood of observing cross-linguistic differences. With this material, both language groups gave more trochaic responses in the intensityvaried condition and more iambic answers in the durationvaried condition than expected by chance. Importantly, as predicted, some clear differences across the groups emerged in experiment 1: Relative to the French listeners, the German listeners gave more trochaic answers in the intensity condition and fewer trochaic answers in the duration condition. A second cross-linguistic difference was found with the German listeners showing a trochaic bias in the control condition, which had neither intensity nor duration variation. Experiment 2 tested the role of another cue related to stress: Pitch, replacing intensity. Here, the Germans performed similarly to experiment 1 with more trochaic responses in the pitch condition and fewer trochaic responses in the duration condition. In contrast, French listeners did not show an overall effect of either pitch or duration manipulation on their judgments of the strings. This complex pattern of findings gives rise to a number of questions that will be discussed in the following sections. We consider each of the different acoustic cues in turn; first, intensity tested in experiment 1; next, pitch, present only in experiment 2, and finally, duration, present in both experiments. A. Intensity
In experiment 1, both language groups judged stimuli in the intensity condition as being trochaic in line with the predictions of the ITL. This adds to previous results showing consistent grouping by intensity of both speech-like and non-speech stimuli across language groups (French, English, and Japanese; Hay and Diehl, 2007; Iversen et al., 2008). In addition, we show a cross-linguistic difference in consistency of grouping based on intensity cues; both groups showed the same pattern of judgment for the intensity-varied sequences, but the German group judged them as trochaic significantly more often than did the French group. This suggests that performance is not solely guided by acoustic properties of the stimuli (because the same stimuli were used for both groups), but that it is influenced by the native language of the participants. In addition, the German participants showed a trochaic bias for the control sequences in experiment 1 (testing duration/intensity) but not in experiment 2 (testing duration/pitch). As discussed in the preceding text, intensity is the weakest of the three cues for prosodic stress in both languages. Hayes (1985) himself said that the trochaic effect of intensity was somewhat weaker than the iambic effect of duration, and he gave an alternative to the ITL discussed in the preceding text (see also Hyde, 2011): Grouping will be iambic if segmentation in a language is sensitive to the position of heavy syllables; if it is not, grouping will be trochaic. This Bhatara et al.: Rhythmic grouping of speech
suggests that trochaic grouping may be the default; indeed, English-speaking participants reported mainly trochaic grouping in Bolton’s (1894) seminal study of subjective rhythmization of invariant click sequences. This aligns with our results to suggest that for the German listeners, the trochaic pattern is the default in experiment 1. An alternative explanation is that this trochaic bias found for the German listeners may reflect the dominance of the trochaic word pattern in the German lexicon. This would also explain why the French participants do not show this bias. In that respect, it resembles the trochaic listening preference found for German-learning but not French-learning infants at the age of 6 months (H€ ohle et al., 2009). B. Pitch
Responses to sequences varying in pitch show large differences between the two language groups. Whereas the Germans show a clear pattern of grouping the pitchvaried sequences as trochaic, the French group does not consistently group the pitch sequences, in the overall analysis at least. However, closer inspection of the results from the pitch condition reveal more complexity; although their ratings of the pitch condition were not different from chance or from the control condition, the French group did show an effect of manipulation level with responses varying with pitch level variation. In addition, they grouped the sequences spoken by the German voice as significantly more trochaic than chance, whereas the sequences spoken by the French voice were rated as no different from chance. Adding to the findings by Bion et al. (2011) for Italian listeners, the German listeners’ results provide the first cross-linguistic evidence that pitch provides a relevant cue for rhythmic grouping in a way similar to intensity. However, the French results do not and are thus consistent with previous results with English and Japanese speakers suggesting that pitch is not as good of a cue as intensity for grouping in this particular task (Kusumoto and Moreton, 1997; Woodrow, 1911). Perhaps this pattern is related to the role of F0 in these three languages; Italian and German both demonstrate tonal alignment with stressed segmental information (Niemann et al., 2011), whereas French does not, at least for phrase-internal tones (Welby, 2006). It is possible that this relative freedom of alignment within French phrases contributed to their inconsistent groupings. Also, as mentioned in the preceding text, the general pitch pattern in French is low-high for utterance-internal phrases (Jun and Fougeron, 2002; Spinelli et al., 2010; Welby, 2003, 2006), so if French listeners are trying to fit this typical low-high pattern onto the stimuli in the study and at the same time reconcile this conflict with the ITL that would predict high-low patterns, their inconsistent grouping by pitch is not so surprising. In addition, as discussed in the preceding text, French listeners may use particular F0 contours to segment speech (e.g., Rolland and Lœvenbruck, 2002). If they are used to segmenting speech based on F0 patterns, but the patterns in the stimuli do not clearly delineate the beginning or the end of a phrase, they may J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
respond in an inconsistent manner as was observed in this study. However, this does not explain why the French listeners were affected by the voice of the stimulus (which could be due to idiosyncratic properties of the voices themselves or to the fact that one voice was German and the other French). To ascertain whether participants were aware of the language of the voice, we asked the last 10 participants in each group (half hearing the German voice and half hearing the French voice) what they thought the native language of the speaker was (free response). None guessed correctly and several refused to guess. Note, however, that this lack of conscious awareness does not rule out the possibility that the French participants could have unconsciously recognized a speaker of their native language and applied French prosodic rules to this speaker more than to the other. Regardless of the underlying cause, these data suggest that cross-linguistically, pitch might be a less consistent cue to grouping than intensity. This cross-linguistic difference could have arisen because the pitch contours in the experimental stimuli were more like those in one of the two languages. If they were more like one language, that would have given the listeners of that language a processing “advantage” or at least might have made them more likely to consistently apply a language-specific strategy to the grouping. However, pitch contours like those of the stimuli with a peak in the middle of a syllable, are not more typical of either French or German, as in both languages, peaks in F0 typically occur near the end of the vowel (Grabe, 1998; Welby and Lœvenbruck, 2006). It is important to note that even though the French listeners’ grouping by pitch was less consistent than that of the German group, this does not imply they are insensitive to pitch cues. Indeed, as discussed in Sec. I, numerous studies have shown that they are sensitive to pitch cues in speech for tasks such as segmentation (e.g., Bagou et al., 2002; Spinelli et al., 2010). However, in this particular task, their grouping of syllables based on these cues was not consistent. C. Duration
With respect to duration, in both experiments, there was a clear difference in performance between intensity/pitchand duration-varied sequences; this supports the predictions of the ITL and confirms previous results on French, English, and Italian adults (Bion et al., 2011; Hay and Diehl, 2007; Iversen et al., 2008). However, there were also clear differences between the two experiments in grouping biases of duration-varied sequences. The duration manipulation led to an iambic grouping for the German group in both experiments, but for the French group only in experiment 1. For both groups, the consistency of this grouping was decreased in experiment 2 relative to experiment 1. In both experiments, the German group showed consistent grouping with smaller amounts of duration variation. Recall that, as discussed in Sec. I, there is typically less duration variation in phrasal prosody in German than there is in French. This may be why the German group was more consistent with smaller duration variation—they have more Bhatara et al.: Rhythmic grouping of speech
experience perceiving and interpreting smaller amounts of variation. Following this, note that (a) both groups were affected by the context change between the experiments, showing decreased iambic responses in experiment 2 and (b) the French listeners were less sensitive than the Germans in both experiments. The result of this was that the French group’s responses in experiment 2 no longer differed significantly from the control sequences. These two effects may be due to the difference between languages in duration variability, which is typically larger in French than in German. This could explain why the German group exhibited more sensitivity than the French group to smaller variations in duration. Continuing on the topic of the decrease in iambic responses from experiment 1 to experiment 2, recall that the duration and control sequences used in experiment 2 were the same as in experiment 1 and that there were no procedural differences between the two experiments. Because the only factor that could account for performance differences between the duration-varied sequences in experiments 1 and 2 is the context in which they were presented (i.e., the intensity- versus F0-varied sequences), this suggests that intensity and pitch are differently perceived and perhaps differently salient for both language groups, causing a change in the relative salience of the duration variation. Indeed, as outlined in the preceding text, it seems that intensity is the least important of the three acoustic cues discussed here in the prosodic systems of French and German, and variation in F0 is important for phrasing and intonation in both languages, although it serves different purposes in the two. In line with this, we propose that variation in pitch is more salient than variation in intensity for both groups, and this could explain the loss of the trochaic bias for the German group from experiment 1 to experiment 2. If, in the context of pitchvaried sequences, the missing pitch contrast in the control condition stimuli is more obvious to the listener than the missing intensity contrast was in experiment 1, the control sequences in experiment 2 would be less likely to be judged as trochaic. In turn, this increased salience of pitch might have caused a relative decrease in salience for the duration variations, leading to the decrease in iambic responses from experiment 1 to 2 for both groups. D. Interactions of cues
The discussion in the preceding text leads to the related question of why the German listeners treat pitch and intensity variation in the same way, whereas the French do not. It could be due to alignment of phrasal pitch accents with stressed syllables, as discussed in the preceding text. Alternatively, Nespor and colleagues (2008) showed that in German, pitch and intensity are higher for a syllable that has phrasal stress, whereas in French pitch and intensity are dissociated. Thus in German, pitch is often associated with intensity, leading to a high correlation of these acoustic cues in the speech signal. This may imply that pitch functions in the same way as intensity for the German listeners due to an association learned from experience with their native language, but no such association exists for French listeners. This hypothesis would be interesting to investigate with 3840
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
further experiments testing very young infants, who do not yet have much experience with their native language, on their sensitivity to pitch and intensity as a grouping cue. Furthermore, comparing the role of intensity and pitch in grouping non-speech sequences may further help to disentangle the relation of these two cues with respect to the ITL. One explanation for this could be that because intensity is the least important of the three cues in each language, grouping by intensity would be the least modulated by linguistic experience, and this hypothesis is supported by crosslinguistic studies done thus far: Cross-linguistic differences have been shown for grouping by duration (Iversen et al., 2008; Kusumoto and Moreton, 1997) and pitch (Kusumoto and Moreton, 1997), but the groups did not differ in intensity grouping. One should also keep in mind that the voice used for creating the experimental stimuli had effects on the outcome with a higher degree of trochaic responses for the German voice compared to the French voice, even though we were careful to keep the acoustic cues equivalent between the two voices (for example, by only using tense vowels for both languages because vowels can be either tense or lax in German but only tense in French). However, the syllables in each voice were necessarily phonetically different (because MBROLA captures some language-specific properties). This difference in responses occurred most strongly for the French group and was especially evident in the pitchvaried condition. This effect was found even though our listeners were not (consciously) able to identify the language of the voice to which they had been exposed. However, it is not unreasonable to postulate that they could have unconsciously been able to tell a “native” voice from a “foreign” voice. Although we have no clear explanation for these results, and although they should be treated cautiously as we used only one “synthesized” voice in each language, they highlight the need for careful consideration when creating the stimulus materials for cross-linguistic studies like ours (see Tsuji et al., 2012 for similar subtle effects of voices on adult perception across languages). Finally, one should consider the possibility that our results were influenced by the differences in instruction between the two groups. The French, but not German, participants were given contrastive stress examples in addition to lexical stress contrasts. While we could have given similar instructions to the German group, we had chosen instead to give instructions that were simpler and more relevant to their everyday experience of lexical stress. Could these differences in instruction (contrastive/emphatic vs lexical stress examples) explain the cross-linguistic effects we found? This is unlikely for several reasons. First, both types of stress use the three acoustic cues in similar ways: F0, intensity, and duration are increased on the emphasized syllable in French (e.g., Astesano et al., 2002) although F0 may be less consistently used in contrastive stress in German (Schneider and M€obius, 2006). According to Vaissie`re and Michaud (2006), for French listeners, the closest equivalent to the frequent strongly stressed syllables in English (and by extension, German) is emphatic stress. Therefore because the French listeners had minimal experience with languages containing Bhatara et al.: Rhythmic grouping of speech
word-level stress, we chose to use contrastive stress or emphatic accent in their instructions; this has been shown by corpus analyses to be frequently used in everyday speech (Dahan and Bernard, 1996). Additionally, among the three main accents in French (phrase-final, phrase-initial, and emphatic; although cf. Jun and Fougeron, 2000), emphatic accent is the only one with a pragmatic rather than rhythmic function (Dahan and Bernard, 1996). Final evidence against an explanation of our cross-linguistic differences based on instruction differences comes from two subsequent studies (both papers being in preparation) in which we examined the role of the instructions in the results of this task. In one study, we tested late bilingual (French-German) participants, and half of the group received one version of the instructions while the other received the other version. We found no effect of instruction. Similarly, in a study using sequences of non-linguistic sounds, we gave French and German speakers the same instructions (translated) and found cross-linguistic differences similar to the present study. Going back to the main question of the present study, we aimed to show whether clear cross-linguistic differences between the German and French language groups could be obtained by presenting participants with more complex stimuli than in previous work that did not find cross-linguistic differences (Hay and Diehl, 2007). The results provided clear evidence both for the ITL in both language groups and for its modulation across languages, with the Germans showing increased consistency in their grouping based on the three cues than the French in both experiments 1 and 2 as well as increased sensitivity to these cues, needing smaller acoustic variation than the French to group sequences as either iambic or trochaic. The differences between the two language groups thus appear to be due mostly to degree of sensitivity rather than general or categorical perceptual differences with the possible exception of the pitch results. These findings are in line with the findings of Dupoux and colleagues (Dupoux et al., 1997; Dupoux et al., 2001), showing that French listeners are not stress deaf in a strict sense but that they are more restricted in processing the parameters of stress than are listeners of German, who have greater experience with word stress, in particular when the task involves processing of multiple cues simultaneously, as occurs in speech.
content words. In addition, it appears that more complex stimuli than those used by Hay and Diehl (2007) and Iversen et al. (2008) are necessary to bring out cross-linguistic differences between these two linguistic groups. Our results are in line with observations that the processing of stress information is weakened in French listeners compared to listeners of a language with word-level stress (Dupoux et al., 1997; Dupoux et al., 2001) and extends this finding from the processing of isolated words to the processing of continuous syllable sequences. The generalizability of these cross-linguistic effects remains to be explored; in particular it is as yet unknown whether listeners of languages with fixed word-level stress (such as Polish) would perform more like the German group (because of experience with word-level stress) or more like the French group (because with unchanging stress patterns, the use of stress in fixedstress languages is primarily demarcative, aiding listeners to find word and phrase boundaries rather than conveying meaning as in German). Other remaining questions are whether the present cross-linguistic French/German effects could be found with simple or complex non-linguistic stimuli and whether the cross-linguistic English/Japanese effects found by Iversen et al. (2008) would be the same for speech stimuli. ACKNOWLEDGMENTS
We thank Thomas Eckes for supplying us with the German C-test texts, and Karl-Heinz Eggensperger, the coordinator of the C-test project from the foreign languages department at the University of Potsdam, for providing us with the French text as well as assistance with the software. The German text was developed by the TestDaF-Institut, and the French version was developed by the C-testArbeitsgruppe, a group of researchers from foreign language departments at various German universities. We also thank Alexis Dimitriadis for a script with which we generated the stimulus sequences and Tom Fritzsche for PRAAT scripts used in stimulus manipulation. Thanks to Leo-Lyuki Nishibayashi, Nayeli Gonzalez-Gomez, Alexandra Schmitterer, and Lisa Borkenhagen for help with recruiting and testing some of the participants. This work was supported by the Agence Nationale de la Recherche - Deutsche Forschungsgemeinschaft Grant No. 09-FASHS-018 and HO 1960/14-1 to T.N. and B.H.
To summarize, our experiments provide the first piece of evidence for the existence of differences in the effects of the ITL between listeners of languages that differ in word-level prosody but not word order. Iversen et al. (2008) proposed that the cross-linguistic differences they found between English and Japanese listeners in grouping non-speech stimuli were due to differences between the function word—content word order in these two languages; given this interpretation, we can claim that our results fundamentally differ from those of Iversen et al. as, in the present studies, we found crosslinguistic differences between listeners of two languages that do not differ with respect to their basic word order; German as well as French typically place function words before J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
APPENDIX 1. Instructions in French (Translated to English)
In some languages, stress falls on the first or second syllable of the word, and this can change the meaning of the word. An example of this comes from Spanish: The words PApa and paPA have different meanings. PApa means potato, while paPA means father. Here’s another example of stress conveying meaning: If you say “J’aime le bateau,” [I like the boat] and your friend asks, “le g^ateau?” [the cake?], you would say “Non, le BAteau” [No, the boat] and stress the first part of the word. Similarly, if you said “J’aime le Bhatara et al.: Rhythmic grouping of speech
bateau” and your friend said “le barreau?” [the bar?], you would say “Non, le baTEAU!” and stress the second syllable of the word. In this experiment, you will listen to sequences of syllables. In these sequences, two consecutive syllables are either stressed on the first or on the second syllable. Your task will be to decide whether the stress falls on the first syllable (as in PApa) or on the second syllable (as in paPA). If the stress is on the first syllable, press [see Fig. 2(a)]. If the stress is on the second syllable, press [see Fig. 2(b)]. Please try to be as fast and precise as possible. You can press the response buttons before the end of each sequence. First, you will do a short training with four examples of the sounds you will hear during the experiment. If you have questions, do not hesitate to ask the experimenter. At the beginning of each sequence, there will be noise; do not be surprised. The experiment lasts approximately 15 min. There will be a short break after the first half of the experiment, during which you may relax a little. Good luck! 2. Instructions in German (translated to English)
German words can either be accented on the first syllable, as in KIRche, MAma, AUto. Or they can be accented on the second syllable, as in BalKON, ProBAND or StuDENT. In this experiment, you will listen to sequences of syllables. In these sequences, two consecutive syllables are either stressed on the first or on the second syllable. Your task will be to decide whether the stress falls on the first syllable (as in KIRche, MAma, AUto) or on the second syllable (as in BalKON, ProBAND, StuDENT). If the stress is on the first syllable, press [see Fig. 2(a)]. If the stress is on the second syllable, press [see Fig. 2(b)]. Please try to be as fast and precise as possible. You can press the response buttons before the end of each sequence. First, you will do a short training with four examples of the sounds you will hear during the experiment. If you have questions, do not hesitate to ask the experimenter. At the beginning of each sequence, there will be noise; do not be surprised. The experiment lasts approximately 15 min. There will be a short break after the first half of the experiment, during which you may relax a little. Good luck! 1
Some authors (e.g., Dupoux et al., 2001) claim that stress is final at the word level. However, this confusion may arise because a word spoken in isolation constitutes a phrase and would thus have stress on its final syllable (Fox, 2000; p. 94). 2 One consideration in designing our stimuli is that the consonants and vowels must exist in both languages and be discriminable from each other for both language groups. This greatly reduced the available selection. In the end, we chose four consonants and four vowels (see Sec. II) that were, although not equivalent for the two languages, still discriminable from the other consonants in the stimuli other by both groups. The consonant /b/, for example, when spoken in the German voice may sound like a /p/ to the French ears. This is why we did not include /p/ as one of the consonants. Note that the majority of monosyllables in French are words and that the 3842
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
consonant /l/ forms a definite article in French when followed by certain vowels. We do not believe this impacted our study results, but researchers should consider this when designing future studies. 3 Reaction times were also recorded, but because they were not informative over and above the response data, RT data are not included in this paper.
Astesano, C., Bard, E. G., and Turk, A. E. (2002). “Functions of the French initial accent: A preliminary study,” in Proceedings of the Speech Prosody Conference, Aix-en-Provence, France, pp. 139–142. Atterer, M., and Ladd, D. R. (2004). “On the phonetics and phonology of ‘segmental anchoring’ of F0: Evidence from German,” J. Phonetics 32(2), 177–197. Baayen, R. H., Piepenbrock, R., and Gulikers, L. (1995). The CELEX Lexical Database (CD-ROM) (Linguistic Data Consortium, University of Pennsylvania, Philadelphia). Bagou, O., Fougeron, C., and Frauenfelder, U. H. (2002). “Contribution of prosody to the segmentation and storage of ‘words’ in the acquisition of a new mini-language,” in Proceedings of the Speech Prosody Conference 2002, Aix-en-Provence, France, pp. 159–162. Baken, R. J., and Orlikoff, R. F. (2000). Clinical Measurement of Speech and Voice (Singular Pub Group, San Diego, CA), p. 610. Banel, M. H., and Bacri, N. (1994). “On metrical patterns and lexical parsing in French,” Speech Commun. 15(1-2), 115–126. Banse, R., and Scherer, K. R. (1996). “Acoustic profiles in vocal emotion expression,” J. Pers. Soc. Psychol. 70(3), 614–636. Best, C. T., McRoberts, G. W., and Sithole, N. M. (1988). “Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants,” J. Exp. Psychol. Hum. Percept. Perform. 14, 345–360. Bijeljac-Babic, R., Serres, J., Hohle, B., and Nazzi, T. (2012). “Effect of bilingualism on lexical stress pattern discrimination in French-learning infants,” PLoS ONE 7(2), e30843. Bion, R. A. H., Benavides-Varela, S., and Nespor, M. (2011). “Acoustic markers of prominence influence infants’ and adults’ segmentation of speech sequences,” Lang. Speech 54, 123–140. Boersma, P., and Weenink, D. (2010). “PRAAT: Doing phonetics by computer (version 5.2.05) [computer program],” http://www.praat.org (Last viewed October 10, 2010). Bolton, T. L. (1894). “Rhythm,” Am. J. Psychol. 6(2), 145–238. Cutler, A., and Mehler, J. (1993). “The periodicity bias,” J. Phonetics 21, 103–108. Dahan, D., and Bernard, J. M. (1996). “Interspeaker variability in emphatic accent production in French,” Lang. Speech 39(4), 341–374. Delattre, P. (1938). “L’accent final en franc¸ais: Accent d’intensite, accent de hauteur, accent de duree” (“The final accent in French: Intensity accent, pitch accent, duration accent”), French Rev. 12(2), 141–145. Delattre, P. (1966). “A comparison of syllable length conditioning among languages,” IRAL 4, 183–198. Dogil, G., and Williams, B. (1999). “The phonetic manifestation of word stress,” in Word Prosodic Systems in the Languages of Europe, edited by H. van der Hulsta (Mouton de Gruyter, Berlin), pp. 273–334. Dupoux, E., Pallier, C., Sebastian-Galles, N., and Mehler, J. (1997). “A destressing ‘deafness’ in French,” J. Mem. Lang. 36, 406–421. Dupoux, E., Peperkamp, S., and Sebastian-Galles, N. (2001). “A robust method to study stress ‘deafness,’ ” J. Acoust. Soc. Am. 110(3), 1606–1618. Dutoit, T., Pagel, V., Pierret, N., Bataille, F., and Van Der Vreken, O. (1996). “The MBROLA Project: Towards a set of high-quality speech synthesizers free of use for non-commercial purposes,” in Proceedings of the International Conference on Spoken Language Processing 3, Philadelphia, PA, pp. 1393–1396. Fery, C. (1998). “German word stress in optimality theory,” J. Comp. Ger. Linguist. 2, 101–142. Fery, C., H€ ornig, R., and Pahaut, S. (2011). “Correlates of phrasing in French and German from an experiment with semi-spontaneous speech,” in Intonational Phrasing in Romance and Germanic: Cross-Linguistic and Bilingual Studies, edited by C. Gabriel and C. Lle o (John Benjamins Publishing, Amsterdam), Vol. 10, pp. 11–42. Fery, C., and K€ ugler, F. (2008). “Pitch accent scaling on given, new and focused constituents in German,” J. Phonetics 36, 680–703. Fox, A. (2000). Prosodic Features and Prosodic Structure. The Phonology of Suprasegmentals (Oxford University Press, New York), p. 94. Bhatara et al.: Rhythmic grouping of speech
Friederici, A. D., Friedrich, M., and Christophe, A. (2007). “Brain responses in 4-month-old infants are already language specific,” Curr. Biol. 17, 1208–1211. Friedrich, M., Herold, B., and Friederici, A. D. (2009). “ERP correlates of processing native and non-native language word stress in infants with different language outcomes,” Cortex 45(5), 662–676. Goedemans, R., and van der Hulst, H. (2011a). “Rhythm types,” in The World Atlas of Language Structures Online, edited by M. S. Dryer and M. Haspelmath (Max Planck Digital Library, Munich), feature 17A, available online at http://wals.info.feature/17A (Last viewed August 23, 2011). Goedemans, R., and van der Hulst, H. (2011b). “Fixed stress locations,” in The World Atlas of Language Structures Online, edited by M. S. Dryer and M. Haspelmath (Max Planck Digital Library, Munich), Chap. 14, available online at http://wals.info/chapter/14 (Last viewed August 23, 2011). Grabe, E. (1998). “Pitch accent realization in English and German,” J. Phonetics 26(2), 129–143. Greenberg, S., Carvey, H., Hitchcock, L., and Chang, S. (2003). “Temporal properties of spontaneous speech—a syllable-centric perspective,” J. Phonetics 31(3-4), 465–485. Hay, J. S. F., and Diehl, R. L. (2007). “Perception of rhythmic grouping: Testing the iambic/trochaic law,” Percept. Psychophys. 69, 113–122. Hayes, B. (1985). “Iambic and trochaic rhythm in stress rules,” in Proceedings of the Annual Meeting of the Berkeley Linguistics Society, pp. 429–446. Hayes, B. (1995). Metrical Stress Theory: Principles and Case Studies (University of Chicago Press, Chicago), pp. 80–81. H€ohle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., and Nazzi, T. (2009). “Language specific prosodic preferences during the first half year of life: Evidence from German and French infants,” Infant Behav. Dev. 32, 262–274. Hyde, B. (2011). “The Iambic–Trochaic Law,” in The Blackwell Companion to Phonology: Suprasegmental and Prosodic Phonology, edited by M. van Oostendorp, C. J. Ewen, E. Hume, and K. Rice (Blackwell, Oxford, UK), Vol. II, pp. 1052–1077. Iversen, J. R., Patel, A. D., and Ohgushi, K. (2008). “Perception of rhythmic grouping depends on auditory experience,” J. Acoust. Soc. Am. 124, 2263–2271. Jessen, M. (1999). “Word stress in West-Germanic languages: German,” in Word Prosodic Systems in the Languages of Europe, edited by H. van der Hulst (Mouton de Gruyter, Berlin), pp. 515–545. Jessen, M., Marasek, K., Schneider, K., and Claßen, K. (1995). “Acoustic correlates of word stress and the tense/lax opposition in the vowel system of German,” Proc. Int. Congr. Phonetic Sci. 13(4), 428–431. Johnson, E., and Seidl, A. (2008). “Clause segmentation by 6-month-old infants: A crosslinguistic perspective,” Infancy 18, 440–455. Jun, S. A., and Fougeron, C. (2000). “A phonological model of French intonation,” in Intonation: Analysis, Modeling and Technology, edited by A. Botinis (Kluwer Academic, Dordrecht), pp. 209–242. Jun, S. A., and Fougeron, C. (2002). “Realizations of accentual phrase in French intonation,” Probus 14(1), 147–172. Jusczyk, P. W., Houston, D. M., and Newsome, M. (1999). “The beginning of word segmentation in English-learning infants,” Cognit. Psychol. 39, 159–207. Kolinsky, R., Cuvelier, H., Goetry, V., Peretz, I., and Morais, J. (2009). “Music training facilitates lexical stress processing,” Music Percept. 26, 235–246. Kusumoto, K., and Moreton, E. (1997). “Native language determines the parsing of nonlinguistic rhythmic stimuli,” J. Acoust. Soc. Am. 102, 3204–3204. Marian, V., Blumenfeld, H. K., and Kaushanskaya, M. (2007). “The language experience and proficiency questionnaire (LEAP-Q): Assessing language profiles in bilinguals and multilinguals,” J. Speech Lang. Hear. Res. 50, 940–967. Michelas, A., and D’Imperio, M. (2010a). “Durational cues and prosodic phrasing in French: Evidence for the intermediate phrase,” in Proceedings of the Speech Prosody Conference, 100881, pp. 1–4. Michelas, A., and D’Imperio, M. (2010b). “Accentual phrase boundaries and lexical access in French,” in Proceedings of the Speech Prosody Conference, 100882, pp. 1–4. Nazzi, T., Iakimova, G., Bertoncini, J., Fredonie, S., and Alcantara, C. (2006). “Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences,” J. Mem. Lang. 54(3), 283–299.
J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013
Nespor, M., Shukla, M., van de Vijver, R., Avesani, C., Schraudolf, H., and Donati, C. (2008). “Different phrasal prominence realization in VO and OV languages,” Lingue Linguaggio, 7(2), 1–28. New, B., Pallier, C. Ferrand L., and Matos, R. (2001). “Une base de donnees lexicales du franc¸ais contemporain sur internet: Lexique” (“An online contemporary French lexical database: Lexique”) (version 3.55), L’Annee Psychol. 101, pp. 447–462. http://www.lexique.org (Last viewed November 11, 2010). Niemann, H., M€ ucke, D., Nam, H., Goldstein, L., and Grice, M. (2011). “Tones as gestures: The case of Italian and German,” Proceedings of the International Congress of Phonetic Sciences, Hong Kong, pp. 1486–1489. Rietveld, A. (1980). “Word boundaries in the French language,” Lang. Speech 23(3), 289–296. Rolland, G., and Lœvenbruck, H. (2002). “Characteristics of the accentual phrase in French: An acoustic, articulatory and perceptual study,” in Proceedings of the Speech Prosody Conference 2002, Aix-en-Provence, France, pp. 611–614. Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996). “Statistical learning by 8-month-old infants,” Science 274(5294), 1926–1928. Schneider, K., and M€ obius, B. (2006). “Production of word stress in German: Children and adults,” in Proceedings of Speech Prosody 2006, Dresden, Germany, pp. 333–336. Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., SebastianGalles, N., Limissuri, R. A., and Peperkamp, S. (2009). “Language-specific stress perception by 9-month-old French and Spanish infants,” Dev. Sci. 12, 914–919. Spinelli, E., Grimault, N., Meunier, F., and Welby, P. (2010). “An intonational cue to word segmentation in phonemically identical sequences,” Atten. Percept. Psychol. 72(3), 775–787. Tsuji, S., Gonzalez-Gomez, N., Medina, V., Nazzi, T., and Mazuka, R. (2012). “The labial-coronal effect revisited: Japanese adults say pata, but hear tapa,” Cognition 125(3), 413–428. Tyler, M. D., and Cutler, A. (2009). “Cross-language differences in cue use for speech segmentation,” J. Acoust. Soc. Am. 126(1), 367–376. Vaissie`re, J. (1974). “On French prosody,” Quarterly Progress Report (Research Laboratory of Electronics, Massachusetts Institute of Technology), Vol. 114, pp. 212–223. Vaissie`re, J., and Michaud, A. (2006). “Prosodic constituents in French: A data-driven approach,” in Prosody and Syntax, edited by I. F onagy, Y. Kawaguchi, and T. Moriguchi (John Benjamins Publishing, Amsterdam), pp. 47–64. Venditti, J., Jun, S. A., and Beckman, M. E. (1996). “Prosodic cues to syntactic and other linguistic structures in Japanese, Korean, and English,” in Signal to Syntax: Bootstrapping From Speech To Grammar in Early Acquisition, edited by J. L. Morgan and K. Demuth (Erlbaum Associates, Mahwah, NJ), pp. 287–311. Vos, P. (1977). “Temporal duration factors in the perception of auditory rhythmic patterns,” Sci. Aesth./Sci. Art 1, 183–199. Welby, P. S. (2003). “The Slaying of Lady Mondegreen, being a study of French tonal association and alignment and their role in speech segmentation,” Ph.D. dissertation, http://etd.ohiolink.edu/view.cgi?acc_ num=osu1074614793 (Last viewed February 6, 2013). Welby, P. (2006). “French intonational structure: Evidence from tonal alignment,” J. Phonetics 34(3), 343–371. Welby, P. (2007). “The role of early fundamental frequency rises and elbows in French word segmentation,” Speech Commun. 49(1), 28–48. Welby, P., and Lœvenbruck, H. (2006). “Anchored down in Anchorage: Syllable structure and segmental anchoring in French,” Ital. J. Linguist. 18, 74–124. Werker, J. F., and Tees, R. C. (1984). “Cross-language speech perception: Evidence for perceptual reorganization during the first year of life,” Infant Behav. Dev. 7, 49–63. Wiese, R. (1996). The Phonology of German (Clarendon, Oxford), p. 282. Woodrow, H. (1909). “A quantitative study of rhythm: The effect of variations in intensity, rate, and duration,” Arch. Psychol. 14, 1–66. Woodrow, H. (1911). “The role of pitch in rhythm,” Psychol. Rev. 18(1), 54–77. Woodrow, H. (1951). “Time perception,” in Handbook of Experimental Psychology, edited by S. Stevens (Wiley and Sons, Oxford), pp. 1224–1236. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., and Werker, J. F. (2010). “The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study,” Cognition 115, 356–361.
Bhatara et al.: Rhythmic grouping of speech
Copyright of Journal of the Acoustical Society of America is the property of American Institute of Physics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.