Relation Robert
between
voice-onset
time
and vowel
duration
F. Port
Departmentof Linguistics,Indiana University,Bloomington,Indiana 47405
[:losemarie
Rotunno
Departmentof Speechand Hearing Sciences,City Universityof New York, New York, New York 10036 (Received26 July 1978; revised3 April 1979) As part of an investigationof the temporal implementationrules of English, measurements were made of voice-onsettime for initial English stopsand the duration of the following voiced vowel in monosyllabic words for New York City speakers.It was found that the VOT of a word-initial consonantwas longer beforea voicelessfinal clusterthan beforea singlenasal,and longer beforetensevowelsthan lax vowels. The vowels were also longer in environmentswhere VOT was longer, but VOT did not maintain a constantratio with the vowel duration, even for a singleplace of articulation.VOT was changedby a smallerproportionthan the followingvoicedvowel in both cases.VOT changesassociated with the vowel were consistentacrossplace of articulationof the stop. In the final experiment,when vowel tensity and final consonanteffectswere combined,it was found that the proportionof vowel durationchangethat carried over to the precedingVOT is different for the two phoneticchanges.These resultsimply that temporalimplementationrules simultaneously influenceseveralacousticintervalsincludingboth VOT and the "inherent" interval correspondingto a segment,either by independentcontrol of the relevant articulatory variablesor by someunknown commonmechanism. PACS numbers: 43.70.Gr, 43.70.Bk, 43.70.Ve
INTRODUCTION
A central problem in phonetics is to discover how the phonologicalelements from which words are spelled (phonemesand features) are manifestedin the physical medium of sound.
The mismatch between the digitized
segmentsof linguistic analysis and the finely graded timing of articulatory and acoustic events constitutes
vowels in English. Context effects on VOT will provide an opportunity to investigate several theoretical issues regarding the nature of the hypothesizedtemporal implementation rules operating in speech production in English.
Voice-onset time (VOT) is the temporal interval from the release of an initial stop to the onset of glottal pul-
investigators treat the complexity of speech timing in roughly the same way coarticulation is often treated--
sing (if acousticrecords are examined)or to the closure of the glottis for a followingvowel (if articulatory records of speech are examined) (Lisker and Abramson, 1964, 1967; Abramson, 1976). It is knownto play
as a kind of noise added to the speech signal by the production device during performance. It is sometimes
a major role in distinguishing initial voiced and voiceless (or lax and tense) sto•psin English as well as in a
proposed that physiological and mechanical properties
number of other languages. In English the voiceless stops /p, t, k/have positive VOT (with voicing lagging after stop release) greater than around 30 ms. The term aspiration is used to describe the auditory effect of the glottal turbulence accompanyingthis voicing lag. The voiced stops /b, d, g/have either negative VOT
one of the most troublesome aspects of this problem
(Lisker, 1974b;Lehiste, 1970; Klatt, 1976). Some
of the articulators provide the main sources of temporal effects (Chomsky and Halle, 1968; Stevens and House, 1972), however, many of the specific hypotheses concerning such mechanical effects have not survived rig-
orous testing (Lisker and Abramson, 1971; Hirose and Gay, 1972; MacNeilage, 1972; Lisker, 1974a; Raphael, 1975). These results lend support to the notion that the level of phonetic recoding which converts phono-
logical input into instructionsto the articulators (Lib-
(with voicing beginningbefore stop release) or very short positive VOT (Lisker and Abramson, 1964, 1967; Zlatin, 1974). Perceptual studies employing a multidimensional synthetic analog of the VOT continuum have validated the phonological relevance of this art-
erman, Cooper, Shankweiler, and $tuddert-Kennedy, 1967; Fant, 1969; Ladefoged, 1972) must also include a set of temporal implementation rules that specify the durations of phonetic intervals according to various contextual features (Klatt, 1976). Although there may
thoughVOT is clearly a highly important parameter distinguishingvoiced from voiceless stops in English,
be other ways of conceptualizingthese effects (Fowler,
the absolute value of VOT depends not just on the voic-
1977), Klatt's notion of temporal implementation rules provides a useful framework for the formulation of issues regarding the manifestation of phonological units in the temporal structure of speech. In this paper we shall report some properties of the implementation rules necessary to describe the temporaJ variable of voice-onset time (VOT) for aspirated word-initial stops
ing feature of the correspondingstop, but is also highly occurs (Lisker and Abramson, 1967; Klatt, 1975). Althoughthe main contrast in VOT between voiced and voiceless stops could be said to be due to the phonology directly, the context effects require either temporal implementation rules for their description, or a
as a function
carefully developed mechanical explanation.
654
of factors
known to affect the duration
J. Acoust. Soc.Am.66(3),Sept.1979
of
iculatory parameter (Lisker and Abramson, 1970;
Zlatin, 1974;SummerfieldandHaggard,1976). Al-
sensitive
to various
features
of the context in which it
0001-4966/79/090654-09500.80 ¸ 1979Acoustical Society of America
654
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.236.27.111 On: Thu, 18 Dec 2014 02:50:57
Assuming that the phonology of a language yields a matrix of segmental features as output, the phonetic temporal implementation rules would convert these into descriptions of the durations of various prominent
temporal intervals (such as consonant constriction durations, vowel durations, and VOT) as observed in speech production.
Although there have been several
attempts to formalize such _asystem of rules (Lindblom and Rapp, 1973; Nooteboom, 1972), we shall build primarily on the model of Klatt (1976)--to be described in more detail below--since it is the most appropriate for description of the effects of segmental features on timing. There are a number of issues regarding the form of such rules. Presumably the input to the rules is in the form of segments or segmental features as well as certain prosodic properties such as stress and word boundaries. More problematic is the domain of application of the rules. For example, are there rules that partially specify the phonetic duration of a
"segment itself" in terms of its feature composition, or instead are "inherent durations" always given in advance with rules applying only to intervals in neighboring context (Klatt, 1976)? Another question regarding rule domains is whether they affect only a single interval at a time, or if a number of temporal intervals may be independently specified for durational modifications. In the latter case, for example, a single rule might modify not only the duration of a vowel, but also the duration of adjacent consonant constrictions or VOT. It has also been proposed that there may be a hierarchy of rule domains nested within each other (Lindblom and Rapp, 1973). Another issue is the arithmetic form of the temporal changes themselves. There have been data to support rules that involve addition or subtraction of a constant (Chen, 1970), multiplication by a constant (Klatt, 1973, 1976) and the use of power functions
(Nooteboom, 1972; Lindblom and Rapp, 1973). Finally, there are questions to be discussed later about the combinationof
several rules acting on a single inter-
val.
The following experiments tion
affect
that
most
contexts
the duration
have a similar
which
of vowels
tion effects onset
have been
observed
also been
found
time.
pected effects on vowel duration (Peterson and Lehiste, 1960; Chen, 1970) would also be observed for the VOT in a preceding voiceless
less stops /p, t, k/are
stop.
Since the three voice-
known to have different mean
VOTs, however, we also want to know if the same temporal relation applies to all three stops.
A methodological problem arises here, as elsewhere in phonetic research, since to collect data we must make measurements of physical events without knowing
what kind of physical events (for example, whether articulatory or acoustic) are most appropriate to the language. We measured acoustically defined intervals and propose models to describe the results.
Direct
measurement of articulatorily defined intervals (e.g., duration of glottal abduction or duration of open oral cavity) might conceivabl•y yield results that imply a rather
different
I. EFFECTS EXPERIMENT A.
formal
model.
OF POSTVOCALIC I
CONSONANTS:
Methods
A symmetrical set of six monosyllabic test words was prepared beginning with the three voiceless stops
/p, t, k/ and ending'with either/n/or
/pt/ (in order to
maximize the effects of postvoealic el duration). The set consisted of
consonants on vow-
pin pipped
to
tin tipped
to
kin
effect on the duration of VOT for a pre-
ceding aspirated stop. For example, both vowels (Fry,
the rules must have separate
If a simple relation, such as constant ratio, holds between the VOT of initial voiceless stops and the duration of the voiced portion of the following vowel, we should predict that any factor that affects the duration of a vowel would also affect preceding VOT by the same proportion--even segmental phonological features. In our first experiment, the voicing and clustering of the postvocalic consonants were varied to see if their ex-
derive from the observahave
or whether
terms to account for segmental context effects on voice-
kipped
in stressed syllables than in unstressed syllables, and
A pseudorandom list was prepared containing five occurrences of each word along with ten other words to distract attention from the test items. Eight speakers
longer in one syllable words than in two syllable words
were
(Klatt, 1973; Lisker and Abramson, 1967). Increasing speaking tempo shortens both vowels (Peterson and Lehiste, 1960; Port, 1976) and VOT (Summerfield,
loudness.
1955) and VOT (Lisker and Abramson, 1967) are longer
1975). There is even indication that aspiration is longer in utterance final syllables than in nonutterance final
words (Summerfield, 1975), comparable to utterance final vowel lengthening (Oller, 1973; Lehiste, 1972). Weismer (1979) measured VOT of initial voiceless stops in CVCs before both tense and lax vowels followed by both voiced and voiceless final stops. He found a significant though small effect of both vowel tensity and final consonant voicing on VOT. One question to be explored in this paper is whether these VOT effects can be derived in some way directly from the vowel dura655
J. Acoust.Soc.Am., Vol. 66, No. 3, September1979
asked
to read
the list
at a comfortable
rate
Five of the subjects were life-long
and
residents
of New York City and students at Brooklyn College; four of this group were female. ers
were
male
students
from
The other three speak-
the Midwest
or South.
The recordings were made in a quiet booth on a TEAC 1 (model A-3340S) tape deck at 77 ips after the subject was well practiced at reading the list. Wide-band spec-
trograms were made of each utterance on a Kay (model 6061B) sound spectrograph. A time scale was prepared for measurement
from a wide-band
gram of a calibration tone.
spectro-
Voice-onset time (VOT)
was measured to the nearest 10 ms from the beginning of the release burst on the initial stop to the first vis-
ible striation representingglottal pulsing.• The duraR.F. Port and R. Rotunno: VOT and vowel duration
655
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.236.27.111 On: Thu, 18 Dec 2014 02:50:57
tion of the following vowel was measured to the nearest 10 ms from the onset of glottal pulsing to the closure for the following consonant as evidenced by the abrupt
cessation of energy in the lower formants.
initial
Although
I00
in general it is perhaps most appropriate to define vow-
c
"
8O
-
•
60
A plot of VaT against vowel duration for the six words in Fig. 1 demonstrates the relation between the two. If VaT
and vowel
duration
maintain
a constant
ratio
across the three stops then all six points should array themselves along a single line radiating from the origin of the graph. If instead each initial stop has its own characteristic ratio, then the pairs of words beginning with the same stop should lie along such lines. Figure I displays dashed lines representing constant VaT/ vowel duration ratios of 1, •,• and • From the figure it can be seen first that the vowel duration is greatly affected by the following consonants, since vowels with
following/pt/were 60% of the duration of those with following/n/. This was expected from previous research (Peterson and Lehiste, 1960; Chen, 1970). It can also be seen that although the place of articulation of the initial stop affects VaT, place does not affect the duration of the following vowel in these data. Thus, no overall constant ratio is obtained. 2
We must hypothesize, then, a rule that modifies the inherently different VaTs of the three stops. Arithmetically such a rule might add 15 ms to the VaT before -p! (or equivalently subtract from the VaT of
the -n words), or it might increase (or decrease) VaT TABLE
I.
Mean voice-onset
time and vowel duration pooled
across subjects for the six test words in experiment 1 with standard
deviations. Voice
onset
time
-zn m
p-t-k-
656
SD
64 (11) 73 (13) 90 (11)
Vowel
-zpt m
SD
47 (10) 61 (13) 74 (13)
-zpt SD
167 170 169
(21) (27) (23)
A
/n /
&
/•/kzn/
2/
A./
_ •/ /kzpt///•n/ _ //
//
•
/tzpt/.,•/
•n
•
/ //
//
/
5o
4o
p
/ •0
// 80
90
Vowel
•00
•0
•0
Duration
•0
•
in
•50
ms
•60
for
•0
•80
•90
•00
/x/
FIG. 1. Mean voice-onset time plotted against mean vowel duration for the six test words in experiment 1. Dashed lines
represent VOT/vowel duration ratios of 1, «, and «.
by 20%. In an attempt to choose between the additive and multiplicative models, a coefficient of correlation was computed for predicted and observed values of the
individual speaker means for the three stops (24 pairs of points) separately for the two models. There was no difference between them (for additive model, r = 0.893, for multiplicative model, r= 0.888). The more interesting of the two models, however, is the multiplicative one since one might try to derive the mult-
iplicative constant for the VaT effect (k = 80%) from the multiplicative constant (Klatt, 1976; Peterson and Lehiste, 1960; Port, 1976) for the vowel duration effect (/r = 60%). This experiment replicates Weismer's (1979) result by showing that even segmental properties, such as voicing, manner and clustering that are known to affect preceding vowel duration also affect the duration of the quasivocalic interval of VOT for a preceding aspirated stop. These results further indicate that the proportion by which VOT is affected is considerably smaller than the amount by which the voiced vowel is affected.
If implementation rules are required to account for the effects of postvocalic consonants on VaT preceding a vowel, will phonological features of the vowel itself have effects on preceding VaT? The next experiment explored this question.
duration
-•n m
/k/
•-'ø• /
/
its familiar effect on VaT since /p/ < /t/ < /k/. These data replicate Weismer (1979) and indicate that some
stops.
70
0
are 17 ms, 12 ms, and 16ms, respectively, amounting to an average shortening of VaT of 20%. The table also shows that the place of articulation of the stop has
segmental phonological features that affect vowel duration will also affect the duration of aspiration on a preceding voiceless stop. The results further show that the shortening effect of the final consonants has roughly the same absolute change for the three initial aspirated
O
consonants
/
•ø•/
words ending with the voiceless cluster /pt/than in words ending with/n/ (p