Acoustic correlates of perceived sexual identity in
preadolescent children's voices Suzanne
Bennett
Universityof Maryland, CollegePark, Maryland 20742
Bernd Weinberg Purdue University,WestLafayette, Indiana 47906 (Received3 October 1978; acceptedfor publication 7 June 1979)
This project was undertakento provide information about the sexual characteristicsof preadolescent children'svoices.In one seriesof experiments,perceptualjudgmentsof sexualidentity were obtainedin responseto 73 children's productionsof isolated whisperedand normally phonated vowels, normally spoken sentences,and sentencesspoken in a monotonous fashion (Bennett and Weinberg, 1978). The purposeof this portion of the project was to describecertain acousticand temporal characteristicsof these children'sspeechsamples,and to assessthe relationshipof these variablesto perceptualjudgmentsof sexual identity. Sexual differencesin the frequencylocation of vocal tract resonanceswere significantly correlated with listenerjudgmentsof child sex in all four utterance conditions.The origin of the observed differencesin vocaltract resonance characteristics is discussed with referenceto possiblesexualdifferences in vocal tract size as well as certain articulatory behaviors. Average fundamental frequency was significantlyrelated to listeners'sex identificationsin two utteranceconditions.However, the influenceof this variable was considerablylesspronouncedwhen comparedto vocal tract information.Although certain measuresof fundamentalfrequencyvariability (mean duration of level inflectionsand the rate of frequency changeassociatedwith upward shifts) were significantlyrelated to perceptualmeasuresof sexualidentity, these cues were also interpretedto play a secondaryrole in defining malenessand femalenessin these children's
voices.
PACS numbers: 43.70.Gr, 43.70.Dn
I. METHOD
INTRODUCTION
A. Subjects and recording procedures Until recently,
the extent of our knowledge regarding
sexual characteristics
of children's
voices consisted
of
speculative comments and opinions. Curry (1940) sug-
gestedthat the voices of boys and girls are highly similar prior to the onset of pubescence. Moses (1954), on the other hand, believed that sexual differences
children's voices emerge early in life.
in
Within the last
few years, several investigators have shown that sexual characteristics are indeed perceptually prominant in
the voices of many prepubertal speakers (Weinberg and Bennett, 1971; Marshall, 1972; Ingrisano and Thompson, 1975; Sachs et al., 1973; Sachs, 1975). However, the acoustic attributes which guide listener judgments of child sex have yet to be fully specified.
This project was designed to provide information about the acoustic correlates of perceived sexual identity in the voices of preadolescent boys and girls. In one series of experiments, perceptual judgments of sexual
identity were obtained in response to 73 children's productions of isolated whispered and normally phonated vowels, normally spoken sentences, and sentences spoken in a monotonous fashion. The results which emerged from the four perceptual experiments have been reviewed in a recent report (Bennett and Weinberg, 1979). The aims of the present investigative series were to provide a description of certain acoustic and
•he subjectswere 73 preadolescent boysandgirls between the ages of six years,
one month and seven
years, ten months. Speakers were kingergarten and first grade children in an elementary school in North Central Indiana. All subjects were free of vocal quality deviations and demonstrated error-free productions of the experimental utterances. School health records were used to verify that all children had passed a hearing screening test. Speech recordings were made in a quiet room at an elementary school using a Nagra IV-D tape recorder
and a Sony microphone (ECM-50).
A mouth-to-micro-
phone distance of 15 cm was maintained for the phonated vowel and sentence productions, while the distance was 7.5 cm for the whispered vowels.
B. Speech stimuli 1. /so/ated
vowels
Tape recordings were obtained of each child producing
sustained versions (approximately ls) of the vowel/a•/in a normally phonated mode and a whispered mode. Children produced the vowels using a comfortable level of vocal effort. They were not asked to instrumentally monitor their productions.
Only phonemically representative/a•/
vowels were
temporal characteristics of these children's speech
used in this investigation.
samples and to assess the relationships of these variables to listener judgments of sexual identity.
quencies measured for these speakers were representa-
989
J. Acoust.$oc. Am. 66 (4), Oct. 1979
to ensure
that variations
0001-4966/79/100989-12500.80
This was necessary in order in/a•/
vowel resonance
¸ 1979 AcousticalSocietyof America
fre-
989
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 08:25:19
rive
of sexual
differences
and not a reflection
of varia-
tions in the phonetalc identity of the vowel. Vowel stimuli were considered representative when four of the
five experienced listeners (age range 32-42 yr) indicated that a given sample was an acceptable/•e/vowel (Bennett and Weinberg, 1979). Of the 146 samples evaluated, 114 met the established criteria. Thus, 57 phonated and 57 whispered vowels produced by 26 girls and 31 boys were available for acoustic and perceptual study.
Tape recordings were also made of each child pro-
ducing a nine-syllable declarative sentence (JACKIE AND DAD GRABBED THE BLACK RABBIT) in a normal and a monotonous fashion. A monotonous production was operationally defined as a purposeful attempt to minimize the variations in fundamental frequency throughout the course of the utterance. By contrast, a normal production represented a situation in which the subject was free to vary voice fundamental frequency. Procedures used to elicit these two utterance types are described in a previous report dealing with the per-
,ceptual prominence of sexual characteristics in these children's speech samples (Bennett and Weinberg, 1979). It was necessary to determine that listeners could categorize the two experimental sentences according to the mode of production intended by the subject. Briefly, the two declarative sentences of each child were randomly arranged on a master type and presented
to five listeners (age range 25-35 yr) experienced in judging speech samples and familiar
with the voices of
children.
whether
Their
task
was
tence was representative fashion
or
to indicate
of an utterance
one uttered
each
sen-
spoken in a
in a monotonous
fashion
(Bennett and Weinberg, 1979). Sentences were considered acceptable when four of the five listeners (80%) agreed on the mode of production intended by the subjects. Of the 126 sentences presented to the listeners for evaluation, 106 met the criterion established for acceptability. Hence, 106 normally spoken and monotonous sentences produced by 30 girls and 23 boys were available for study. Since monotonous sentence productions were chosen' to eliminate major changes in fundamental frequency
(fo) associatedwith the implementation of intonation contours, comparisons of fo standard deviations in the two experimental
sentences were of considerable
in-
terest. Variations in fundamental frequency observed for the normally intoned sentences ranged from 0.89 to
2.73 semitones. By contrast, allfo standard deviations observed
in the monotone
sentences
were
less
than
one
semitone (Bennett and Weinberg, 1979). These data showed that the monotonous productions of the children
were characterized by a substantial reduction in fo variability and confirmed the expected perceptual impression of monotonicity.
It was also nedessary to verify that important physical properties other than fo variability did not undergo substantial
change between the two modes of sentence
production. That is, if properties other than fo variability changed significantly during the production of 990
intonation patterns.
Comparisons of the averagefo,
speaking rate, and/m/vowel characteristics
child's
measured
sentences
formant frequency
in the two versions
revealed
of each
that none of these variables
differed significantly as a function of the mode of sen-
tence production (Bennett and Weinberg, 1979). C. Perceptual judgments of sexual identity
2. Sentence stimuli
normal
monotone sentences, potential differences in listeners' perceptual judgments obtained for the two sentence types could not be attributed solely to an alteration in
J. Acoust. Soc. Am., Vol. 66, No. 4, October 1979
Tape-recorded samples obtained from these children were arranged on four master tapes and presented to listeners. The four listening tapes consisted of ran-
domly ordered samples of (a) isolated phonated/m/ vowels, (b) isolated whispered/•/vowels, (c) normally spoken sentences, and (d) sentences produced with minimal variations in fundamental frequency. A total of 116 young adult females ranging in age from 18-23 yr served as listeners; that is, there were 29 different listeners in each perceptual experiment. Judges were instructed
to listen
to each
utterance
and indicate
whether
the speaker producing the sample was a boy or a girl.
D. Acousticanalyses
i
I. Vowel formant frequency analyses Vowel formant frequency measurements were obtained directly from broadband (300-and 450-Hz bandwidth) spectrograms (Voiceprint, Model 700) prepared from these children's isolated vowel and sentence recordings. Measurements of the frequency of the lower two vowel formant frequencies were derived from comparatively steady-state portions of the /m/ in the two connected utterances. Resonance data for the /•e/ in the word AND
in each
sentence
were
not obtained
because
of the
close proximity of the nasal. Average F1 and F2 values representing the mean frequency of the lower two resonances averaged across the five remaining/•e/ vowels were calculated. All spectrograms were measured withoutknowledge of the sex of the speaker. The zero-frequency line was adjusted such that it was approximately i mm from the bottom edge of each spectrogram.
Approximately one-third (n = 78) of the spectrograms were measured by B. Weinberg (BW) to obtain an estimate of the interjudge reliability of vowel formant frequency measurements. The 78 spectrograms were chosen randomly from the total sample. BW made his measurements without benefit of the marks placed on the spectrograms, i.e., measurement included a redetermination of the midpoints of the various formant bands. The average extent of measurement error for the whispered vowels was 45.38 Hz forF1 and 53.47 Hz for F2, while for the phonated vowels the values were 52.62 and
54.08Hz, respectively. Withrespec•tothenormallyproduced sentences, the errorwas 43.07 Hz for F1 and 48.62 Hz forF2. The measu.rement error forF1 in the monotone sentences was 44.53 Hz, while for F2 it was 50.29 Hz.
2. Fundamentalfrequencyanalyses Fundamental frequency measures were obtained by hand measuring waveform data. Speech recordings were played by a Nagra IV-D tape recorder
into one
S. Bennett and B. Weinberg: Determinants of sexual identity
990
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 08:25:19
channel of a direct writing oscillographic recorder
talker sex andselected acousticproperties. Separate
(HoneywellVisecorder, model 1508) operating at.a
analyses were completed for each of the four utterance types spokenby these children. In each analysis the de-
transport speed of 500 mm/s. A 100-Hzcalibrationsignal was placed between every ten speech samples to provide a means for assessing speed variations in the writeout system.
Several measures of fundamental frequency variability were extracted from these speakers' normal sentence
productions: (a) the number of upward and downward inflections and shifts, (b) the extent (in semitones) of upward and downward inflections and shifts, (c) the number and duration (in ms) of level inflections, (d) the number of zero shifts, and (e) the rate of change (in semitones/s) associated with upward and downward inflections and shifts. Inflections were defined as any rising or falling frequency modulation that occurred
during an uninterrupted interval of phonation. A shift was defined as a changein fundamentalfrequency that occurred between the termination of one phonation and the initiation of a subsequentphonation. Adjacent segments of the oscillographic record which evidenced no
change in fo during continuousvoicing were defined as level inflections.
A zero shift referred
to instances
in
pendentvariable wasthe listeners' perceptualresponses, expressed as the number of listeners who responded male to each stimulus. The independent variables were the various acoustic and temporal variables. These statis-
tical analyses provided a basis for estimating the individual and combined predictive power of the various acoustic variables, the magnitude of the simple correlations between each of the acoustic variables
and lis-
tener perception, and the intercorrelations among the various physical measures. II.
RESULTS
A.
Isolated
vowel stimuli
One hypothesis under test in this project was that sexual
differences
in the vocal
tract
resonance
charac-
teristics of preadolescent children provide major cues about sexual identity when judgments are based on isolated vowels. The results of the perceptual experiments (Bennett and Weinberg, 1979) indicated that the overall
which there was no changein fundamentalfrequency between the termination of one voicing episode and the
ren's phonated and whispered vowels (65% and 66%)
initiation of a subsequent episode.
were not significantly
Averaging procedures were used to obtain fundamental
frequency measures. Briefly, the oscillographic recordings of each subject were divided into segments of about 50 ms (25 mm) in duration, and the average period of completed cycles within each segment was calculated. Absolute measurement error was operationally defined as 0.5 mm. Namely, it was assumed that the average degree of error in measuringtemporal intervals of approximately50 ms could reasonablybe expectednot to exceed 1 mm (+ 0.5 ram). Hence, any modulationin frequency which exceededthe changeattributable to measurementerror (approximately10 Hz) was categorized
as an inflection
or a shift.
Similar
criteria were used to identify level inflections and zero shifts. For example, if during an episode of continuous
voicingthe difference in averagefo calculatedfor adjacent segmentswas less than or equal to that attributable to measurement error, then those segmentswere defined as level
inflections.
rates
identification
observed
different.
for
these
child-
This observation led to
the conclusion that transfer function information (e.g. vowel formant frequency values) provided the primary cues about sexual identity in both vowel conditions and that average fundamental frequency characteristics did not provide critically relevant sexual information.
1. Whispered vowel stimuli The simple correlations between [isteners' perceptual responses and the lower two resonances of these
chitdren's whispered vowels are shown in Table I. Both F1 and F2 were significantly (p = 0.01) corretated with listener judgments of child sex. The negative relationships between these variables indicated that a lowering of F1 and F2 values was generally associated with an increase in the number of male responses. It may also be seen that F1 andF2 were significantly related to one another.
Results of the stepwise regression analysis provided further
The rate of fundamental frequency changeassociated
of correct
evidence to show that these children's
vocal
with inflections and shifts reflected the rapidity with
tract resonance characteristics exerted a significant influence on listener judgments (Table iI). Differences
which f o was modulated per unit of time. These mea-
in the frequency location of F1 and F2 accounted for
sures were derived by dividing the extent of the modulation in semitones by its duration in seconds. Since
approximately 61% of the total variance in listeners' perceptual responses. Differences in the frequency of
the duration offo modulationsoften encompassseveral averagingintervals, it was assumedthat the majority of the fo change occurred in the time interval between
TABLE
the midpointof the first segmentand the midpoint of the final segment. The same temporal segmentingproce-
measures and listeners' perceptual judgments for the whis-
I.
Intercorrelations
and zero
resonance
pered vowels (n = 56).
dures were used to measure the duration of level inflections
between vocal tract
Number
shifts.
of
male responses
E. Statistical analyses
F1
--0.70 a
F2
--0.71 a
F1
0.61
Multiple stepwiseregression analyseswere used to assess the relationships between listener judgments of
•p= 0.01.
991
S. Bennettand B. Weinberg: Determinantsof sexualidentity
J. Acoust.Soc.Am., Vol. 66, No. 4, October1979
991
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 08:25:19
TABLE II. Stepwise multipleregressionsummarytablefor thewhisperedvowels(n=56).
Regression
Variable
step
F value
entered
! 2
F2 F1
Significance R
44.175 12.497
• 2450 Hz.
sured for children in the three perceptual groups were clearly nonoverlapping helps clarify why this variable was significantly related to listener judgments.
F1 and F2 values obtained for these children's phonated vowels are illustrated
TABLE IV.
Several
children
were
perceptually ambiguous in spite of the fact that their formant frequency values fell in an area of nonoverlap, while others whose sex was correctly identified evidenced values that were more like those of the opposite
sex. In these and other cases, average fundamental
Stepwise multiple regression summary table for the phonated vowels (n= 56).
Regression
R2
Variable
F value
entered
to enter
I
F2
31.865