Acoustic correlates of perceived sexual identity in

preadolescent children's voices Suzanne

Bennett

Universityof Maryland, CollegePark, Maryland 20742

Bernd Weinberg Purdue University,WestLafayette, Indiana 47906 (Received3 October 1978; acceptedfor publication 7 June 1979)

This project was undertakento provide information about the sexual characteristicsof preadolescent children'svoices.In one seriesof experiments,perceptualjudgmentsof sexualidentity were obtainedin responseto 73 children's productionsof isolated whisperedand normally phonated vowels, normally spoken sentences,and sentencesspoken in a monotonous fashion (Bennett and Weinberg, 1978). The purposeof this portion of the project was to describecertain acousticand temporal characteristicsof these children'sspeechsamples,and to assessthe relationshipof these variablesto perceptualjudgmentsof sexual identity. Sexual differencesin the frequencylocation of vocal tract resonanceswere significantly correlated with listenerjudgmentsof child sex in all four utterance conditions.The origin of the observed differencesin vocaltract resonance characteristics is discussed with referenceto possiblesexualdifferences in vocal tract size as well as certain articulatory behaviors. Average fundamental frequency was significantlyrelated to listeners'sex identificationsin two utteranceconditions.However, the influenceof this variable was considerablylesspronouncedwhen comparedto vocal tract information.Although certain measuresof fundamentalfrequencyvariability (mean duration of level inflectionsand the rate of frequency changeassociatedwith upward shifts) were significantlyrelated to perceptualmeasuresof sexualidentity, these cues were also interpretedto play a secondaryrole in defining malenessand femalenessin these children's

voices.

PACS numbers: 43.70.Gr, 43.70.Dn

I. METHOD

INTRODUCTION

A. Subjects and recording procedures Until recently,

the extent of our knowledge regarding

sexual characteristics

of children's

voices consisted

of

speculative comments and opinions. Curry (1940) sug-

gestedthat the voices of boys and girls are highly similar prior to the onset of pubescence. Moses (1954), on the other hand, believed that sexual differences

children's voices emerge early in life.

in

Within the last

few years, several investigators have shown that sexual characteristics are indeed perceptually prominant in

the voices of many prepubertal speakers (Weinberg and Bennett, 1971; Marshall, 1972; Ingrisano and Thompson, 1975; Sachs et al., 1973; Sachs, 1975). However, the acoustic attributes which guide listener judgments of child sex have yet to be fully specified.

This project was designed to provide information about the acoustic correlates of perceived sexual identity in the voices of preadolescent boys and girls. In one series of experiments, perceptual judgments of sexual

identity were obtained in response to 73 children's productions of isolated whispered and normally phonated vowels, normally spoken sentences, and sentences spoken in a monotonous fashion. The results which emerged from the four perceptual experiments have been reviewed in a recent report (Bennett and Weinberg, 1979). The aims of the present investigative series were to provide a description of certain acoustic and

•he subjectswere 73 preadolescent boysandgirls between the ages of six years,

one month and seven

years, ten months. Speakers were kingergarten and first grade children in an elementary school in North Central Indiana. All subjects were free of vocal quality deviations and demonstrated error-free productions of the experimental utterances. School health records were used to verify that all children had passed a hearing screening test. Speech recordings were made in a quiet room at an elementary school using a Nagra IV-D tape recorder

and a Sony microphone (ECM-50).

A mouth-to-micro-

phone distance of 15 cm was maintained for the phonated vowel and sentence productions, while the distance was 7.5 cm for the whispered vowels.

B. Speech stimuli 1. /so/ated

vowels

Tape recordings were obtained of each child producing

sustained versions (approximately ls) of the vowel/a•/in a normally phonated mode and a whispered mode. Children produced the vowels using a comfortable level of vocal effort. They were not asked to instrumentally monitor their productions.

Only phonemically representative/a•/

vowels were

temporal characteristics of these children's speech

used in this investigation.

samples and to assess the relationships of these variables to listener judgments of sexual identity.

quencies measured for these speakers were representa-

989

J. Acoust.$oc. Am. 66 (4), Oct. 1979

to ensure

that variations

0001-4966/79/100989-12500.80

This was necessary in order in/a•/

vowel resonance

¸ 1979 AcousticalSocietyof America

fre-

989

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 08:25:19

rive

of sexual

differences

and not a reflection

of varia-

tions in the phonetalc identity of the vowel. Vowel stimuli were considered representative when four of the

five experienced listeners (age range 32-42 yr) indicated that a given sample was an acceptable/•e/vowel (Bennett and Weinberg, 1979). Of the 146 samples evaluated, 114 met the established criteria. Thus, 57 phonated and 57 whispered vowels produced by 26 girls and 31 boys were available for acoustic and perceptual study.

Tape recordings were also made of each child pro-

ducing a nine-syllable declarative sentence (JACKIE AND DAD GRABBED THE BLACK RABBIT) in a normal and a monotonous fashion. A monotonous production was operationally defined as a purposeful attempt to minimize the variations in fundamental frequency throughout the course of the utterance. By contrast, a normal production represented a situation in which the subject was free to vary voice fundamental frequency. Procedures used to elicit these two utterance types are described in a previous report dealing with the per-

,ceptual prominence of sexual characteristics in these children's speech samples (Bennett and Weinberg, 1979). It was necessary to determine that listeners could categorize the two experimental sentences according to the mode of production intended by the subject. Briefly, the two declarative sentences of each child were randomly arranged on a master type and presented

to five listeners (age range 25-35 yr) experienced in judging speech samples and familiar

with the voices of

children.

whether

Their

task

was

tence was representative fashion

or

to indicate

of an utterance

one uttered

each

sen-

spoken in a

in a monotonous

fashion

(Bennett and Weinberg, 1979). Sentences were considered acceptable when four of the five listeners (80%) agreed on the mode of production intended by the subjects. Of the 126 sentences presented to the listeners for evaluation, 106 met the criterion established for acceptability. Hence, 106 normally spoken and monotonous sentences produced by 30 girls and 23 boys were available for study. Since monotonous sentence productions were chosen' to eliminate major changes in fundamental frequency

(fo) associatedwith the implementation of intonation contours, comparisons of fo standard deviations in the two experimental

sentences were of considerable

in-

terest. Variations in fundamental frequency observed for the normally intoned sentences ranged from 0.89 to

2.73 semitones. By contrast, allfo standard deviations observed

in the monotone

sentences

were

less

than

one

semitone (Bennett and Weinberg, 1979). These data showed that the monotonous productions of the children

were characterized by a substantial reduction in fo variability and confirmed the expected perceptual impression of monotonicity.

It was also nedessary to verify that important physical properties other than fo variability did not undergo substantial

change between the two modes of sentence

production. That is, if properties other than fo variability changed significantly during the production of 990

intonation patterns.

Comparisons of the averagefo,

speaking rate, and/m/vowel characteristics

child's

measured

sentences

formant frequency

in the two versions

revealed

of each

that none of these variables

differed significantly as a function of the mode of sen-

tence production (Bennett and Weinberg, 1979). C. Perceptual judgments of sexual identity

2. Sentence stimuli

normal

monotone sentences, potential differences in listeners' perceptual judgments obtained for the two sentence types could not be attributed solely to an alteration in

J. Acoust. Soc. Am., Vol. 66, No. 4, October 1979

Tape-recorded samples obtained from these children were arranged on four master tapes and presented to listeners. The four listening tapes consisted of ran-

domly ordered samples of (a) isolated phonated/m/ vowels, (b) isolated whispered/•/vowels, (c) normally spoken sentences, and (d) sentences produced with minimal variations in fundamental frequency. A total of 116 young adult females ranging in age from 18-23 yr served as listeners; that is, there were 29 different listeners in each perceptual experiment. Judges were instructed

to listen

to each

utterance

and indicate

whether

the speaker producing the sample was a boy or a girl.

D. Acousticanalyses

i

I. Vowel formant frequency analyses Vowel formant frequency measurements were obtained directly from broadband (300-and 450-Hz bandwidth) spectrograms (Voiceprint, Model 700) prepared from these children's isolated vowel and sentence recordings. Measurements of the frequency of the lower two vowel formant frequencies were derived from comparatively steady-state portions of the /m/ in the two connected utterances. Resonance data for the /•e/ in the word AND

in each

sentence

were

not obtained

because

of the

close proximity of the nasal. Average F1 and F2 values representing the mean frequency of the lower two resonances averaged across the five remaining/•e/ vowels were calculated. All spectrograms were measured withoutknowledge of the sex of the speaker. The zero-frequency line was adjusted such that it was approximately i mm from the bottom edge of each spectrogram.

Approximately one-third (n = 78) of the spectrograms were measured by B. Weinberg (BW) to obtain an estimate of the interjudge reliability of vowel formant frequency measurements. The 78 spectrograms were chosen randomly from the total sample. BW made his measurements without benefit of the marks placed on the spectrograms, i.e., measurement included a redetermination of the midpoints of the various formant bands. The average extent of measurement error for the whispered vowels was 45.38 Hz forF1 and 53.47 Hz for F2, while for the phonated vowels the values were 52.62 and

54.08Hz, respectively. Withrespec•tothenormallyproduced sentences, the errorwas 43.07 Hz for F1 and 48.62 Hz forF2. The measu.rement error forF1 in the monotone sentences was 44.53 Hz, while for F2 it was 50.29 Hz.

2. Fundamentalfrequencyanalyses Fundamental frequency measures were obtained by hand measuring waveform data. Speech recordings were played by a Nagra IV-D tape recorder

into one

S. Bennett and B. Weinberg: Determinants of sexual identity

990

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 08:25:19

channel of a direct writing oscillographic recorder

talker sex andselected acousticproperties. Separate

(HoneywellVisecorder, model 1508) operating at.a

analyses were completed for each of the four utterance types spokenby these children. In each analysis the de-

transport speed of 500 mm/s. A 100-Hzcalibrationsignal was placed between every ten speech samples to provide a means for assessing speed variations in the writeout system.

Several measures of fundamental frequency variability were extracted from these speakers' normal sentence

productions: (a) the number of upward and downward inflections and shifts, (b) the extent (in semitones) of upward and downward inflections and shifts, (c) the number and duration (in ms) of level inflections, (d) the number of zero shifts, and (e) the rate of change (in semitones/s) associated with upward and downward inflections and shifts. Inflections were defined as any rising or falling frequency modulation that occurred

during an uninterrupted interval of phonation. A shift was defined as a changein fundamentalfrequency that occurred between the termination of one phonation and the initiation of a subsequentphonation. Adjacent segments of the oscillographic record which evidenced no

change in fo during continuousvoicing were defined as level inflections.

A zero shift referred

to instances

in

pendentvariable wasthe listeners' perceptualresponses, expressed as the number of listeners who responded male to each stimulus. The independent variables were the various acoustic and temporal variables. These statis-

tical analyses provided a basis for estimating the individual and combined predictive power of the various acoustic variables, the magnitude of the simple correlations between each of the acoustic variables

and lis-

tener perception, and the intercorrelations among the various physical measures. II.

RESULTS

A.

Isolated

vowel stimuli

One hypothesis under test in this project was that sexual

differences

in the vocal

tract

resonance

charac-

teristics of preadolescent children provide major cues about sexual identity when judgments are based on isolated vowels. The results of the perceptual experiments (Bennett and Weinberg, 1979) indicated that the overall

which there was no changein fundamentalfrequency between the termination of one voicing episode and the

ren's phonated and whispered vowels (65% and 66%)

initiation of a subsequent episode.

were not significantly

Averaging procedures were used to obtain fundamental

frequency measures. Briefly, the oscillographic recordings of each subject were divided into segments of about 50 ms (25 mm) in duration, and the average period of completed cycles within each segment was calculated. Absolute measurement error was operationally defined as 0.5 mm. Namely, it was assumed that the average degree of error in measuringtemporal intervals of approximately50 ms could reasonablybe expectednot to exceed 1 mm (+ 0.5 ram). Hence, any modulationin frequency which exceededthe changeattributable to measurementerror (approximately10 Hz) was categorized

as an inflection

or a shift.

Similar

criteria were used to identify level inflections and zero shifts. For example, if during an episode of continuous

voicingthe difference in averagefo calculatedfor adjacent segmentswas less than or equal to that attributable to measurement error, then those segmentswere defined as level

inflections.

rates

identification

observed

different.

for

these

child-

This observation led to

the conclusion that transfer function information (e.g. vowel formant frequency values) provided the primary cues about sexual identity in both vowel conditions and that average fundamental frequency characteristics did not provide critically relevant sexual information.

1. Whispered vowel stimuli The simple correlations between [isteners' perceptual responses and the lower two resonances of these

chitdren's whispered vowels are shown in Table I. Both F1 and F2 were significantly (p = 0.01) corretated with listener judgments of child sex. The negative relationships between these variables indicated that a lowering of F1 and F2 values was generally associated with an increase in the number of male responses. It may also be seen that F1 andF2 were significantly related to one another.

Results of the stepwise regression analysis provided further

The rate of fundamental frequency changeassociated

of correct

evidence to show that these children's

vocal

with inflections and shifts reflected the rapidity with

tract resonance characteristics exerted a significant influence on listener judgments (Table iI). Differences

which f o was modulated per unit of time. These mea-

in the frequency location of F1 and F2 accounted for

sures were derived by dividing the extent of the modulation in semitones by its duration in seconds. Since

approximately 61% of the total variance in listeners' perceptual responses. Differences in the frequency of

the duration offo modulationsoften encompassseveral averagingintervals, it was assumedthat the majority of the fo change occurred in the time interval between

TABLE

the midpointof the first segmentand the midpoint of the final segment. The same temporal segmentingproce-

measures and listeners' perceptual judgments for the whis-

I.

Intercorrelations

and zero

resonance

pered vowels (n = 56).

dures were used to measure the duration of level inflections

between vocal tract

Number

shifts.

of

male responses

E. Statistical analyses

F1

--0.70 a

F2

--0.71 a

F1

0.61

Multiple stepwiseregression analyseswere used to assess the relationships between listener judgments of

•p= 0.01.

991

S. Bennettand B. Weinberg: Determinantsof sexualidentity

J. Acoust.Soc.Am., Vol. 66, No. 4, October1979

991

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 155.33.16.124 On: Wed, 26 Nov 2014 08:25:19

TABLE II. Stepwise multipleregressionsummarytablefor thewhisperedvowels(n=56).

Regression

Variable

step

F value

entered

! 2

F2 F1

Significance R

44.175 12.497

• 2450 Hz.

sured for children in the three perceptual groups were clearly nonoverlapping helps clarify why this variable was significantly related to listener judgments.

F1 and F2 values obtained for these children's phonated vowels are illustrated

TABLE IV.

Several

children

were

perceptually ambiguous in spite of the fact that their formant frequency values fell in an area of nonoverlap, while others whose sex was correctly identified evidenced values that were more like those of the opposite

sex. In these and other cases, average fundamental

Stepwise multiple regression summary table for the phonated vowels (n= 56).

Regression

R2

Variable

F value

entered

to enter

I

F2

31.865

Acoustic correlates of perceived sexual identity in preadolescent children's voices.

Acoustic correlates of perceived sexual identity in preadolescent children's voices Suzanne Bennett Universityof Maryland, CollegePark, Maryland 20...
2MB Sizes 0 Downloads 0 Views