© 1991 S. Karger AG. Basel 02 54-4962/91 /0242-0088S2.7 5/0
Psychopathology 1991;24:88-105
Speech Characteristics in Depression1 II. H. Siassen, G. Bom ben, E. Gunther Research Department. Psychiatric University Hospital Zurich. Switzerland
1 The study comprised 192 persons stratified ac cording to sex, age and education. 3 different texts and repeated measurements from the same individu als. This sample serves as a reference with respect to the ‘natural’ variability of speech parameters and allows for deciding upon the significance of voice quality changes [Stassen et al.. 1988a],
It is worth noting that there exist for a consider able number of speakers (about 40% of the general population) additional resonance points in the imme diate neighborhood of F0, e.g. the terz. quart below F0 or the terz. quint above F0. All subsequently presented results are based on the counting task and on Spearman correlation cocffi-
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
Abstract. This study examined the relationship between speech characteristics and psy chopathology throughout the course of affective disturbances. Our sample comprised 20 depressive, hospitalized patients who had been selected according to the following criteria: (1) first admission: (2) long-term patient; (3) early entry into study; (4) late entry into study: (5) low scorer; (6) high scorer, and (7) distinct retarded-depressive symptomatology. Since our principal goal was to model the course of affective disturbances in terms of speech parameters, a total of 6 repeated measurements had been carried out over a 2-week period, including 3 different psychopathological instruments and speech recordings from automatic speech as well as from reading out loud. It turned out that neither applicability nor efficiency of single-parameter models depend in any way on the given, clinically defined subgroups. On the other hand, however, no significant differences between the clinically defined subgroups showed up with regard to basic speech parameters, except for the fact that low scorers seemed to take their time when producing utterances (this in contrast to all other patients who, on the average, had a considerably shorter recording time). As to the relationship between psycho pathology and speech parameters over time, we found significant correlations: (1) in 60% of cases between the apathic syndrome and energy/dynamics; (2) in 50% of cases between the retarded-depressive syndrome and energy/dynamics; (3) in 45% of cases between the apathic syndrome and mean vocal pitch, and (4) in 71 % of low scores between the somatic-depres sive syndrome and time duration of pauses. All in all, single parameter models turned out to cover only specific aspects of the individual courses of affective disturbances, thus speaking against a simple approach which applies in general.
Introduction In 1964. Hargreaves and Starkweather published an outstanding, pioneering study on voice quality changes in depression. They reported an attempt to track changes in mood in hospitalized psychiatric patients utilizing direct measurement of voice spec tra. For this purpose, the authors inter viewed 8 patients about 4 times per week throughout the course of hospitalization. During the interviews, speech samples were recorded and subsequently used to compute long-term spectra at a resolution of thirdoctave bands. Based on these spectra, multi ple regression on two psychopathological rat ing scales was carried out and yielded mod erately good prediction for some patients. Design and realization of the investiga tion relied on the authors’ experiences in the field of speaker recognition [Hargreaves and Starkweather, 1963] and. on the other hand, on the well-known fact that the quality of a patient’s voice holds several cues for the psy chiatrist’s intent on obtaining a diagnosis. However, the chosen design with repeated measurements on the same individual over the whole hospitalization period has not been taken up by other investigators. Rather, subsequent studies have concentrated on the analysis of single, scalar speech parameters, in particular ‘pause duration’ and ‘funda mental frequency’, and have primarily aimed at differences between parameter val-
cients. Even though correlations turned out to be essentially consistent over the speech tasks under investigation, we decided in favor of the counting task as basis of our analyses in order to be compatible with the majority of earlier studies in the literature. More over. correlation coefficients derived from the read ing task did not always reach significance.
89
ues derived from the beginning of a patient’s hospitalization and those measured after im provement [Alpert, 1983; Avery and Silverman. 1984; Blackburn. 1975: Bouhuys and Mulder-Hajonides, 1984; Bouhuys and Al berts, 1984; Godfrey and Knight, 1984; Greden et al., 1981: Hardy et al., 1984; Helfrich et al., 1984: Hinchcliffe et al., 1971; Hollien and Darby, 1979: Johnson et al.. 1986; Klos and Ellgring, 1984: Newman and Mather. 1938; Nilsonne, 1987, 1988; Nilsonne et al.. 1988: Pope et al.. 1970: Rice et al.. 1969: Roessler and Lester, 1976: Saxman and Burk, 1968; Szabadi et al., 1976; Szabadi and Bradshaw, 1980: Teasdale et al.. 1980: Tolkmitt et al., 1982; Weintraub and Aronson, 1967]. Indeed, speech dysfunctions, such as slow, delayed or monotonous speech, are prominent features of severe depression, ma nia and schizophrenia. ‘The patients speak in a low voice, slowly, hesitatingly, monoto nously. sometimes stuttering, whispering, try several times before they bring out a word, become mute in the middle of a sentence. They become silent, monosyllabic, can no longer converse' [Kraepelin, 1921]. Accord ingly, clinicians routinely monitor speed of talking among affectively disturbed patients for diagnostic purposes and as indicators of clinical change [Greden and Carroll. 1980]. Moreover, clinicians frequently observe that the speech of depressed patients is uni form and sometimes exhibits a ‘regular repe tition of gliding intervals' [Darby and Hol lien, 1977] and that the pitch alterations of these patients are narrowed, giving the voice a monotonous quality [Leff and Abberton, 1981]. Based on such experiences, re searchers regarded the parameters ‘speech pause time’ and ‘fundamental frequency’ together with related quantities - as most
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
Speech Characteristics in Depression
promising and as having a great deal to offer, not just for the very sophisticated assess ment of affective range and quality, but for assessment at a sufficiently accurate level. As a consequence, a considerable number of in vestigations into these speech characteristics has been carried out during the past 2 de cades in order to quantify the perceptual observations in an observer-independent, re producible way. The respective results, how ever. are controversial: ‘While these studies suggest that depression is associated with distinctive speech patterns, cross compari son of the research has not been possible because wide variations in diagnosis and methodology exist between the various stud ies. Furthermore, the findings have been less than dramatic, suggesting that precise de scription and sensitive measuring scales are required in order to distinguish speech pat terns and measure differences' [Darby et al„ 1984], The authors themselves mostly regarded their results as preliminary and open to im provement. Andreasen et al. [1981], for ex ample, relativized the results of their study on flat affect as follows: ‘This study is best regarded as a pilot or preliminary investiga tion that should be retested with additional refinements. It contains several problem ar eas that, although they do not negate the sig nificant findings, should be adequately worked through in future research.’ Also, re cent studies in the field of speech pause time in depression yielded no real break through: ‘Measuring pause times in speech is a com plex task, and the relationship between dif ferent pauses and measures of retardation needs further study’, [Nilsonne. 1988]. Then the author continues: ‘The measures of fun damental frequency changeability were lower in the depressed patients than in the
Stassen/Bomben/Giinther
control subjects: these measures could possi bly be used to differentiate between de pressed and non-depressed groups’. Discrim ination between psychiatric patients and normal controls on the basis of single-pa rameter models, however, also did not fully succeed, for example, in the case of schizo phrenia by means of a traditional discriminance analysis [Clemmer. 1980]. In summary, even though a wide range of clinical judgements on affective disorders can be derived from speech samples, no ob jective. sufficiently powerful approach to the measurement of talking behavior through acoustic variables is currently available. The main reason for this is that the underlying processes are too complex to be grasped in tuitively. In particular, measurements may be contaminated by external factors, such as diurnal variation or motor retardation, amongst others, whose effects seem to be superimposed on the speech production pro cess. Our knowledge about these phenome na. at its present state, is insufficient, and available results are more or less preliminary and sometimes inconsistent. Greden and Carroll [ 1980], for example, pointed out that ‘the shift between morning and evening goes in the opposite direction in the endogenous depressives than it does in normal subjects or in recovered depressives’, whereas Hoff mann et al. [1985] reported: ‘A diurnal vari ation in speech pause time was found in con trol subjects, but not in depressed patients, whether retarded or nonretarded’. With re gard to motor retardation. Szabadi and Bradshaw [1983] argued: ‘Thus one cannot escape the conclusion that speech pause time and other psychomotor tests do not measure the same thing'. Future research in this field therefore involves systematically exploring all the different sources of variation in talk
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
90
ing behavior of psychiatric patients as well as of normal controls in order to reliably assess the properties inherent to human speech.
Representing Speech Characteristics Speech characteristics can be roughly de scribed by a few major features: speech flow, loudness, intonation and intensity of over tones. Speech flow comprises the speed at which utterances are produced as well as the number and duration of temporary breaks in speaking. Loudness reflects, on the one hand, the amount of energy used to articu late utterances and, when regarded as a time-varying quantity, the speakers dy namic expressiveness on the other hand. In tonation is the manner of producing utter ances with respect to rise and fall in pitch. Accordingly, intonation leads to tonal shifts in either direction of the speaker’s mean vocal pitch. Overtones are the higher tones which faintly accompany a fundamental tone and are responsible for the tonal diver sity of sounds. The question, however, of how to operationalize speech characteristics is controversely discussed in the literature and obviously depends on the specific prob lem. Indeed, models for speech recognition or speech perception necessarily require dif ferent sets of parameters than do ap proaches to psychoacoustic sensations or psychiatric applications. Principally, one distinguishes between time domain parame ters (concerning speech flow, energy, dy namics) and frequency domain parameters (concerning intonation and overtones). In what follows, we will give a short overview of speech parameters relevant to psychiatric applications.
91
Speech Parameters o f the Time Domain
Speech rate is defined either by the amount of time used to produce utterances or by the number of syllables uttered during a given time. Mean utterance duration and variability of utterance duration describe the statistical properties of utterances within a given text. These parameters are used to test the hypothesis that patients speak more slowly during depression than they do after recovery. Pause time is the summed dura tion of between-utterance breaks or the num ber of temporary breaks during a given time. Mean pause duration and variability of pause duration describe the statistical prop erties of temporary breaks within a given text. These parameters allow for testing the hypothesis that number and duration of tem porary breaks decrease with recovery from depression. Energy measures the total amount of en ergy used to articulate the utterances of a given text. Mean energy and the correspond ing variability of energy, the latter repre senting the speaker's dynamic expressive ness, describe the statistical properties of loudness within a given piece of speech. These parameters make it possible to test the hypothesis that ‘flat affect’ can be defined in terms of speech parameters as a signal with low average energy per second and a stan dard deviation very much smaller than that of the norm. Besides the above-mentioned, most widely used parameters of the time domain, we also analyze (1) utterances per second (mean value and variability) which measure the percent proportion of utterances within each second; (2) pauses per second (mean value and variability) which measure the percent proportion of temporary breaks within each second, and (3) energy per sylla
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
Speech Characterislics in Depression
Stassen/Bomben/Günther
ble (mean value and variability) which mea sure the amount of energy used to articulate a syllable as well as the speaker’s dynamic expressiveness at a syllable level. Speech Parameters o f the Frequency Domain
Spectral analyses which yield a transfor mation of the time domain into the fre quency domain result in spectra which can be directly interpreted as power-frequency distributions. However, the time interval length on which a spectral analysis is to be based is a critical value if the processes un der investigation are essentially nonstationary, as is the case for speech signals. The prob lem of how to appropriately define time in tervals for psychiatric purposes will be dis cussed in the following paragraph. The fundamental frequency FO is defined as the first maximum of a given power-fre quency distribution. For a sufficiently long time interval (1-2 s), FO is a good estimator of the speaker’s mean vocal pitch. The vari ability of FO can be measured either by the standard deviation of FO (estimated from a sufficiently representative speech sample) or by the time derivative of FO. The specific form of the FO distribution curve is called the FO contour. Parameters of the frequency domain particularly allow for testing the hy pothesis that fundamental frequency change ability is lower during depression than after recovery.
Material and Methods Sample Our sample consisted of 12 male patients (mean age 40.8 years, standard deviation 12.9 years) and 8 female patients (mean age 46.8 years, standard devia tion 13.8 years), who had been recently hospitalized
at the Psychiatric University Hospital Zürich with the following diagnoses: 11 affective psychoses (ICD9: 296.1,296.3). 3 schizoaffective psychoses in de pressed state (ICD9: 295.7). I neurotic depression (ICD9: 300.4). 2 alcohol dependences (ICD9: 303). and 3 reactive depressions (ICD9: 309.0. 309.1). There w'ere 15 entries into the study within the first 2 weeks after hospitalization (75%) and 5 entries after at least 1 full month of hospitalization (25%). This sample also comprised a subgroup of 5 first admissions, a subgroup of 5 long-term patients with more than 6 months' hospitalization, and another subgroup of 6 patients who scored high on the re tarded-depressive AMDP scale. Furthermore. wfe sub divided our sample with respect to psychopathology into low scorers (n = 7), high scorers (n = 5). and aver age scorers (n = 8), on the basis of the average AMDP and Hamilton scores at entrance into the study. A detailed description of the patients who joined this calibration study is given in table 1. Rating Instruments The experimental setup followed the same design as in the previous pilot study [Stassen, 1988; Stassen et al.. 1989]: patients were rated by their psychiatrists by means of the AMDP and Hamilton instruments during a psychiatric exploration at a fixed time in the morning each Monday. Wednesday, and Friday throughout 2 weeks. Immediately after psychiatric exploration, patients were asked to fill out the Zung affective state self-rating test. Speech recordings w'ere carried out in an acoustiely shielded room, digitized on-line, and comprised three pieces of spoken texts: ( I) counting out loud from I to 30: (2) reading out loud a fixed text selected for its simplicity and emo tional neutrality, and (3) counting out loud again from 1 to 30. Based on the AMDP and Hamilton instruments, psychopathology was operationalized through the af fect-specific AMDP scales ‘apathic syndrome’, ‘so matic-depressive syndrome", ‘retarded-depressive syn drome’ and ‘manic-depressive syndrome’ (the latter being a second-order scale), and through the ‘Hamil ton 17 score. Signal Processing All speech signals were inspected visually and marked with an artifact code if necessary so that dis turbed intervals could be removed prior to data anal yses. In a next step, segmentation tables were set up in
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
92
Speech Characteristics in Depression
93
Table 1. Composition of calibration sample with respect to sex, age. age of onset, diagnosis, psychopathological score, entry into study, hospitalization time, type of depression Sex
Age years
Onset years
25 3 1 4 23 7 11 9 26 5 12 6 18 14 2 8 21 24 17 22
M F M F M M F M M M M M F F F M M F F M
40 54 42 55 49 46 36 34 33 67 57 43 49 21 52 28 27 65 41 23
19 53 37 51 47 28 29 34 15 67 38 24 17 20 32 26 24 53 29 22
Admission
first
first first
first
first
ICD diagnosis
AMDP score
Entry day
Depression Hospital time, days
295.7 296.3 309.1 300.4 296.1 303.0 303.0 309.0 295.7 296.1 296.1 296.3 296.1 296.1 296.1 309.1 296.1 296.1 296.1 295.7
47 49 41 39 48 37 40 42 51 47 53 38 41 43 50 45 52 48 40 52
H
2 3 4 4 4 5 5 7 7 8 8 9 11 15 18 35 35
L H
47 130
36 41 41 16 178 62 30 91 26 325 74 31 41 44 184 59 61 81 101 322
L L L L H H L L H
oc-, r*
Patient No.
ret
ret
ret ret
ret
ret
order to identify pauses and utterances, whereby pauses of less than 250-ms duration were skipped. Finally, we calculated long-term spectra on the basis of 1-second epochs by means of a discrete Fourier transformation (‘pure’ utterances with pauses having been eliminated for spectral analysis) [a more de tailed description of the different data processing steps is given in Stassen et al. 1988a]. As outlined before, the lime interval length on which spectral analyses are based is a crucial point in the field of nonstationary processes since the theory of ‘local' time-dependent spectra requires a problem-oriented definition of time intervals. Indeed, for speech sig nals the following properties apply: the shorter the time interval length the more strongly spectra follow the ‘tonal- composition of the underlying speech enti ties (c.g. vowels, diphthongs, consonants, plosives, fricatives). Accordingly, successive short-time spec
tral analyses with a sliding time window of about 20100 ms are best suited in the field of automatic speech recognition. On the other hand, formant reso nances move during continuous speech relatively slow because of the physical limitations on how quickly tongue, lips and jaw can be moved. Thus, spectra derived from time intervals of 1—2 s length most clearly reveal the distinct individuality of a speaker's overtone distribution (automatic speaker recognition problem). Investigations into the talking behavior of affec tively disturbed patients focus on modelling phe nomena like apathy and depression (reflected, for example, by speech characteristics like flat timbre and lack of intonation) or aggressivity and stress (re flected. for example, by a shift of mean vocal pitch). Obviously, such phenomena are related to longer persisting processes which have to be separated
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
L = Low; H - right; ret - retarded-depressive.
94
Stassen/Bomben/Giinther
b ____________________________________________________________________________________________
(1) from short-time fluctuations due. for example, to the spoken text or to interactions with the immediate environment, and (2) from circadian variations. Ac cording to the findings of our investigation into the reproducibility and sensitivity of speech parameters in the general population, the pronounced individu
ality of a speaker's fundamental frequency (FO) char acteristics can be determined in a sufficiently repro ducible way from a time interval of about I-second length. Due to the fact that spectral analysis yields a pow er-frequency distribution with spectral intensities
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
Fig. 1. Typical distributions of the fundamental frequency FO for male speakers (a) and female speakers (b) after smoothing.
Speech Characteristics in Depression
Results Our sample comprised depressive, hospi talized patients who had been selected accord ing to the criteria: first admission (n = 5). long term patient (n = 5), early entry into study (n = 15). late entry into study (n = 5). low psycho pathology scorer (n = 7), high psychopathol ogy scorer (n = 5), and distinct retardeddepressive symptomatology (n = 6). With re spect to psychopathology, significant differ ences between first registration and last regis tration 14 days later showed up for various subgroups of patients as well as for the total sample (table 2). No significant changes over the period of observation could be found, however, for the apathic syndrome. There are indications that a considerable number of patients did not improve in this regard during therapy. On the other hand, differences in the other scores under investigation (somaticdepressive syndrome, Hamilton 17 score) turned out to be significant at the 1% level for the total sample, indicating good response to treatment. Since the principal goal of this investigation was to test the efficiency of sin gle-parameter models, we analyzed all major speech parameters proposed in the literature independently of each other. Our analyses included the following speech parameters: (1) average pause duration; (2) number of pauses; (3) average pause duration per sec ond; (4) average utterance duration; (5) aver age energy/second. (6) variation of energy/ second, (7) average energy/syllable; (8) varia tion of energy/syllable; (9) registration time; (10) total length of pauses; (11) total length of utterances. (12) average vocal pitch; (13) vari ation of vocal pitch; (14)F0 narrowness, (15) FO contour. The comparison of clinically defined sub groups yielded, with respect to speech behav-
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
proportional to amplitudes and time durations of the underlying partial tones, the FO contour as well as the FO variability can be reliably estimated from a 1 to 2-second spectrum, rather than from a sequence of short-time Fourier transforms. Figure 1 show's typical FO distributions for male and female speakers (after smoothing). The first maximum designates in each case the fundamental frequency FO: whereas the high er-frequency maxima designate the higher formants F l, F2 etc. We approximated the shape of the FO dis tribution curve by a 2nd degree polynomial2 and used the distance between the symmetrical 6-dB points as a measure for the FO variability and the ratio height/ width of the polynomial as a measure of ‘FO narrow ness". All frequency differences were calculated in quartertones thus providing for a direct comparabil ity of FO contours, independently of the actual funda mental frequencies. A total of 15 speech parameters of both the time domain and the frequency domain were extracted from the cleaned speech recordings by means of a set of computer programs described elsewhere [Stassen et al.. 1988a], The applied computational algorithms were identical with those presented in the literature [Nilsonne. 1988] except for the frequency domain parameters. Since the principal goal of our study was to measure the course of affective disturbances in terms of speech parameters, we combined, for each individual separately, the results of the 6 repeated measurements into 6-dimensional vectors in order to compute correlation coefficients between ‘rating vec tors’ and ‘speech vectors'. Dependencies were subse quently investigated on the basis of Spearman corre lation coefficients whose significance was tested sta tistically. In addition, we tested differences between registrations at group level by means of the Wilcoxon matched-pairs test whereas differences between clini cally defined subgroups were tested by the MannWhitney U test. Thus, we could determine the effi ciency of all major single-parameter models which have been proposed in the literature. In particular, we addressed the following questions: (1) Which psvchopathological quantities are correlated with speech behavior changes in depressive patients during im provement? (2) Are single-parameter models power ful enough to model the individual course of depres sive disturbances over time in a sufficiently represen tative sample of patients? (3) Are there differences in speech behavior between clinically defined subgroups of patients?
95
Stassen/Bomben/Giimher
96
Table 2. Significant differences between first registration and last registration 14 days later for various subgroups of patients and psychopathology
Retarded Early entry Late entry High scorer Low scorer Long-term First admission Total sample
n
APATH
SOMD
RETD
MAND
HAMD
6 15 5 5 5 5 5 20
NS NS NS NS NS NS NS NS
1% 1% NS 1% NS NS NS 1%
1% 1% NS NS NS NS 1% 1%
1% 1% NS NS NS NS NS 1%
1% 1% NS 1% NS NS NS 1%
APATH = Apathic syndrome: SOMD = somatic-depressive syndrome: RETD = retarded-depressive syn drome; MAND = manic-depressive syndrome; HAMD = Hamilton 17 score. Test results are based on the Wilcoxon matched-pairs signed-rank test.
Utterances. n:s
Energy/s
n
X
SD
n
X
SD
n
X
SD
n
X
First registration Early entry Late entry First admission Long-term Low scorer High scorer Retarded Total sample
21 20 18 23 26 20 23 20
355 317 288 335 437 331 343 345
231 209 190 231 269 217 198 226
42 43 40 42 47 40 41 42
339 304 328 321 359 343 342 330
167 134 156 149 175 174 162 160
13 12 12 13 17 12 13 13
418 429 410 445 449 408 429 421
172 162 194 166 181 174 157 170
25 23 22 25 32 24 25 25
6.4 7.7 8.0 6.6 5.6 9.2 8.0 6.7
Last registration Early entry Late entry First admission Long-term Low scorer High scorer Retarded Total sample
14 days later 16 329 219 14 278 280 14 284 194 14 321 319 20 397 304 14 309 206 19 310 182 16 318 235
42 38 40 41 46 36 40 41
328 348 331 333 333 377 358 332
147 171 155 152 138 193 165 153
12 11 11 11 14 11 12 11
390 374 369 364 424 359 377 386
175 216 185 172 179 191 163 185
23 20 20 21 27 22 24 22
6.7 8.3 8.7 7.1 6.5 7.4 7.3 7.1
FO variability SD
n
X
SD
Total length s
2.7 4.5 3.9 4.1 2.2 3.6 3.2
3 3 3 3 4 3 3 3
10.0 10.4 10.4 10.6 11.4 9.4 9.0 10.1
2.5 1.7 2.4 2.3 2.9 2.0 1.6 2.3
26 24 23 25 33 24 25 25
3.6 4.9 3.7 4.3 4.9 4.2 3.6 4.0
3 3 3 3 3 3 4 3
10.4 9.5 11.9 10.4 10.8 9.4 9.2 10.2
3.1 1.5 4.9 3.6 1.9 1.9 1.9 2.9
23 21 21 22 28 22 24 23
oo
Pauses/s
r«-i
Pauses, nts
FO variability measured as 6-dB bandwidth in quartertones: n = averaged observed number per patient: total length = registration time. The experimental situation is counting out loud from 1 to 30.
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
Table 3. Differences between first registration and last registration 14 days later for various subgroups of patients and selected speech parameters
Speech Characteristics in Depression
97
Table 4. Significant differences between first registration and last registration 14 days later for various subgroups of patients and selected speech parameters
Retarded Early entry Late entry High scorer Low scorer Long-term First admission Total sample
n
Pauses
Pauses/s
Utterances
Energy
F0 variability
6 15 5 5 5 5 5 20
NS NS NS NS NS NS NS NS
1% 1% 1% 1% NS 1% 1% NS
NS NS NS NS NS NS NS NS
NS NS NS NS NS NS NS NS
NS NS NS NS NS NS NS NS
The experimental situation is reading out loud emotionally neutral text (Wilcoxon matched-pairs signedrank test).
Table 5. Significant differences between first registration and last registration 14 days later for various subgroups of patients and selected speech parameters
Retarded Early entry Late entry High scorer Low scorer Long-term First admission Total sample
n
Pauses
Pauses/s
Utterances
Energy
F0 variability
6 15 5 5 5 5 5 20
NS NS NS NS NS NS NS NS
NS NS NS NS 1% 1% NS 5%
NS NS 1% 1% NS NS NS NS
NS NS NS NS NS NS NS NS
NS NS NS NS NS NS NS NS
ior, an essentially homogeneous picture, ex cept for the fact that low scorers had by far the longest registration time for each of the three different texts (counting from 1 to 30, reading standard text, and counting from 1 to 30 after reading). This increased registra tion time was obviously due to prolonged speech pause times because corresponding utterance durations were comparable to
those of the other groups (table 3). With respect to the speech parameters under in vestigation, significant differences between first registration and last registration 14 days later showed up very inconsistently, and, moreover, not at all for the total sample. In particular, no relationship between psychopathological improvement and speech pa rameter changes could be found (table 4, 5).
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
The experimental situation is couting out loud from I to 30 (Wilcoxon matched-pairs signed-rank test).
The psychopathology vectors computed sep arately for each patient from the 6 repeated measurements displayed a very individual pattern, indicating that the variation of re spective scores over time is characterized by a pronounced individuality. Changes in both directions, towards a less severe as well as towards a more severe symptomatology, can be observed, and the degree to which changes manifest themselves varies from in dividual to individual. Accordingly, the effi ciency of speech parameter models with re gard to the course of psychopathology has been tested by means of individual correla tion analyses. Detailed correlation analyses yielded several remarkable results. First of all, 4 single-parameter models proved in deed to be suited for measuring relevant aspects of the course of affective disorders: the ‘pause model’, the ‘energy model’, the ‘dynamics model’ and the ‘pitch model’. All other speech parameters under question, in particular time duration of utterances, turned out to be too unspecific or too loosely related to the psychopathological processes. As to the correlation between psychopa thology and speech parameters, our results provide ample evidence for a close relation ship between the apathic syndrome and speech dynamics: we found a significant cor relation in 60% of cases, 7 cases at the 5% level and 5 cases at the 10% level. The energy model yielded almost identical re sults. However, there were 2 additional pa tients whose energy values displayed a signif icant correlation with the apathic syndrome but whose speech dynamics remained con stant during the observation period. Hence, in more than two thirds of patients, the energy/dynamics model turned out to be well suited to monitor the apathic aspect of de pression.
Stassen/Bomben/Giinther
Positive as well as negative correlations showed up. For some patients, speech changed during improvement from low voiced. monotonous forms towards standard values, whereas in others, changes from loud and abrupt speech towards standard values could be observed. No external criteria were found allowing for a classification of patients according to their speech behavior with re gard to the energy/dynamics model. A similar picture was found for the rela tionship between the retarded-depressive syndrome and speech behavior with respect to energy and dynamics. Analyses yielded a significant correlation in 50% of cases, 7 cases at the 5% level and 3 cases at the 10% level. Again, positive as well as negative cor relations showed up. However, it must be emphasized that 3 different types of patients were found for whom the energy/dynamics model applied: (1) patients with correlations to the apathic syndrome only: (2) patients with correlations to both the apathic and the retarded-depressive syndrome, and (3) pa tients with correlations to the retarded-de pressive syndrome only. Another important result concerned the subgroup of low scorers. We found a signifi cant. positive correlation between the speech parameters ‘time duration of pauses' and the somatic-depressive syndrome in 71% of cases (3 cases at the 5% level and 2 cases at the 10% level). Contrary to our expectations and contrary to the results of several studies in the literature, there was no clear relation ship between time duration of pauses and the retarded-depressive syndrome. Only one third of the sample showed a significant cor relation. 2 cases at the 5% level and 3 cases at the 10% level. Moreover, only 1 patient of the retarded-depressive subgroup turned out to display this latter correlation. As to the
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
98
99
Speech Characteristics in Depression
type seems to more inertly follow changes in affect and sometimes correlations do not reach significance. In spite of all that is now known about the characteristics of speech types, further un derstanding of the underlying speech pro duction processes is still necessary in order to decide upon specific application fields. Accordingly, a decision in favor of one of the different speech types seems premature.
Discussion Considering the results of our normative investigation into the speech characteristics of 192 healthy subjects and based on the experience of our pilot study with 6 depres sive patients, it was not likely that this present study could lead to a single-parame ter model which meets the psychiatric re quirements and applies in general. Indeed, the distinct individuality of speech charac teristics together with the large variety of individual courses of affective disturbances speak against a fairly simple single-parame ter model. Nevertheless, our approach to modelling the course of affective distur bances in terms of speech parameters has revealed several important clues. First, nei ther applicability nor efficiency of the new method depended on the clinically defined subgroups under investigation. Thus, from our current perspective, it seems likely that no principal restrictions concerning the ap plicability of the new method exist. Second, with regard to basic speech parameters, no significant difference between the clinically defined subgroups showed up. Thus, speech parameters seem to be less suited to differen tiate between clinical categories and will cer-
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
correlation between vocal pitch and psycho pathology. our results were somewhat disap pointing. Contrary to earlier findings in the literature, our study did not lead to a clear picture. Although mean vocal pitch was sig nificantly correlated in 45% of cases with the apathic syndrome (3 cases at the 5% level and 6 cases at the 10% level), the expected close relationship between pitch variations and psychopathology could not be con firmed. Only a few patients showed a signif icant correlation between pitch variation and the somatic-depressive syndrome. As to the Zung scale, detailed analysis revealed a significant correlation between observer rating and self-rating in 13 cases (Spearman coefficient r > 0.6, p < 10%, for at least 4 of 5 interview scales), a nonsignifi cant correlation in 1 case, and no correlation at all in 6 cases. In other words, for one third of samples, interview rating and self-rating yielded completely independent results whereas, for the other two thirds, a high coincidence between both rating types showed up. Another important issue of our study elu cidated the old problem of how to get a patient to speak. A direct comparison of automatic speech (i.e. counting out loud) with reading out loud, demonstrated the ad vantages and disadvantages of the 2 ap proaches: Automatic speech displays more variability and seems to follow changes in affect more closely. This advantage, how ever, is achieved at the expense of reduced reproducibility. Reading out loud enables speakers to produce more overtones and to develop speech patterns of a ‘richer’ quality. Moreover, speech parameters derived from the reading task are better reproducible as compared to those derived from automatic speech. On the other hand, speech of this
100
Stasscn/Bomben/Gümher
tainly not allow for modelling traditional diagnostic schemes on this basis. Third, pos itive as well as negative correlations were found between psychopathology and speech parameters.
This latter point is of particular interest since the predictive value as well as the applicability of speech parameter models can be seriously affected by such a fact. Polarity of the speech parameters energy and
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
Fig. 2. a, c Spectral voice patterns reflect the overtone structure (intensity contours and variability) of a hospitalized patient throughout the course of affective disturbances. Pattern derived from the first measure ment (a) and the pattern derived from the third measurement 5 days later (c). Spectral intensities are plotted along a log-proportional scale (y-axis) and frequency covers 7 octaves from 64 to 8.192 Hz (x-axis).
Speech Characteristics in Depression
101
dynamics, for example, imply that changes in psychopathology may be accompanied by speech parameter changes in both directions, towards increased or reduced values. In other words, for some patients speech
changed during improvement from low voiced forms without dynamics towards standard values, whereas in others, changes from loud and abrupt speech towards stan dard values can be observed (such standard
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
b, d Spectral voice patterns reflect the overtone structure (intensity contours and variability) of a hospitalized patient throughout the course of affective disturbances. Pattern derived from the fourth measurement (8 days alter the first measurement) (b). and pattern derived from the last measurement 14 days later (d). Spectral intensities arc plotted along a log-proportional scale (y-axis) and frequency covers 7 octaves from 64 to 8.192 Hz (x-axis).
values are available through our normative study with 192 healthy subjects stratified ac cording to sex. age and education). As a con sequence. a preclassification of patients is required when monitoring therapeutic ef fects. such as responses to treatment, with the help of the above speech parameters. Since the ICD diagnostical scheme failed to yield this classification at a sufficiently accu rate level, we are developing, on the basis of our normative data, a fully automatized pro cedure which makes it possible to classify patients through their first speech record ing. A solution of this classification problem, however, will not necessarily mean much progress with regard to our principal goal of developing a method of assessing affect in terms of speech parameters which is practi cable in the sense that it is ( I ) applicable to most psychiatric patients, and (2) easy to carry out in standard form. According to the results of this present investigation, the energy/dvnamics model applies only to two thirds of patients whereas, in the other pa tients. speech behavior does not change in this respect during improvement. Moreover, the energy /dynamics model assesses only the apathie aspect of depression sufficiently well (the relationship to the retarded-depres sive syndrome is less clear). Similar findings hold for the other speech parameters under investigation: they only apply to relatively small subsets of affectively disturbed pa tients thus requiring the application of com plicated classification procedures prior to any analysis. A multivariate approach based on feature vectors into which scalar speech parameters are combined will possibly pro vide a broader basic for the method. The respective methodological development is under way.
Stassen/Bomben/Günther
Alternatively, we pursue another method of approach which aims at measuring voice quality changes in the frequency domain by pattern recognition techniques used, for ex ample. in the field of computerized speaker verification. As outlined before, our spectral analyses yield a quartertone resolution over the full frequency range of 7 octaves be tween 64 and 8.192 Hz. Accordingly, we are able to simultaneously analyze the contours of all formants relevant to the underlying speech process. This is of particular impor tance because the tonal richness or. in turn, the lack of tonal expressiveness is made up by the complete ensemble of overtones, rather than by the mere fundamental fre quency. Within the scope of our investigations into the genetic determination of human brain wave patterns [Stassen ct ah. 1988b], we have developed our concept of spectral patterns. Such spectral patterns reflect, on the one hand, the above-mentioned overtone distribution and. on the other, the corre sponding spectral variabilities. We applied this approach to our sample of healthy sub jects in order to determine the 'natural' dy namic properties of individual overtone dis tributions. For this purpose, we determined the optimum time interval length for spec tral analysis by iteratively optimizing the cri terion ‘computerized recognition of persons from a sufficiently large population of speak ers’. The resulting time interval length turned out to be large enough that for the characteristic frequency distribution to be revealed, yet small enough for the individual variability of each spectral compound to show up. In particular, we found that 90% of persons could be uniquely identified by their spectral voice patterns (n = 97). A cross-vali dation by means of an independent sample
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
102
Speech Characteristics in Depression
eral population (comprising 192 healthy sub jects and repeated measurements under 3 different experimental conditions) which will allow' us to distinguish between ‘natural’ and ‘significant’ fluctuations and. (4) the next steps in this research project are clearly laid out. Undoubtedly, speech characteris tics in depression will be the subject of study and fascination throughout the coming years.
References Alpert. M.: Encoding of feelings in voice; in Clayton. Barrett. Treatment of depression: old controver sies and new approaches, pp. 217-228 (Raven Press. New York 1983). Andreascn, N.C.: Alpert. M.: Martz. M.J.: Acoustic analysis, an objective measure of affective flatten ing. Archs gen. Psychiat. 38: 281-285 (1981). Avery. D,: Silverman. .1.: Psychomotor retardation and agitation in depression. Relationship to age. sex. and response to treatment. J affect. Disorders 7: 67-76(1984). Blackburn. I.M.: Mental and psychomotor speed in depression and mania. Br. J. Psychiat. 126: 329335 (1975). Bouhuys. A.L.; Muldcr-Hajonides. W.: Speech timing measures of severity, psychomotoric retardation, and agitation in severely depressed patients. J comm. Disorder 17: 277-288 (1984). Bouhuys. A.L.: Alberts. E.: An analysis of the organi zation of looking and speech-pause behaviour of depressive patients. Behaviour 80: 269-298 (1984). Clemmer. E.J.; Psyeholinguistic aspects of pauses and temporal patterns in schizophrenic speech. J. psy cholinguist. Res. 9: 161-185 (1980). Darby. J.K.: Speech evaluation in psychiatry. (Grune & Stratton. New York 1981). Darby. J.K.; Hollien. H.: Vocal and speech patterns of depressive patients. Folia phoniat. 29: 279-291 (1977). Darby. J.K.: Simmons. N.: Berger. P.A.: Speech and voice parameters in depression: A pilot study. .1. comm. Disorder ¡7: 75-85 (1984). Godfrey. H.P.: Knight. R.G.: The validity of actom-
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
(n = 90) yielded identical results [Stassen, 1990]. The frequency-related variability of the formants F0. FI. F2 ... as expressed by the respective ‘contours' and the intensityrelated variability of overtones explain most (90%) of the interindividual diversity of speech sounds. In accordance with these conclusive findings, we expect that the same kind of adaptive strategies will lead to simi larly powerful solutions when addressing the problem of intraindividual changes in voice timbres, provided a sufficiently large num ber of characteristic speech samples is avail able. Thus, we arc now collecting data from affectively disturbed patients with clinically well-defined changes in psychopathology over time. These data will subsequently serve as learning samples for determining significant changes (qualitatively as well as quantitatively) in the overtone structure (contours and intensities) of patients during improvement. Of particular interest in this context is the relationship between the varia bilities 'within measurement’ and ‘between measurements'. The ratio of these latter quantities represents the criterion function during optimization. Examples of overtone structure changes during therapy are given in fig. 2. In summary, the point we have arrived at is disappointing in the sense that there ob viously exists no sufficiently powerful single parameter model for monitoring changes in the affective state of patients during im provement which meets the requirements of psychiatric applications. Nevertheless we are optimistic since (1) the proposed method has proved to be practicable: (2) the applied adaptive procedures (‘learning to recognize') have led to reasonable results with regard to the overtone structure of speakers. (3) we have available reference data from the gen
103
etcr and speech activity measures in the assess ment of depressed patients. Br. J. Psychiat. 145: 159-163 (1984). Grcden. J.F.: Caroll. B.J.: Decrease in speech pause times with treatment of endogenous depression. Biol. Psychiat. 15: 575-587 (1980). Greden. J.F.: Albala, A.A.: Smokier. I.A.: Gardner. R.: Caroll. B.J.: Speech pause time: a marker of psychomotoric retardation in endogenous depres sion. Biol. Psychiat. / 6: 851-859 (1981). Hardy, P.; Jouvent, R.; Widlöchcr. D.: Speech pause time and the retardation rating scale for depres sion (FRD). J. affect. Disorders 6: 123-127 (1984). Hargreaves. W.A.; Starkweather. J.A.: Recognition of speaker identity. Lang. Speech 6: 63-67 (1963). Hargreaves. W.A.; Starkweather. J.A.: Voice quality changes in depression. Lang. Speech 7: 8488/218-220 (1964). Helfrich. H.: Standke. R.; Scherer. K.R.: Vocal indi cators of psychoactive drug effects. Speech Commun. 3: 245-252 (1984). Hinchcliffe. M.K.; Lancashire. M.: Roberts. F.J.: De pression: defence mechanisms in speech. Br. J. Psychiat. IIS: 471-472 (1971). Hoffmann. G.M.: Gonze. J.C.: Mendlewicz. .1.: Speech pause time as a method for the evaluation of psychomotoric retardation in depressive ill ness. Br. J. Psychiat. 146: 535-538 (1985). Hollien. H.: Darby. J.K.: Acoustic comparisons of psychotic and non-psychotic voices: in Hollien. Hollien. Current issues in the phonetic sciences, pp. 829-835. (Benjamins. Amsterdam. 1979). Johnson. W.F.: Emde, R.N.: Scherer. K.R.; Klinncrt, M.D.: Recognition of emotion from vocal cues. Arclts gen. Psychiat. 43: 280-283 (1986). Klos, K.T.: Ellgring. H.: Sprechgeschwindigkeit und Sprechpausen von Depressiven; in Hautzingcr. Straub. Psychologische Aspekte depressiver Stö rungen (Roderer. Regensburg 1984). Kraepelin. E.: Manic-depressive insanity and para noia (transl. M. Barclay), (Livingstone. Edinburgh 1921). Leff, J.; Abbcrton. E.: Voice pitch measurements in schizophrenia and depression. Psychol. Med. II: 849-852 (1981). Nevlud. G.N.; Fann. W.E.; Falck. F.: Acoustic param eters of voice and neuroleptic medication. Biol. Psychiat. 18: 1081-1084 (1983).
Stassen/Bombcn/Günthcr
Newman. S.: Mather. V.G.: Analysis of spoken lan guage of patients with affective disorders. Am. J. Psychiat. 94: 912-942 (1938). Nilsonne. A.: Acoustic analysis of speech variables during depression and after improvement. Acta psychiat. scatid. 76: 2 3 5-24 5 (1987). Nilsonne. A.: Speech characteristics as indicators of depressive illness. Acta psychiat. scand. 77: 253263 (1988). Nilsonne, A.; Sundbcrg. J.: Ternstrom, S.; Askcnlclt. A.: Measuring the rate of change of voice funda mental frequency in fluent speech during mental depression. J. Acoust. Soc. Am. 83: 716-728 (1988). Pope. B.; Blass. T.: Siegman. A.W.: Rahcr, J.: Anxiety and depression in speech. J. consult, clin. Psychol. 35: 123-133 (1970). Rice. D.G.: Abroms. G.M.: Saxman. J.H.: Speech and physiological correlates o f ‘flat’ affect. Archs gen. Psychiat. 20: 566-572 ( 1969). Roessler. R.: Lester. J.W.: Voice predicts affect dur ing psychotherapy. J. nerv. ment. Dis. 163: 166— 176 (1976). Saxman. J.H.: Burk. K.W.: Speaking fundamental fre quency and rate characteristics of adult female schizophrenics. J. Speech Hearing Res. II: 194— 203 (1968). Stassen. H.H.: Modelling affect in terms of speech parameters. Psychopathology 21: 83-88 (1988). Stassen, H.H.: Computerized recognition of persons by spectral voice patterns. Meth. Inform. Med. (submitted. 1990). Stassen, H.H.: Bomben, G.: Affective state and voice: reproducibility and sensitivity of speech parame ters. Meth. Inform. Med. 27: 87-96 ( 1988a). Stassen, H.H.: Lykken. D.T.: Propping. P.; Bomben. G.: Genetic determination of the human EEG. Survey of recent results on twins reared together and apart. Hum. Genet. 80: 165-176 (1988b). Stassen. H.H.: Kuny, S.; Woggon, B.: Angst. J.: Affec tive state and voice: results of a pilot study with 6 depressive patients. Pharmacopsychiatry 22: suppl.. pp. 17-22 (1989). Szabadi. E.; Bradshaw, C.M.: Speech in depressive states, in Simpson. Psycholinguistics in clinical practice, pp. 211-252 (Irvington, New York 1980). Szabadi, E.: Bradshaw, C.M.: Speech pause time: Be havioural correlate to mood. Am. J. Psychiat. 142: 265 (1983).
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
104
Speech Characteristics in Depression
Weintraub, W.; Aronson. H.: The application of ver bal behavior analysis to the study of psychological defense mechanisms: IV. Speech patterns associ ated with depressive behavior. J. nerv. ment. Dis. 144: 22-28 (1967). H.H. Stassen Psychiatric University Hospital Zurich Research Department PO Box 68 CH-8029 Zurich (Switzerland)
Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM
Szabadi. E.: Bradshaw. C.M.; Besson. J.A.: Elonga tion of pause-time in speech: a simple, objective measure of motor retardation in depression. Br. J. Psychiat 129: 592-597 (1976). Teasdale, J.D.; Fogarty. S.J.: Williams, M.G.: Speech rate as a measure of short-term variation in depres sion. Br. J. soc. Psychol. 19: 271-278 (1980). Tolkmitt. F.; Helfrich. H.: Standke. R.: Scherer. K.R.: Vocal indicators of psychiatric treatment effects in depressives and schizophrenics. J. common. Disorders. 15: 209-222 (1982).
105