Journal of Experimental Psychology: Human Perception and Performance 1978, Vol. 4, No. 3, 483-496

Response Bias in Category and Magnitude Estimation of Difference and Similarity for Loudness and Pitch Bruce Schneider Scott Parker University of Toronto American University Toronto, Canada Michael Valenti, Glenn Farrell, and Gary Kanow Columbia University Interval scales of sensory magnitude were derived from magnitude and category estimates of loudness differences, loudness similarities, pitch differences, and pitch similarities. In each of the four loudness experiments, a loudness scale was constructed from a nonmetric analysis of the rank order of the judgments. The four loudness scales so constructed were found to be equivalent to one another and indicated that loudness was a power function of sound pressure with an exponent of .29. A similar analysis for the four pitch experiments found the pitch scales derived in each case to be equivalent to one another and linear with the mel scale of pitch. Thus the same sensory scale formed the basis for magnitude and category estimates of differences and similarities for two distinct perceptual continua. For both pitch and loudness, these sensory scales were used to generate scales of sensory differences. A comparison of the category and magnitude estimates of sensory differences with the scale of sensory differences derived from the nonmetric analyses indicated the presence of significant response biases in both category and magnitude estimation procedures. Psychological scales of sensory events have been constructed using a variety of scaling techniques. Unfortunately, scales of sensation derived from different techniques sometimes do not agree. A classic example is the well-known relationship between magnitude estimation and category estimation (Stevens & Galanter, 1957). In a magnitude estimation task, observers are asked to assign numbers to stimuli such that the ratio of any two numbers reflects the ratio of the sensory magnitudes of the corresponding stimuli. In a category estimation task they are asked to assign stimuli to categories such that the spacing _,. , T . This research was supported by National Science Foundation Grant GB-36211 and National Research Council Grant A9952.

among categories reflects the psychological differences among stimuli. Assuming that there is a single sensory continuum and that observers can judge both sensory intervals and sensory ratios accurately, then the numerical assignments should be linearly related. However, category scales are usually a negatively accelerated function of magnitude estimation scales (e.g., Stevens & Galanter, 1957). This nonlinearity could be due to one or more of the following factors: (a) different sensory representations for judgments of differences and ratios, (b) a failure on the part of subjects to distinguish between sensory ratios and sensory differences for some continua, and (c) response biases in one , , ... . . ., , ,. . , °r both of the techniques that distort the true representation of the sensory

Requests for reprints should be sent to Bruce Schneider, Centre for Research in Human Develop-

continuum .„ , ' AU three

meat, Erindale College, University of Toronto, Mississauga, Ontario, Canada L5L 1C6.

of

, these ,

. . .

Copyright 1978 by the American Psychological Association, Inc. 0096-1523/78/0403-0483S00.75

483

,

explanations have their adherents in the literature. For

484

SCHNEIDER, PARKER, VALENTI, FARRELL, AND KANOW

example, Marks (1974) has argued that we need a different perceptual representation for judgments of sensory differences than we do for judgments of sensory ratios. Torgerson (1961) and Schneider, Parker, Farrell, and Kanow (1976) have argued that observers cannot judge both sensory differences and sensory ratios for some continua. They have claimed that only one comparative perceptual relationship is denned on a sensory continuum. Thus, the numbers assigned to pairs of stimuli equally separated with respect to this perceptual relationship would depend on the instructions. In a magnitude estimation experiment, the numbers assigned to these equally separated pairs would have equal numerical ratios, whereas in a category estimation experiment, the numbers would have equal numerical differences. The end result would be a negatively accelerated function relating category to magnitude scales. Finally, some investigators have argued that response biases are responsible for the relationship between the two scales. Stevens (1971) has argued that subjects are unable to judge accurately equal intervals at different locations along a sensory continuum. According to Stevens's explanation, a constant perceptual difference seems larger near the lower end than near the higher end of the continuum. This kind of asymmetrical distortion, if it truly existed, would explain the negative acceleration of the category scale when plotted as a function of the corresponding magnitude scale. Another potential source of response bias in magnitude estimation has been identified in a model proposed by Attneave (1962) and explored in a number of studies (e.g., Curtis, Attneave, & Harrington, 1968; Curtis & Rule, 1972; Rule, Curtis, & Markley, 1970; Rule, Laye, & Curtis, 1974). This model suggests that magnitude estimation is a two-stage process. First, there is a mapping of physical energy into psychological magnitude. Second, there is a mapping of psychological magnitude into numerical response. Both of these mappings are assumed to be power functions. Hence magnitude estimates, accord-

ing to this model, are a nonlinear function of psychological magnitude. In the present article, we explore further the role of numerical response biases in magnitude and category estimation tasks. If response biases enter into one or both of these tasks, then the observer does not accurately report the subjective magnitude of the stimulus. In most of the studies reported in the literature, the nature of the task (magnitude estimation or category estimation) is completely confounded with the stimulus relationship (ratio or difference) that is being judged. In magnitude estimation tasks, subjects are asked to base their judgments on sensory ratios, whereas in category estimation tasks, subjects are asked to judge sensory intervals and/or differences. Consequently, it is difficult to tell whether the difference in results is due to the differences between the tasks or to the differences between the perceptual relationships being judged, and in the few studies (Beck & Shaw, 1967; Dawson, 1971; Marks & Cain, 1972) that have looked at magnitude estimates of psychological difference, the data set has not been complete enough to permit the response bias analyses conducted here. In the present experiments, category estimates and magnitude estimates were obtained for judgments of loudness difference (Experiments 1 and 2), judgments of loudness similarity (Experiments 3 and 4), judgments of pitch difference (Experiments 5 and 6), and judgments of pitch similarity (Experiments 7 and 8). The intent of each pair of experiments was to determine the presence of response bias in the two techniques. If no response bias exists, for example, then the category estimates of loudness difference should be a linear function of magnitude estimates of loudness difference. Since the same perceptual relationship (loudness difference) is being judged in both instances, any nonlinearity indicates the presence of response bias; otherwise we would find a linear relationship between category estimates and magnitude estimates. An added feature of this procedure is that it is possible to determine an interval scale of loudness from the

RESPONSE BIAS IN JUDGMENTS OF DIFFERENCE AND SIMILARITY

ordinal properties of both the magnitude estimates and the category estimates. For purposes of this analysis, the tones employed in this experiment may be conceived of as equivalent to points on a line segment. The distance between points then becomes analogous to the difference in tonal loudness. In the magnitude estimation task, the subjects were asked to estimate the magnitude of these loudness differences. Shepard (1966) has shown that provided the number of stimuli is greater than or equal to 10, the rank order of interpoint distances can be used to determine projection values along the line segment that are, for all practical purposes, unique up to addition and multiplication of a constant. Hence the rank order of the magnitude estimates can be used to determine an interval scale of loudness, provided that the magnitude estimates are at least monotonic with loudness difference. In the category estimation task, subjects were asked to assign a loudness difference to the appropriate category, where the highest category referred to the greatest loudness difference and the lowest category to the smallest loudness difference. If these assignments were monotonic with loudness difference, then the interval scale of loudness constructed from the category estimates of loudness difference should be identical to the loudness scale constructed from the magnitude estimates. Note that to obtain interval scales of loudness, the judgments of loudness differences (either magnitude or category) need only be monotonically related to loudness differences along the sensory continuum. The present study was intended to (a) show that the same perceptual scale is recovered from both kinds of scaling instructions (magnitude estimation and category estimation), (b) detect any response biases in these two procedures, and (c) show that judgments of similarity for loudness and pitch are based on the same perceptual relation, namely, loudness differences and pitch differences, respectively.

485

Method Subjects Forty undergraduate and graduate students served as subjects in these experiments. All subjects claimed to have normal hearing and some subjects had musical training. Five subjects served in each, of the eight experiments.

Apparatus Calibrations and listening conditions were identical to those used by Carvellas and Schneider (1972) except that the subjects sat in an Industrial Acoustics Model 300 sound-resistant booth. Ten 1200-Hz tones varying in intensity [50, 56, 60, 68, 72, 80, 86, 94, 98, and 104 dB (SPL)] were used in the loudness experiments. In the pitch experiments each of the 10 tones (460, 525, 645, 760, 830, 920, 1,060, 1,130, 1,290, and 1,370 Hz) was presented at a sound pressure level of 83.3 dB.

Procedure Category estimates of loudness difference were obtained for the 45 pairs of unequal tones constructed from the set of 10 1200-Hz tones. Each subject served in three experimental sessions. In the first session the 45 pairs of tones were presented once in irregular order. In both the second and third sessions, the same 45 pairs were presented twice, with a 10-minute break separating the presentations. Prior to the first session in Experiment 1, the subjects were instructed as follows: This is an experiment on your perception of difference. You will hear pairs of tones. The tones in a pair will differ in loudness. Your task is to decide how different the tones in a pair are and assign them to a category between 1 and 9. The categories represent different degrees of loudness difference. Category 9 represents the greatest amount of loudness difference, while Category 1 is to be used for the smallest amount of loudness difference. At the beginning of the experiment you will hear two pairs of tones as examples. The first pair of tones will be widely separated in loudness, so it should be assigned to Category 9; the second pair will be quite close together in loudness and therefore should be assigned to Category 1. You will then hear a series of pairs of tones. You should assign each pair to the category which represents its degree of loudness difference. The categories should be considered as containing equal ranges of difference; for example, Category 5 should be used for pairs whose loudness differences are midway between those from Category 1 and those from Category 9. Remember, Category 9 represents a large loudness difference. You should use each category and feel free to use any category more than once. Are there any questions?

486

SCHNEIDER, PARKER, VALENTI, FARRELL, AND KANOW

The tone pairs used as examples of the two extreme categories were 52 and 54 dB for Category 1, and 48 and 104 dB for Category 9. After hearing the instructions, the subject was led into the booth and shown how to wear the earphones. The subject alternated between the tones of a given pair by means of a three-position hand switch; the third position was quiet, to which he switched between trials. The subject and experimenter communicated by means of an intercom. Subjects were allowed to listen to each pair of tones as long as they wished. In the second and third sessions, no instructions were given, but each session was preceded by identified presentations of the standard pairs. The sequences in the second and third sessions were such that each tone pair appeared twice before and twice after every other tone pair. Also, each tone appeared equally often on each operative position of the three-position switch. The second sequence of the third session was identical to that used in the first session. For category estimation of loudness similarity, the same tone pairs were employed and the instructions were modified so that they concerned loudness similarity rather than loudness difference. The same tone pairs were employed as standards; however, the 52—54 dB pair was now labeled Category 9 (highest degree of loudness similarity) and the 48-104 dB pair was called Category 1 (lowest degree of similarity). Sometimes the subjects estimating similarity claimed that they did not understand what they were to do. They were then told to assign the pairs to categories according to "how much the tones sound alike." The experiments on category estimation of pitch difference and pitch similarity were identical to those on loudness difference and loudness similarity, with the following exceptions: First, 10 tones varying in frequency were used to construct the 45 tone pairs. Second, the instructions were changed so that they concerned pitch instead of loudness. Third, the standard pairs were 460-525 Hz (small pitch difference, large pitch similarity) and 4601,370 Hz (large pitch difference, small pitch similarity). The experiments on magnitude estimation of loudness difference, loudness similarity, pitch difference, and pitch similarity were reported by Parker and Schneider (1974). The apparatus and number of subjects were the same as for the category estimation experiments. The only difference in procedure was that the subjects were given magnitude estimation instructions instead of category estimation instructions.

Results Experiments 1-4: Loudness In Experiment 1, subjects were asked to assign pairs of 1200-Hz tones differing in intensity to one of nine categories. They

were instructed that Category 1 was reserved for pairs having the smallest loudness difference and that Category 9 was reserved for pairs having the largest loudness difference. For each subject the first of five category assignments of a tone pair was discarded. The arithmetic mean assignment of the remaining four estimates was computed for each of the 45 tone pairs and and used to rank order the tone pairs from 1 to 45, where RH = 1 indicates that the tone pair (i, j) was judged to have the smallest loudness difference and Rkm = 45 indicates that the tone pair (k, m) was judged to have the largest loudness difference. To determine whether the five subjects were in ordinal agreement with one another, Kendall's coefficient of concordance, W (Siegel, 1956, pp. 229-238), was computed. Kendall's W was .94, indicating good ordinal agreement among the five subjects in Experiment 1. The data analysis for Experiment 2 (magnitude estimation of loudness difference) was identical to that for Experiment 1 except that the ordinal ranking of pairs for each subject was determined from the geometric mean magnitude estimates rather than from the arithmetic mean. The geometric mean was used in this instance because the variance of magnitude estimates is known to increase with the mean estimate. The data analysis of Experiment 3 was identical to that of Experiment 1 and the analysis of Experiment 4 was identical to that of Experiment 2, with the following exception: In Experiments 1 and 2, subjects were instructed to judge loudness difference, whereas in Experiments 3 and 4, they were instructed to judge loudness similarity. If we assume that judgments of loudness similarity are a monotonic inverse of judgments of loudness difference, then the reverse of the rank ordering of tone pairs based on similarity should be a rank ordering of tone pairs based on loudness difference. Accordingly, in Experiments 3 and 4, the rankings of tone pairs were reversed to obtain rank orders of loudness difference. Kendall's W was .94, .95, and .93 for Experiments 2, 3, and 4, respectively.

RESPONSE BIAS IN JUDGMENTS OF DIFFERENCE AND SIMILARITY

Hence ordinal agreement within each group of subjects was equally good across all four experiments. If loudness is a unidimensional experience, then the loudnesses of sounds can be represented as points on a line segment. Furthermore, in this representation the distance between any two points becomes equivalent to the loudness difference (Li — L,) between any two tones. In each of the four experiments, a rank order of loudness differences was determined for each subject. The arithmetic mean of the ranks across the five subjects in each experiment was computed for each simulus pair. This mean ranking provided an ordinal index of the group's perception of loudness difference. The mean rank order from each experiment was used as an input to a nonmetric scaling program (Carvellas & Schneider, 1972). This program produces numerical assignments (projection values, P;s, along a line segment) for each of the 10 tones such that the differences in numerical assignments between all pairs of tones best predict the ranks of the 45 pairs of loudness differences used as input for the program. Furthermore, these projection values are unique up to addition and multiplication by a constant, that is, these values constitute interval scale measurement. Stress, Kruskal's (1964) measure of goodness of fit, was computed for each of the four experiments. Stress measures the discordance between the predicted distances, ds, and a set of distances, ds, that are (a) monotonically related to the rank ordering of pairs that served as input to the program and (b) as much like the ds as they can be within the restrictions imposed by Condition a. Stress is given by [2(d —

Response bias in category and magnitude estimation of difference and similarity for loudness and pitch.

Journal of Experimental Psychology: Human Perception and Performance 1978, Vol. 4, No. 3, 483-496 Response Bias in Category and Magnitude Estimation...
1MB Sizes 0 Downloads 0 Views