Scand. J . Psychol.. 1975, 16, 72-78

Generalization of ratings based on projective tests LARS NYSTEDT DAVID MAGNUSSON EVA ARONOWITSCH

Absrracf.-A multivariable-multimethod analysis was performed on six professional clinical psychologists’ ratings of 38 patients for the three variables, Intelligence, Ability to Establish Contact and Control of Affect and Impulses. The judges based their ratings of the patients performance on each of the three tests Rorschach, TAT, and Sentence Completion separately as well as on all tests together. The stability coefficients and the consensus among judges was rather high but decreased when the depth of interpretation of the response variable increased. The ratings showed both convergent and discriminant validity but the generality of the ratings was generally low. The results support the opinions held among clinicians, that tests give different kinds of information about individuals and that clinicians, to some extent, use the information from each test as a subvariable which is integrated into a more global variable.

The problem dealt with is the generalizability of clinical judgments based on projective tests. A test may be regarded as a standardized situation especially construed with the purpose of eliciting a sample of a person’s behavior. The clinician uses the person’s test responses as a basis for inferences about the personality of the person. Because such inferences may have severe consequences for the person being judged, it is natural to investigate the generalizability of such judgments. Most studies concerning the validity of clinical judgments have explored the predictive validity of such judgments and compared it with the predictions made by a statistical equation. The results of these studies have generally shown that the validity of clinical judgments are low and that relatively simple equations could be worked out which would be as good as, if not betterthan, the clinical prediction (see Meehl, 1954, and Sawyer, 1966, for a review). Holt (1958,1970) has criticized the studies summarized by Meehl and Sawyer as well as their conclusions. Korman (1968) has also summarized 40 relevant studies. In Korman’s comparison there was no basis for the conclusion that statistical prediction was Scand. J . Psychol. 16

University of Stockholm, Sweden

generally better than clinical prediction. The results rather indicated that the clinical judgments were more accurate than the statistical predictions. Thus, the conclusions about the relative validity of clinical and statistical prediction are not unambiguous. In a clinician’s daily work, judgments are often made which are formulated in such a manner that tests of their predictive validity are not feasible. Goldberg & Werts (1966) discussed three types of inferential reliability, which are applicable when tests of the predictive validity of judgments are not possible. They based their discussion upon Cronbach, Rajaratnam & Gleser’s (1963) paper on reliability as generalizability. Goldberg and Werts differ between ( a ) generalizability over time for a judge who make estimates of the same trait from the same data (stability), (b) generalizability overjudges who make estimates of the same trait from the same data (consensus), and (c) generalizability over data sources which are administered at the same time and interpreted by the same judge (Convergence). According to Goldberg (1968) the stability of judges’ ratings seemed to be relatively high, whereas the convergence was low. Consensus among judges varied from very high to very low. A further analysis of the studies about consensus and convergent validity summarized by Goldberg (1%8)and a corresponding analysis of studies carried out by Magnusson, GerzCn & Nyman (1%8), Magnusson, Heffler & Nyman (1968), Magnusson & Hemer (1969), and Nystedt (1974) indicate that inferential reliability seems to be a function of ( a ) characteristics of judges, (b) characteristics of the traits being evaluated, ( c )amount of test information, ( d )type of information, and (e) situation variables. The purpose of the present study was to investigate generalizability as the stability, the consensus among judges and the convergent and discriminant validity of ratings based on projective tests.

Generalization of ratings 73

Patients

consider the variables as clinically meaningful, ( b ) the judges should consider the variables possible to judge o n the basis of each of the tests, and (c) the variables were assumed to be low intercorrelated. The variables chosen were Intelligence (I), Ability of Establish Contact (AEC), and Control of Affect and Impulses (CAI). They were assumed to range in the order mentioned from low to high degree of inference from test performance. The following definitions of the variables were used. Intelligence. Verbal ability, perceptual ability, ability to differentiate, organization and integration, productivity and structuring, originality. Ability to Establish Contacr. Ability to establish contact and connection to other people. This variable is dependent upon the understanding of, insight into and identification with other people. Control of Affect and Impulses. The hypothesis is that the individual must have sufficient control, both when it concerns experiencing and expressing his impulses and emotions in the adjustment to reality, and the ability to satisfy his needs. Too little control implies poor adjustment, uncontrolled outbreaks of affect, poor control of anxiety, impulsive acting-out. On the other hand, we have the overcontrolled person, too inhibited to be able t o satisfy his needs or t o utilize his creative capacities. The extremes of each scale were characterized by a series of statements and the midpoint for I and AEC by the statement “Average in relation to the group”. For CAI the midpoint was characterized by the statement “Flexibly controlled in relation to the group.”

The clinical protocols were collected for 38 patients (19 men and 19 women)ataMentalHygieneClinic. Theagesof the 38 patients varied between 18 and 64 years.

Analysis of Data

Table 1 . The experimental design Occasion Judges

1

2

3

4

METHOD Tests Three different types of projective tests were used as a basis for ratings-Rorschach (R), Sentence Completion (S), and TAT (T).

Judges Six professional clinical psychologists were used as judges. The psychologists were trained in advanced Rorschach interpretation and used the instrument in their daily work. They had had little contact with the T A T test and they did not use the Sentence Completion Test in their daily work.

Design The judges rated the same 38 patients on three variables during four occasions. On each occasion the judges based their ratings on different types of test information. The ratings on each variable were made on a scale ranging from 1 to 5 points. The experimental design is shown in Table 1. On each of the occasions 1 to 3, the judges based their ratings on the 38 protocols from only one test. On occasion 4 the judges based their ratings on all three tests. One and a half year later four of the six judges rerated the same subjects on all three tests. The judges were instructed first to read all the protocols before beginning their ratings, in order to get a common frame of reference. The protocols were marked by numbers. The protocofs were presented to the judges in random order for each type of information, but in the same order for each judge when they based their ratings on the same type of information. Each judge was allowed as much time as he wanted. Thejudges had neitheraccess to the test protocols nor to their own ratings from earlier occasions. Each judge was asked not to discuss his ratings with anyone else.

Response Variables The three response variables were chosen and defined in collaboration with the six judges. Three criteria were used in selecting the response variables. ( a ) The judges should

Campbell & Fiske’s (1959) multitrait-multimethod analysis was used to study the convergent and discriminant validity of the judgments. These coefficients were computed as product moment correlations. The coefficients within judge, trait and condition were transformed to Fisher’s Z , and then the averages of these coefficients were computed. The mean Fisher’s 2-coefficients were then retransformed to correlation coefficients. The consensus among judges was computed by means of Ebel’s (195 1) formula for intraclass correlation coefficients. The between-judges variance was included in the error variance. The stability coefficients were computed as product moment correlations between the two rating sessions.

Table 2. Stability coefficients Variables Judge

I

1

.42 .67 .66 .77 .64

2 3 6 Mean

AEC

CAI

.68

.32 .58 .24 .78 .52

.58

.50 .88 .69

Scand. J . Psychol. 16

74 L . N\stc,tlt r t

cil.

Method Variables

R

T

S

R, S , T

1

.90

AEC CAI

.90

.87 .82 .62

.88 .85 .36

.87 .83 .73

.68

RESULTS Stubility

The stability coefficients for the four judges who rated the patients after one and half a years on the basis of RST are reported in Table 2 . The coefficients indicate a very high stability for one judge for all variables. The lower mean stability coefficient for CAI indicate that judgments of that variable are more difficult than judgments of the other two variables.

Consensus The intraclass correlation coefficierrts for each variable rated o n the basis of each test and the three tests together are presented in Table 3. For only one of the variables (CAI) was the consensus among judges higher when the judgments were based on all three tests together, than when they were based on only one test. In all conditions, the consensus was lowest for rated CAI, whereas there were no or very small differences between the two other variables. The results agree, in this respect, with results presented by Howard (1%3). Convergent and discriminant validity

Averages of the convergent and discriminant validity coefficients are presented in Table 4.

A summary of the different coefficients in Table 4 are given in Table 5. An inspection of t h e monovariable-heteromethod coefficients in Tables 4 and 5 indicates that the ratings had low generality. However, the monovariable-heteromethod coefficients were systematically higher than the heterovariable-heteromethod coefficients. This was true for the overall comparison as well as for each variable. The ratings, thus, indicated some convergent validity. The convergent validity was highest for I and lowest for CAI. The average monovariable-heteromethod coefficient was higher than the average heterovariablemonomethod coefficients. The average monovariable-heteromethod coefficient was also higher than the average heterovariable-heteromethod coefficient. The coefficients in Table 4 also show a pattern, which was repeated when the same or different method was used as a basis for ratings. These three aspects of the result support the idea of discriminant validity. Thus, despite the fact that the monovariable-heteromethod coefficients were low the analysis of the convergent and discriminant validity of the ratings indicate that the ratings had some generality.

Method variance

Results from other studies show that formal aspects (verbal fluency, word subtleness, grammatical properties, etc.) of verbal material which are judged by raters of personality variables affect the ratings in a systematic way, consciously or unconsciously (see e.g., Magnusson, 1959, i % l ) . The effect seems to be different for different kinds of verbal material, as for example for different kinds of projective tests, and for different kinds of variables. Such formal

Table 4. The average of the convergent and discriminant validity coefficients I R

Variable Method I

AEC

CAI

R T S R T S R T S

A EC

T

S

.28 .26

.25

.59 .23 .I0

.I8 .44 .02

.09 .09 .27

.02 -.I1 -.04

-.07 .03

.It .05 .I0

Scand. J . Psychol. 16

.04

CAI

R

T

S

R

.24 .I5 - .22 -.i4 -.12

.13 -.I3

-.01

-.18 -.02

-.08

.I5

-.08

.I3

T

.06

S

Getieruliiution of rutings 75 Table 5. Summury of the t~ii~ltit?ieth~)d-t?iiilti-Table 7. ThP u w ruge c w rrelu tioris bet we'e n vurivariuble matrire: averuge correlutions trhles rnted on cuch teht sepurutely und \wriubles ruted on the three tests together (RTS) MonoHeterovariablevariableTotal" R. T, S R T S heteroheteroheteroVariable method method variable I

.26

AEC CAI

Mean

.I7 .II

- .04

.I8

.02

.06

Heterovariablemono-

Method

method

R

.I6 .I1 .I0 .I2

T S

Mean

I A EC CAI

.I2

.06 .02 - .03

.09

Heterovariableheteromethod

.09 .06 .08 .08

.01

.04 .03

.66 .70

caily higher than the heteromethod coefficients for I and AEC, and thus indicate a systematic method effect. A systematic but smaller method effect is indicated in the intercorrelation matrix for AEC and CAI. The coefficients in the intercorrelation matrix for I and CAI d o not show any systematic method effect.

Totalb heteromethod

.03

.46 .4 I .26

.43 .35 .32

.68

The convergence of ratings on each test

" The mean of heterovariable-monomethod and heterovariable-heteromethod coefficien1s. The mean of heterovariable-heteromethod and monovariable-heteromethod coefficients. characteristics may cause spurious correlations between ratings for different variables from a certain projective technique (heterovariable-monomethod ratings) and between ratings for the same variable from different techniques (monovariable-heteromethod ratings) as well as between ratings for different variables from different techniques. Such correlations will thus contain irrelevant method variance. Since the ratings in this study concern personality variables and are based on verbal material from these different sources, method variance can be supposed to influence the ratings. The possible effects of method variance are illustrated in Table 6. The monomethod coefficients were systemati-

und ull tests together

The results of the earlier analysis indicated some convergent and discriminant validity but the convergent validity coefficients weregenerally low. One possible reason for the result is that each test did not elicit the same kind of information about a variable. There is a common opinion among clinicians that they use different kind of information about a person in order to confirm o r modify hypotheses about the person. They use different tests because they feel that these tests will compliment each other. Thus, very high convergent validity for two tests would be the basis for a conclusion to remove one of the tests from the battery used by the clinician. This is in line with Fiske's (1971) opinion that measures from different tests may represent subvariables which may be included in a more global variable. It is then of interest to study how the judges used the information from the different tests when they based their judg-

Table 6. Averuge monomethod and heteromethod coejjicients for each pair of response variables Method AEC

CAI

Variable

Method

R

T

S

I

R

.59*

.23

.to

T

.I8 .09

.44*

.02 .27*

AEC

S R T S

.w

R .02* - .07 .It - .22* -.I3 -.01

T

S

-.11 .03* .05

- .04 .04

-.I4

-.I2 - .02 -.08*

-.

18*

-.08

.lo*

* Indicate monomethod coefficients. Scand. J . Psychol. 16

7 6 L . Nystedt et al.

Table 8. The multiple correlations between ratings on each test and all tests together f o r each variable and corresponding regression coefficients Judge Variables

Tests

1

2

3

4

5

6

1

R T S

.538* ,185 ,233 ,720

.521* ,170 .I20 .673

.470* .261* .219 ,700

.683* .I53 .215* ,846

.614* .I84 .294* .778

.315* .397* .414* .883

R T S

.834* ,030 .195*

.386* .301* .289*

.590* .205 ,240

.674* ,134 .367*

.367* .461* .344*

R T S

350 .907* .I28 -.016

,597 .449* .377* .288*

.610* - .006 .333* ,649 .4%* - ,033 .093 .5 15

.808 .73 I * .323* .126*

,784 .485* .334* .248*

.846 .765* .194* .225*

.871

.70 I

.866

Multiple Corr. AEC

Multiple Corr. CAI

Multiple Corr.

.862

.65 I

* Indicate tests, which contributed to the regression sum of squares at the 10% level

ments on all tests together. In Table 7 are given the product moment correlation coefficients between ratings of the three variables performed on each test and ratings of the three variables performed on the three tests together. Data were further analyzed in the following way. For each judge the multiple correlation between his ratings on each test and his ratings on all tests together was computed separately for each response variable. In Table 8 are shown the regression coefficients and the multiple correlation. According to Tables 7 and 8 the judges have used the tests in a systematically, different way. The coefficients indicate, with two exceptions that the judges paid more attention to the Rorschach test than to the other two tests.

Table 9. The coefficients of correlations between ratings of intelligence and objectively measured intelligence Test Judge

R

S

T

RST

1 2 3 4 5 6

.45** .35* .43** .21 .34* .54**

.44** .29 .16 .I9 .32 .60**

.29 .25 .21 .35* .21 .48**

.49** .34* .53** .I8 .47** .61**

* pS0.05. ** p S 0 . 0 1 . S c a d . J . Psychol. 16

Second attempt RST .61** .49** .49** .54**

Ratings of intelligence and objectively measured intelligence Each patient was tested with a general intelligence test (CVB).The product moment correlation coefficients between objectively measured intelligence and rated intelligence are reported for each judge in Table 9. All coefficients were positive and 18 were signifcantly above zero. The most efficient judgments were based on Rorschach and the three tests together. One of the judges was able to rate intelligence efficiently in all rating occasions. DISCUSSION

The results presented support the hypothesis that inferential reliability of well trained psychologists is a function of characteristics of the traits being evaluated, amount of test information available and type of information available. The average stability coefficients (Table 2) indicated a considerable stability for the judgments. This is noteworthy when one considers the relatively long time span between the two rating sessions (about 18 months). However, the average stability coefficient for ratings of CAI was much lower than the average stability coefficients for the two other rated variables. The same trend was also obtained with regard to consensus among judges (Table 3). This was true with regard to ratings based on each test separately as well as ratings based on the three tests together.

Generalization of ratings

The results obtained indicate that the rating of CAI was a difficult task. It was assumed that this variable, which is more anchored in dynamic personality theory than the other two variables, demands more theoretical knowledge and deeper interpretation. Therefore, already the agreement between judges with respect to the definitions of the variables is probably lower for CAI. For single tests the intraclass coefficients were highest for ratings based on the Rorschach test. This is probably a result of the fact that the psychologists were more trained in the Rorschach technique and used the instrument in their daily work. Possibly the judges made more systematic evaluations and judgments of the Rorschach-protocols and more impressionistic judgments of the other tests. The results from the analysis of the utilization of the different tests (Table 8) support such conclusion. The monovariable-heteromethod coefficients revealed low convergent validity (Tables 4 and S ) , thus supporting results from other studies. The monomethod coefficients were, however, lower than the monovariable coefficients and also lower than monomethod coefficients obtained in some other studies (Goldberg & Werts, 1966). The low convergent coefficients and the results presented in this study about utilization of information from different sources (Tables 7 and 8), support the general opinion held among clinicians that tests give different kinds of information about individuals and thus no high convergent validity could be expected. The tendency to lower convergent validity for rated CAI than for the other variables agree with results from other studies. Mischel (1968) summarized that convergent validity has been higher for ratings of intellectual and cognitive functions than for other personality variables in empirical studies. According to psychoanalytic theory, the differences in convergent validity between the different vanables may be understandable because the functions differ with regard t o autonomy, e.g. freedom from contlict (Hartman, 1958). The intellectual functions are less dependent of conflicts, more autonomous, whereas CAI is closely related to conflicts. These conflicts may be actualized in different degrees in different test situations. The indications of a method effect seem reasonable and are in accordance with results obtained in other studies (see Campbell & O’Connel, 1967). Formal aspects of an individual’s verbal productivity in projective test situations are likely t o be more

77

directly and similarly related to such personality characteristics as intelligence and ability to establish contact than to control of affects and impulses. The study was supported by a grant to Lars Nystedt from the Swedish Council for Social Science Research and to David Magnusson from the Bank of Sweden Tercentenary Fund.

REFERENCES Campbell, D. T. & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol. Bull. 56, 81-105. Campbell, D. T. & O’Connel, E. J. (1967). Method factors in multitrait-multimethod matrices: Multiplicative rather than additive? Mulriv. Behav. Res. 2 , 409-425. Cronbach, L. J . , Rajaratnam, N. & Gleser, G. D. (1963). Theory of generalizability: A liberalization of reliability theory. Brit. J. Statist. Psychol. 16, 137-163. Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychomerrika 16. 407424.

Fiske, D. W. (1971). Strategies in the search for personality constructs. J. Exp. Res. Pers. 5 , 323-330. Goldberg, L. R. (1968). Simple models or simple processes? Some research on clinical judgments. Amer. Psychologist 23, 483-496.

Goldberg, L. R. & Werts, C. E. (1966). The reliability of clinicians’ judgments: A multitrait-multimethod approach. J . Consult. Psychol. 30, 199-206. Hartman, H. (1958). Ego psychology and the problem of adaptation. New York: International Universities Press. Holt, R. R. (1958). Clinical and statistical prediction: A reformulation and some new data. J. Abnorm. SOC. Psychol. 56, 1-12.

Holt, R. R . (1970). Yet another look at clinical and statistical prediction: Or is clinical psychology worth-while? Amer. Psychologist 25, 337-349.

Howard, K. I. (1963). Ratings of projective test protocols as a function of degree of inference. Educ. Psychol. Measmt. 23, 267-275.

Korman, A. K. (I%@. The prediction of managerial performance: A review. Personnel Psychol. 21, 295-322. Magnusson, D. (1959). A study of ratings based on TAT. Stockholm: Almqvist & Wiksell. Magnusson, D. (1961). The influence of verbal productivity on the scoring of an achievement test. [Rep. Psychol. Lab., Univ. Stockholm, N o . 97.1 MagnusG ’ f a ,”Gerztn, M. & Nyman, B. (1968). The g e n e x t y of behavioral data I: Generalization from observations on one occasion. Multiv. Behav. Res. 3 , 295-320. Magnusson, D. & Heffler, B. (1969). The generality of behavioral data 111: Generalization potential as a function of the number of information instances. Multiv. Behav. Res. 4 , 2 9 4 2 . Seand. J . Psyehol. 16

78 L . N y ~ l e d c’t t 01. Magnusson. D., Heffler, 9. & Nyman, B. (1968). The generality of behavioral data 11: Replication of an experiment on generalization from observations on one occasion. Multiv. Brhuv. Res. 3 , 415422. Meehl, P. E. (1954). Clinical versus statistical prediction. A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press. Mischel. W. (1968). Personality and assessment. New York: Wiley. Nystedt, L. (1974). Consensus among judges as afunction of amount of information. Educ. Ps.vcltol. Mrusrnt. 34. 91-101. Sawyer, J . (1966). Measurement and prediction. clinical and statistical. Psycho/. B u / / .66, 178-200. PostuI rrrklreJs:

L. Nystedt Psychological Laboratories University of Stockholm Box 6706 S-I 1385 Stockholm Sweden

Scand. J . Psychol. 16

Generalization of ratings based on projective tests.

Scand. J . Psychol.. 1975, 16, 72-78 Generalization of ratings based on projective tests LARS NYSTEDT DAVID MAGNUSSON EVA ARONOWITSCH Absrracf.-A mu...
462KB Sizes 0 Downloads 0 Views