PsychologicalReports, 1990, 67, 511-514.

O Psychological Reports 1990

UPDATE O N T H E PARALLEL ANALYSIS CRITERION FOR DETERMINING T H E NUMBER O F PRINCIPAL COMPONENTS ' A. B. SILVERSTEIN MRRC-Lanterman

Developmental Center Research Group, UCLA School of Medicine

Summary.-Recent developments in parallel analysis with unities in the diagonal are reviewed, and the application of the parallel analysis criterion is illustrated with three examples. It is shown that the results of various approaches do not always agree. Investigators are encouraged to employ the parallel analysis criterion, along with one or more other criteria, in deciding on the number of principal components.

In a previous note I gave a brief history of parallel analysis and illustrated the application of the parallel analysis criterion with three examples (Silverstein, 1987). My purpose was to encourage investigators to employ t h s as one criterion for determining the number of common factors or principal components. The historical section of that note ended with the development by Allen and Hubbard (1986) of a set of regression equations for predicting the latent roots of random data matrices with unities in the diagonal. With their equations, investigators who selected the principal component model could quite easily apply the parallel analysis criterion to their data. In the past year or so, however, there have been several further developments, and so an update seems called for. To place these recent developments in context, two features of Allen and Hubbard's equations should be noted. First, like the earlier equations of Montanelli and Humphreys (1976), who preferred the common factor model, AUen and Hubbard's equations actually predict the means of the latent roots of random data matrices, obtained from a Monte Carlo study. Second, unlike MontanelIi and Humphreys' equations, which involve only the number of variables and the sample size, Allen and Hubbard's equations include, for every root beyond the first, a term involving the immediately preceding root. They are, in that sense, recursive. Longman, Cota, Holden, and Fekken (1989b) have developed an alternative set of regression equations that are simpler than Allen and Hubbard's and that yielded more accurate results. Specifically, their equations include, in addition to the terms involving the number of variables and the sample size, a term involving both of these variables, a product or interaction term. What may prove to be a more important feature of their equations, however, is that they are not recursive. Longman, et al. also introduced a new wrinkle. By chance alone, a latent root extracted from real data will be larger than the 'This study was su ported in part by National Institute of C u d Healch and Human Development ResearcE Grant No. HD-04612. Requests for reprints should be sent to A. B. Silverstein, P O Box 100-R, Pomona, CA 91769.

512

A. B. SILVERSTEIN

mean of the corresponding roots extracted from random data about 50% of the time. To counteract this dependence on chance, Longman, et a2. also developed regression equations for predicting the 95th percentiles of the roots of random data matrices with unities in the diagonal. They presented regression coefficients (for both means and 95th percentiles) for the first 33 roots based on a Monte Carlo study using from 5 to 50 variables and sample sizes ranging from 50 to 500-but expressed the opinion that the 95th percentile approach has more potential. Lautenschlager, Lance, and Flaherty (1989) took a different tack. By including a term involving the immediately preceding root in their equations, Allen and Hubbard had obtained excellent results for all but the first root, but in practice-precisely because their equations are recursive-the accuracy of the results for subsequent roots is necessarily contingent on the accuracy of the results for the first root. By augmenting Allen and Hubbard's equations with a term involving both the number of variables and the sample size, the reciprocal of the subjects-to-variables ratio (cf. Longman, et al., 1989b), Lautenschlager, et a/. obtained more accurate results not only for the first latent root but also for several subsequent roots. Note, however, that their equations are still recursive. They presented regression coefficients for the first 48 roots based on a Monte Carlo study using from 5 to 50 variables and sample sizes ranging from 50 to 1,000. But the most recent development may just make regression estimation methods, or at least those involving recursive equations, obsolete. Lautenschlager (1989) observed that for certain combinations of numbers of variables and sample sizes, both Allen and Hubbard's equations and those of Lautenschlager, et al. produce unreasonable results, for example, latent roots that increase after a point. [Longman, et al. (1989b) also noted that in some cases M e n and Hubbard's equations yield anomalous results.] Lautenschlager then proceeded to offer another option for applying the parallel analysis criterion. Specifically, he presented tables giving the mean roots from principal component analyses of 17,000 random data matrices using from 5 to 80 variables and sample sizes ranging from 50 to 2,000-in essence, the raw data from which Lautenschlager, et al. developed their equations-and demonstrated that simple linear interpolation in those tables yielded more accurate results than either Allen and Hubbard's equations or those of Lautenschlager, et al. A word is in order about the ways in which the relative accuracy of the various approaches was assessed in these studies: (a) by comparing the correlations between the roots predicted by the different approaches and the mean roots actually obtained from a Monte Carlo study, and/or (b) by evaluating the deviations of the predicted roots from the obtained roots. Given that the purpose is to provide a criterion for deciding on the number of principal

513

PARALLEL ANALYSIS CRITERION

components, however, would it not be a good idea to determine how well the various approaches do in indicating the number of components in data sets of known dimensionality (i.e., plasmodes)?

THREEEXAMPLES To illustrate further the application of the parallel analysis criterion for determining the number of principal components, I have again chosen the three examples employed by Gorsuch (1983) throughout his text on factor analysis: the box problem, the ability problem, and the physique problem. My purpose is not to assess the relative accuracy of the various approaches but simply to show that in practice their results do not always agree. THREEE-LES Problem

Root

OF

TABLE 1 LATENT ROOTSFORREAL

Real 1

Box

1 2

Ability

1 2 3 4 5

Physique

1

2

AND

RANDOM DATA

Random 3

4

5

2 3 Note.-Real roots taken from Table 8.3.1 in Gorsuch (1983). Random roots obtained as follows: (1) calculated bv means of Equation 1 in Allen and Hubbard (1986): (2)and (3) calculated by means of ~ ~ u a t i o3 n i n Ldn man, et ol. (1989b3, using coefficients for means and 95th percentles, r e r l i u e l y ; (4) ,calcu!ated by means of Equation 2 in Lautenschlager, et a1. (1989); (5) calculate uslng h e a r ~nterpolationin Tables 2-5 in Lautenschlager (1989).

Table 1 gives the first several latent roots for the real and random data of the three problems, the former taken directly from Gorsuch and the latter calculated by means of the equations developed by Allen and Hubbard, Longman, et al., and Lautenschlager, et al., and by using linear interpolation in Lautenschlager's tables. For both the box problem and the physique problem the results of the various approaches are in complete agreement: d five indicate one component for the former, which clearly underestimates the dimensionality of the boxes, and two components for the latter, which some have argued is the appropriate number of dimensions for describing physique (Gorsuch, 1983, p. 13). For the ability problem the results of the various approaches disagree to some extent, and so are more interesting. Both Allen and Hubbard's equations and those of Longman, et al. for predicting means suggest three components but do not unequivocally reject four, since the roots for the real and random data are equal. The equations of Longman, et

514

A. B. SILVERSTEIN

al. for predicting 95th percentiles, which are necessarily "more conservative" than their equations for predicting means, indicate only two (or just possibly three) components. I n contrast, both the equations of Lautenschlager, et al. and interpolation in Lautenschlager's tables indicate four components, the number of dimensions generally accepted for the ability problem. As suggested earlier, these very limited results should be taken neither as an endorsement nor an indictment of any of the approaches. Lautenschlager advised investigators against using "current recursive regression estimation methods" (uir.,Allen and Hubbard's and those of Lautenschlager, et al.), but is it not possible that future recursive methods or current nonrecursive methods (e.g., that of Longman, et al.) would fare better than the method that Lautenschlager now espouses? This is just one of a number of questions that calls for further research. Until more definitive answers are forthcoming, I would still encourage investigators who select the principal component model to apply the parallel analysis criterion, probably by using the equations of Longman, et al. and/or Lautenschlager's table^,^ but they would be well advised to employ one or more additional criteria as well. REFERENCES R. (1986) Regression equations for the latent roots of random data ALLEN, S. J., & HUBBARD, correlation matrices with unities on the diagonal. Multivariate Behavioral Research, 21, 393-398. GORSVCH, R. L. (1983) Factor analysis. (2nd ed.) Hillsdale, NJ: Erlbaum. HAYS, R. D. (1987) PARALLEL: a program for performing parallel analysis. Applied Psychological Measurement, 11, 58. LAUTENSCHLAGER, G . J. (1987) PARANAL: a program for estimating parallel analysis criteria for principal components analysis. (Author, University of Georgia) LAUTENSCHLAGER, G . J. (1989) A comparison of alternatives to conducting Monte Carlo analyses for determining parallel analysis criteria. Multivariate Behavioral Research, 24, 365-395. JAUTENSCHLAGER, G. J., LANCE,C. E., & FLAHERTY, V. L. (1989) Parallel anal sis criteria: revised equations for estimating the latent roots of random data correition matrices. Educational and Psychological Measurement, 49, 339-345. LONGMAN, R. S., COTA,A. A , , HOLDEN,R. R., & FEKKEN,G. C. (1989a) PAM: a double-precision FORTRAN routine for the parallel analysis method in principal components analysis. Behavior Research Methods, Instruments, €+ Computers, 21, 477-480. LONGMAN, R. S., COTA,A. A., HOLDEN,R. R., & FEKKEN,G. C. (1989b) A regression equation for the parallel analysis criterion in principal components analysis: means and 95th percentile eigenvalues. Multivariate Behavioral Research, 24, 59-69. MONTANELLI, R. G., JR., & HVMPHREYS,L. G. (1976) Latent roots of random data correlation matrices with squared multiple correlauons on the hagonal: a Monte Carlo study. Psychometrika, 41, 341-348. SILVEXSTEIN,A. B. (1987) Note on the parallel analysis criterion for determining the number of common factors or principal components. Psychological Reports, 61, 351-354. Accepted August 29, 1990. 'Computer programs are now available for the various regression estimation methods: PARALLEL for Allen and Hubbard's (Hays, 1987); PAM for those of Longman, et aI. (Longman, Cota, Holden, & Fekken, 1989a); and PARANAL for those of Lautenschlager, et al. (Lautenschlager, 1987). Copies of Lautenschlager's tables are also available on diskette directly from him (Laurenschlager, 1989).

Update on the parallel analysis criterion for determining the number of principal components.

Recent developments in parallel analysis with unities in the diagonal are reviewed, and the application of the parallel analysis criterion is illustra...
167KB Sizes 0 Downloads 0 Views