Behav Res DOI 10.3758/s13428-014-0499-2

On the number of factors to retain in exploratory factor analysis for ordered categorical data Yanyun Yang & Yan Xia

# Psychonomic Society, Inc. 2014

Abstract Conducting exploratory factor analysis (EFA) using statistical extraction methods has been recommended, but little is known about the accuracy of the decisions regarding the number of factors to retain for ordered categorical item data by considering a chi-square test, fit indices, and conventional criteria, such as eigenvalue >1 and parallel analysis. With computer-generated data, the authors examined the accuracy of decisions regarding the number of factors to retain for categorical item data, by combining these pieces of information using weighted least-square with mean and variance adjustment estimation methods based on polychoric correlations. A chi-square difference test was also conducted to compare nested EFA models. The results showed that the eigenvalue >1 criterion resulted in too many factors, in general. The chi-square test, chi-square difference test, fit indices, and parallel analysis performed reasonably well when the number of scale points was four, the number of items was 24, the sample size was at least 200, and the categorical distributions were similar across items. However, parallel analysis had a tendency toward factor underextraction when the correlation among factors was .50, particularly for twopoint and 12-item scales. Keywords Exploratory factor analysis . Polychoric correlations . Eigenvalue >1 . Parallel analysis . Chi-square test . Fit indices Exploratory factor analysis (EFA) is a commonly used tool to understand the factor structure underlying a set of measured Y. Yang : Y. Xia Florida State University, Tallahassee, FL, USA Y. Yang (*) Educational Psychology and Learning Systems, Florida State University, Box 306-4453, Tallahassee, FL 32306-4453, USA e-mail: [email protected]

variables. As the name implies, EFA does not require an a priori hypothesis about which measured variables are good indicators of which factor. Instead, a data-driven approach is taken to determine a factor structure that not only best reproduces the interrelationship among the measured variables (Gorsuch, 1983) but also provides a meaningful interpretation of the latent factors. It is ideal that a measured variable loads mainly on the targeted factor(s), to give a clear factor structure with a factor complexity of one. However, it is very common that a measured variable also has nonzero loading(s) on untargeted factor(s). Instead of fixing the nonzero crossloadings to zero, as in a typical confirmatory factor analysis (CFA), retaining nonzero cross-loadings in the measurement model is more reasonable. Both simulation studies and empirical data analyses have shown that under these circumstances, fixing the nonzero small cross-loadings to zero using CFA models results in distorted structural relations and overestimates of factor correlations (Asparouhov & Muthén, 2009; Marsh, Liem, Martin, Morin, & Nagengast, 2011; Marsh et al., 2009). Browne (2001) suggested that when a restrictive CFA model demonstrates a poor fit, instead of relying on modification indices to improve the model-data fit in the post-hoc analysis, an exploratory factor analysis with a proper rotation method is preferable. A similar recommendation was also made by Gorsuch (1997, p.536), that “EFA is an appropriate alternative to attempting to adjust the confirmatory model” when the CFA model fails to provide adequate fit. In applied research, EFA has mainly been used for socalled less consequential purposes (Conway & Huffcutt, 2003). Preliminarily evaluating or searching for scale dimensionality involves such a use. In a review of empirical EFA studies in organizational research, Conway and Huffcut (2003) found that about 78 % of the studies used EFA for less consequential purposes. In addition, 62 % of the studies used nonstatistical extraction methods (Kaplan, 2009), such as principal component analysis and principal axis factoring. A

Behav Res

greater percentage, 78 %, of empirical EFA studies using nonstatistical extraction methods was found in psychological research (Henson & Roberts, 2006). For nonstatistical extraction methods, several criteria are used to help make the decision pertaining to the number of factors to retain, including the eigenvalue-greater-than-one rule (eigenvalue >1; Kaiser, 1960), confidence intervals for eigenvalues (Larsen & Warne, 2010), the scree test (Cattell, 1966), parallel analysis (Horn, 1965), minimum average partial correlations (Velicer, 1976), percentages of the total variance explained, extracted communality, residual matrices, and substantive interpretation of the factor structure; some criteria are used more often than others. A review of empirical EFA studies revealed that the criterion of eigenvalue >1 was overly relied on, with about 19 % of EFA studies in psychological research using an eigenvalue >1 as the sole criterion (Fabrigar, Wegener, MacCallum, & Strahan, 1999), and similar percentages have been reported in reviews of organizational research, with 22 % in 1986 (Ford, MacCallum, & Tait, 1986) and 18 % in 2003 (Conway & Huffcutt, 2003). EFA can also be used for more consequential purposes (Conway & Huffcutt, 2003): Developing a hypothesized model (and then testing the model with CFA using a new random sample), refining the instrument’s scales, and testing a hypothesis (e.g., examining whether a scale demonstrates the same factor structure across subpopulations or measurement occasions) are some examples (Conway & Huffcutt, 2003). Conducting an unrestrictive EFA model as the first step in the four-step procedure to assess the fit of a full structural model (Mulaik & Millsap, 2000) is another example. For these purposes, using statistical extraction methods (e.g., maximum likelihood) is preferred, so that various fit indices and standard errors of parameter estimates (e.g., factor loadings, factor correlations) can be used to assess the quality of the model. The more consequential uses of EFA have been reported less frequently in the published research. In the review by Conway and Huffcutt, developing a new scale and hypothesis testing constituted only 7.3 % and 4.3 %, respectively, of the empirical EFA studies; in addition, only 3.4 % of the studies used statistical extraction method, and the percentage was zero in psychological research (Fabrigar et al., 1999). One reason may explain such rare use of statistical extraction methods in EFA. On the one hand, the rapid methodological development in CFA and the implementation of these methodologies in commercial statistical software promote the use of CFA over EFA. A major advantage of conducting CFA is the availability of fit indices and standard errors of the parameter estimates. However, as we discussed earlier, EFA is more appropriate than CFA under some circumstances. On the

other hand, methodological development in EFA appears relatively slow, and the implementation of these methodologies in commercial statistical software is limited. For example, although methods to obtain standard errors for rotated solutions have been developed (Jennrich, 2002; Jennrich & Sampson, 1966; see also Asparouhov & Muthén, 2009), they are not available in commonly used statistical programs (e.g., SPSS). During the last decade, statistical extraction methods for conducting EFA have been implemented in the well-known program Mplus (L.K. Muthén & Muthén, 1998–2012) and in the program Comprehensive Exploratory Factor Analysis (CEFA; Browne, Cudeck, Tateneni, & Mels, 2008). More recently, Asparouhov and Muthén (2009) proposed an exploratory structural equation modeling (ESEM) framework allowing for EFA measurement components in a larger structural equation model (SEM). With these new methodological developments and the accessibility of these methods in statistical software, using statistical extraction methods to conduct EFA should receive greater attention in empirical EFA studies (Schmitt, 2011). An important decision to make when conducting an EFA is the number of factors to retain. With statistical methods, one can make this decision by examining not only the commonly used criteria, such as eigenvalue >1 and parallel analysis, but also a chi-square test and fit indices (note that sample eigenvalues, the chi-square statistic, and fit indices are provided in Mplus when conducting EFA). Numerous studies have shown that eigenvalue >1 yields too many factors and that parallel analysis gives the most accurate number of factors to retain, and thus that parallel analysis should be promoted (e.g., Dinno, 2009; Glorfeld, 1995; Schmitt, 2011; Zwick & Velicer, 1986). The majority of these studies have used normally distributed continuous data. Only a few studies considered nonnormal distributed or categorical data. Dinno (2009) and Glorfeld (1995) concluded that parallel analysis is robust to nonnormal distributions of the random data. Timmerman and Lorenzo-Seva (2011) demonstrated that when using the minimum rank factor analysis extraction method, polychoric correlations are preferred to Pearson correlations for parallel analysis when the item data are ordered categorical. Weng and Cheng (2005) found that a parallel analysis yielded quite accurate decisions regarding the number of factors retained for binary data, but their study considered only a one-factor model. Cho, Li, and Bandalos (2009) evaluated the accuracy of parallel analysis for ordinal item data. They showed that parallel analysis based on polychoric correlation tended to underestimate the number of factors when the correlation among the factors was large. On the other hand, the performance and appropriateness of the chi-square test and of fit indices in model evaluation has been well-studied in CFA (e.g., Hu & Bentler, 1999), but very rare in EFA, and no study

Behav Res

that we are aware of has considered combining evidence from a chi-square test, fit indices, and the commonly used criteria (e.g., eigenvalue >1, parallel analysis). Asparouhov and Muthén (2009) and Marsh et al. (2011; Marsh et al., 2009) demonstrated applications of EFA within the ESEM framework using empirical data. Models were evaluated and compared on the basis of a chi-square test, the comparative fit index (CFI), root mean square error of approximation (RMSEA), chi-square difference test, the Akaike information criterion (AIC), and Bayesian information criterion (BIC). With computergenerated data, Asparouhov and Muthén also examined the performance of the chi-square test, CFI, RMSEA, SRMR, and chi-square difference test in evaluating EFA models. They indicated that when fit indices are used in conjunction with a chi-square test, one can avoid overrejecting the true EFA models. However, only continuously distributed data were considered. Marsh et al. (2009) argued that the chi-square test and fit indices in EFA may perform differently from those in CFA or SEM, because the number of parameters estimated in EFA models tends to be large, particularly when competing EFA models are compared (e.g., two-factor vs. threefactor models). For example, for a 24-item measurement consisting of continuous data, the numbers of freely estimated model parameters for one-factor, two-factor, and three-factor models are 72, 95, and 117, respectively. Consequently, the differences in the numbers of estimated parameters are 23 between one-factor and two-factor models and 22 between two-factor and three-factor models. To support the promotion of EFA/ESEM in modeling, we examined the accuracy of decision making about the number of factors to retain in EFA models by considering the chisquare test, fit indices, eigenvalue >1, and parallel analysis for ordered categorical items. We also conducted the chi-square difference test to evaluate competing EFA models that differed in their numbers of factors. In this article, we focus only on ordered categorical item data for two reasons. First, categorical measured variables (e.g., five-point Likert-type scale) are frequently used in survey research in psychology and education disciplines. Second, no simulation study has been conducted to examine accuracy in determining the number of factors to retain by combining the chi-square test, fit indices, chi-square difference test, eigenvalue >1, and parallel analysis for ordered categorical item data. It has been well-documented that Pearson correlations computed from ordered categorical data underestimate the relationship among the measured variables, due to the loss of information caused by categorization (e.g., Bollen & Barb, 1981). Unfortunately, the most commonly used statistical software, SPSS, uses Pearson correlations to conduct EFA for categorical item data. Many researchers have recommended the use of polychoric

correlations (tetrachoric correlations for two-point scales) in factor analysis for categorical item data (e.g., Finney & DiStefano, 2006). Holgado-Tello, Chacón-Moscoso, Barbero-García, and Abad (2010) conducted a simulation study to examine the superiority of analyzing the polychoric correlation matrix relative to the Pearson correlation matrix in factor analysis for categorical items. They found that analyses based on the polychoric correlation matrix were more likely to recover the true model. Cho et al. (2009) conducted a parallel analysis based on the polychoric and Pearson correlations. They found that parallel analysis based on the polychoric and Pearson correlations performed similarly when the eigenvalues were compared to those from random data sets with the same type of correlations (i.e., polychoric correlations vs. random polychoric correlations, or Pearson correlations vs. random Pearson correlations). In this article, we investigated EFA models based on the polychoric correlation matrix with the weighted least-square with mean and variance adjustment method (WLSMV; B.O. Muthén, 1984), also called the diago n a l l y w e i g h t e d l e a s t s q u a re s m e t h o d ( D W L S ; Christoffersson, 1977). Below we describe in detail the design factors as well as the rationale for the conditions manipulated in the simulation study.

Method To decide the levels of each design factor for data generation, we considered mainly two of the most recent review articles (Conway & Huffcutt, 2003; Henson & Roberts, 2006) summarizing empirical EFA studies in social and behavioral research. Levels of design factors were chosen so that they would represent typical scenarios encountered in empirical studies. Design factors for data generation EFA model In a review of empirical EFA studies in psychological research (Henson & Roberts, 2006), the median number of extracted factors was three (ranging from one to seven), and the median number of variables factored was 20 (ranging from 5 to 110). Thus, we chose two models, both measuring three factors for data generation. Model 1 contained 12 items, and Model 2 consisted of 24 items. This resulted in the ratios of the number of variables to the number of factors of four and eight; 75 % of the empirical EFA studies reviewed in Conway and Huffcutt (2003) were within this range. For both models, continuous item data with multivariate normal distributions were generated according to the factoranalytic model, such that X

¼ ∧ϕ∧0 þ Ψ;

Behav Res

where Σ is the covariance matrix among continuous items, Λ denotes the factor-loading vector, Φ indicates the covariance matrix among factors, and Ψ is the covariance matrix among residuals. The loading vector Λ was defined as 2

:7 Model 2 : ∧ 4 0 :3 0

:7 0 :3

:7 0 0

:7 :7 0 0 0 0

:7 :7 0 :3 0 0

:7 :3 :3 :7 0 0

:3 :7 0

0 :7 0

Three levels of covariance of the factors were considered. The three covariance matrices among the factors Φ for both models were defined as 2

3 2 3 2 3 1 0 0 1 :25 :25 1 :5 :5 Φ1 ¼ 4 0 1 0 5; Φ2 ¼ 4 :25 1 :25 5; and Φ3 ¼ 4 :5 1 :5 5 0 0 1 :25 :25 1 :5 :5 1

2

:7 Model 1 : ∧ 4 0 :3 0

0 :7 0

0 :7 0

0 :7 0

:7 0 0

0 0 :7 :7 :3 :3

:7 0 0

:7 :3 0

0 0 :3 :3 :7 :7

:3 :7 0

0 :7 0

0 :0 :7

0 :7 0

0 0 :7

0 0 :7

0 0 :7

:3 0 :7

3 :3 0 5 :7

¼ ½:21; :51; :51; :21; :21; :51; :51; :21; :21; :51; :51; :21Š



scale. Scales with five or more categories were not included in this study because numerous studies (e.g., Green et al., 1997; Finney & DiStefano, 2006) have shown that it is less problematic to treat ordered categorical data as if they are continuous when the number of scale points is at least five. Thresholds (τ) After continuous data with desired interitem correlations were generated, thresholds (τ) were applied to the continuous data to create ordered categorical data. For the two-point scales, three sets of thresholds were chosen to represent different scenarios regarding the similarity of thresholds across items: & &

Number of scale points (c) We considered two types of response category: a two-point scale and a four-point

0 0 :7

Model 1 : diag ðΨ Þ

:21; :21; :51; :51; :51; :51; :21; :21; :21; :21; :51; :51; :51; :51; :21; :21; :21; :21; Model 2 : diagðΨ Þ ¼ :51; :51; :51; :51; :21; :21

Sample size (n) For each data generation model, two sample sizes were considered: For Model 1, with 12 items, n= 100 and 200; for Model 2, with 24 items, n= 200 and 400. As a result, the ratios of sample size to the number of measured variables factored were 8.33 (100/12= 200/24= 8.33) and 16.67 (200/12= 400/24= 16.67). The median ratio of 11 (Henson & Roberts, 2006) was within this range.

0 0 :7

3 3 0 5 :7

0 :3 :7

The residual covariance matrix had all nondiagonal elements of zero. The diagonal elements of the matrix were



In other words, for the model with 12 items, items 1–4, 5–8, and 9–12 primarily measured the first, the second, and the third factor, respectively, with loadings of .70. In addition, the following items had cross-loading of .30 on other factors: Items 5 and 12 on the first factor, Items 4 and 9 on the second factor, and Items 1 and 8 on the third factor. Values of .70 and .30 were chosen to represent the typical cutoffs used to determine important and trivial factor loadings in EFA. For the model with 24 items, the pattern of factor loadings was the same, but the number of items associated with each factor was doubled. The variance of each factor was 1, and the correlation among the three factors, ρFF', was 0, .25, or .50, representing independent factors, medium-correlated factors, or highly correlated factors in EFA. The residual variances were chosen such that the variance of each item was 1. Both factors and residuals followed normal distributions. For simplicity and without loss of generality, the mean of each variable was fixed at zero.

0 :7 :3

&

Symmetric thresholds: τ= {0} for all items, yielding symmetric distributions with probabilities of [.50, .50] for the two categories. Asymmetric thresholds: τ= {1} for all items yielding positively skewed distributions with probabilities of [.84, .16] for the two categories. Mixed thresholds: for Model 1, τ= l{–1, –1, 1, 1, –1, –1, 1, 1, –1, –1, 1, 1}. The distributions of the categorical data for Items 1–2, 5–6, and 9–10 were skewed in the opposite direction from the distributions of the other items. For Model 2, such a pattern was repeated once, such that Items 1–4, 9–12, and 17–20 were skewed in the opposite direction from the distributions of the other items.

Behav Res

For four-point scales, we also considered three sets of thresholds generating different types of categorical distributions: & & &

Symmetric thresholds: τ= {–1 0 1} for all items, yielding symmetric distributions with probabilities of [.16, .34, .34, .16] for the four categories. Asymmetric thresholds: τ= {0 0.75 1.5} for all items, yielding positively skewed distributions with probabilities of [.50, .27, .16, .07] for the four categories. Mixed thresholds: For Model 1, τ= {–1.5 –0.75 0} for Items 1–2, 5–6, and 9–10 and τ= {0 0.75 1.5} for all the other items, such that the distributions of one set of items were skewed in the opposite direction from the distributions of the other set of items. The probabilities for the four categories yielded from τ= {–1.5 –0.75 0} and τ= {0 0.75 1.5} were [.07, .16, .27, .50] and [.50, .27, .16, .07], respectively. For Model 2, the distributions of the categorical data for Items 1–4, 9–12, and 17–20 were skewed in the opposite direction from the distributions of the other items.

The total number of conditions for data generation was 72 (2 models × 3 levels of factor correlations × 2 sample sizes × 2 types of scale point × 3 sets of thresholds). For each condition, 2,000 data sets were generated.

across 2,000 data sets were computed for each condition. Then, the eigenvalue >1 rule was used to determine the number of retained factors. To conduct parallel analysis, for each data set 200 random samples with continuous variables, the desired sample size (e.g., 200), number of variables (e.g., 12), and zero population correlations among the variables were generated; this procedure resulted in 400,000 (i.e., 2,000 × 200) random samples for each condition. Then the thresholds (as specified in the earlier section) were applied to create categorical data. Next, eigenvalues were obtained on the basis of the polychoric correlation matrix for each random sample. The median of the eigenvalues across the 200 random samples was then compared to the eigenvalues from each of the data sets in the corresponding condition in the simulation study. The eigenvalue from the sample data set was marked when it was the first time that the value was smaller than the corresponding eigenvalue from the random samples. The number of eigenvalues that was greater than the marked eigenvalue determined the number of retained factors. Analysis of outcome variables For each condition, the EFA models were evaluated by computing the following indices in SAS: & &

Data analysis & Each generated data set was analyzed with three models: a two-factor model (underspecified), a three-factor model (correctly specified), and a four-factor model (overspecified). All models were examined on the basis of the polychoric correlation matrix with the WLSMV estimation method using Mplus 7.0 (Muthén & Muthén, 1998–2012). The total number of conditions for data analysis was 216 (72 conditions for data generation × 3 EFA models). To evaluate the competing EFA models using the chi-square difference test, we used DIFFTEST, which is available in Mplus 7.0. Because the DIFFTEST can only be conducted for a single data set and is not available in conjunction with the Monte Carlo procedure in Mplus 7.0, we generated the data in SAS, and then the 2,000 replications of EFA analyses were achieved by having SAS 9.2 call DOS to run Mplus (Gagné & Furlow, 2009). Because model parameters were not of interest in this study, all EFA models were conducted using the default Geomin rotation method. For each of the generated data sets, we obtained eigenvalues and the 95 % confidence intervals (CIs) of each eigenvalue based on the polychoric correlation matrix. The CIs of eigenvalues were estimated using the mathematical method provided in Larsen and Warne (2010). The means of the eigenvalues and the means of the corresponding 95 % CIs

& &

The means and 95 % CIs for the first five eigenvalues The numbers of factors retained on the basis of the eigenvalue >1 rule and parallel analysis Model rejection rates based on the chi-square test with the nominal alpha level of .05 Rejection rates based on the chi-square difference test for nested models (e.g., two-factor model vs. three-factor model), with the nominal alpha level of .05 The means of fit indices for each model and the mean differences in fit indices for nested models

Results Factor retention based on eigenvalue >1 and parallel analysis Tables 1 and 2 present the means of the first five eigenvalues (λ1 to λ5), based on the polychoric correlation matrix and the means of the corresponding 95 % CIs across 2,000 data sets for each condition, with 12 items and 24 items, respectively. Table 3 shows the detailed results of factor retention based on eigenvalue >1. As can be summarized from these three tables, the means of the first three eigenvalues were greater than 1 for all conditions, and the means of the corresponding 95 % CIs did not cover the value of 1. For the analysis with 12 items, eigenvalue >1 accurately

Behav Res Table 1 Means [with 95 % confidence intervals] of the first five eigenvalues for 12 items c

n

τ

λ1

λ2

λ3

λ4

λ5

ρFF' = 0 2

100

2

200

4

100

4

200

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

4.00[2.89,5.11] 4.07[2.94,5.20] 3.47[2.51,4.44] 3.92[3.16,4.69] 3.91[3.15,4.68] 3.68[2.96,4.40] 3.95[2.86,5.05] 3.96[2.86,5.06] 3.96[2.86,5.06] 3.89[3.13,4.66] 3.90[3.13,4.66] 3.90[3.14,4.67]

2.66[1.92,3.39] 2.81[2.03,3.59] 2.45[1.77,3.13] 2.56[2.06,3.06] 2.72[2.19,3.25] 2.50[2.01,2.99] 2.59[1.87,3.31] 2.61[1.89,3.34] 2.62[1.89,3.34] 2.51[2.02,3.00] 2.53[2.04,3.03] 2.53[2.03,3.02]

2.06[1.49,2.63] 2.05[1.48,2.62] 1.87[1.35,2.39] 2.11[1.70,2.53] 2.11[1.70,2.52] 1.99[1.60,2.39] 2.08[1.50,2.65] 2.07[1.50,2.64] 2.09[1.51,2.66] 2.14[1.72,2.56] 2.13[1.71,2.55] 2.14[1.72,2.56]

0.87[0.63,1.11] 1.10[0.80,1.41] 1.30[0.94,1.65] 0.75[0.60,0.90] 0.93[0.75,1.11] 1.06[0.85,1.27] 0.71[0.52,0.91] 0.75[0.54,0.96] 0.76[0.55,0.98] 0.64[0.52,0.77] 0.67[0.54,0.80] 0.68[0.55,0.81]

0.68[0.49,0.87] 0.80[0.58,1.03] 0.99[0.72,1.27] 0.62[0.50,0.74] 0.72[0.58,0.86] 0.82[0.66,0.98] 0.59[0.43,0.75] 0.61[0.44,0.78] 0.62[0.45,0.79] 0.56[0.45,0.66] 0.57[0.46,0.68] 0.58[0.46,0.69]

ρFF' = .25 2

100

Sym Asy Mix

5.22[3.77,6.67] 5.17[3.73,6.60] 4.52[3.27,5.77]

2.10[1.52,2.69] 2.36[1.70,3.01] 2.04[1.47,2.60]

1.58[1.15,2.02] 1.65[1.20,2.11] 1.56[1.13,2.00]

0.84[0.61,1.07] 1.02[0.74,1.31] 1.18[0.85,1.51]

0.66[0.47,0.84] 0.75[0.54,0.95] 0.91[0.66,1.16]

2

200

4

100

4

200

Sym Asy Mix Sym Asy Mix Sym Asy Mix

5.18[4.16,6.19] 5.14[4.13,6.15] 4.90[3.94,5.86] 5.21[3.77,6.66] 5.21[3.76,6.65] 5.22[3.78,6.67] 5.17[4.16,6.19] 5.17[4.15,6.18] 5.18[4.17,6.20]

1.99[1.60,2.37] 2.14[1.72,2.56] 1.95[1.56,2.33] 2.00[1.45,2.56] 2.04[1.47,2.60] 2.03[1.47,2.60] 1.92[1.55,2.30] 1.95[1.56,2.33] 1.94[1.56,2.32]

1.61[1.29,1.92] 1.62[1.30,1.94] 1.52[1.22,1.82] 1.58[1.14,2.02] 1.57[1.14,2.01] 1.59[1.15,2.03] 1.62[1.30,1.94] 1.61[1.30,1.93] 1.62[1.30,1.94]

0.73[0.59,0.88] 0.87[0.70,1.04] 0.99[0.79,1.18] 0.70[0.50,0.89] 0.73[0.53,0.93] 0.74[0.54,0.95] 0.64[0.51,0.76] 0.66[0.53,0.79] 0.67[0.54,0.80]

0.61[0.48,0.72] 0.67[0.54,0.81] 0.77[0.62,0.92] 0.57[0.41,0.73] 0.59[0.43,0.75] 0.60[0.43,0.77] 0.55[0.44,0.65] 0.56[0.45,0.67] 0.56[0.45,0.67]

Sym Asy Mix Sym Asy Mix Sym Asy Mix

6.32[4.57,8.07] 6.24[4.51,7.97] 5.48[3.95,6.99] 6.30[5.06,7.53] 6.26[5.03,7.48] 5.93[4.77,7.09] 6.32[4.57,8.08] 6.32[4.56,8.07] 6.34[4.58,8.09]

1.60[1.16,2.05] 1.84[1.33,2.36] 1.68[1.22,2.16] 1.48[1.19,1.77] 1.61[1.30,1.93] 1.49[1.20,1.79] 1.48[1.07,1.89] 1.51[1.09,1.93] 1.51[1.09,1.93]

1.19[0.86,1.53] 1.32[0.95,1.69] 1.31[0.96,1.69] 1.18[0.95,1.41] 1.21[0.98,1.45] 1.17[0.95,1.41] 1.16[0.84,1.48] 1.16[0.84,1.48] 1.17[0.84,1.49]

0.80[0.57,1.01] 0.92[0.67,1.18] 1.04[0.75,1.32] 0.71[0.57,0.85] 0.81[0.65,0.96] 0.90[0.72,1.07] 0.68[0.49,0.87] 0.71[0.51,0.91] 0.72[0.52,0.92]

0.62[0.45,0.79] 0.68[0.49,0.87] 0.81[0.58,1.03] 0.58[0.46,0.69] 0.63[0.50,0.75] 0.71[0.57,0.85] 0.55[0.40,0.71] 0.56[0.41,0.72] 0.58[0.42,0.73]

Sym Asy Mix

6.30[5.06,7.53] 6.29[5.06,7.53] 6.30[5.07,7.54]

1.41[1.13,1.68] 1.43[1.15,1.71] 1.43[1.15,1.71]

1.18[0.95,1.41] 1.17[0.94,1.40] 1.18[0.95,1.41]

0.62[0.50,0.74] 0.64[0.52,0.77] 0.65[0.52,0.78]

0.53[0.43,0.64] 0.54[0.43,0.65] 0.55[0.44,0.65]

ρFF' = .50 2

100

2

200

4

100

4

200

c, the number of scale points; n, sample size; τ, thresholds; λ, eigenvalue

retained three factors for 83.4 % to 100 % of the data sets across all conditions, with ten exceptions (i.e., 28 % of the conditions); all were associated with the two-point scales and none-symmetric distributions for categorical items. For these ten conditions, the mean of the fourth eigenvalue was either greater than 1 (e.g., ‾ λ4 = 1.04 for the condition with ρFF' = .50) or smaller than but close to 1

λ4 = .90 for the conditions with ρFF' = .50). (e.g.,‾ λ4 = .92 and ‾ Correspondingly, eigenvalue >1 gave more than three factors for a considerably large percentages of data sets (16.0 % to 98.3 %) for these ten conditions. Note that the majority of the conditions associated with two-point scales with 12 items had the means of the 95 % CIs for the fourth eigenvalues covering the value of 1, and four

Behav Res Table 2 Means [with 95 % confidence intervals] of the first five eigenvalues for 24 items c

n

τ

λ1

λ2

λ3

λ4

λ5

ρFF' = 0 2

200

2

400

4

200

4

400

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

7.50[6.03,8.98] 7.41[5.96,8.86] 6.96[5.60,8.32] 7.41[6.38,8.43] 7.38[6.35,8.40] 7.25[6.24,8.25] 7.43[5.98,8.89] 7.43[5.97,8.88] 7.45[5.99,8.91] 7.39[6.37,8.42] 7.40[6.37,8.42] 7.40[6.38,8.43]

4.66[3.75,5.57] 4.96[3.99,5.94] 4.46[3.59,5.34] 4.53[3.90,5.16] 4.72[4.07,5.37] 4.48[3.86,5.10] 4.60[3.70,5.50] 4.64[3.73,5.55] 4.63[3.72,5.53] 4.48[3.85,5.10] 4.50[3.88,5.12] 4.49[3.87,5.11]

3.90[3.14,4.67] 3.95[3.18,4.72] 3.68[2.96,4.41] 3.99[3.44,4.55] 3.98[3.43,4.54] 3.90[3.36,4.45] 3.93[3.16,4.71] 3.92[3.15,4.69] 3.94[3.17,4.71] 4.02[3.46,4.57] 4.00[3.45,4.56] 4.01[3.46,4.57]

1.02[0.82,1.22] 1.39[1.12,1.66] 1.63[1.31,1.94] 0.85[0.73,0.97] 1.09[0.94,1.24] 1.23[1.06,1.39] 0.80[0.65,0.96] 0.86[0.69,1.02] 0.87[0.70,1.04] 0.71[0.61,0.80] 0.74[0.64,0.84] 0.75[0.65,0.86]

0.90[0.72,1.07] 1.18[0.95,1.41] 1.35[1.09,1.61] 0.77[0.66,0.87] 0.95[0.82,1.08] 1.05[0.91,1.20] 0.72[0.58,0.86] 0.76[0.61,0.91] 0.78[0.62,0.93] 0.65[0.56,0.74] 0.68[0.58,0.77] 0.69[0.59,0.78]

ρFF' = .25 2

200

Sym Asy Mix

10.02[8.06,11.98] 9.94[7.99,11.89] 9.42[7.58,11.27]

3.51[2.82,4.20] 3.79[3.04,4.53] 3.32[2.67,3.97]

2.89[2.32,3.45] 2.92[2.34,3.49] 2.68[2.16,3.21]

1.00[0.80,1.19] 1.30[1.04,1.55] 1.55[1.24,1.85]

0.88[0.71,1.05] 1.10[0.88,1.32] 1.28[1.03,1.53]

2

400

4

200

4

400

Sym Asy Mix Sym Asy Mix Sym Asy Mix

10.01[8.62,11.40] 9.96[8.58,11.34] 9.78[8.42,11.13] 9.99[8.03,11.95] 9.99[8.03,11.94] 10.00[8.04,11.96] 9.99[8.61,11.38] 9.98[8.60,11.37] 9.99[8.61,11.38]

3.37[2.90,3.84] 3.52[3.03,4.01] 3.32[2.86,3.78] 3.42[2.75,4.09] 3.46[2.78,4.14] 3.45[2.77,4.12] 3.30[2.85,3.76] 3.33[2.87,3.79] 3.32[2.86,3.78]

2.93[2.52,3.33] 2.92[2.51,3.32] 2.84[2.45,3.24] 2.90[2.33,3.47] 2.89[2.32,3.46] 2.91[2.34,3.48] 2.94[2.53,3.35] 2.93[2.53,3.34] 2.94[2.53,3.35]

0.84[0.72,0.95] 1.03[0.88,1.17] 1.18[1.02,1.34] 0.79[0.64,0.95] 0.84[0.68,1.01] 0.86[0.69,1.02] 0.70[0.60,0.80] 0.73[0.63,0.83] 0.74[0.64,0.85]

0.75[0.65,0.86] 0.90[0.77,1.02] 1.02[0.88,1.16] 0.71[0.57,0.85] 0.75[0.60,0.90] 0.76[0.61,0.91] 0.65[0.56,0.73] 0.67[0.58,0.76] 0.68[0.59,0.77]

Sym Asy Mix Sym Asy Mix Sym Asy Mix

12.28[9.88,14.69] 12.17[9.79,14.56] 11.52[9.26,13.78] 12.23[10.53,13.92] 12.19[10.50,13.88] 11.95[10.29,13.61] 12.28[9.87,14.68] 12.26[9.86,14.67] 12.29[9.88,14.70]

2.46[1.98,2.94] 2.69[2.17,3.22] 2.36[1.90,2.83] 2.35[2.02,2.67] 2.47[2.12,2.81] 2.32[2.00,2.64] 2.36[1.89,2.82] 2.39[1.92,2.86] 2.38[1.92,2.85]

2.00[1.61,2.40] 2.07[1.66,2.48] 1.91[1.53,2.28] 2.01[1.73,2.29] 2.02[1.74,2.30] 1.95[1.68,2.22] 1.98[1.59,2.37] 1.98[1.59,2.37] 1.99[1.60,2.38]

0.98[0.79,1.17] 1.22[0.98,1.46] 1.46[1.17,1.74] 0.83[0.71,0.94] 0.99[0.85,1.12] 1.15[0.99,1.31] 0.78[0.63,0.94] 0.82[0.66,0.99] 0.85[0.68,1.01]

0.86[0.69,1.02] 1.03[0.83,1.23] 1.21[0.97,1.44] 0.74[0.64,0.84] 0.86[0.74,0.98] 0.99[0.85,1.13] 0.70[0.57,0.84] 0.74[0.59,0.88] 0.75[0.61,0.90]

Sym Asy Mix

12.23[10.53,13.92] 12.23[10.53,13.92] 12.23[10.54,13.93]

2.28[1.96,2.60] 2.30[1.98,2.62] 2.30[1.98,2.62]

2.02[1.74,2.30] 2.00[1.73,2.28] 2.02[1.74,2.30]

0.69[0.60,0.79] 0.72[0.62,0.82] 0.74[0.63,0.84]

0.64[0.55,0.73] 0.66[0.57,0.75] 0.67[0.58,0.77]

ρFF' = .50 2

200

2

400

4

200

4

400

c, the number of scale points; n, sample size; τ, thresholds; λ, eigenvalue

of these conditions even had the means of the 95 % CIs for the fifth eigenvalues include the value of 1. As compared to the analysis with 12 items, when the number of items increased to 24, more conditions had means of the fourth (42 % of the conditions) and the fifth (22 % of the conditions) eigenvalues that were greater than or close to 1, and 56 % and 39 % of the conditions had means of the 95 %

CIs for the fourth and fifth eigenvalues, respectively, that included the value of 1. The majority of these conditions were again associated with two-point scales. As a result, a great percentage of replications (39 % to 100 %) retained too many factors on the basis of eigenvalue >1. However, when the scale points increased to four with a sample size of at least 200, eigenvalue >1 resulted in very accurate decisions (93.2 %

Behav Res Table 3 Factor retention (as percentages) based on eigenvalue >1 and parallel analysis (PA) 12 Items

24 Items

λ> 1 τ

c

ρFF' = 0 2 Sym Asy Mix 2 Sym Asy Mix 4 Sym Asy Mix 4 Sym

λ> 1

PA

PA

n

3

3

n

3

3

100

0 0 0 0 0 0 0 0 0 0

88.0 29.1 1.7 99.9 73.4 34.6 100 99.3 99.3 100

12.0 70.9 98.3 0.1 26.6 65.4 0 0.7 0.7 0

1.1 12.4 23.9 0 0.6 2.0 0 0.2 0 0

98.9 83.0 65.8 100 98.9 96.0 100 99.8 100 100

0 4.6 10.3 0 0.5 2.0 0 0 0 0

200

0 0 0 0 0 0 0 0 0 0

40.3 0.1 0 99.7 15.7 0.5 99.9 98.3 98.1 100

59.7 99.9 100 0.3 84.3 99.5 0.1 1.7 1.9 0

0 0 0 0 0 0 0 0 0 0

100 100 97.3 100 100 100 100 100 100 100

0 0 2.7 0 0 0 0 0 0 0

0 0

100 100

0 0

0 0

100 100

0 0

0 0

100 100

0 0

0 0

100 100

0 0

0.1 0.5 0.1 0 0.3 0.1 0 0.1 0.1 0 0 0

91.8 46.8 9.3 100 86.4 57.2 99.9 99.5 99.3 100 100 100

8.1 52.7 90.6 0 13.3 42.7 0.1 0.4 0.6 0 0 0

33.0 58.2 79.4 5.5 31.5 54.0 11.2 20.5 17.0 0.2 1.0 0.6

67.0 40.9 19.2 94.5 68.5 45.9 88.8 79.5 83.0 99.8 99.0 99.4

0 0.9 1.4 0 0 0.1 0 0 0 0 0 0

200

0 0 0 0 0 0 0 0 0 0 0 0

52.7 0.8 0 99.9 39.6 1.2 99.9 99.3 98.4 100 100 100

47.3 99.2 100 0.1 60.4 98.8 0.1 0.7 1.6 0 0 0

0 2 2.7 0 0 0 0 0 0 0 0 0

100 97.9 96.3 100 100 100 100 100 100 100 100 100

0 0.1 1.0 0 0 0 0 0 0 0 0 0

13.5 6.5

83.5 63.2

3 30.3

94.9 94.7

5.1 5.3

0 0

200

0 0

61 4.6

39 95.4

12.8 62.9

87.2 37.1

0 0

1.4 9 10.6 9.3 14.4 16.4 13.7 4.4 6.8 4.8

40.2 90.9 84.6 74.7 85.6 83.4 86.2 95.6 93.2 95.2

58.4 0.1 4.8 16.0 0 0.2 0.1 0 0 0

99.2 90.2 96.2 99.7 90.7 93.6 93.5 75.5 82.1 81.9

0.8 9.8 3.9 0.3 9.3 6.4 6.5 24.5 17.9 18.1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 99.9 59.1 4.2 99.9 99.6 99.0 100 100 100

100 0.1 40.9 95.8 0.1 0.4 1 0 0 0

89.4 0.2 12.6 16.3 0.8 2.9 2.1 0 0 0

10.5 99.8 87.4 83.7 99.2 97.1 97.9 100 100 100

0.1 0 0 0 0 0 0 0 0 0

200

100

200

Asy Mix

400

200

400

ρFF' = .25 2

2

4

4

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

100

Sym Asy

100

200

100

200

400

200

400

ρFF' = .50 2

2

4

4

Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

200

100

200

400

200

400

c, the number of scale points; n, sample size; τ, thresholds; λ, eigenvalue; >3, =3, 1 criterion was less likely to overestimate the number of

factors. These findings suggested that if eigenvalue >1 was chosen as the sole criterion, it was more likely to retain too many factors when the number of items was large, the number of scale points was few, the sample

Behav Res

size was small, thresholds varied across items, and the correlation among factors was small. Table 3 also shows the factor retention based on parallel analysis. It appears that the performance of parallel analysis greatly depended on the magnitude of the correlation among factors and the number of items. First, when the factors were independent in the population, parallel analysis gave very accurate decisions about the number of factors to retain: More than 96 % of analyses yielded three factors, with two exceptions. The exceptions were associated with two-point scales with 12 items, sample size of 100, and nonsymmetric categorical distributions. For these two conditions, 12.4 % and 23.9 % of analyses, respectively, suggested too few factors. As the number of items increased to 24, nearly 100 % of analyses yielded three factors for all conditions. In general, parallel analysis performed better than the eigenvalue >1 criterion. Second, when the correlation of factors increased to .25, greater percentages of analyses resulted in too few factors when the number of items was 12 (up to 79.4 % of the analyses produced less than three factors). The percentage of underextraction increased with smaller sample size and greater dissimilarity of thresholds across items. For the 24-item scales, parallel analysis still performed very well, and better than the eigenvalue >1 criterion. Third, when the correlation of factors increased to .50, the majority of the analyses (75.5 % to 99.7 %) suggested too few factors when the number of items was 12. For these conditions, eigenvalue >1 performed better than parallel analysis. When the number of items increased to 24, the performance of parallel analysis improved, but this depended on the number of scale points, sample size, and the similarity of thresholds across items. Parallel analysis performed very well for four-point scales (more than 97.1 % of the analyses retained three factors), but was less satisfactory for two-point scales, particularly when the thresholds differed across items (e.g., 89.4 % and 16.3 % of the analyses retained less than three factors for n = 200 and n= 400, respectively). Decision based on chi-square test and fit indices Before summarizing the chi-square test and fit indices for EFA models, we identified the replications with inadmissible solutions including model non-convergence and improper solutions. In general, the sample size, number of scale points, and heterogeneity of thresholds across items did not impact the admissible solutions rate. Analyses encountered fewer inadmissible solutions when the number of items was larger and the models were correctly specified or overspecified. For the analysis with 12 items, more than 93.1 % of the underspecified models, 90 % of the correctly specified models, and 70 % of the overspecified models converged to proper solutions. For the analysis with 24 items, more than

98 % of the models converged to proper solutions across all conditions. Replications with inadmissible solutions were excluded from further analysis. Tables 4 and 5 present, for conditions with 12 items and 24 items, respectively, the rejection rates based on the chi-square test with an alpha level of .05 and the means of two fit indices: RMSEA and the weighted root mean square residual (WRMR). We also computed the means of CFI, but decided not to report them because the CFI performed similarly to or not as well as RMSEA in the model evaluation. Note that the two-factor model was the underspecified model, so we expected high rejection rates based on the chi-square test, high RMSEAs, and high WRMRs. Both the three-factor and fourfactor models were correctly or overspecified models, so the corresponding rejection rates should ideally be at or lower than 5 %, and the RMSEA and WRMR were expected to be low. As expected, the chi-square test was sensitive to sample size, particularly when the number of scale points was two. Three additional observations can be made from Tables 4 and 5. First, the rejection rates based on the chi-square test for the two-factor models were satisfactory (≥86.9 %) when the number of scale points was four or when the number of items was 24. For two-point scales with 12 items, the twofactor model was rejected too rarely, particularly when the sample size was 100 and the correlation among the factors was .50 (48.1 %, 14.7 %, and 14.3 % for the symmetric, asymmetric, and mixed conditions, respectively). In other words, the chi-square test showed considerably low power of rejecting the underspecified model for these conditions. The rejection rates were also satisfactory in general for three-factor models (≤13.0 %, with the majority of them lower than 5 %) and four-factor models (≤0.4 %), except for several conditions with 24 items, two-point scales, and mixed thresholds. For these conditions, the rejection rate was very high for the threefactor model (26.0 % to 92.1 %), and high for the four-factor model (7.8 % to 38.6 %). Second, RMSEA did not demonstrate sufficient power to reject the two-factor model when the number of scale points was two and the correlation among factors was .50 (mean RMSEAs≤ .06 for most of these conditions, indicating reasonable fit; Hu & Bentler, 1999), but it performed better as the number of scale points increased to four and/or the correlation among factors decreased (all mean RMSEAs≥ .050, and all but six mean RMSEAs≥ .060). Third, WRMR showed patterns of detecting the underspecified two-factor model that were different from those of RMSEA. The mean of WRMRs was smaller than 1 (indicating failing to reject the model; Yu, 2002) for all conditions with 12 items and correlation among factors of .50, but it increased considerably when the number of items was 24 (0.786≤ mean WRMRs≤ 2.758). For three-factor and fourfactor models, both RMSEA and WRMR suggested adequate model-data fit.

Behav Res Table 4 Rejection rates based on the chi-square test (as percentages) and means of the root-mean square error of approximation (RMSEA) and weighted root mean square residual (WRMR) for exploratory factor analysis (EFA) models for 12 items Two-Factor Model

Three-Factor Model

Four-Factor Model

n

τ

%

RMSEA

WRMR

%

RMSEA

WRMR

%

RMSEA

WRMR

2

100

2

200

4

100

4

200

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

99.7 74.4 63.8 100 99.7 98.3 100 100 100 100 100 100

.126 .075 .068 .137 .084 .080 .189 .172 .170 .200 .180 .179

.998 .809 .832 1.369 1.008 1.058 1.176 1.093 1.091 1.668 1.510 1.504

3.0 1.8 9.4 3.4 4.4 13.0 1.9 2.3 4.1 2.7 3.3 4.3

.019 .019 .028 .014 .015 .022 .015 .016 .020 .012 .012 .013

.396 .433 .546 .398 .440 .558 .293 .310 .335 .289 .302 .326

0.1 0.1 0.4 0.4 0.1 0.7 0.1 0.0 0.4 0.2 0.2 0.2

.007 .005 .008 .004 .004 .006 .004 .004 .006 .003 .003 .003

.282 .296 .373 .286 .309 .387 .207 .218 .238 .206 .216 .233

Sym Asy Mix Sym Asy

92.3 44.7 36.3 100 92.5

.093 .057 .050 .104 .065

.794 .683 .726 1.046 .821

2.1 2.0 4.9 3.5 2.9

.018 .018 .023 .013 .014

.391 .411 .515 .396 .417

0.2 0.2 0.4 0.3 0.3

.006 .007 .006 .004 .004

.277 .285 .356 .282 .296

Mix Sym Asy Mix Sym Asy Mix

83.5 100 99.7 99.7 100 100 100

.058 .149 .132 .129 .161 .142 .139

.869 .901 .836 .840 1.252 1.136 1.132

11.1 1.7 1.6 3.0 3.1 3.2 3.9

.022 .014 .016 .017 .012 .011 .013

.558 .293 .305 .333 .290 .299 .329

0.9 0.1 0.1 0.2 0.2 0.3 0.2

.006 .003 .004 .005 .003 .003 .003

.389 .206 .214 .235 .207 .213 .236

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym

48.1 14.7 14.3 96.0 55.1 43.2 96.3 90.2 86.9 100

.058 .038 .033 .070 .044 .039 .103 .090 .086 .117

.631 .587 .630 .779 .660 .740 .662 .624 .634 .889

1.7 0.8 1.5 3 1.7 7.3 1.3 1.6 2 2.8

.015 .015 .013 .012 .011 .018 .013 .014 .015 .011

.388 .393 .453 .398 .405 .535 .299 .308 .338 .297

0.1 0 0.1 0.1 0.1 0.4 0.1 0.2 0.1 0.3

.005 .006 .004 .003 .003 .005 .003 .004 .004 .003

.275 .275 .318 .284 .288 .376 .211 .217 .240 .213

Asy Mix

100 100

.102 .098

.813 .817

2.8 3.7

.011 .012

.303 .335

0.1 0.3

.003 .003

.215 .240

c ρFF' = 0

ρFF' = .25 2

100

2

200

4

100

4

200

ρFF' = .50 2

100

2

200

4

100

4

200

c, the number of scale points; n, sample size; τ, thresholds; %, rejection rate based on χ2 test.

Model selection based on chi-square difference test and difference in fit indices The two-factor, three-factor, and four-factor models were nested such that the model with a smaller number of factors

was nested within the model with a greater number of factors. Nested models are commonly compared via the chi-square difference test (Δχ2). If the more complex model fits significantly better than the simpler one, the more complex model is adopted. Theoretically, when the model is misspecified, the

Behav Res Table 5 Rejection rates based on the chi-square test (as percentages) and means of the RMSEA and WRMR for EFA models for 24 items Two-Factor Model

Three-Factor Model

Four-Factor Model

c

n

τ

%

RMSEA

WRMR

%

RMSEA

WRMR

%

RMSEA

WRMR

ρFF' = 0 2

200

2

400

4

200

4

400

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

100 100 100 100 100 100 100 100 100 100 100 100

.108 .066 .072 .117 .073 .078 .153 .138 .139 .160 .144 .146

1.532 1.172 1.276 2.131 1.519 1.676 1.947 1.772 1.771 2.758 2.487 2.477

1.5 3.7 53.1 2.3 2.2 26.0 1.3 0.6 1.4 3.0 2.0 2.3

.009 .013 .029 .006 .007 .015 .007 .007 .008 .005 .006 .006

.522 .580 .757 .522 .566 .718 .407 .424 .452 .404 .419 .448

0.1 0.3 7.8 0.1 0.0 2.3 0.0 0.0 0.1 0.1 0.3 0.1

.004 .006 .013 .002 .003 .005 .002 .002 .003 .002 .002 .002

.456 .499 .621 .461 .497 .613 .356 .370 .396 .356 .369 .395

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

100 99.6 100 100 100 100 100 100 100 100 100 100

.083 .052 .057 .091 .059 .061 .125 .111 .111 .132 .118 .118

1.190 0.956 1.084 1.606 1.217 1.371 1.480 1.352 1.355 2.070 1.868 1.858

1.4 0.9 73.3 2.0 1.2 61.4 0.8 1.0 1.8 1.4 2.0 2.1

.009 .011 .034 .006 .007 .022 .006 .007 .008 .005 .006 .006

.520 .550 .784 .519 .545 .768 .406 .420 .454 .404 .415 .449

0.1 0.1 14.8 0.2 0.2 10.9 0.1 0.1 0.0 0.1 0.2 0.1

.004 .005 .018 .002 .003 .010 .002 .002 .003 .001 .002 .002

.455 .475 .634 .459 .478 .637 .355 .366 .398 .356 .366 .397

Sym Asy Mix Sym Asy

100 88.8 99.3 100 100

.057 .037 .045 .064 .042

0.905 0.786 0.949 1.16 0.943

1 0.8 74 2.1 1.4

.009 .010 .033 .006 .006

.526 .542 .765 .526 .542

0 0.1 13.8 0.2 0.1

.004 .005 .018 .002 .003

.459 .468 .623 .464 .475

Mix Sym Asy Mix Sym Asy Mix

100 100 100 100 100 100 100

.047 .093 .081 .079 .101 .089 .087

1.15 1.06 0.979 0.979 1.45 1.32 1.32

92.1 0.7 0.9 1.4 2.3 2.1 2.8

.030 .006 .007 .008 .005 .005 .006

.842 .418 .425 .465 .416 .422 .461

38.6 0.1 0 0.1 0.2 0.2 0.3

.018 .002 .003 .003 .002 .002 .002

.679 .365 .371 .409 .368 .372 .408

ρFF' = .25 2

200

2

400

4

200

4

400

ρFF' = .50 2

200

2

400

4

200

4

400

c, the number of scale points; n, sample size; τ, thresholds; %, rejection rate based on χ2 test

sample chi-squares do not asymptotically follow a central chisquare distribution, but rather a noncentral chi-square distribution with the mean equal to the model degrees of freedom plus a noncentrality parameter. As a result, the chi-square difference statistics do not follow a central chi-square distribution (see also Hayashi, Bentler, & Yuan, 2007). However, in

real applications the true population model is unknown, and the chi-square difference test is frequently used to compare the relative fits of nested models. For this reason, we report the results from the chi-square difference test between two models, assuming that the chi-square difference statistics follow a central chi-square difference. Specifically, we made two

Behav Res Table 6 Rejection rates based on the chi-square difference test (Δχ2) and differences in the means of the RMSEAs and WRMRs for 12 items Two-Factor vs. Three-Factor

Three-Factor vs. Four-Factor

c

n

τ

rep

Δχ2

ΔRMSEA

ΔWRMR

rep

Δχ2

ΔRMSEA

ΔWRMR

ρFF' = 0 2

100

2

200

4

100

4

200

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

2000 1991 1962 2000 2000 1999 2000 1872 1862 2000 1999 1999

100 88.6 85.7 100 100 100 100 100 100 100 100 100

–.106 –.057 –.040 –.124 –.069 –.059 –.175 –.155 –.151 –.189 –.168

–.602 –.377 –.288 –.971 –.568 –.500 –.883 –.783 –.757 –1.379 –1.208

1646 1661 1652 1599 1639 1668 1668 1584 1511 1640 1617

10.8 9.9 32.4 16.6 17.8 50.7 9.9 24.7 16.8 16.8 16.4

–.014 –.014 –.022 –.010 –.011 –.017 –.011 –.013 –.014 –.009 –.009

–.119 –.140 –.176 –.114 –.132 –.174 –.088 –.094 –.100 –.084 –.089

–.166

–1.178

1612

18.6

–.010

–.093

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

1998 1975 1871 2000 2000 1984 2000 1947 1954 2000 2000 2000

98 65.9 61.5 100 98.3 96.3 100 99.9 100 100 100 100

–.075 –.039 –.028 –.091 –.052 –.036 –.135 –.117 –.113 –.149 –.131

–.403 –.273 –.214 –.649 –.403 –.312 –.609 –.531 –.507 –.962 –.837

1594 1631 1523 1573 1614 1612 1668 1599 1554 1622 1651

8.3 7.9 23.2 14.7 12.8 40.8 11.9 10.9 14.3 15.2 14.6

–.013 –.012 –.018 –.010 –.010 –.017 –.011 –.013 –.013 –.009 –.009

–.117 –.129 –.163 –.116 –.123 –.171 –.088 –.093 –.099 –.085 –.088

–.127

–.803

1596

18.8

–.010

–.095

Sym Asy Mix Sym Asy

1981 1937 1761 1999 1983

72.1 28.6 39.5 99.3 79.6

–.043 –.023 –.021 –.058 –.033

–.243 –.195 –.180 –.381 –.256

1572 1593 1334 1574 1605

6.5 3.4 11.0 14.8 7.7

–.011 –.010 –.010 –.010 –.008

–.116 –.122 –.139 –.116 –.119

Mix Sym Asy Mix Sym Asy Mix

1801 2000 1979 1973 2000 2000 2000

74.3 99.6 98.0 97.1 100 100 100

–.021 –.090 –.076 –.071 –.106 –.091

–.207 –.363 –.316 –.296 –.592 –.510

1373 1621 1632 1542 1585 1618

30.1 11.4 10.5 12.2 14.6 14.4

–.014 –.011 –.011 –.012 –.008 –.009

–.162 –.090 –.094 –.100 –.086 –.089

–.086

–.482

1577

18.5

–.009

–.096

ρFF' = .25 2

100

2

200

4

100

4

200

ρFF' = .50 2 100

2

200

4

100

4

200

c, the number of scale points; n, sample size; τ, thresholds; rep, the number of legitimate replications; Δχ2 , percentages of legible replications demonstrating a significant chi-square difference test at the alpha level of .05.

comparisons: the two-factor model versus the three-factor model, and the three-factor versus the four-factor model. Only replications for which both of the models in a comparison converged to proper solutions were included. The numbers of replications involved in each

comparison are reported under the “rep” columns in Tables 6 and 7. In summary, the numbers of replications involved in the comparisons varied across conditions but were at least 1,334, warranting a valid generalization.

Behav Res Table 7 Rejection rates based on the chi-square difference test (Δχ2) and differences in the means of the RMSEAs and WRMRs for 24 items Two-Factor vs. Three-Factor

Three-Factor vs. Four-Factor

c

n

τ

rep

Δχ2

ΔRMSEA

ΔWRMR

rep

Δχ2

ΔRMSEA

ΔWRMR

ρFF' = 0 2

200

2

400

4

200

4

400

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

2000 2000 1997 2000 2000 2000 2000 2000 1998 1999 2000 2000

100 100 100 100 100 100 100 100 100 100 100 100

–.099 –.053 –.043 –.110 –.066 –.064 –.146 –.131 –.131 –.154 –.139

–1.011 –.592 –.518 –1.609 –.954 –.959 –1.539 –1.348 –1.319 –2.354 –2.068

1991 1991 1995 1982 1984 1979 1986 1990 1979 1987 1990

17.6 25.2 94.3 33.1 25.5 84.9 23.3 21.6 27.1 35.7 33.4

–.005 –.007 –.016 –.004 –.004 –.009 –.005 –.005 –.005 –.004 –.004

–.065 –.080 –.137 –.061 –.069 –.104 –.052 –.054 –.056 –.048 –.050

–.140

–2.030

1970

38.4

–.004

–.052

Sym Asy Mix Sym Asy Mix Sym Asy Mix Sym Asy Mix

2000 2000 1999 2000 2000 1998 2000 2000 2000 2000 1999 2000

100 100 100 100 100 100 100 100 100 100 100 100

–.074 –.041 –.023 –.085 –.052 –.039 –.119 –.104 –.103 –.127 –.112

–.669 –.406 –.300 –1.087 –.672 –.602 –1.074 –.932 –.901 –1.666 –1.453

1986 1995 1995 1964 1982 1989 1990 1987 1978 1985 1986

16.3 10.9 98.7 29.6 19.4 96.8 21.4 19.8 26.2 34.3 32.3

–.005 –.005 –.016 –.004 –.004 –.012 –.004 –.005 –.005 –.004 –.004

–.065 –.074 –.150 –.061 –.067 –.131 –.051 –.054 –.056 –.048 –.050

–.112

–1.409

1971

38.5

–.004

–.052

Sym Asy Mix Sym Asy

1999 1999 1999 2000 2000

99.1 99.3 100 100 100

–.048 –.027 –.012 –.058 –.036

–.380 –.243 –.185 –.632 –.401

1992 1992 1996 1963 1975

14.7 7.8 98.8 28.8 17.8

–.005 –.005 –.015 –.004 –.004

–.066 –.073 –.141 –.062 –.067

Mix Sym Asy Mix Sym Asy Mix

1993 2000 2000 1999 2000 2000 1998

100 100 100 100 100 100 100

–.017 –.086 –.074 –.071 –.097 –.083

–.304 –.644 –.554 –.523 –1.040 –.894

1986 1983 1981 1977 1979 1983

100 23.2 19.5 25.7 31.9 31.0

–.012 –.005 –.005 –.005 –.004 –.004

–.162 –.052 –.054 –.057 –.048 –.050

–.081

–.855

1952

37.8

–.004

–.053

ρFF' = .25 2

200

2

400

4

200

4

400

ρFF' = .50 2 200

2

400

4

200

4

400

c, the number of scale points; n, sample size; τ, thresholds; rep, the number of legitimate replications; Δχ2 , percentages of legible replications demonstrating a significant chi–square difference test at the alpha level of .05.

The results from the chi-square difference tests for model comparisons are presented in Tables 6 and 7 for 12 items and 24 items, respectively. The values in the tables under the “Δχ 2” column are percentages of legible replications

demonstrating a significant chi-square difference test at the alpha level of .05. For the comparison between the two-factor and three-factor models, we hoped for a high rejection rate (indicating that the three-factor model was favored); for the

Behav Res

comparison between the three-factor and four-factor models, we hoped for a low rejection rate (so that the four-factor model was not preferred). Two observations can be made from the tables. First, for the analysis with 12 items, 72.1 % to 100 % of the replications involved in the first comparison (two-factor vs. three-factor) showed that the three-factor model was superior to the two-factor model, with four exceptions. The four exceptions were related to the sample size of 100 and the twopoint scale: The rejection rate was only 28.6 % for asymmetric thresholds, and 39.5 % for mixed thresholds when the correlation among factors was .50; the corresponding rejection rates were increased to 65.9 % and 61.5 % as the correlation among factors decreased to .25. This result was consistent with the findings from the chi-square test (the rejection rate based on the chi-square test was only about 14.7 %, 14.3 %, 36.3 %, and 44.7 %, respectively) and parallel analysis (the majority of the replications produced fewer than three factors), but not consistent with the decisions based on the eigenvalue >1. For the second comparison (three-factor vs. four-factor), the rejection rate ranged from 3.4 % (for asymmetric thresholds, n= 100, ρFF' = .50, and a two-point scale) to 50.7 % (for mixed thresholds, n= 200, ρFF' =0, and a two-point scale). In other words, although the three-factor model was preferable to the four-factor model for the majority of the replications (ranging from 49.3 % to 96.6 %), still a considerable number of replications favored the overspecified four-factor model. Second, for 24-item scales, the two-factor model demonstrated a considerably worse fit than did the three-factor model across all conditions (with rejection rates ≥99.1 %). The comparison of the three-factor model to the four-factor model revealed that the three-factor model was favored 7.8 % to 38.5 % of the time, with six exceptions (84.9 % to 100 %). The six exceptions were associated with the two-point scale when the thresholds varied across items, indicating that the analysis resulted in retaining too many factors. Because of the well-known problem with the chi-square and chi-square difference tests—that they are sensitive to sample size when the model is misspecified—we computed the mean differences in RMSEAs and WRMRs (ΔRMSEA and ΔWRMR) between the two models involved in the comparison. The results are also presented in Tables 6 and 7. Both RMSEA and WRMR favored more complex models. As expected, both ΔRMSEA and ΔWRMR were negative across all conditions, thereby indicating that the RMSEA and WRMR from the more complex model were smaller than those from the simpler model (e.g., the four-factor model vs. the three-factor model). For the comparison between the threefactor and four-factor models, the absolute values of ΔRMSEA were ≤.018, and those of ΔWRMR were ≤.176 for all conditions. For the comparison between the two-factor and three-factor models, the absolute values of ΔRMSEA were ≥.020, and those of ΔWRMR were ≥.180, with two exceptions. The exceptions were again associated with the

two-point scale with sample size of 200 and 24 items when the correlation among factors was .50.

Discussion With the promotion of integrating EFA and CFA in SEM analysis (see Asparouhov & Muthén, 2009) and the implementation of statistical extraction methods for EFA in statistical software, applied researchers may combine evidence from the chi-square test, fit indices, and conventional criteria such as eigenvalue >1 and parallel analysis to determine the number of factors to retain in EFA. However, no previous study we were aware of has examined the accuracy of the decisions about the number of factors to retain in EFA for categorical item data by combining all of these methods. The WLSMV estimation method was applied to analyze EFA models based on the polychoric correlation matrix. To determine the number of factors to retain, we considered both the model–data fit and two conventional criteria in the EFA literature—that is, eigenvalue >1 and parallel analysis. Below we summarize and discuss the major findings from the simulation study. First, it is well-documented in the EFA literature that eigenvalue >1 tends to give too many factors. This conclusion is mainly based on analysis with Pearson correlations. Whether the conclusion would also apply to ordered categorical item data is worth investigating. In the present study, we analyzed polychoric correlations for ordered categorical item data. We found that for two-point scales, the performance of the eigenvalue >1 rule depended on the distribution of the categorical data, the number of items in the scale, the magnitude of correlation among the factors, and sample size. Too many factors tended to be retained when the items did not have symmetric distributions, the number of items in the scale was large (i.e., 24), correlation among the factors was small (e.g., ρFF' =0), and sample size was small (e.g., 100). For four-point scales, eigenvalue >1 gave very accurate decisions about the number of factors to retain, unless the correlation among the factors was large (i.e., ρFF' = .50) and the sample size was small (e.g., 100). These findings suggest that the eigenvalue >1 criterion should not be trusted for scales with smaller sample size, fewer scale points, asymmetric or varying thresholds across items, and stronger correlation among factors. Second, researchers have recommended using parallel analysis to determine the number of factors to retain, because it yields the most accurate decision (although parallel analysis is greatly underused in empirical EFA studies; see, e.g., Dinno, 2009; Henson & Roberts, 2006). The results from our study indicated that parallel analysis resulted in very accurate decisions when the factors were independent. This finding is consistent with Timmerman and Lorenzo-Seva (2011). Under these conditions, parallel analysis performed

Behav Res

much better than the eigenvalue >1 criterion. When the factors were correlated, parallel analysis has a tendency to underextract the number of factors, particularly for scales with two-point items, low numbers of items (i.e., 12), heterogeneous thresholds, and small sample size (e.g., 100). This finding is consistent with Cho et al. (2009), who found that 87.7 % of the analysis based on parallel analysis resulted in an accurate decision when the correlation among factors was .30. The percentage decreased markedly to 13.8 % when the correlation among factors reached .70, and the majority of the analyses resulted in factor underextraction. The level of correlation among factors explained 54 % of the total variance in the correct decision rates in their study. By varying the magnitude of the correlation among factors, we also found that parallel analysis performed better than eigenvalue >1 when the correlation among factors was .25. However, as the correlation increased to .50, parallel analysis performed worse than eigenvalue >1 for two-point scales with 12 items. We could argue that when the correlation between factors is as high as .50, the multidimensionality of the scale may be questionable. Applied researchers might opt for a model with only one single common factor, or a model with multiple factors, or both, assuming that the factor structure is substantively meaningful. From this point of view, the results based on parallel analysis are informative. We also found that parallel analysis tended to perform poorly when the sample size was small, the number of scale points was two, and the thresholds were heterogeneous across items. Two reasons may explain the unsatisfactory performance of parallel analysis. First, polychoric correlations are likely to be overestimated when the sample size is small (e.g., Flora & Curran, 2004; Quiroga, 1992), even when the population correlations for the random data are zero. Second, for scales with heterogeneous thresholds across items, items with similar thresholds tend to merge together, yielding spurious factors (also called difficulty factors or method factors; e.g., Bernstein & Teng, 1989; Coenders, Satorra, &Saris, 1997; Green, 1983; Green, Akey, Flemming, Hershberger, & Marquis, 1997; McDonald, 1974). When factor analysis is conducted on polychoric correlations using the WLSMV method, it is less likely than the analysis based on Pearson correlations to result in spurious factors, because WLSMV takes item thresholds into consideration (e.g., Yang & Green, 2011); however, spurious associations still occur when items demonstrate great dissimilarities in thresholds (such as the condition of “mixed thresholds” manipulated in this study). Consequently, the first several eigenvalues estimated from the random sample data tend to be too large, and thus too few factors are retained on the basis of parallel analysis. On the basis of these findings, although parallel analysis performed better than the eigenvalue >1 method in most of the conditions, we caution against using parallel analysis when the scale point is two (e.g., true/false), with small sample sizes (less

than 200), and when the thresholds are dramatically heterogeneous across items. Applied researchers can easily gain the information of scale point, sample size, and the categorical distribution of items from the sample data. Such information would help applied researchers to determine whether parallel analysis is likely to yield an accurate decision about the number of factors to retain. Third, the chi-square test and chi-square difference test performed reasonably well in rejecting the underspecified models (i.e., two-factor model in our study) and retaining the correctly specified models. Our results partly aligned with those of Hayashi et al. (2007), conducted on continuous data, that the performance of the chi-square test is reasonable when the EFA model is correctly specified, but poor when the EFA model is overfactored (resulting in too many factors). We found that the chi-square test and chi-square difference test resulted in more accurate decisions about the number of the factors to retain with more scale points, more items, and larger sample sizes. For four-point scales, the underspecified twofactor model was accurately rejected for the 12-item scale, even with a sample size of 100. As the number of items involved in the analysis increased to 24, the power of rejecting the underspecified two-factor model reached 100 %. For some conditions with two-point scales and heterogeneous thresholds across items, the chi-square and chi-square difference tests too often chose the overspecified models. However, researchers have argued that overfactoring is less problematic than underfactoring in EFA, because the true model is likely to be recovered by the first several factors (MacCallum, Widaman, Preacher, & Hong, 2001). In general, the chisquare test performed similarly to or better than parallel analysis in detecting underspecified models. Fourth, considering the well-known problems with the chisquare and chi-square difference tests, we examined RMSEA and WRMR for model evaluation, as well as ΔRMSEA and ΔWRMR for model comparison. Our aim was not to establish the cutoff of RMSEA and WRMR for EFA with categorical data, but to understand the performance of these two indices if conventional cutoffs are adopted (e.g., RMSEA < .06, WRMR< 1; Hu & Bentler, 1999; Yu, 2002). The results showed that RMSEA tended to be too small (i.e., RMSEA< .06) to reject the underspecified model for two-point scales, particularly when the correlation among factors was large, but that it performed reasonably well for four-point scales. WRMR showed a slightly different pattern, in that it lacked power (i.e., WRMR< 1) to reject the underspecified model for scales with fewer items, but performed much better when the number of items increased to 24. In other words, RMSEA and WRMR should not be trusted for two-point scales and short scales, respectively. Similarly, ΔRMSEA performed slightly differently from ΔWRMR in our model comparisons. When comparing the two-factor model to the three-factor model, the reduction in RMSEA was large (ΔRMSEA≥ .07) for four-

Behav Res

point scales, and the reduction in WRMR was large for scales with more items (ΔWRMR ≥ .185). This may be due to the large difference in model degrees of freedom when comparing two nested EFA models. We expected that ΔRMSEA and ΔWRMR would be more sensitive in the model comparison for EFA with more items. Further studies will be needed to understand the appropriateness and performance of RMSEA and WRMR (as well as other fit indices) for EFA with categorical item data. Note that we also considered CFI in the model evaluation, although the detailed results were not presented. Many studies have evaluated the sensitivity of CFI, RMSEA, and other fit indices to the type of model misspecification and other conditions in the context of CFA. Some authors have found that CFI and RMSEA are sensitive to the misspecification of measurement components (e.g., Hu & Bentler, 1999), but others have argued that these measures are not necessarily sensitive to the misspecification of measurement components once the degree of misspecification at the population level is controlled (e.g., Fan & Sivo, 2005, 2007; Fan, Thompson, & Wang, 1999). In our study, CFI performed either similarly to or slightly worse than RMSEA. This may have occurred because in EFA all measured variables are saturated with all factors (see also Sun, 2005), the means of CFI were greater than the cutoff of .95 in many conditions even when the number of factors were underspecified. With a limited number of conditions, this article has presented a study of EFA models with categorical item data. On the basis of the results from this study, the chi-square test, the chi-square difference test, fit indices, and parallel analysis suggested quite accurately the numbers of factors to retain when the number of scale points was four, the number of items was 24, the sample size was at least 200, and the categorical distributions across items did not vary dramatically. Parallel analysis based on polychoric correlations yielded too few factors when the correlation among factors was large, particularly for two-point and 12-item scales. Similar to many other simulation studies, only a limited number of conditions were included in the present study. More conditions should be considered in order to generalize the findings to broader situations. For example, different factor structures, factor saturations, factor correlations, numbers of items for each factor, sample sizes, and numbers of scale points could be considered. Another factor to be considered is larger sample sizes. The largest sample size of 400 included in our study approached the mean sample size reported in Henson and Roberts (2006), but many empirical EFA studies may have larger sample sizes than 400. On the basis of our results, we expect that the chi-square tests and fit indices, as well as eigenvalue >1 and parallel analysis, would perform better as sample size increased.

References Asparouhov, T., & Muthén, B. O. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397–438. doi:10. 1080/10705510903008204 Bernstein, I. H., & Teng, G. (1989). Factoring items and factoring scales are different: Spurious evidence for multidimensionality due to item categorization. Psychological Methods, 105, 467–477. doi:10.1037/ 0033-2909.105.3.467 Bollen, K. A., & Barb, K. H. (1981). Pearson’s R and coarsely categorized measures. American Sociological Review, 46, 232–239. Retrieved from www.jstor.org/stable/2094981 Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111–150. doi: 10.1207/S15327906MBR3601_05 Browne, M.W., Cudeck, R., Tateneni, K., & Mels, G. (2008). CEFA: Comprehensive exploratory factor analysis, Version 3.03 [Computer software and manual]. Retrieved December 2, 2011, from http:// faculty.psy.ohio-state.edu/browne/ Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276. doi:10.1207/ s15327906mbr0102_10 Christoffersson, A. (1977). Two-step weighted least squares factor analysis of dichotomized variables. Psychometrika, 42, 433–438. doi: 10.1007/BF02293660 Cho, S.-J., Li, F., & Bandalos, D. (2009). Accuracy of the parallel analysis procedure with polychoric correlations. Educational and Psychological Measurement, 69, 748–759. doi:10.1177/0013164409332229 Coenders, G., Satorra, A., & Saris, W. E. (1997). Alternative approaches to structural modeling of ordinal data: A Monte Carlo study. Structural Equation Modeling, 4, 261–282. doi:10.1080/ 10705519709540077 Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147–168. doi:10.1177/ 1094428103251541 Dinno, A. (2009). Exploring the sensitivity of Horn’s parallel analysis to the distributional form of random data. Multivariate Behavioral Research, 44, 360–388. doi:10.1080/00273170902938969 Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272–299. doi:10.1037/ 1082-989X.4.3.272 Fan, X., & Sivo, S. A. (2005). Sensitivity of fit indices to misspecified structural or measurement model components: rationale of twoindex strategy revisited. Structural Equation Modeling, 12, 343– 367. doi:10.1207/s15328007sem1203_1 Fan, X., & Sivo, S. A. (2007). Sensitivity of fit indices to model misspecification and model types. Multivariate Behavioral Research, 42, 509–529. doi:10.1080/00273170701382864 Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample size, estimation methods, and model specification on structural equation modeling fit indexes. Structural Equation Modeling, 6, 56–83. doi: 10.1080/10705519909540119 Finney, S., & DiStefano, C. (2006). Non-normal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 269– 314). Greenwich: Information Age. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466–491. doi:10.1037/1082-989X. 9.4.466 Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A critical review and analysis. Personnel Psychology, 39, 291–314.

Behav Res Gagné, P., & Furlow, C. F. (2009). Automating multiple software packages in simulation research for structural equation modeling and hierarchical linear modeling. Structural Equation Modeling, 16, 179–185. doi:10.1080/10705510802561543 Glorfeld, L. W. (1995). An important on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55, 377–393. doi: 10.1177/0013164495055003002 Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale: Erlbaum. Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of Personality Assessment, 68, 532–560. doi:10. 1207/s15327752jpa6803_5 Green, S. B. (1983). Identifiability of spurious factors using linear factor analysis with binary items. Applied Psychological Measurement, 7, 139–147. doi:10.1177/014662168300700202 Green, S. B., Akey, T. M., Flemming, K. K., Hershberger, S. L., & Marquis, J. G. (1997). Effect of the number of scale points on chi-square fit indices in confirmatory factor analysis. Structural Equation Modeling., 2, 108–120. doi:10.1080/ 10705519709540064 Hayashi, K., Bentler, P. M., & Yuan, K.-H. (2007). On the likelihood ratio test for the number of factors in exploratory factor analysis. Structural Equation Modeling, 14, 505–526. doi:10.1080/ 10705510701301891 Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66, 393–416. doi:10.1177/0013164405282485 Holgado-Tello, F. P., Chacón-Moscoso, S., Barbero-García, I., & Abad, E. V. (2010). Polychoric versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal variables. Quality & Quantity, 44, 153–166. doi:10.1007/s11135-008-9190-y Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185. doi:10.1007/BF02289447 Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. doi:10.1080/ 10705519909540118 Jennrich, R. I. (2002). A simple general method for oblique rotation. Psychometrika, 67, 7–19. doi:10.1007/BF02294706 Jennrich, R. I., & Sampson, P. F. (1966). Rotation to simple loadings. Psychometrika, 31, 313–323. doi:10.1007/BF02289465 Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141– 151. doi:10.1177/001316446002000116 Kaplan, D. (2009). Structural equation modeling: Foundations and extensions (2nd ed.). Thousand Oaks: Sage. Larsen, R., & Warne, R. T. (2010). Estimating confidence intervals for eigenvalues in exploratory factor analysis. Behavior Research Methods, 42, 871–876. doi:10.3758/BRM.42.3.871 MacCallum, R. C., Widaman, K. F., Preacher, K. J., & Hong, S. (2001). Sample size in factor analysis: The role of model error. Multivariate

B e h a v i o ra l Re s e a rch , 3 6, 6 11 – 63 7 . d oi :1 0 . 1 2 0 7/ S15327906MBR3604_06 Marsh, H. W., Liem, G. A., Martin, A. J., Morin, A. J. S., & Nagengast, B. (2011). Methodological measurement fruitfulness of exploratory structural equation modeling (ESEM): New approaches to key substantive issues in motivation and engagement. Journal of Psychoeducational Assessment, 29, 322–346. doi:10.1177/ 0734282911406657 Marsh, H. W., Muthén, B. O., Asparouhov, T., Lüdtke, O., Robitzsch, A., Morin, A. J., & Trautwein, U. (2009). Exploratory structural equation modeling, integrating CFA and EFA: Application to students’ evaluations of university teaching. Structural Equation Modeling, 16, 439–476. doi:10.1080/10705510903008220 McDonald, R. P. (1974). Difficulty factors in binary data. British Journal of Mathematical and Statistical Psychology, 27, 82–99. doi:10.1111/ j.2044-8317.1974.tb00530.x Mulaik, S. A., & Millsap, R. E. (2000). Doing the four-step right. Structural Equation Modeling, 7, 36–73. doi:10.1207/ S15328007SEM0701_02 Muthén, B. O. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132. doi:10.1007/BF02294210 Muthén, L.K., & Muthén, B.O. (1998–2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Authors. Quiroga, A.M. (1992). Studies of the polychoric correlation and other correlation measures for ordinal variables. Unpublished doctoral dissertation, Acta Universitatis Upsaliensis, Uppsala, Sweden. Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29, 304–321. doi:10.1177/ 0734282911406653 Sun, J. (2005). Assessing goodness of fit in confirmatory factor analysis. Measurement and Evaluation in Counseling and Development, 37, 240–256. Timmerman, M. E., & Lorenzo-Seva, U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16, 209–220. doi:10.1037/a0023353 Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41, 321–327. doi:10. 1007/BF02293557 Weng, L.-J., & Cheng, C.-P. (2005). Parallel analysis with unidimensional binary data. Educational and Psychology Measurement, 65, 697– 716. doi:10.1177/0013164404273941 Yang, Y., & Green, S. B. (2011). Coefficient alpha: A reliability coefficient for the 21st century? Journal of Psychoeducational Assessment, 29, 377–392. doi:10.1177/0734282911406668 Yu, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Unpublished doctoral dissertation, University of California, Los Angeles, CA. Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432–442. doi:10.1037/0033-2909.99.3.432

On the number of factors to retain in exploratory factor analysis for ordered categorical data.

Conducting exploratory factor analysis (EFA) using statistical extraction methods has been recommended, but little is known about the accuracy of the ...
349KB Sizes 2 Downloads 9 Views