Original Article Received 3 January 2012,

Revised 5 September 2012,

Accepted 16 September 2012

Published online 21 November 2012 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/jrsm.1063

Synthesizing regression results: a factored likelihood method Meng-Jia Wua*† and Betsy Jane Beckerb Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported in the regression studies to calculate synthesized standardized slopes. It uses available correlations to estimate missing ones through a series of regressions, allowing us to synthesize correlations among variables as if each included study contained all the same variables. Great accuracy and stability of this method under fixed-effects models were found through Monte Carlo simulation. An example was provided to demonstrate the steps for calculating the synthesized slopes through sweep operators. By rearranging the predictors in the included regression models or omitting a relatively small number of correlations from those models, we can easily apply the factored likelihood method to many situations involving synthesis of linear models. Limitations and other possible methods for synthesizing more complicated models are discussed. Copyright © 2012 John Wiley & Sons, Ltd. Keywords:

meta-analysis; synthesis; regression; linear models; likelihood

1. Introduction Meta-analysis has continuously garnered ever more attention over the past 30 years due to its power for examining accumulated evidence. Although methods for synthesizing studies involving mean differences, correlations, and odds ratios are well documented and developed (e.g., Cooper, 1998; Cooper et al., 2009; Hunter & Schmidt, 2004; Lipsey & Wilson, 2001; Sutton et al., 2000), methods for synthesizing studies using regression analyses remain underdeveloped and in need of further study. Because regression analyses are widely used by researchers in different fields, excluding regression studies because of a lack of methods to synthesize them sabotages our efforts to thoroughly understand complex research questions. The essential feature of meta-analysis is extracting the same type of effect size from studies focusing on the same research question to ensure that results from studies in a synthesis are comparable. As (Lipsey and Wilson, 2001) noted, ‘The effect size statistic produces a statistical standardization of the study findings such that the resulting numerical values are interpretable in a consistent fashion across all the variables and measures involved’ (p. 4). One possible effect size for regression studies is the standardized slope for the predictors in the models (Kim, 2011), because the unit – the standard deviation – is the same across studies. The standardized slope for a predictor quantifies the relationship between the predictor and the outcome variable in standard deviation units while controlling for other variables in the model. One potential problem that arises when synthesizing standardized slopes is that the models examined in different studies are usually not identical in terms of the predictors included. Therefore, the focal standardized slopes are not totally comparable across studies because the effects of different variables are partialed out in the collected models. This problem becomes complicated quickly when models contain many predictors. Unless the extra predictors in the models are absolutely independent of the focal predictor (which is virtually never true), comparing the slopes from non-identical models is like comparing apples to oranges. One potential solution for synthesizing the effect of a focal predictor across models that contain different predictors is to include only models that use the same variables. However, this is an unrealistic expectation particularly in fields where large numbers of variables are typically used to investigate a phenomenon. An a

127

Loyola University Chicago, School of Education, Chicago, IL, USA Florida State University, College of Education, Tallahassee, FL, USA *Correspondence to: Meng-Jia Wu, School of Education, Loyola University Chicago, Chicago, IL, USA. † E-mail: [email protected] b

Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

alternative solution is to ignore the fact that the models are different. Greenwald et al. (1996) summarized the strength of the relationship between school inputs and outputs studied in the format of educational production functions. They focused on the fully and half-standardized regression coefficients representing school resources and per pupil expenditures from these production functions to overcome problems arising from scale differences on the measured variables in the functions. Although their approach solved the issue of the scale differences among the slopes, no adjustments were made to accommodate the structural differences (i.e., different variables involved) across functions. Thus, the problem of having different models still remained. A more recent method for synthesizing results from regression studies involves converting regression slopes to correlations then using conventional methods for synthesizing these converted correlations. Peterson and Brown (2005) searched 35 journals from psychology, consumer behavior, management, marketing, and sociology dated 1975–2001 and identified 143 articles that reported both standardized slopes and correlations. Given the relationships shown in the collected articles, the authors derived the equation r = 0.98 b * + 0.05 l to convert standardized slopes (b* values) into correlations, where l is an indicator variable that equals 1 when b* is positive and 0 when b* is negative. The authors noticed that the relationship between slopes and correlations can be impacted by the number of predictors in the regression model. However, they did not propose a solution for that in their formula, and their adjustment for the slopes is so general that it may lead to bias in the synthesis results. Other approaches to synthesizing the results from regression models use generalized least squares methods (Becker & Wu, 2007). Using multivariate methods allows the meta-analyst to synthesize standardized slopes with the estimated covariance among slopes to account for dependence among the predictors. It also accounts for the unequal variances of the slopes. Becker and Wu noted, however, that the interpretations of the slopes from regression models with only a subset of predictors would not be exactly the same as those from models with all predictors. In addition, the method requires covariance estimates, which can be very complicated when models involve several predictors. Instead of focusing on synthesizing slopes from regression models, our approach to combining the correlation matrices that the regressions are based on provides an alternative method for synthesizing regression results. The pooled correlation matrix can then be used to fit a regression model or other types of linear model. Furlow and Beretvas (2005) conducted a series of simulations to examine and compare the performance of two multivariate approaches proposed by Becker (1992) and Cheung (2001) that accommodate models with different predictors to the univariate approach discussed by Hedges and Olkin (1985). The summarized correlation matrices from Furlow and colleague’s simulations were used to fit structural equation models to test the bias of the approaches. They found that the results based on the multivariate methods with an average weight scheme consistently performed the best among all the tested methods. Similar to the drawback discussed by Becker and Wu (2007), these methods required the estimates of complicated covariances. Cheung and Chan (2005) proposed a two-stage approach to synthesize structural equation models. Their first step was to synthesize correlation matrices using the concept of confirmatory factor analysis and assuming the factor correlation matrices from different studies were all equal. Like other approaches discussed earlier, the synthesized correlation matrices can be used to fit linear models. Although this method can potentially handle the correlation matrices from regression studies with different predictors, it requires some knowledge of factor analysis as well as programs for structural equation modeling to conduct it. Therefore, it may not appeal for broad application. In this paper, we present a factored likelihood (FL) method, originally designed for handling missing data, for synthesizing the results from non-identical regression models. Instead of synthesizing regression slopes directly, this easily calculated method focuses on the correlations the slopes are based on and does not require any specific software. We first link the problem of non-identical models to the issue of missing data in Section 2. In Section 3, we introduce the FL method using the motivating example illustrated in the previous section. In Section 4, we provide an empirical example to demonstrate the application of the FL method. In Section 5, we test the accuracy of the FL method through Monte Carlo simulation. All the demonstrations in this study are based on a simple scenario to convey the central concepts. Possible solutions for more complicated cases are discussed in Section 6.

2. Linking the problem of synthesizing models to missing-data issues

128

The problem of synthesizing a set of non-identical regression models is like analyzing a dataset with missing data. Consider a simple case where four regression studies are included in a synthesis. All of the models have the same outcome Yki, where k is study number (k = 1 to 4) and i represents participant i in study k. Study 1 contains only the focal predictor X1; study 2 contains X1 and a second predictor X2; study 3 contains X1, X2, and X3; study 4 contains ^ s) are shown in column 1 of Figure 1. X1, X2, X3, and X4. The four estimated regression models with standardized slopes (B Figure 1 illustrates how the data from these four regression models can be formulated as a missing-data problem. The columns to the right of the models show the data structures for each of the models. When the original data from the four studies are concatenated as shown in Figure 1, the data from these four studies (assuming scores are standardized) constitute a multivariate dataset with five variables (Y, X1, X2, X3, and X4). When a variable is not included in a study, the data for this missing variable result in a missing block in the dataset. For example, there is a small block of missing X2 values and a larger block of missing X4 values in this dataset. When the main interest in synthesizing these four regression models is to explore the relationship between the outcome Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

Figure 1. The data structure for the four studies.

Y and focal predictor X1 while controlling for the rest of variables, it is inappropriate to combine the standardized slopes of X1 from the four studies directly because the focal slope has different meaning across studies when different variables are involved in the regression models. Rather than focusing on the slopes, the FL method focuses on the correlations among the variables for synthesizing regression results. Because slopes are functions of the correlations, which do not differ when different predictors are present, the FL method is able to combat the problem associated with non-identical models discussed in Section 1. This method uses available correlations to estimate missing ones through a series of regressions. This allows us to synthesize correlations among variables as if each study contained all the same variables. The synthesized correlations are then used to calculate the standardized slopes for all predictors. Because the synthesized slopes are standardized, the relative importance of the variables in the final model can be assessed as well.

3. The factored likelihood method

Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

129

The pattern of missingness in Figure 1 is the ‘monotone’ missing pattern described by Little and Rubin (2002). One method for obtaining the estimated parameters under the monotone missing pattern is to use factored likelihood estimates (Anderson, 1957). Unlike many techniques that impute the values for missing data before estimating the parameters, this method uses a series of regressions on the basis of the available data to adjust the impact from missing variables. This procedure decomposes the original estimation problem into a set of smaller estimation problems by factoring the likelihood of the observed data into a product of likelihoods whose parameters are distinct (Dempster, 1969). One important assumption for applying the FL method is that the mechanism causing predictors to be missing (which produces blocked missing data) is assumed missing at random (MAR). As first described by Rubin (1976) and later elaborated by Little and Rubin (2002), a missing-data mechanism is MAR if the missingness does not depend on the components that are missing and is related to the observed components. In the current application in meta-analysis, MAR suggests that a predictor that is missing from a regression model has nothing to do with the values of that missing predictor. It implies the missing predictor could relate to the outcome and other predictors in the model with the same relationships observed in other studies that include that predictor. This is a reasonable assumption and happens frequently when we include studies based on large-scale (e.g., government) datasets in which many variables are measured, along with other smaller-scale studies in which fewer variables are measured because of the constraints of time and money. If the missing variables were to be measured and included in the models on the basis of the smaller-scale studies, they would have relationships with other variables similar to those seen in the studies where they are included. Therefore, the relationships among variables we observe can be ‘borrowed’ through the FL method, which is described in detail in the paragraphs that follow.

M.-J. WU AND B. J. BECKER

For the example illustrated in Figure 1, expression (1) shows the density functions of these data. The parameters of interest in the dataset with missing data can be estimated using the existing relationships between observed variables. The likelihood is n4

ΠNðY4i ; X41i ; X42i ; X43i ; X44i jmY ;m1 ;m2 ;m3 ;m4 ;s2Y ; s21 ; s22 ; s23 ; s24 ; sY1 ; sY2 ; sY3 ; sY4 ; s12 ; s13 ; s14 ; s23 ; s24 ; s34 Þ i¼1 n3

ΠNðY i¼1 n2

2 2 2 2 3i ; X31i ; X32i ; X33i jmY ;m1 ;m2 ;m3 ;sY ; s1 ; s2 ; s3 ; sY1 ; sY2 ; sY3 ; s12 ; s13 ; s23 Þ

ΠNðY i¼1 n1

2 2 2 2i ; X21i ; X22i jmY ;m1 ;m2 ;sY ; s1 ; s2 ; sY1 ; sY2 ; s12 Þ

ΠNðY i¼1

2 2 1i ; X11i jmY ;m1 ;sY ; s1 ; sY1 Þ

n1 þn2 þn3 þn4

¼

Π i¼1

NðYi ; X1i jmY ;m1 ;s2Y ; s21 ; sY1 Þ

n2 þn3 þn4

Π i¼1

NðX2i jb20Y1 þ b2Y1 Yi þ b21Y X1i ; s22Y1 Þ

n3 þn4

Π NðX i¼1

3i jb30Y12

þ b3Y12 Yi þ b31Y2 X1i þ b32Y1 X2i ; s23Y12 Þ

n4

ΠNðX i¼1

4i jb40Y123

þ b4Y123 Yi þ b41Y23 X1i þ b42Y13 X2i þ b43Y12 X3i ; s24Y123 Þ: (1)

The first four lines in expression (1) show the product of the density functions of the four studies. Each study contains a different number of predictors (Xs). The parameters – means (mk s), variances (s2k s), and the covariances (skk ’ s) – are on the right sides of the density functions. The next four lines show the FL of the observed data. It is the product of the density functions based on available data for the four variables. By switching the focus from studies (in the first four lines) to variables (in the next four lines) and forming the density functions, the original parameters (mk, s2k, and skk ’) are transformed so the likelihood function is factorized into components that separate ms, s2, and bs through a series of regressions. Distinguishing the factors permits the independent maximization of the likelihood of the variables (Y and Xs) on the basis of the available data. An easy way to obtain the regression slopes and the variances in expression (1) to estimate the parameters (i.e., ms, s2, and bs) is through the sweep operator and reverse sweep operator (Dempster, 1969). The definition of the sweep operator is included in the Appendix. Through the sweep operator, the FL in expression (1) yields ML ^ Specifically, ^ and the variance-covariance matrix (Σ). estimates of the synthesized mean vector ( m)

"m^ #

2

6 6 ^1 m 6 ^ 2 ; ^Σ ¼ 6 ^ ¼ m m 6 6 ^3 m 4 ^4 m Y

^ 2Y s

^ Y1 s ^ 21 s

^ Y2 s ^ 12 s ^ 22 s

^ Y3 s ^ 13 s ^ 23 s ^ 23 s

3 ^ Y4 s 7 ^ 14 7 s 7 ^ 24 7 s 7: 7 ^ 34 5 s ^ 24 s

(2)

130

To apply this approach in order to synthesize regression results, we assume the data from each study were standardized to have means equal to zero and standard deviations equal to one for each variable. When the data are merged as shown in Figure 1, the mean vector (^ m) for the dataset contains all zeros, and the variances (^ s jj ) of ^ are expected to be 1s. The covariances in the upper variables Y, X1, X2, X3, and X4 on the diagonal of matrix Σ triangle (^ s jj0 ) in expression (2) will be the synthesized correlations among variables. The resulting correlation matrix can then be used to calculate the standardized slope for each variable through simple matrix algebra, as described in the paragraphs that follow and in the Appendix. The standardized slopes would be the final result for the synthesis. Expression (1) suggests that the data from the included studies are all from the same population, as they share the same means and variances. In other words, we assume the correlation matrices are homogeneous when applying the FL method. To test the homogeneity of the correlation matrices, Kullback (1967) provides general procedures and illustrative examples using w2 statistics. Two major methods that are frequently used in meta-analysis context (Hedges & Olkin, 1985; Hunter & Schmidt, 2004) are well discussed by Becker (2009) and Cheung and Chan (2005). Cheung and Chan also discuss a Bonferroni adjustment approach to control for the overall Type I error that was proposed by Cheung (2001). If the correlation matrices are heterogeneous, potential moderators can be investigated and used to create sets of homogeneous matrices before applying the FL method. Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

4. Empirical examination In this section, the application of the FL method is demonstrated with a meta-analysis of four regression models. To be consistent, this synthesis was designed according to the example illustrated in Section 2. The four models to be synthesized all have a common predictor (the focal predictor), and some have more variables. The four regression models were created using subset of samples from a large dataset, which served as the population with known parameters of interest (i.e., correlations among the variables and the regression slopes based on the variables). The results of the synthesis based on the subsamples are compared with the parameters to assess their accuracy. 4.1. Sample creation Four regression models were created using data from the National Education Longitudinal Study:1988 (NELS:88; Ingels et al., 1998). A sample of 2508 students in 10th grade in 1990 with complete data on first follow-up standardized math scores (F1math), base year standardized math scores (BYmath), socioeconomic status (SES), whether the student’s teacher has a bachelor’s degree in math or not (BSdegree), and 10th grade dropout rate (Drop) of the school the student attends was randomly selected. F1math is the outcome for this example. The other four variables are used as predictors. These predictors represent the effects from the student (BYmath), and their family (SES), teacher (BSdegree), and school (Drop). These represent four dimensions that researchers have studied extensively in models of student achievement in previous regression studies using the NELS:88 dataset (Wu & Becker, 2004). Four samples were randomly drawn from the complete dataset of 2508 cases to create four regression models with different numbers of predictors. The first study uses BYmath as the only predictor; a second study contains both BYmath and SES as predictors; the third study contains BYmath, SES, and BSdegree as predictors; and the fourth study uses all four predictors. According to Green (1991), the suggested sample size (n) for a regression model with 0.8 power would be n = 50 + 8 * p, where p is the number of predictors in the model. Therefore, the sample sizes for our four studies are 58 (50 + 8 * 1 for study 1), 66 (50 + 8 * 2 for study 2), 74 (50 + 8 * 3 for study 3), and 82 (50 + 8 * 4 for study 4), respectively. The sample sizes and the correlations among the variables for the each of the four subsets and for the complete dataset are shown in Table 1. Our goal is to estimate the model including all the four dimensions mentioned previously, which is Z^ YF1math i ¼ ^ 1 ZX ^ 2 ZXSES i þ B ^ 3 ZX ^ 4 ZXDrop i , where the B ^ values are the estimated standardized slopes for the B þB þB BYmath i

BSdegree i

predictors. 4.2. Steps in applying the factored likelihood method using sweep operators Applying the FL method requires only two pieces of information from each study: the correlations among the variables and the sample size. Through this method, the correlations are synthesized as if all the studies have the same variables. The impact of the correlations that were missing in the studies is estimated on the basis of a series of regressions on the available correlations. These synthesized correlations can be used to calculate the standardized slopes for the regression model. These standardized slopes measure not only the strength of the relationship between the focal predictor and the outcome but also the relative importance of the predictors in the model. Specific steps in the calculation process of the FL method through the sweep operator are presented in the Appendix. For the current example, the calculation steps are illustrated as follows. The notation in the following description parallels that described in the Appendix. The first step to implement the FL approach was to find the maximum likelihood estimate of the correlations between the variables that are used in all four studies included in the synthesis. In this example, F1math (Y) and Table 1. Correlations among five variables. F1math F1math

BYmath

SES

0.861 0.874 0.884 0.856 1

(n1 = 58) (n2 = 66) (n3 = 74) (n4 = 82)

SES

0.441

0.418

0.456 0.407 0.466 0.511 0.433 0.370 1

0.180 0.082

0.136 0.133

0.077 0.176

0.867 (n = 2508)

BSdegree

0.323 0.135 0.308 0.046 0.196 0.064 1 0.018

Drop

0.074 0.094 0.118 0.042 1

Note. Elements in the upper triangle are correlations based on each of the four randomly selected samples; elements in the lower triangle are correlations based on the total sample of 2508 students. Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

131

BSdegree Drop

1

BYmath

M.-J. WU AND B. J. BECKER

BYmath (X1) are in all four studies. The weighted mean correlation is the estimated maximum likelihood of the correlation in this example, which was, r Y1 ¼ ð0:861  58 þ 0:874  66 þ 0:884  74 þ 0:856  82Þ=ð58 þ 66 þ 74 þ 82Þ ¼ 0:869: The estimated value was stored in the matrix form and denoted as A. F1math BYmath   1 0:869 A¼ : 0:869 1 ^ 2Y:Y1 and B ^ 21:Y1) and The second step is to find the maximum likelihood estimates of the standardized slopes (B 2 error variance (^ s2:Y1 ) for regressing SES (X2), which is the second-most-used variable, on F1math (Y) and B1math (X1) on the basis of the studies containing all those variables (i.e., studies 2, 3, and 4). Before those estimates were found, a correlation matrix, R234, was created to store the weighted mean correlations among variables F1math (Y), BYmath (X1), and SES (X2). r Y1 ¼ ð0:874  66 þ 0:884  74 þ 0:856  82Þ=ð66 þ 74 þ 82Þ ¼ 0:871 r Y2 ¼ ð0:456  66 þ 0:407  74 þ 0:466  82Þ=ð66 þ 74 þ 82Þ ¼ 0:443 r 12 ¼ ð0:511  66 þ 0:433  74 þ 0:370  82Þ=ð66 þ 74 þ 82Þ ¼ 0:433:

R234

2F1math BYmath SES3 1 0:871 0:443 ¼ 4 0:871 1 0:433 5: 0:443 0:433 1

To obtain the standardized slopes and the error variance, F1math (Y) and BYmath (X1) were swept out of R234, as described in Section 1 in the Appendix. This sweep-out process was recorded as follows: 2 3 2 3 1 0:871 0:443 4:134 3:599 0:275 SWP½Y;14 0:871 1 0:433 5 ¼ 4 3:599 4:134 0:194 5: 0:443 0:433 1 0:275 0:194 0:794 ^ 21:Y = 0.194, ^ 2Y:1 = 0.275, B The last column/row in the swept-out matrix shows the estimates of interest, which are B 2 ^2:Y1 and s = 0.794. Next, F1math (Y) and BYmath (X1) were swept out of matrix A to obtain a new matrix, denoted as B. Specifically,     1 0:869 4:075 3:540 B ¼ SWP½Y;1 ¼ : 0:869 1 3:540 4:075

^ 21:Y ) and error ^ 2Y:1 and B The matrix B then was augmented with the estimated standardized slopes ( B 2 variance (^ s2:Y1 ) obtained earlier to form a new matrix C. 2 3 4:075 3:540 0:275 C ¼ 4 3:540 4:075 0:194 5: 0:275 0:194 0:794 The matrix C is the matrix that would be obtained when sweeping the rows and columns 1 and 2 when we have a 3 * 3 matrix. ^ 31:Y2, and B ^ 32:Y1) The next step was to find the maximum likelihood estimates of the standardized slopes (^B 3Y:12, B 2 and error variance (^ s3:Y12 ) for regression of BSdegree (X3), which was the third-most-used variable, on F1math (Y), B1math (X1), and SES (X2). Again, to use the sweep operator to obtain these estimates, another correlation matrix, S34, was created with weighted mean correlations among variables F1math (Y), BYmath (X1), SES (X2), and BSdegree (X3), on the basis of studies 3 and 4. That is, r Y1 ¼ ð0:884  74 þ 0:856  82Þ=ð74 þ 82Þ ¼ 0:869; r Y2 ¼ ð0:407  74 þ 0:466  82Þ=ð74 þ 82Þ ¼ 0:438; r Y3 ¼ ð0:323  74 þ 0:135  82Þ=ð74 þ 82Þ ¼ 0:224; r 12 ¼ ð0:433  74 þ 0:370  82Þ=ð74 þ 82Þ ¼ 0:400; r 13 ¼ ð0:308  74 þ 0:046  82Þ=ð74 þ 82Þ ¼ 0:170; r 23 ¼ ð0:196  74 þ ð0:064Þ  82Þ=ð74 þ 82Þ ¼ 0:059; and

R34

132

F1math 2 1 6 0:869 ¼6 4 0:438 0:224

BYmath SES 0:869 0:438 1 0:400 0:400 1 0:170 0:059

BSdegree 3 0:224 0:170 7 7: 0:059 5 1

To obtain the slopes of interest in this step, F1math (Y), BYmath (X1), and SES (X2) were swept out of R34. Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

F1math 2 1 6 0:869 6 SWP½Y; 1; 2 4 0:438 0:224

BYmath SES BS degree 3 2 4:262 3:522 0:459 0:869 0:438 0:224 6 1 0:400 0:170 7 7 ¼ 6 3:522 4:100 0:097 0:097 1:240 0:400 1 0:059 5 4 0:459 0:329 0:097 0:046 0:170 0:059 1

3 0:329 0:097 7 7: 0:046 5 0:946

^ 3Y:12 = 0.329, B ^ 31:Y2 = 0.097, B ^ 32:Y1 = 0.046, The last column in the matrix shows the estimates, which are B 2 ^3:Y12 and s = 0.946. Then SES (X2) was swept out of matrix C to obtain a new matrix D, 2 3 2 3 4:075 3:540 0:275 4:170 3:473 0:346 SWP½24 3:540 4:075 0:194 5 ¼ 4 3:473 4:122 0:224 5 ¼ D; 0:275 0:194 0:794 0:346 0:224 1:259 ^ 31:Y2, and B ^ 32:Y1) and error variance ^ 3Y:12, B and matrix D was augmented with the estimated standardized slopes (B 2 (^ s3:Y12 ), and for a new matrix denoted as E, 2 3 4:170 3:473 0:346 0:329 6 3:473 4:122 0:244 0:097 7 7: E¼6 4 0:346 0:244 1:259 0:046 5 0:329 0:097 0:046 0:946 ^ 41:Y23, B ^ 42:Y13, and B ^ 43:Y12) Next, we find the maximum likelihood estimates of the standardized slopes (^B 4Y:123, B 2 and error variance (^ s4:Y123 ) for regressing Drop (X4) on F1math (Y), B1math (X1), SES (X2), and BSdegree (X3) on the basis of the studies with all those variables in the models. Because only the last sample uses all five variables, the sweep operation was applied to the correlation matrix on the basis of study 4 only. The outcome F1math (Y), BYmath (X1), SES (X2), and BSdegree (X3) were swept out of the correlation matrix of study 4. 2

F1math

BYmath SES BSdegree Drop3 2 3 0:466 0:135 0:074 4:361 3:415 0:800 0:483 0:113 6 0:190 0:297 0:143 7 0:370 0:046 0:094 7 7 7 6 3:415 3:839 6 0:800 0:190 ¼ 1:314 0:183 0:121 7 1 0:064 0:118 7 7: 7 6 0:183 1:063 0:058 5 0:064 1 0:042 5 4 0:483 0:297 0:113 0:143 0:121 0:058 0:978 0:118 0:042 1

1 0:856 6 0:856 1 6 SWP½Y; 1; 2; 3 6 0:466 0:370 6 4 0:135 0:046 0:074 0:094

^ 4Y:123 = 0.113, The last column in the previous matrix shows the estimates of interest in this step, which were B 2 ^ ^ ^ ^4:Y123 = 0.978. B 41:Y23 = 0.143, B 42:Y13 = 0.121, B 43:Y12 = 0.058, and s Then, BSdegree (X3) was swept out of matrix E to obtain a new matrix F, 2 3 2 3 4:284 3:507 0:362 0:348 4:170 3:473 0:346 0:329 6 3:473 4:122 0:244 0:097 7 6 3:507 4:132 0:239 0:103 7 7¼6 7 ¼ F; SWP½36 4 0:346 0:239 1:261 0:048 5 0:244 1:259 0:046 5 4 0:362 0:348 0:103 0:048 1:058 0:329 0:097 0:046 0:946 and the matrix F was augmented, and the new matrix was denoted as G. 2

4:284 6 3:507 6 G¼6 6 0:362 4 0:348 0:113

3:507 0:362 0:348 4:132 0:239 0:103 0:239 1:261 0:048 0:103 0:048 1:058 0:143 0:121 0:058

3 0:113 0:143 7 7 0:121 7 7: 0:058 5 0:978

To obtain the maximum likelihood estimate of the correlation matrix of F1math (Y), BYmath (X1), SES (X2), BSdegree (X3), and Drop (X4), the reverse sweep operation was applied to the matrix G. F1math BYmath SES 4:284 3:507 0:362 6 3:507 4:132 0:239 6 RSW½3; 2; 1; Y 6 0:239 1:261 6 0:362 4 0:348 0:103 0:048 0:113 0:143 0:121 2

BSdegree Drop3 2 1 0:348 0:113 6 0:103 0:143 7 7 6 0:869 6 0:048 0:121 7 7 ¼ 6 0:443 1:058 0:058 5 4 0:224 0:078 0:058 0:978

0:869 1 0:432 0:169 0:107

0:443 0:224 0:432 0:169 1 0:058 0:058 1 0:137 0:064

3 0:078 0:107 7 7 0:137 7 7: 0:064 5 1

Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

133

Expression (3) summarizes the process described previously. It starts with the inner section of the expression and goes outward. First, the weighted correlation between the most frequently observed variables X1 and Y (¯r = 0.869) was calculated on the basis of studies 1, 2, 3, and 4. Then the second most observed variable, X2, was regressed on X1 and Y. This is performed by sweeping X1 and Y out of the weighted correlation matrix on the basis of studies 1, 2,

M.-J. WU AND B. J. BECKER

and 3 that contains Y, X1, and X2. This step calculates the standardized slopes for Y and X1 as well as the variance of the residuals for this regression, which relate to the parameters in the second line in the second half of expression (1). The slopes of Y and X1, and the residual variance of this regression are 0.275, 0.194, and 0.794, respectively. Then X3 was regressed on Y, X1, and X2 on the basis of the studies containing these variables. The process continues until the least observed variable is regressed on the more frequently observed ones. The reverse sweep operator was then used to reverse the matrix (i.e., the RSW[3, 2, 1, Y] step) with estimated slopes and residuals to obtain the synthesized correlations. After the reversal, the diagonal elements in the summarized correlation matrix need to be adjusted to 1 (if they were not 1s) before the matrix is used to calculate the standardized regression coefficients for the final model. In this example, the correlations for each variable with itself were all 1s, and no adjustment was needed. The whole calculation process can be summarized as follows:

2 F1math BYmath SES BSdegree 1 0:869 0:443 0:224 6 0:869 1 0:432 0:169 6 ¼6 0:432 1 0:058 6 0:443 4 0:224 0:169 0:058 1 0:078 0:107 0:137 0:064 The standardized slope for the variables summarized correlation matrix is set up as 2 1 ^r Y1 6 ^r 6 Y1 1 6^ 6 r Y2 ^r 12 6 4 ^r Y3 ^r 13 ^r Y4 ^r 14

Drop 3 0:078 0:107 7 7 0:137 7 7: 0:064 5 1

(3)

in the model can be calculated using simple algebra. When the 3 ^r Y2 ^r Y3 ^r Y4 ^r 12 ^r 13 ^r 14 7 7  1 7 ^ ^ 1 r 23 r 24 7 ¼ R21 7 ^r 23 1 ^r 34 5 ^r 24 ^r 34 1

 R12 ; R11

the vector containing the synthesized slopes (B) can be calculated as B = R1 11 R12 (Cooley & Lohnes, 1971, p. 55). The R-squared for the synthesized model (R2Y ) can be obtained as R12 * B. In this example, R2Y is 0.768. The results of the synthesis are presented in Table 2. The estimated standardized slopes based on the FL method are very close to the estimates from the complete sample. This method is particularly accurate at estimating the slope for BYmath, which was the most frequently observed predictor. The estimated slope of the least frequently observed variable Drop was not adjusted because only one regression model (study 4) used that predictor. That is, the ^ 4 = 0.027) is the same as the slope estimated only on data from study 4. The R-squared synthesized slope of Drop (B value for this model is 0.768, which indicated that 76.8% of the variation in the outcome F1math can be explained by the four predictors together. The standardized slopes also allow comparisons of importance among the variables. In this example, BYmath ^ 1 = 0.820, SE = 0.033) appears to be the most important predictor because it shows by far the largest slope of the four. (B ^ 2 = 0.087, SE = 0.036) and BSdegree (B ^ 3 = 0.082, SE = 0.040) are similar in strength, and their slopes are very small SES (B ^ 4 = 0.027, SE = 0.055) is the least important predictor in this compared with that for BYmath. The predictor Drop ( B model. The standard error for each predictor (SEbv ) was estimated by Cohen et al. (2003) (p. 86) rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffisffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1  R2Y 1 SEbv ¼ ; n  k  1 1  R2BYmath where R2v is the partial R-squared for each predictor. It can be obtained the same way as we obtained the model R-squared earlier. The R-squared based on the synthesized correlation matrix shown in expression (3) is

134

Table 2. Standardized regression coefficients estimated from complete sample, and FL estimates. N BYmath SES BSdegree

Drop

Complete FL

0.046 0.027

2508 280

Copyright © 2012 John Wiley & Sons, Ltd.

0.822 0.820

0.101 0.087

0.062 0.082

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

the R2Y used for calculating the SEs. The sample size n and the number of studies k for calculating the SE for each predictor are based on the studies that include this predictor. For example, the variable BYmath appeared in four studies with a total sample size of 280; thus, its SE is calculated as SEbBYmath ¼

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1  0:768 1 ¼ 0:033: 280  4  1 1  0:209

Because BYmath appeared in all four studies, the sample size is the largest among the four predictors (nBYmath = 280) and the SE is the smallest. Drop appeared only in study 4, and the SE was estimated with the smallest sample size (nDrop = 82); the SE is the largest, and the slope is not significantly different from 0 (95% C.I. = 0.081 0.135) in this case.

5. Simulation In this section, a Monte Carlo simulation is performed to examine the accuracy of the FL method under different scenarios. The parameters that varied in the scenarios were the number of predictors in each model (p), the intercorrelations among the predictors and the outcome (rs), and the sample sizes of the studies included in synthesis (ns). The number of predictors in the final model (four) and the number of studies included in the synthesis (four) were held constant. SAS/IML software was used to generate subject-level data based on the assumption of normality within each study included in the synthesis. The Cholesky decomposition was used to obtain data with the desired relationships defined in the intercorrelation matrix assigned to each study. Once the data for four studies with the desired sample sizes were obtained, the FL method was used to calculate the summarized correlation matrices. The standardized slopes for the four predictors and their standard errors were then computed on the basis of the summarized correlation matrices to demonstrate the precision and stability of this method. 5.1. Choice of parameters 5.1.1. Number of predictors. The number of predictors in the models in this simulation ranged from one (p = 1 is a simple regression or Pearson’s correlation) to four. For the purpose of the current research, four predictors are sufficient to capture the different patterns of missing predictors. Figure 2 shows five different sets of regression models (Patterns I, II, III, IV, and V), each with different missing predictor patterns. The shaded blocks for each of the four studies in each pattern indicate the predictors that were included in that study. For example, in Pattern I, the first study used only predictor X1 to predict the outcome Y, whereas the fourth study used predictors X1, X2, X3, and X4 to predict the outcome Y. 5.1.2. Intercorrelation matrix. The intercorrelation matrix contained the correlations among the outcome and predictor(s) in the study. In each matrix, the correlations were designed on the basis of one principle: If the model had more than one predictor, the correlation between any pair of predictors was set equal to or less than any correlation between any predictor and the outcome. This is consistent with the idea that multicollinearity is not a problem in the individual studies. According to Cohen (1988), correlations of 0.10, 0.30, and 0.50 are small, medium, and large, respectively. Therefore, in the first matrix, the largest correlation

Pattern I 1 2 3

Y

X1

X2

X3

X4

4

Pattern II 1 2 3

Y

X1

X2

X3

X4

Y

X1

X2

X3

X4

4 Y

X1

X2

X3

X4

Pattern V 1 2 3 4

Y

X1

X2

X3

X4

Pattern IV 1 2 3 4

135

Pattern III 1 2 3 4

Figure 2. Five sets of regression models with different numbers of predictors missing from studies.

Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

between a predictor and the outcome in the current simulation research was designed to be 0.60 to represent a large predictive effect for a variable. The smallest correlation between a predictor and the outcome was set to 0.25 to capture a weaker predictor but one potentially worth keeping in the regression model. The correlation between any pair of predictors was 0.25 or less to represent weak collinearity among the predictors. Four intercorrelation matrices (Rs) for the outcome Y and the four predictors X1, X2, X3, and X4 are presented on the left, and the corresponding standardized slopes (b s) and the R2 values based on the correlations are on the right. X2 X3 X4 2 Y X1 1 0:60 0:40 0:30 6 0:60 1 0:25 0:10 6 R1 ¼ 6 0:40 0:25 1 0:15 6 4 0:30 0:10 0:15 1 0:25 0:05 0:10 0:15 2

1 6 0:60 6 R2 ¼ 6 6 0:40 4 0:30 0:25 2

1 6 0:25 6 R3 ¼ 6 6 :30 4 :40 :60 2

1 6 0:25 6 R4 ¼ 6 6 0:30 4 0:40 0:60

3 0:25 0:05 7 7 0:10 7 7 0:15 5 1

b11 b12 b13 b14 R2

¼ 0:5161 ¼ 0:2253 ¼ 0:1886 ¼ 0:1734 ¼ 0:500

0:60 0:40 1 0 0 1 0 0 0 0

3 0:30 0:25 0 0 7 7 0 0 7 7 1 0 5 0 1

b21 b22 b23 b24 R2

¼ 0:6000 ¼ 0:4000 ¼ 0:3000 ¼ 0:2500 ¼ 0:673

0:25 0:30 1 0:15 0:15 1 0:1 0:15 0:05 0:1

3 0:40 0:60 0:1 0:05 7 7 0:15 0:1 7 7 1 0:25 5 0:25 1

b31 b32 b33 b34 R2

¼ 0:1734 ¼ 0:1886 ¼ 0:2253 ¼ 0:5161 ¼ 0:500

0:25 0:30 1 0 0 1 0 0 0 0

3 0:40 0:60 0 0 7 7 0 0 7 7 1 0 5 0 1

b41 b42 b43 b44 R2

¼ 0:2500 ¼ 0:3000 ¼ 0:4000 ¼ 0:6000 ¼ 0:673

Each correlation matrix was used to generate data for all four studies in the simulated meta-analysis to test the method under the fixed-effect conditions that the original method was designed for. 5.1.3. Sample size sets. According to Cohen and Cohen (1975), at least 124 participants are needed to maintain 0.80 power with a single predictor that in the population correlates with the dependent variable at 0.30. Therefore, in the current research, the minimal sample size for a study was set at 150. Because many studies adopt regression techniques to analyze data from large-scale datasets, the maximum sample size was set to 2000. Four sets of sample sizes for four studies were investigated: N1 ¼ f150; 150; 150; 150g; N2 ¼ f2000; 2000; 2000; 2000g; N3 ¼ f150; 500; 1000; 2000g; and N4 ¼ f2000; 1000; 500; 150g The first two sample size sets represented equal small (N1) and large (N2) samples in a synthesis. The other two sets represented unequal sample sizes across studies. The varied sample size sets combined with different numbers of predictors in the model allow us to examine the impact of missing rate of the variables. The design of the simulation yielded 80 scenarios (5 patterns * 4 model conditions * 4 sample size sets) for testing the FL method. In each scenario, the synthesized correlation matrix was calculated and used to obtain the standardized slopes, which serve as the results of the synthesis. The procedure was replicated 1000 times and produced 1000 syntheses for each of the 80 conditions. 5.2. Data analysis

136

The means of the slope estimates and their standard errors based on the 1000 replications in each condition were used to evaluate the FL method. The mean slopes were compared with the population values that were used to generate the ^ of each slope, defined as data. The relative percentage bias ðBðθÞÞ Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER   ^θ  θ B ^θ ¼  100% θ was also computed in each scenario to quantify the difference between the calculated value and the population  value. The θ is the population slope, and ^θ is the mean slope obtained from averaging the sample slopes from 1000 replications. Good estimation methods should have relative bias less than 5% (Hoogland & Boomsma, 1998). The standard errors were also examined to show the stability of the estimates. 5.3. Results The results of the simulation demonstrated that the FL method generally produced very stable estimates for fixed-effects models (Conditions 1 to 4). The mean estimates showed less than 2% relative percentage bias across all fixed-effects conditions. The average percentage bias values ranged from 0.038% for X1 to 0.103% for X2. No particular combination of conditions consistently produced better estimates. The scenarios that produced the most accurate estimates of the slope are summarized in Table 3. These scenarios produced 0% relative percentage bias on average, and the smallest standard errors among all the tested scenarios. The best ^B 1 occurred when there was no interrelationship among predictors (matrix R2) ^ also occurred and when the rest of the predictors X to X had equal missing rates (Pattern III). The best B 2

2

4

with Pattern III but with a higher missing rate on the variables that have higher correlations with the outcome ^ 3 occurred when there was no interrelationship among predictors and when the (matrix R3). The best B missing-data rate was higher for the variables that demonstrated higher correlations with the outcome (matrix ^ 4 occurred when only a small amount of data was missing on X4 and when X4 related most R4). The best B strongly to the outcome (matrix R3). All the standard errors were less than 0.0006, which demonstrated the stability of the results. Table 4 contains the scenarios that produced the most biased estimates for each of the predictors in this simulation. The missing rate, the population slope, the estimate slope, and the standard error for each variable for each scenario were reported. The first three predictors were estimated with the largest bias when huge amounts of data were missing for the predictors (Pattern III). The largest relative percentage bias for b1 (1.3%) was observed when X1 had the weakest relation to the outcome (R3). Slopes for predictors X2, X3, and X4 were all estimated more poorly when much data was missing (96%) for those predictors. The largest relative percentage bias for b2 (0.6%) was observed when there was no interrelationship among predictors (R4). The largest relative percentage bias for X3 (1.3%) was observed when there was no interrelationship among predictors (R4). The largest bias under the fixed-effects model (2%) occurred Table 3. Simulation results: the least biased estimates based on the fixed-effects models (conditions 1–4). ^ 2 ^ 3 ^ 4 ^ 1 B B B B ^ 1 Least biased B R2 Pattern III N3 Missing rate 0% 45% 45% 45% Parameter 0.6000 0.4000 0.3000 0.2500 FL estimate (SE) 0.6000 (0.000375) 0.4003 (0.000412) 0.3006 (0.000408) 0.2500 (0.000394) ^ 2 Least biased B R3 Pattern III N2 Missing rate Parameter FL estimate (SE) ^ 3 Least biased B R4 Pattern I N3 Missing rate Parameter FL estimate (SE)

75%

75%

75%

0.1734 0.1739 (0.000408)

0.1886 0.1886 (0.000534)

0.2253 0.2257 (0.000540)

0.5161 0.5170 (0.000457)

0%

4%

18%

45%

0.2500 0.2498 (0.000387)

0.3000 0.2998 (0.000405)

0.4000 0.4000 (0.000398)

0.6000 0.6002 (0.000387)

0%

0%

0%

25%

0.1734 0.1737 (0.000274)

0.1886 0.2253 0.1881 (0.000280) 0.2252 (0.000275) ^ s. Note. The bold numbers show the least biased estimates of the B Copyright © 2012 John Wiley & Sons, Ltd.

0.5161 0.5161 (0.000260)

Res. Syn. Meth. 2013, 4 127–143

137

^ 4 Least biased B R3 Pattern II N2 Missing rate Parameter FL estimate (SE)

0%

M.-J. WU AND B. J. BECKER

Table 4. Simulation results: the most biased estimates based on the fixed-effects models (Conditions 1–4). ^ 2 ^ 3 ^ 4 ^ 1 B B B B ^ 1 Most biased B R3 Pattern III N1 Missing rate 0% 75% 75% 75% Parameter 0.1734 0.1886 0.2253 0.5161 FL estimate (SE) 0.1757 (0.001521) 0.1891 (0.001933) 0.2272 (0.001870) 0.5318 (0.001706) ^ 2 Most biased B R4 Pattern III N4 Missing rate 0% 96% 96% 96% Parameter 0.25 0.3 0.4 0.6 FL estimate (SE) 0.2502 (0.001375) 0.3018 (0.001557) 0.3985 (0.001591) 0.6026 (0.001486) ^ 3 Most biased B R3 Pattern III N4 Missing rate 0% 96% 96% 96% Parameter 0.1734 0.1886 0.2253 0.5161 FL estimate (SE) 0.1736 (0.001424) 0.1894 (0.001925) 0.2223 (0.001994) 0.5176 (0.001728) ^ 4 Most biased B R2 Pattern IV N4 Missing rate 0% 0% 0% 96% Parameter 0.6 0.4 0.3 0.25 FL estimate (SE) 0.6001 (0.000705) 0.4003 (0.000708) 0.3002 (0.000675) 0.2550 (0.001407)  ^ Note. The bold numbers show the most biased estimates of the B s. when estimating the slope for X4 when X4 was the only variable that had a very high missing-data rate (96% based on sample size set N4) and had the weakest correlation with the outcome (R2). Even though these scenarios produced the most biased estimation in this simulation, the relative biases for all estimates were still very small.

6. Discussion

138

Improvements in the quantitative methods used in primary studies increase the difficulty of synthesizing those studies. Compared with the methods for synthesizing mean differences, correlations, and odds ratios, methods for synthesizing regressions or other forms of linear models are less understood, and further developments are still needed. This study linked the problem of synthesizing regression models with different predictors to the issue of missing data. Instead of using the standardized slope, an intuitively appealing effect size for regression, the proposed FL method focuses on the correlations that underlie the regression models. It solves the problem of using different measures, which usually have different scales to measure the same construct, and makes a synthesis possible. It is by using a series of steps to regress less-observed variables on more observed ones from the collected studies that this method allows the correlations among variables to be estimated on the basis of all available information. The synthesized correlations are then used to calculate standardized slopes for each of the predictors in the final model. The synthesized slopes based on this method not only quantify the relationship between the focal predictor and the outcome but also allow assessment of the relative importance of predictors in the model through comparisons of the standardized slopes. We demonstrated the use of the method using data from NELS, and the accuracy of this method in different conditions was tested through simulation. The results from the empirical example showed that the slopes estimated using the FL method were fairly accurate, especially for the focal predictor that was present in all the included studies. Minimally biased results were found on the basis of the simulation under several scenarios, which included different sample sizes, missing-data patterns, and correlations among variables. Our findings suggest that this method will be most effective when a researcher’s focus is on the relationship between the outcome and one particular predictor (the focal predictor) that appears in all the collected models. This method also allows the combination of regression studies with correlation studies that investigate bivariate relationships between the outcome and the focal variable. The ability to appropriately include correlation studies when synthesizing regression models would allow the meta-analyst to expand the number of studies to be included, thus increasing the power of the meta-analysis. The FL method can be used as the first stage in the two-stage structural equation modeling approach described by Cheung and Chan (2005) to pool the correlations from different studies. The synthesized correlations can also be used to fit other types of linear models. Moreover, the FL Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

method does not require any special statistical programming because all calculations are based on basic algebra. For researchers who are familiar with SPSS or SAS, both programs have the sweep operator command,1 which can potentially save calculation time. Without any special statistical software, the calculation can be carried out using a calculator or an Excel spreadsheet. Although the FL method worked well for providing accurate synthesized slopes, this method is constrained in the following ways. First, using FL estimation through the sweep operator requires the predictors included in the models in the synthesis to be arranged in a monotone pattern like any of those shown in Figure 2. Those patterns help to obtain the maximum likelihood of the correlations without an iterative process, which makes the FL method easier to use. As pointed out by Little and Rubin (2002), ‘the pattern of missing data is rarely monotone, but is often close to monotone’ (p. 6). The desired pattern might be obtained by rearranging the predictors in the models, or sometimes by omitting a relatively small number of predictors and their correlations. If the variables studied in the regression models for a synthesis are too diverse and no arrangement of the predictors can be made to obtain the desired patterns as mentioned previously, other methods for handling missing data might be workable to account for the issue of non-identical models when meta-analyzing regressions. Wu and Pigott (2008) synthesized a set of educational production functions using the Markov Chain Monte Carlo method, a Bayesian approach for conducting multiple imputations to handle missing data. Even though more investigation of their approach is needed, Wu and Pigott showed the utility of this method for synthesizing more complicated regression models. Second, using the FL method requires correlations among the variables from the primary studies. Unfortunately, information about the zero-order correlations is not always available or might be only partially reported. Without these correlations, the method cannot be applied. Bayesian approaches might provide a possible direction for obtaining the correlations needed for synthesizing regression studies based on other information, such as slopes reported in the regression studies. A possible solution might be using the Gibbs Sampler (Casella & George, 1992; Gelfand & Smith, 1990), which is based on elementary properties of Markov chains, to generate possible correlations based on the observed distributions of the slopes of regression models. More exploration in this domain will be needed. Third, the FL method assumes that a fixed-effects model applies to the included studies. Other than exploring potential moderators that may reduce existing variability, a better way to quantify and incorporate variability across non-identical regression studies (i.e., based on the random-effects model) is still needed. This is a potential topic for further investigation for the synthesis of linear models.

APPENDIX A: THE APPLICATION OF SWEEP OPERATORS This Appendix describes the procedure for applying sweep operators to estimate the regression coefficients (the bs) and the residual variances (s2s) as shown in expression (1) in Section . These estimates are used to obtain synthesized correlations, which are needed for calculating the synthesized standardized slopes for regression studies. We start with the definition of sweep operators, then show the specific steps for calculating the synthesized slopes for a regression model.

Definition of sweep operator The sweep operator adopted in this study is the one defined by Dempster (1969). A p * p matrix M is said to have been swept on row and column c if the elements in M are replaced by another p * p matrix N whose element nij is related to the mij of M as follows:

1

In both SAS and SPSS, the Sweep transformation is carried out with the command SWEEP. Using the first sweep operation conducted in this study (pp. 6) as the example, the SAS and SPSS syntax for sweeping the outcome Y and the first variable out of the correlation matrix R234 for obtaining a new transformed matrix, aa, that contains the desired slopes and the variance is: SAS

syntax:

PROC IML; r234 = {1 0.871 0.443, 0.871 1 0.433, 0.443 0.433 1}; aa = SWEEP(SWEEP(r234,1),2); PRINT aa; QUIT; SPSS

syntax:

Copyright © 2012 John Wiley & Sons, Ltd.

139

MATRIX.COMP r234 = {1, 0.871, 0.443; 0.871, 1, 0.433;0.443, 0.433, 1}. COMP swp1 = SWEEP(r234,1). COMP aa = SWEEP(swp1, 2). PRINT aa. END MATRIX.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

ncc nic ncj nij

¼ 1=mcc ; ¼ mic =mcc ; ¼ mcj =mcc ; and ¼ mij  mic  mcj =mcc ;

for i 6¼ c and j 6¼ c. For example, let M be a 3 * 3 matrix, 2 m11 M ¼ 4 m12 m13

m12 m22 m23

(A:1)

3 m13 m23 5: m33

When we apply the previous rules and sweep row and column 1 out of M to obtain the matrix N, the matrix N will look like 2 3 m12 =m11 m13 =m11 1=m11 2 m22  m12 =m11 m23  m13 m12 =m11 5: N ¼ 4 m12 =m11 m13 =m11 m23  m13 m12 =m11 m33  m13 2 =m11 For brevity, using the terminology defined in Beaton (1964), the matrix N can be denoted as N = SWP[c]M. The result of successively applying the operations SWP[c1], SWP[c2],. . . SWP[ct] to matrix M can be denoted by SWP[c1, c2,. . . ct]M. The operations are carried out successively, and each stage uses the output of only the previous stage. The sweep operator can be reversed. It is undone by replacing the ijth element nij in the N matrix with the ijth element mij to obtain the elements of the original M matrix as mcc mic mcj mij

¼ 1=ncc ; ¼ nic =ncc ; ¼ ncj =ncc ; and ¼ nij  nic  ncj =ncc :

The reverse sweep is denoted as M = RSW[c]N. Using the earlier example, we 2 3 2 n12 =n11 n13 =n11 1=n11 m11 2 RSW½14 n12 =n11 n22  n12 =n11 n23  n13 n12 =n11 5 ¼ 4 m12 n13 =n11 n23  n13 n12 =n11 n33  n13 2 =n11 m13

(A:2)

have m12 m22 m23

3 m13 m23 5 ¼ M: m33

Application of sweep operators To be consistent, the steps for applying the sweep operator are demonstrated with the scenario described in Section 2 where four regression studies are synthesized in a meta-analysis. Step 1: Find the maximum likelihood estimates of the correlations between the variables that are used in all studies included in the synthesis. In the example, Y and X1 are present in all four studies. The correlation between the two variables can be estimated by calculating the weighted mean correlation r Y1 ¼ n1  r1ðY1Þ þ n2  r2ðY1Þ þ n3  r3ðY1Þ þ n4  r4ðY1Þ =ðn1 þ n2 þ n3 þ n4 Þ , and the mean correlation is stored   1 r Y1 in matrix A as . r Y1 1 ^ 21:Y1) and error variance ^ 2Y:Y1 and B Step 2: Find the maximum likelihood estimates of the standardized slopes (B 2 ^2:Y1 (s ) for regressing X2, which is the second-most-used variable, on Y and X1, on the basis of the studies containing all those variables. To use the sweep operator to obtain these estimates, we first create a correlation matrix R234, which contains the weighted mean correlations among variables Y, X1, and X2, on the basis of studies 2, 3, and 4. That is,   r Y1 ¼ n2  r2ðY1Þ þ n3  r3ðY1Þ þ n4  r4ðY1Þ =ðn2 þ n3 þ n4 Þ; r Y2 ¼ n2  r2ðY2Þ þ n3  r3ðY2Þ þ n4  r4ðY2Þ =ðn2 þ n3 þ n4 Þ; r 12 ¼ n2  r2ð12Þ þ n3  r3ð12Þ þ n4  r4ð12Þ =ðn2 þ n3 þ n4 Þ; and 2

140

R234

3 1 r Y1 r Y2 ¼ 4 r Y1 1 r 12 5: r Y2 r 12 1

To obtain the slopes, we sweep Y and X1 out of R234. Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER 2

SWP½Y;1R234

swept ¼ 4 swept ^ 2Y:1 B

swept swept ^ 21:Y B

3 ^ 2Y:1 B ^ 21:Y 5: B ^ 22:Y1 s

The ‘swept’ in the previous matrix indicated the elements ‘swept out’ the matrix using expression (A.1). The entries 2 ^ 21:Y , and s ^2:Y1 . in the last column and row in the matrix then become the estimates of interest: ^B 2Y:1 , B Step 3: Sweep Y and X1 out of matrix A to obtain a new matrix  B ¼ SWP½Y;1A ¼

1  ðr Y1 2 =1  r Y1 2 Þ r Y1 =1  r Y1 2

  r Y1 =1  r Y1 2 b ¼ 11 b12 1=ð1  r Y1 2 Þ

 b12 ; b22

^ 2Y:1 and B ^ 21:Y ) and error and augment matrix B to form a new matrix C with the estimated standardized slopes (B 2 variance (^ s2:Y1 ) from the previous step. Thus, 2

b11 C ¼ 4 b12 ^ 2Y:1 B

b12 b22 ^ 21:Y B

3 ^ 2Y:1 B ^ 21:Y 5: B s ^ 22:Y1

The previous matrix looks like the matrix we would obtain when sweeping out rows and columns 1 and 2 when all the studies contain variables Y, X1, and X2. This setup allows us to ‘borrow’ the information from studies with more variable when we reverse the operator at the end of the calculations. ^ 3Y:12 , B ^ 31:Y2 , and B ^ 32:Y1 ) and error Step 4: Find the maximum likelihood estimates of the standardized slopes (B 2 variance (^ s3:Y12 ) for regressing X3, which is the third-most-used variable, on Y, X1, and X2. To do this, we create a correlation matrix, R34, with weighted mean correlations among variables Y, X1, X2, and X3, on the basis only of studies 3 and 4.   r Y1 ¼ n3  r3ðY1Þ þ n4  r4ðY1Þ =ðn3 þ n4 Þ;   r Y2 ¼ n3  r3ðY2Þ þ n4  r4ðY2Þ =ðn3 þ n4 Þ;   r Y3 ¼ n3  r3ðY3Þ þ n4  r4ðY3Þ =ðn3 þ n4 Þ;   r 12 ¼ n3  r3ð12Þ þ n4  r4ð12Þ =ðn3 þ n4 Þ;   r 13 ¼ n3  r3ð13Þ þ n4  r4ð13Þ =ðn3 þ n4 Þ;   r 23 ¼ n3  r3ð23Þ þ n4  r4ð23Þ =ðn3 þ n4 Þ; and 2 3 1 r Y1 r Y2 r Y3 6 r Y1 1 r 12 r 13 7 7 R34 ¼ 6 4 r Y2 r 12 1 r 23 5: r Y3 r 13 r 23 1 We sweep Y, X1, and X2 out of R34 to obtain the slopes and the variance in the last column and row, specifically, 2 3 ^ 3Y:12 swept swept swept B 6 swept swept swept B ^ 31:Y2 7 7 SWP½Y; 1; 2 R34 ¼ 6 4 swept swept swept B ^ 32:Y1 5: 2 ^ 3Y:12 B ^ 31:Y2 B ^ 32:Y1 s ^ 3:Y12 B Step 5: Sweep X2 out of matrix C to obtain a new matrix D, 2 3 d11 d12 d13 SWP½2C ¼ 4 d12 d22 d23 5 ¼ D; d13 d23 d33 2 ^ 31:Y2, and B ^ 32:Y1) and error variance (^ ^ 3Y:12, B s3:Y12 ) and augment matrix D with the estimated standardized slopes (B obtained from previous step to create a new matrix E, 2 3 ^B 3Y:12 d11 d12 d13 6 d12 ^B 31:Y2 7 d22 d23 7 E¼6 4 d13 ^B 32:Y1 5: d23 d33 ^ 3Y:12 B ^ 31:Y2 B ^ 32:Y1 s ^ 33:Y12 B

Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

141

^ 4Y:123, B ^ 41:Y23, B ^ 42:Y13, and B ^ 43:Y12) and Step 6: Find the maximum likelihood estimates of the standardized slopes (B 2 error variance (^ s4:Y123 ) by regressing X4 on Y, X1, X2, and X3 on the basis of the studies with all those variables in the model. Because only the last study uses all five variables, the sweep operator will be applied to the correlation matrix on the basis of study 4 only (R4),

M.-J. WU AND B. J. BECKER 2

1

6 r4ðY1Þ 6 R4 ¼ 6 6 r4ðY2Þ 4 r4ðY3Þ r4ðY4Þ

r4ðY1Þ 1 r4ð12Þ r4ð13Þ r4ð14Þ

r4ðY2Þ r4ð12Þ 1 r4ð23Þ r4ð24Þ

r4ðY3Þ r4ð13Þ r4ð23Þ 1 r4ð34Þ

3 r4ðY4Þ r4ð14Þ 7 7 r4ð24Þ 7 7: r4ð34Þ 5 1

swept swept swept swept ^ 42:Y13 B

swept swept swept swept ^ 43:Y12 B

We sweep Y, X1, X2, and X3 out of R4 via 2

swept 6 swept 6 SWP½Y; 1; 2; 3 R4 ¼ 6 6 swept 4 swept ^ 4Y:123 B

swept swept swept swept ^B 41:Y23

3 ^ 4Y:123 B ^ 41:Y23 7 B 7 ^ 42:Y13 7: B 7 ^ 43:Y12 5 B ^ 24:Y123 s

2 ^ 4Y:123 , B ^ 41:Y23 , B ^ 42:Y13 , B ^ 43:Y12 , and s ^4:Y123 The last column/row in the matrix show the estimates B . Step 7: Sweep X3 out of matrix E to obtain a new matrix F, 2 3 e11 e12 e13 e14 6 e12 e22 e23 e24 7 7 SWP½3E ¼ 6 4 e13 e23 e33 e34 5 ¼ F; e14 e24 e34 e44

and store the new matrix with the results of step 6 as follows and denote it as G: 2 6 6 G¼6 6 4

e11 e12 e13 e14

^ 4Y:123 B

e12 e22 e23 e24

^ 41:Y23 B

e13 e23 e33 e34

^ 42:Y13 B

e14 e24 e34 e44

^ 43:Y12 B

3 ^ 4Y:123 B ^ 41:Y23 7 B 7 ^ 42:Y13 7: B 7 ^ 43:Y12 5 B 2 ^ 4:Y123 s

Step 8: To obtain the maximum likelihood estimates of the correlation matrix of Y, X1, X2, X3, and X4, we conduct the reverse sweep operation on the matrix G as defined in expression (A.2). The reversed matrix contains the synthesized correlations (^r s): 3 2 1 ^r Y1 ^r Y2 ^r Y3 ^r Y4 6 ^r Y1 1 ^r 12 ^r 13 ^r 14 7 7 6 7 RSW½3; 2; 1; YG ¼ 6 6 ^r Y2 ^r 12 1 ^r 23 ^r 24 7 ¼ H: 4 ^r Y3 ^r 13 ^r 23 1 ^r 34 5 ^r Y4 ^r 14 ^r 24 ^r 34 1 The steps described previously can be represented concisely as

The matrix H can be partitioned as 3 1 ^r Y1 ^r Y2 ^r Y3 ^r Y4 6 ^r Y1 1 ^r 12 ^r 13 ^r 14 7  7 6 1 7 H¼6 6 ^r Y2 ^r 12 1 ^r 23 ^r 24 7 ¼ R21 4 ^r Y3 ^r 13 ^r 23 1 ^r 34 5 ^r Y4 ^r 14 ^r 24 ^r 34 1 2

 R12 ; R11

142

and the standardized slope vector (B) can be calculated as B = R1 11 R12 (Cooley & Lohnes, 1971), which provides a synthesized standardized regression on the basis of the four regression models. Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

M.-J. WU AND B. J. BECKER

Acknowledgements This research was partially supported by grants DRL-0723543 and REC-0634013 from the National Science Foundation. We thank Julie Wren (Loyola University Chicago) for her input. For additional information regarding the contents of this paper, contact Meng-Jia Wu ([email protected]).

References

Copyright © 2012 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 127–143

143

Anderson TW. 1957. Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. Journal of the American Statistical Association 52: 200–203. Becker BJ. 1992. Using results from replicated studies to estimate linear models. Journal of Educational Statistics 17: 341–362. Becker BJ, Wu M. 2007. The synthesis of regression slopes in meta-analysis. Statistical Science 22(3): 414–429. Becker BJ, Wu M. 2009. Model based meta-analysis. In Cooper HM, Hedges LV, Valentine JC (eds.). The Handbook of Research Synthesis and Meta-Analysis (pp. 377–396). Russell Sage Foundation: New York. Casella G, George EI. 1992. Explaining the Gibbs sampler. Journal of the American Statistical Association 46(3): 167–174. Cheung M, Chan W. 2005. Meta-analytic structural equation modeling: a two-stage approach. Psychological Methods 10(1): 40–64. Cheung SF. 2001. Examining solutions to two practical issues in meta-analysis: dependent correlations and missing data in correlation matrices. Dissertation Abstracts International 61: 8B. (UMI No. AA199884691) Cohen J. 1988. Statistical Power Analysis for the Behavioral Sciences (2nd edn). Lawrence Erlbaum Hillsdale, NJ. Cohen J, Cohen P. 1975. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum: Hillsdale, NJ. Cohen J, Cohen P, West S, Aiken L. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd edn). Lawrence Erlbaum: Hillsdale, NJ. Cooley WW, Lohnes PR. 1971. Multivariate Data Analysis. John Wiley & Sons: Inc.New York. Cooper H. 1998. Synthesizing Research (3rd edn). Sage: Thousand Oaks, CA. Cooper H, Hedges LV, Valentine JC (eds.). 2009. The Handbook of Research Synthesis and Meta-Analysis (2nd edn). Russell Sage Foundation: New York. Dempster AP. 1969. Elements of Continuous Multivariate Analysis. Addison-Wesley: San Francisco. Gelfand AE, Smith AFM. 1990. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85(410): 398–409. Furlow CF, Beretvas NS. 2005. Meta-analytic methods of pooling correlation matrices for structural equation modeling under different patterns of missing data. Psychological Methods 10(2): 227–254. Green SB. 1991. How many subjects does it take to do regression analysis? Multivariate Behavioral Research 26(3): 499–510. Greenwald R, Hedges LV, Laine RD. 1996. The effect of school resources on student-achievement. Review of Educational Research 66(3): 361–396. Hedges LV, Olkin I. 1985. Statistical Methods for Meta-Analysis. Academic Press: Orlando, FL. Hoogland JJ, Boomsma A. 1998. Robustness studies in covariance structure modeling: an overview and a metaanalysis. Sociological Methods and Research 26: 329–367. Hunter JE, Schmidt FL. 2004. Methods of Meta-Analysis: Correcting Error and Bias in Research Findings (2nd edn). Sage: Newbury Park, CA. Ingels SJ, Scott LA, Taylor JR, Owings J, Quinn P. 1998. National Education Longitudinal Study of 1988 (NELS:88) Base Year through Second Follow-Up: Final Methodology Report. Working Paper No. 98–06. U.S. Department of Education Office of Educational Research and Improvement: Washington, DC. Kim RS. 2011. Standardized regression coefficients as indices of effect sizes in meta-analysis. Unpublished doctoral dissertation, Florida State University. Kullback S. 1967. On testing correlation matrices. Journal of the Royal Statistical Society. Series C (Applied Statistics) 16(1): 80–85. Lipsey MW, Wilson DB. 2001. Practical Meta-Analysis. Sage: Thousand Oaks, CA. Little RJA, Rubin DB. 2002. Statistical Analysis with Missing Data (2nd edn). Wiley: Hoboken, NJ. Peterson RA, Brown SP. 2005. On the use of beta coefficients in meta-analysis. Journal of Applied Psychology 90(1): 175–181. Rubin DB. 1976. Inference and missing data. Biometrika 63: 581–592. Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. 2000. Methods for Meta-Analysis in Medical Research. Wiley: London. Wu M, Becker BJ. 2004. Synthesizing results from regression studies: what can we learn from combining results from studies using large data sets? Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA. Wu M, Pigott T March 2008. Methods for synthesizing the results of regressions. Paper presented at the Annual Meeting of the American Educational Research Association, New York, NY.

Synthesizing regression results: a factored likelihood method.

Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes usin...
566KB Sizes 2 Downloads 8 Views