Commentary Received 23 February 2015,
Accepted 28 February 2015
Published online 10 June 2015 in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/jrsm.1141
Form meets function: A commentary on meta-analytic history Hannah R. Rothstein*†
Unlike my colleagues who developed meta-analysis to solve three different research problems—whether selection test validities were generalizable, psychotherapy was effective, or interpersonal expectations inﬂuenced behavior—my involvement in meta-analysis came about because I needed a solution to a very different type of problem. In 1980, a mother of a toddler and a newborn, I had just received my PhD in Industrial/Organizational Psychology from the University of Maryland and was looking for a job that would let me combine motherhood and research. Fortunately for me, the Civil Service Reform Act had recently been passed, and agencies of the Federal Government were mandated to create part-time positions for professionals. I was offered a job developing selection tests at the US Ofﬁce of Personnel Management, and, despite the fact that employee selection was my least favorite subject in graduate school, I accepted the half-time position with alacrity and gratitude. Shortly after my arrival at the US Ofﬁce of Personnel Management, I met Frank Schmidt, who had by then published his ﬁrst set of papers (with Jack Hunter and others) on validity generalization. My own education had been at a hotbed of situational speciﬁcity (that test validities were largely situationally speciﬁc not general), so Frank’s perspective was at ﬁrst very alien to me. After some animated discussions, I became convinced both of the correctness of the theory of validity generalization and of the usefulness of the method that came to be called meta-analysis. Frank asked me to be a co-author on the seminal “Forty questions about validity generalizations and meta-analysis” (Schmidt, Hunter, Pearlman and Hirsha, 1985), and shortly thereafter, I conducted and then published my ﬁrst meta-analysis on the generalizability of validities for selection into law enforcement occupations (Hirsh, Northrop and Schmidt, 1986). I have published quite a few meta-analyses since then, but my primary interest over the past 25 years has been how meta-analysis developed and how it is used. It is from this perspective that I comment. The ﬁrst point I would like to make is that differences in the research questions asked by Glass, Rosenthal, and Schmidt played a great role in what was emphasized by the method of meta-analysis developed by each of its founders and how they see meta-analysis even now. Gene Glass, for example, developed Glassian meta-analysis to demonstrate the effectiveness of psychotherapy as a treatment for psychological problems. I suggest that, in large part, it is because he used meta-analysis to test the effectiveness of an intervention that he views metaanalysis as being about the evaluation of technologies and not about the testing of theories (Glass, 2015). Of course, if we look at the application of meta-analysis, it is true that the majority of applications have been to assess the effectiveness of interventions (consider, for example, the thousands of meta-analyses published by the Cochrane and Campbell Collaborations). This does, not, however, mean that meta-analysis is fundamentally unsuited to theory testing. As Frank Schmidt notes elsewhere in this issue (Schmidt, 2015), his and Jack Hunter’s original development of validity generalization was to provide evidence to refute the theory of situational speciﬁcity. There are many other uses of meta-analysis to test theories; I will mention two that I think deserve particular attention because of the broad impact they have had in their respective ﬁelds. One is Alice Eagly and colleagues’ programmatic meta-analytic work examining social role theory as an explanation for gender similarities, differences, and stereotypes particularly in the area of leadership (cf. Eagly et al., 2003; Eagly and Johnson, 1990; and Koenig et al., 2011). The other is Jessica Gurevitch and colleagues’ meta-analysis on the role of competition in natural populations, which put an end to a decades-long theoretical controversy by providing overwhelming evidence for the importance and ubiquity of competition in nature across many different groups of organisms in many different kinds of ecological communities (Gurevitch et al., 1992).
*Correspondence to: Hannah R. Rothstein, Management Department, Baruch College, City University of New York, New York, NY, 10010, USA. † [email protected]
a From 1976 to 1988, I was known as Hannah Rothstein Hirsh.
Copyright © 2015 John Wiley & Sons, Ltd.
Res. Syn. Meth. 2015, 6 290–292
Bob Rosenthal’s ﬁrst foray into meta-analysis was to demonstrate that an experimenter expectancy effect existed. The combined p-value approach that he used was sufﬁcient to address this issue, because the hypothesis tested in the combined p-value approach is whether a non-chance effect was found in at least one of the studies in the set. Furthermore, because the question posed by Rosenthal had to do with whether a phenomenon existed, rather than whether the phenomenon existed in some situations but not others, or in different degrees in different situations, he had no need to emphasize indices of heterogeneity, even when he moved from p-values to effect sizes. Once the initial question about the existence of expectancy effects was resolved meta-analytically, and more complex questions about the conditions that facilitated or inhibited it were raised, meta-analyses on the topic migrated to the approach pioneered by Hedges and Olkin and its ability to quantify differences in effect sizes across studies. Validity generalization, the form of meta-analysis created by Jack Hunter and Frank Schmidt, was developed to enable refutation of the situational speciﬁcity hypothesis, that is, that the true population validities of employment tests varied substantially from situation to situation. Thus, their research focused on the variation in effect sizes across a set of studies, and how much of it was artifactual, and only secondarily on the mean overall effect. A second purpose of their research was to estimate, when there was true variation in effect sizes, the proportion of true validities that was above a threshold level. This resulted in the introduction of the credibility interval as an important parameter in meta-analysis, to my mind, arguably their most important contribution. Strangely, although users of the Hunter–Schmidt method of meta-analysis routinely reported credibility intervals starting in the 1980s, and lobbied their colleagues in other disciplines, using other methods of meta-analysis, to do the same, estimation of the distribution of true effects remained rare outside industrial/organizational psychology until Higgins et al. (2009) advocated for the use of a similar concept called the prediction interval. Schmidt and Hunter’s background in psychometrics coupled with the unreliability of the dependent variables typically used in employment selection research, and the restriction of range of the samples used led Schmidt and Hunter to propose correcting observed validities for measurement error and range restriction. Although this seems like sound practice to me, it has yet to be adopted by meta-analysts who do not use the Hunter–Schmidt approach, even though Hedges and Olkin described how to make these corrections in their 1985 book. I view it as a priority for the research synthesis community to integrate the Hunter–Schmidt psychometric corrections even when they choose to use the Hedges and Olkin approach to meta-analysis, and for primary researchers to routinely collect the information on the reliability of their measures, and on the degree to which the samples they use correspond to the population to which they wish to draw inferences. The second point I would like to make about the development of meta-analysis is that although each “school” of meta-analysis had its origins in a psychological question, meta-analysis was quickly adopted across the social sciences and in the health sciences. Unlike what happens in many ﬁelds, research synthesis methodologists do not stay within their academic silos but actively pursue interdisciplinary relationships. The interaction among meta-analysts working in different disciplines has provided great insight into the assumptions made back in each discipline, raised the possibility of alternative approaches, and limited redundancy as we learned from each other’s experience. For example, interactions between social science-based and health care-based meta-analysts led to beneﬁcial changes in the way health care meta-analysts approached heterogeneity and to advances in the way social science meta-analysts approached critical appraisal of meta-analyses as well as reporting standards. As meta-analysis matures in our disciplines, and is introduced in other ﬁelds, it will become harder but no less necessary for meta-analysts to maintain their cross-disciplinary conversations. I suggest that this journal is a primary forum for those conversations, but there are bound to be others. We should all be thinking of additional ways to make them happen. On a related note, I would like to address Bob Rosenthal’s (2015) comment on the inevitability of the increasing complexity and techy/mathy nature of meta-analysis. In my view, one of the singular features of meta-analysis is its accessibility, on a conceptual level, to a broad audience of researchers. Unlike some other types of statistical analysis, it is not hard, at present, for an interested researcher to acquire a good understanding of what takes place inside the black box of formulas and calculations. It would be a pity to lose this capacity, so I encourage the methodologists and the statisticians to make sure they are able to translate what they are doing into plain language, and I encourage the substantive researchers to make sure to ask enough questions so that they know exactly what assumptions are being made and what the statistical operations do. Finally, I would like to mention something that I know each of the founders experienced but did not emphasize in their papers. There was an intellectual thrill in being part of a paradigm shift, but more than that, participation in the development and dissemination of meta-analysis were just plain fun.
Copyright © 2015 John Wiley & Sons, Ltd.
Res. Syn. Meth. 2015, 6 290–292
Eagly AH, Johannesen-Schmidt MC, Van Engen ML. 2003. Transformational, transactional and laissez-faire leadership styles: a meta-analysis comparing women and men. Psychological Bulletin 129: 569–591. Eagly AH, Johnson BJ. 1990. Gender and leadership style: a meta-analysis. Psychological Bulletin 108: 233–256.
Glass GV. 2015. Meta-analysis at middle age: a personal history. Research Synthesis Methods 6(3): 221–231. Gurevitch J, Morrow LL, Wallace A, Walsh JS. 1992. A meta-analysis of competition in ﬁeld experiments. The American Naturalist 140: 539–572. Hedges LV, Olkin I. 1985. Statistical methods for meta-analysis. New York: Academic Press. Higgins JPT, Thompson SG, Spiegelhalter DJ. (2009) A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society A. 172: 137–159. Hirsh HR, Northrop LC, Schmidt FL. 1986. Validity generalization results for law enforcement occupations. Personnel Psychology 39: 399–420. Koenig AM, Eagly AH, Mitchell AA, Ristikari T. 2011. Are leader stereotypes masculine? A meta-analysis of three research paradigms. Psychological Bulletin 137: 616–642. Rosenthal R. 2015. Reﬂections on the origins of meta-analysis. Research Synthesis Methods 6(3): 240–245. Schmidt FL. 2015. History and development of the Schmidt–Hunter meta-analysis methods. Research Synthesis Methods 6(3): 232–239. Schmidt FL, Hunter JE, Pearlman K, Hirsh HR. 1985. Forty questions about validity generalization and meta-analysis. Personnel Psychology 38: 697–798.
292 Copyright © 2015 John Wiley & Sons, Ltd.
Res. Syn. Meth. 2015, 6 290–292