This article was downloaded by: [University of Birmingham] On: 21 September 2013, At: 23:24 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Research Quarterly for Exercise and Sport Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/urqe20

What is Missing in p < .05? Effect Size a

b

Jerry R. Thomas , Walter Salazar & Daniel M. Landers

b

a

Department of Exercise Science and Physical Education, Arizona State University, USA

b

Arizona State University, USA

To cite this article: Jerry R. Thomas , Walter Salazar & Daniel M. Landers (1991) What is Missing in p < .05? Effect Size, Research Quarterly for Exercise and Sport, 62:3, 344-348, DOI: 10.1080/02701367.1991.10608733 To link to this article: http://dx.doi.org/10.1080/02701367.1991.10608733

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Research Note

Research Quarterly for Exercise andSport (i;) 1991 bythe American Alliancefor Health, Physical Education, Recreation andDance Vol. 62, No. 3.pp. 344·348

There is currently much interest in reporting effect sizes (not simply probabilities or confidence intervals) for treatment effects.

Becker (June 1991, "Alternative methods ofreporting research results," American Psychologist, 46, pp. 654-655) has argued that the Publication manual of the American Psychological Association should include such alternative ways ofpresenting data in future editions. Authon submitting work to RQES are encouraged to include effect sizes or the statistics necessaryfor their calculation (i.e., means, standard deviations, and ns) in their manuscripts. This wiU assist readers in interpreting the significance ofresults.

Downloaded by [University of Birmingham] at 23:24 21 September 2013

James R.

M07TOW, Jr.,

Editor-in-Chief

What Is Missing in p < .05? Effect Size Jerry R. Thomas, WalterSalazar, andDaniel M. Landers Key words: effect size, statistics, meaningfulness, research reporting

N

u m erou s authors (e.g., Kirk, 1982; Thomas & Nelson, 1990; Tolson, 1980; Winer, 1971) have indicated the need to estimate the magnitude of differences between groups as well as to report the significance of the effects. In his initial "Editor's Viewpoint" for Research. Quarterly for Exercise and Sport (RQES) Thomas (1983) stressed the need for researchers published in RQESto report the strength of the relationships between independent and dependent variables as a way to describe the "meaningfulness" of the findings and recommended the use of omega squared (0)2) (see Tolson, 1980, for recommended procedures) (Note 1). Another useful way to describe the meaningfulness offindingsis to estimate the magnitude ofthe differences between groups using a standardized value, effect size (ES) (suggested by Cohen, 1969, newest edition, 1988): ES

=

(1)

where M 1 = mean for Group I, M 2 = mean for Group 2, and SD = standard deviation.

Jerry R. Thomas is professor andchair of the Department of Exercise ScienceandPhysical Education atArizona State University. Walter Salazar is a doctoral studentandDaniel M. Landers is a Regents Professor at thesameinstitution. Direct correspondence to Dr. Jerry R. Thomas, 201 PEBW, Arizona State University, Tempe, AZ 8528Z Submitted: July 12, 1990 Revision accepted: March 6, 1991 344

Most readers are probably familiar with using ES as the unit of analysis for meta-analysis (e.g., Glass, 1976). However, Cohen's (1969) original proposal was to use ES as a standardized way of estimating differences between groups so these differences could be compared across studies and dependent variables. He suggested in the behavioral sciences an ES of 0.2 represented small differences; 0.5, moderate differences; and 0.8, large differences. Cohen's proposal never had widespread use before Glass (1976) popularized the idea for metaanalysis, and ESs are seldom reported in studies. Authors need to be convinced that they should report the magnitude of effects, as small differences can easily be declared significant based on some combination of small variances and large Ns (Thomas & Nelson, 1990), or the reverse can occur-large differences can be declared nonsignificant due to large variances and small Ns. Cohen (1990) suggests the primary purpose of research should be to measure effect size rather than p values. A single study resulting in a "yes/no" decision at the p < .05 level is unlikely to have an impact on theory or practice. However, reporting of effect size offers valid standards of comparison with past and future research and indicates important characteristics to guide subsequent research. A wonderful quote from Rosnow and Rosenthal (989) seems appropriate, "... surely, God loves the .06 level nearly as much as the .05" (p, 1277). In this research we determined the frequency with which any standardized estimate of treatment effect was presented in two volumes of RQES, 1978 and 1988, and reported the frequency with which articles in these volumes supplied the data to determine meaningful treatment effects. In addition we discussed why effect size should be used to estimate the meaningfulness of treatments.

ROES: September 1991

Thomas, Salazar, andLanders

Downloaded by [University of Birmingham] at 23:24 21 September 2013

Data onROES Volume 49(1978) and Volume 59(1988) Table 1 provides descriptive data on volume 49, in which 54 regular papers (data-based but excluding Research Notes) were published. We have chosen to use only papers in which an ANOVA model was presented, although ES can also be calculated from correlational designs. ANOVA models were used in 36 of 54 papers (67%). Of the 36papers usingANOVA models, 20 (56%) provided data (Ms, SD, and N) from which ES could be computed (Note 2). Table 1 provides the total number of ESs that could be computed, and we have selected the three most important effects (most relevant variables as identified by the authors ofthe study) and calculated ESs (multiple ESs from one study may represent more than two levels of a main effect or more than one dependen t variable) (Note 3). Table 2 includes similar data from volume 59 (1988) . Volume 59 had 34 papers, of which 18 (53%) used ANOVA models. Of these 18 papers using ANOVA models, 10 (56%) provided data from which to calculate ES. As in Table I, we have reported the number ofES that can Table 1. Data on articles in Volume 49,1978 First Author

ES Info

Bain Falls Hatfield ISO-Ahola Anshel Cooter Coyle (p. 119) Katch Sage Singerb Weltman Baker Bird Christensen Coyle (p.278) Frekany Heyward Igbanugo Nelson Noland Rikli Skrinar Stamford (p. 351) Stamford (p.363) Hovell Landers Marlowe

no yes yes no yes ns· yes ns· ns" yes yes yes no no yes no yes yes yes no yes ns· no yes yes yes yes

Nc

Primary ES*

14-54 15

-0.37* 0.99*

-0.75* 0.08

-1.34* -0.80*

16 6 9 14 12 16 11 16

0.39* -0.84 18.77* -0.73 0.14 -1.94* -1.32* 0.25*

0.54* 0.22 14.08* 0.05 0.49 -2.20* -1.17* 0.46*

0.16 0.99 18.43* 0.41 0.55 -2.51* -0.80* 0.33

8

3.65*

-1.89

3.09*

18 4 25

-0.74* -4.55* 0.53*

-0.41 -2.73* 0.10

-0.17 -6.73* 0.80*

40 14

0.34* 0.35

-0.60* -0.23

-0.57* -0.59

11 20-52 12-200 11

0.71* 0.77* -0.79* 1.70*

0.30 0.15 0.25 0.23

0.82* 0.87* -0.66* -1.31*

Table2. Data on articles inVolume 59, 1988 First Author

ES Info

Doody

no yes yes yes yes yes no ns· no yes no no no no yes no yes yes

Kamen"

Alexander Era Kokhonen Farrell Heinert Kamen Ober Simard Berger Stewart Abernethy Etnyre Nelson"

*Comparison of Ms forming ES was significant, P < .05. ·No significant main effects. blncludes onlyinitial and final scores in ES estimates. cPercomparison group.

ROES: September 1991

be estimated and calculated ES for the three most important comparisons. In Table 3 we have followed Cohen's (1969/1988) categories of small « 0.41), moderate (0.41 to 0.70), and large ( > 0.70) to classify ESs (see Table 3), as well as year and significance. Calculating a two-way ANOVA (significance x year) results in one trend appearing. ESs between significant and nonsignificant findings (regardless ofcategory-small, medium, and large) were reliably different (three outlier ESs [beyond 3 SDs] from 1978 in the large category have been removed from all the following statistical analyses) ,F(I, 83) = 15.35, P< .05, with the differences favoring the studies finding significance, M. = 1.33, MD. = 0.49. Neither year nor interaction for ESs were significant, ps > .28. What appeared to separate the significant versus nonsignificant groups was the average sample size (N). In fact, if the same ANOVA (significance x year) was calculated using sample size as the dependent variable, the studies reporting significance were reliably different (M = 33.8) from those reporting nonsignificance (M = 17.3), F (1,83) = 4.37, P< .05. Neither year nor interaction effects were significant, ps > .08. While the magnitude ofESs clearly influenced this outcome, sample size played a major role. Thus, if other characteristics of the nonsignificant studies had remained the same, an increase in Nfor these studies would have allowed detection of additional significant differences. In 1978,31.7% (19/60) of the ESwere small, 18.3% (11/60) were moderate, and 50.0% (30/60) were large.

Wesson Housh Hutcheson

Nc

Primary ES*

9 26-48 5-6 9-12 45-368

0.64 0.33 0.50* -1.97* 0.77*

0.72 0.73* 0.10 -2.64* -0.51"

0.14 0.39 1.42" -1.78" 0.37"

10

1.14

0.81

0.90

7

-1.59*

0.52

-2.71*

13

0.73

1.76

0.85

20 34

-0.53" -0.06

-2.1 I" 0.63"

0.25" -0.30"

*Comparison of Ms forming ES was significant, P < .05. ·No significant main effects. bThe main effect is significant, but no information is provided regarding the significance ofthe post-hoccomparison. cPercomparison group.

345

Downloaded by [University of Birmingham] at 23:24 21 September 2013

Thomas, Salazar, andLanders

Ten years later 26.7% were small (8/30),20.0% (6/30) weremoderate,and53.3% (16/30) were large. While the percents reported do not appear very different from 1978 to 1988, examining values within the "large" ES category produced some interesting findings. In 1978, 45% (27/60) of the significant ESs were large as compared to 30% (9/30) in 1988. In 1978, only 5% (3/60) of the large ESs were nonsignificant, but in 1988,23% (7/30) of the large ESs were nonsignificant. In fact, the seven nonsignificant ESs in 1988 averaged 0.99. Certainly ESs of this magnitude with small average sample sizes (N = 11) suggested potential areas for follow-up research. Yet, this was not so evident unless the ESs were reported. Thus, there appeared to be a trend toward more nonsignificant but large ESs even though power was likely to be increasing because the average N per study was going up (18 to 35 for large significan tESs and 7 to 11 for nonsignificant ESs)-a difficult finding to explain. Some aspects of these data were disappointing. First, the percentage ofstudies reporting significant effects has decreased, 62% (37/60) in 1978 and 35% (16/46) in 1988, despite a relatively minor decrease in average ES (1.4 [3 outliers omitted] to 1.2) and an increase in average sample size (23 [3 outliers omitted] t052). Second, despite calls from numerous sources (including the editor-inchief of RQES [Thomas, 1983]), the number of papers reporting Msand SDs has notincreased (56% for 1978 and 1988), especially for all important aspects (e.g., factors and variables) of the design. Failure to report these data not only make the accurate estimation of ES impossible, but also make comparisons to past and future research difficult. Fundamental to the reporting of good science is information for comparison to other work; ata minimum that includes the Ms and SDs of the variables of interest (main effects and interactions). Table 3. Comparisons of effect sizes Effect Size

1978

1988

S

NS

S

NS

Nc

0.337 4 26

0.202 15 22

0.306 3 87

0.204 5 24

Medium (0.41-0.70)

M n N

0.560 6 40

0.490 5 14

0.542 4 66

0.580 2 8

Large (> 0.70)

M n N

1.794d 24 18

1.240 3 7

1.746 9 35

0.990 7 11

Small (

What is missing in p less than .05? Effect size.

This article was downloaded by: [University of Birmingham] On: 21 September 2013, At: 23:24 Publisher: Routledge Informa Ltd Registered in England and...
510KB Sizes 0 Downloads 0 Views