GAMES FOR HEALTH JOURNAL: Research, Development, and Clinical Applications Volume 3, Number 1, 2014 ª Mary Ann Liebert, Inc. DOI: 10.1089/g4h.2014.1713

Editorial

Lying (or Maybe Just Misleading) With (or Without) Statistics Tom Baranowski, PhD

H

aving been Editor-in-Chief of the Games for Health Journal (G4HJ) for almost a year, and thereby having been responsible for the review of almost 100 submitted manuscripts, it is clear that we need to re-emphasize the importance of research design and statistics to advance our understanding of whether and how games influence health and health-related behaviors. Although statistics is a large discipline addressing many diverse issues involving probability, for us, statistics primarily concerns the confidence we can have in drawing conclusions about relationships, especially whether our game-based interventions are related to or cause intended (or unintended) outcomes. National health policy issues (e.g., should specific games be used and reimbursed by the government to treat certain health problems?) are informed by properly designed and executed research. Because we are all taxpayers, we should want the best information (positive or negative) to inform policy decisions because it has implications for our health and our pocketbooks. There has been extensive academic and policy-related turbulence about statistics and research design in health research.1 It has been clear for some time that the randomized clinical trial (RCT) provides the clearest insight into whether one variable is causally related to another.2 Meta-analyses of the literature (upon which policy is based) often restrict the studies included in their review of a topic to only RCTs. Thus, although non-RCTs may be useful in early stages to refine an intervention concept or procedure, they do not even get considered in summaries of whether certain types of intervention work, or not. What a waste of research resources, investigator effort, and journal pages to not be considered in summaries of whether games are related to outcomes! Priority for publication in G4HJ will be given to manuscripts with RCTs as their primary evaluation design. Many of the manuscripts submitted to G4HJ had small sample sizes (n = 1, 2, 5, 6, etc.). Sometimes when conditions are very unusual (e.g., very low prevalence behavioral or medical diagnoses), or when treatments/interventions (e.g., games) are early in development and pose substantial risk, small sample sizes are appropriate. They are not appropriate for most research involving low-risk videogames with normal populations. A problem with small samples is that they distort the accuracy of statistical tests, requiring larger differences to conclude an effect occurred.3 Clearly, confidence

in the results and the precision of the estimates increase with higher statistical power (i.e., larger samples).4 Ioannidis5 concluded that ‘‘most published research findings are false.’’ This was primarily true in studies with small samples and small effect sizes among other influences. He also concluded that true associations were inflated when studies are underpowered and exploratory statistical analysis procedures were used.6 These problems were worse in behavioral research in the United States than elsewhere.7 Small samples, small effect sizes, exploratory statistical analysis procedures, and behavioral outcomes are common in games for health research. Because we want our research to be taken seriously by other scientists and considered for policy determinations, priority for publication in G4HJ will go to manuscripts that report calculated power, adequately power a study to test a hypothesis, prespecify statistical analyses, and use objective measures where behaviors are assessed. When an intervention (e.g., a videogame for health) is innovative, there is good reason to conduct a pilot or feasibility study both to make sure the game functions properly in the hands of targeted users and to assess whether a larger study would be worthwhile. Feasibility studies, however, are underpowered to have any confidence in determining whether the intervention was effective.8–12 Feasibility studies, instead, are designed to answer questions such as ‘‘How did you recruit participants?’’ ‘‘Were you able to recruit participants?’’ ‘‘Did you have problems recruiting them?’’ ‘‘What percentage of people contacted agreed to participate?’’ ‘‘What percentage of people who agreed and consented actually participated?’’ ‘‘Why did you lose anyone?’’ ‘‘Were you able to retain all baseline participants to post-assessments?’’ ‘‘What were the characteristics of those retained in the sample versus those lost (i.e., retention bias)?’’ ‘‘What were the rationale and design of your intervention?’’ ‘‘Were you able to deliver the intervention?’’ ‘‘Did you have problems in delivering the intervention?’’ ‘‘Did your participants do what you wanted?’’ ‘‘Did they face problems in doing it?’’ ‘‘Were you able to implement your measures pre and post?’’ ‘‘Did you have problems implementing them?’’ ‘‘Were the scales you employed shown to be reliable in this sample?’’ ‘‘What other problems did you encounter?’’ ‘‘How might you modify your intervention, procedures, and measures to enhance your chances of conducting the fully powered study?’’ Although properly designed feasibility studies can

Pediatrics (Behavioral Nutrition & Physical Activity), USDA/ARS Children’s Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, Texas.

1

2

be valuable, and should be reported, authors of feasibility studies should refrain from inferring whether their intervention study worked (i.e., influenced targeted outcomes), or did not, because they can have little confidence in any such statements made. Priority for publication in G4HJ of studies with small samples will go to such manuscripts that address feasibility issues, and not outcomes. Along these lines, authors of manuscripts tend to take liberties determining statistical significance. Several submitted manuscripts indicated that 0.05 < Pp0.10, or even higher, indicated statistical significance. Although the P < 0.05 criterion for statistical significance is an arbitrary cut point, agreed to by statisticians as reasonable in the early part of the 20th century,13 accepting the less stringent probability levels runs the risk of spuriously touting the benefit of an intervention. In the interest of having confidence in our findings, priority for publication in G4HJ will go to manuscripts that conservatively accept probability levels for determining significance. For example, one review found a disproportionate number of studies, reporting obtained significance levels just below the P < 0.05 criterion. This suggests some inappropriate manipulation was conducted on the data in some studies before reporting the results by investigators under pressure to report primarily positive outcomes.13 A statistician concluded that the P < 0.05 criterion for statistical significance led to too many false-positive results and suggested more conservative cut points of P < 0.005, or even P < 0.001.14 As scientists, we should be primarily interested in the truthfulness of our work so that the findings can be appropriately used for building a stronger knowledge base and for the betterment of mankind. A journal editorial cannot provide a tutorial in research design and statistical analysis. Alternatively, G4HJ wants to be at the forefront of knowledge about health games and wants to have substantial confidence in the statements made about the findings of the research it publishes. We certainly don’t want to lie, nor even mislead, about our findings. Strong research design and sophisticated statistical analyses are required for us to make correct decisions and advance our understanding of games, behavior, and health. We must all

EDITORIAL

work hard to use the most sophisticated research designs and statistics, just like we want to use the most relevant and effective game mechanics. References

1. Macintyre S. Good intentions and received wisdom are not good enough: The need for controlled trials in public health. J Epidemiol Community Health 2011; 65:564–567. 2. Jakobsen JC, Gluud C. The necessity of randomized clinical trials. Br J Med Med Res 2013; 3:1453–1468. 3. Krzywinski M, Altman N. Significance, P values and t-tests. Nat Methods 2013; 10:1041–1042. 4. Button KS, Ioannidis JP, Mokrysz C, et al. Confidence and precision increase with high statistical power. Nat Rev Neurosci 2013; 14:585–586. 5. Ioannidis JP. Why most published research findings are false. PLoS Med 2005; 2:e124. 6. Ioannidis JP. Why most discovered true associations are inflated. Epidemiology 2008; 19:640–648. 7. Fanelli D, Ioannidis JP. US studies may overestimate effect sizes in softer research. Proc Natl Acad Sci U S A 2013; 110:15031–15036. 8. Leon AC, Davis LL, Kraemer HC. The role and interpretation of pilot studies in clinical research. J Psychiatr Res 2011; 45:626–629. 9. Arain M, Campbell MJ, Cooper CL, et al. What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Med Res Methodol 2010; 10:67. 10. Kraemer HC, Mintz J, Noda A, et al. Caution regarding the use of pilot studies to guide power calculations for study proposals. Arch Gen Psychiatry 2006; 63:484–489. 11. Bowen DJ, Kreuter M, Spring B, et al. How we design feasibility studies. Am J Prev Med 2009; 36:452–457. 12. Stevens J, Taber DR, Murray DM, et al. Advances and controversies in the design of obesity prevention trials. Obesity (Silver Spring) 2007; 15:2163–2170. 13. Masicampo EJ, Lalande DR. A peculiar prevalence of p values just below .05. Q J Exp Psychol (Hove) 2012; 65:2271–2279. 14. Johnson VE. Revised standards for statistical evidence. Proc Natl Acad Sci U S A 2013; 110:19313–19317.

Lying (or Maybe Just Misleading) With (or Without) Statistics.

Lying (or Maybe Just Misleading) With (or Without) Statistics. - PDF Download Free
1KB Sizes 2 Downloads 11 Views