Physical Therapy in Sport 16 (2015) 1e2

Contents lists available at ScienceDirect

Physical Therapy in Sport journal homepage: www.elsevier.com/ptsp

Editorial

So what does this all mean? For all the reductionist, quantitative, hypothesis-testing scientists out there (other research philosophies and paradigms are available!) the following scenarios are probably very familiar! “operator scans computer screen> …. YES , phew (especially for the PhD student), that's a relief, P ¼ 0.02. … or … “operator scans computer screen> …DAMN! , p ¼ 0.56, right let's have a look at the bottom end of the journal league tables and see where I can get this in … or if your are a PhD student … I wonder if my supervisor knows what to do? … … or, potentially even worse … “operator scans computer screen> … Oh damn , P ¼ 0.07, OK let me think MORE subjects, MORE time, MORE cost? … or just maybe I can play the “feasibility, pilot, trend” card when I go to the journal … OK, a little over-dramatic but not uncommon! This is the consequence of (a) tradition, (b) your own experiences and education, (c) some journal guidelines, editors, and reviewers (d) the hugely productive mathematical branch of statistics that churned out a plethora of analytical tools and tests over the last 150 years … Oh it was so much easier in the 1800's!!! We have ended up in a situation where P < 0.05 is the holy grail for quantitative researchers. It is largely understandable, but somewhat sad, that for many researchers it is often “null hypothesis tests” that dictate the answers to research questions, are the ONLY tool employed for data analysis, and largely define if and where you might try and get your work published. Does it have to be this way? … Well the trap that all 3 researchers in the above examples have fallen into is to rely on the P value to draw conclusions e they are equating statistical significance with scientific importance. Let's look at what happens typically in the quantitative research process. We first carry out a study with a sample of participants drawn from a larger population. We then analyse the data from our sample to get the effect e for example, the average difference in outcome between control and intervention groups. This observed effect is our ‘best guess’ for what we really want to know e the effect we would see if we studied the entire population. However, whenever we estimate a population effect using a sample of participants, there will be a degree of uncertainty. Imagine we did a randomised trial with 30 participants in the control group and 30 in the intervention group. We analyse the data and get the average (mean) difference between groups e the effect of the intervention. If we were to repeat the study with different participants we would get a http://dx.doi.org/10.1016/j.ptsp.2014.10.005 1466-853X/© 2014 Elsevier Ltd. All rights reserved.

different estimate for the effect. Do it again and we'd get another value, and so on. Of course, we're not going to keep repeating the study over and over again. Instead, to express the uncertainty in our estimate of the population value we calculate and present the confidence interval for the effect. The common interpretation of the confidence interval is that it gives an indication of the “likely range” for the true population effect. So, for example, suppose we are evaluating the effect of a resistance training intervention on knee flexor strength, and we find a mean difference for the change in strength between intervention and control of 10 Nm (95% confidence interval 2 to 18 Nm, P ¼ 0.015). Our best guess of the true population effect is 10 Nm, but the true population effect ‘could be’ as low as 2 Nm or as high as 18 Nm. How do we interpret this finding? Is the effect of the intervention clinically/practically important? We can't answer this question using the P value and simply declaring that the training “significantly improves strength” (P < 0.05). What we need is some sense of what is important with respect to increases in strength in the population being studied e the smallest effect that you would consider worthwhile. This value has been termed the “Minimum Clinically Important Difference”, the “Minimum Important Difference”, the “Minimum Practically Important Difference”, the “targeted difference”, the “signal”, and so on. Comparing the effect and its confidence interval to the smallest worthwhile effect allows us to make an inference about clinical or practical importance. At this point you'll be asking “So how on earth do we decide on the smallest worthwhile effect?!” There are 3 main approaches to determining the effect that you would consider clinically/practically important - ‘anchor-based’, ‘distribution-based’, and expert opinion. In anchor-based methods we would establish the change in outcome that was linked to a clinically/practically relevant change in another variable. For example, suppose hypothetically that there was epidemiological literature indicating that a 15 Nm increase in knee flexor strength was associated with (i.e. anchored to) a substantial reduction in hamstring injury. Distribution-based methods, often used when there is no robust anchor, set the smallest worthwhile effect at a ‘small’ standardised mean difference e typically 0.2 between-subject standard deviations. Methods based on expert opinion do what it says on the tin e usually a consensus of a panel of experts on what value of the outcome constitutes the smallest clinically or practically worthwhile effect. Going back to our hypothetical example, suppose that before the study was carried out we'd defined and justified the smallest worthwhile effect as a change in leg flexor strength of 15 Nm. Our observed mean effect (difference between intervention and control groups) was 10 Nm, with a 95% confidence interval of 2e18 Nm. The first thing to note is that the mean effect (the ‘point

2

Editorial / Physical Therapy in Sport 16 (2015) 1e2

estimate’) is smaller than 15 Nm, so we already have a good idea about the likely importance. You can see from the upper limit of the confidence interval (18 Nm) that there will only be a small chance that the true population effect is clinically important (>the smallest worthwhile effect of 15 Nm). Methods are available to quantify precisely the probability that the true population effect is beneficial/trivial/harmful. In this contrived example, the effect of the intervention was statistically significant but unlikely to be beneficial e you might want to retract that fist pump at this point! On the flip side, you can also get a result that has a good chance of being clinically important, but is not statistically significant. The take-home message is don't rely on P values to make inferences e use the observed effect and its uncertainty in relation to the pre-defined smallest worthwhile effect. Finally, before you run any complex statistics we urge you to perform a vital and frequently overlooked task: LOOK AT YOUR DATA! DRAFT your figures and tables (OK we understand you will need means and standard deviations for these tasks e these are simple statistics). Does something immediately jump out … a big rise, a big drop, or a strong relationship? This ‘exploratory data analysis’ helps you get a feel for your data and avoid rushing in and making mistakes. When you do run your main analysis, and before you look at the main output, save and plot the ‘residuals’ to check that your

statistical model was correctly specified. The residuals can reveal issues like non-normality, non-uniform variance, non-linearity severe outliers etc., and help you decide whether you need to respecify the model, perhaps by apply a transformation, for example. In summary, statistical significance does not equal clinical/practical importance. The P value is flawed because it combines the estimated size of the effect and its uncertainty into one number that is impossible to disentangle. We urge researchers to accept the challenge of defining and justifying the smallest worthwhile effect, to calculate and present confidence intervals for the observed effect, and then to make robust inferences for clinical/practical importance. Keith George, Associate Dean for Research, Scholarship and Knowledge Transfer (Science)* Liverpool John Moores University, UK Alan M. Batterham, Professor of Exercise Science Teesside University, UK *

Corresponding author. Tel.: þ44 1 519046228; fax: þ44 1 519046284

So what does this all mean?

So what does this all mean? - PDF Download Free
149KB Sizes 0 Downloads 9 Views