Individual responses made easy.

Articles in PresS. J Appl Physiol (February 19, 2015). doi:10.1152/japplphysiol.00098.2015

1

Individual Responses Made Easy

2 3 4 5 6 7 8 9 10

We all inherit and acquire different characteristics. When we experience a treatment aimed at changing our physiology, these characteristics may modify the effect of the treatment, making it more or less beneficial, harmful or ineffective in different individuals. The issue of individual responses to treatments is therefore one of the most important in experimental research, yet few researchers acknowledge the issue in their published studies, and attempts to quantify individual responses are rare and usually deficient. This climate of neglect and ignorance needs to change, especially now that genome sequencing and pervasive monitoring of individuals can provide researchers with the subject characteristics that account for individual responses and allow more efficient ethical targeting of treatments to individuals.

11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

The review by Hecksteden and colleagues in the current issue (2) is timely, because it deals with some of the methodological challenges in quantifying individual responses in parallel-group controlled trials, where an experimental and control group are measured before and after their respective treatments. The authors assert that proper quantification of individual responses requires repeated administration of the experimental treatment to determine the extent to which each individual's response consists of reproducible and random components additional to the random variation due to error of measurement experienced equally by all individuals. In essence, the reproducible individual responses are those that could be explained by differences between subjects in inherited and acquired stable characteristics or traits, whereas the apparently random responses could be due to changes in subject characteristics or states between administrations of the treatment. Although two or more administrations are indeed required to partition individual responses into these two components, partitioning is possible only if effects of the treatment wash out fully between administrations. Repeated administrations therefore make sense for acute effects of short-term treatments, where the treatments should be administered in crossover fashion (7). For training and other long-term treatments, repeated administrations are seldom logistically feasible, nor are they even possible in principle, if the intent of the study is to provide evidence for a permanent change in behavior or physiology. In any case, for long-term treatments, subject states average out to become subject traits, so one can expect the random component of individual responses due to subject states to become trivial or meaningless.

30 31 32 33 34 35

The Hecksteden et al. review is based on sound but challenging statistical principles, and the focus on repeated administration of treatments may distract researchers from the straightforward and legitimate analysis of individual responses to a single administration of a treatment. The authors have referenced several of my own publications on the appropriate methods (3-6), but I will update and summarize the methods here, in the hope of improving the analysis and reporting of controlled trials at least in this journal.

36 37 38 39 40

Individual responses are manifest as a larger standard deviation of the change scores in the experimental group than in the control group. It is therefore imperative that researchers report the standard deviations of the change scores, along with the means. This simple requirement will also provide all the inferential information needed for inclusion of the study in meta-analyses not only of the mean effect of the treatment but also of the individual responses.

41 42

The individual responses are summarized by a standard deviation (SDIR) given by the square root of the difference between the squares of the standard deviations of the change scores in the

Copyright © 2015 by the American Physiological Society.

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

experimental (SDExp) and control (SDCon) groups: SDIR = (SDExp2–SDCon2). One should consider this standard deviation to be the amount by which the net mean effect of the treatment differs typically between individuals. Confidence limits for the standard deviation are obtained by assuming its sampling variance is normally distributed, with standard error given by [2(SDExp4/DFExp+SDCon4/DFCon)], where DFExp and DFCon are the degrees of freedom of the standard deviations in the two groups (usually their sample sizes minus 1). The upper and lower confidence limits for the true value of the variance of individual responses (SDIR2) are given by its observed value plus or minus this standard error times 1.65, 1.96 or 2.58, for 90, 95 or 99% confidence limits respectively (5). This formula is the basis for the confidence limits provided via mixed modeling with a procedure such as Proc Mixed in the Statistical Analysis System (SAS Institute, Cary, NC). Negative values of variance for the observed individual responses and the confidence limits can occur, especially when there is large uncertainty arising from a small sample size or a large error of measurement. Taking the square root of a negative number is not possible, so I advocate changing the sign first, then presenting the result as a negative standard deviation, interpreted as more variation in the control group than in the experimental group. If the upper confidence limit is negative, the researcher should consider explanations beyond mere sampling variation, such as compression of the responses in the experimental group arising from the treatment bringing subjects to a similar level.

61 62 63 64 65 66 67 68 69

The magnitude of the standard deviation and its confidence limits should also be interpreted. The default approach for interpretation of the change in a mean is standardization, where the mean change is divided by the standard deviation of all subjects at baseline (before the treatment). The thresholds for interpreting the standardized mean change (0.2, 0.6, 1.2, 2.0 and 4.0 for small, moderate, large, very large and extremely large) (6) need to be halved (0.1, 0.3, 0.6, 1.0 and 2.0) for interpreting the magnitude of effects represented by standardized standard deviations (8), including individual responses. Halving the thresholds implies that the magnitude of the standard deviation is evaluated as the difference between a typically low value (mean – SD) and a typically high value (mean + SD).

70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85

To exemplify this approach, I will draw on data for effects of training on maximum oxygen uptake (VO2max) in one of the classic studies of Bouchard and colleagues (1) cited by Hecksteden et al. They reported a change of 384 ± 202 ml.min-1 (mean ± SD) in an experimental group of 720 sedentary young adults. Means and SD at baseline were not shown, although the median was 2159 ml.min-1. Let us assume a baseline of 2200 ± 330 ml.min-1 (mean ± SD). There were no data for changes in a control group, but the "coefficient of variation" for VO2max was stated to be 5%, presumably based on two tests separated by ~1 wk. The standard deviation of change scores over this short period would therefore be 52 or 7.1%, and probably somewhat larger over the 20 weeks of the training study. Let us assume 8%, or 176 ml.min-1. Evidently much of what seems like individual responses was nothing more than noise, and the SD of individual responses to training free of this noise was only (2022–1762) = 99 ml.min-1. The lower and upper 90% confidence limits, assuming 50 subjects contributed to the estimate of the coefficient of variation for VO2max, were -33 and 144 ml.min-1. In standardized units (dividing the values by the assumed baseline SD of 330 ml.min-1), the individual responses were 0.30 (90% confidence limits -0.10 and 0.44). Thus the observed individual responses were borderline small-moderate, but given the uncertainty, the individual responses could have been trivial or at most moderate. Assuming a control group showed

86 87 88 89 90

no mean change over 20 weeks, the standardized mean effect itself was 384/330 = 1.16, or borderline moderate-large. The overall effect of training, after removing the effects of noise, could be summarized as 384 ± 99 (mean ± SD for individual responses), or 1.16 ± 0.30 in standardized units. Thus, ignoring the uncertainties in the mean and SD, in this sample the effect on individuals ranged typically from moderate (mean – SD = 0.86) to large (mean + SD = 1.46).

91 92 93 94 95 96 97 98

The uncertainty in the estimate of individual responses in the above example was large, disappointingly so in view of the assumed reasonably large sample size in the control (50) and the unusually large sample size in the training group (720, which is effectively infinite compared with 50). The main source of the uncertainty is the error of measurement (5%), which is somewhat larger than the smallest important effect (0.2 of the assumed 330 ml.min-1, or 3%). Evidently investigation of individual responses will be successful only with good sample sizes and measures with low noise. As suggested by Hecksteden et al., averaging repeated measurements on each subject before and after the treatments is one solution to the problem of low sample size and noisy measures.

99 100 101 102 103 104 105 106 107 108 109 110

Hecksteden et al. have also reviewed the approaches to identifying positive responders, nonresponders and negative responders. The uncertainty in each change-score response can be expressed as confidence limits by multiplying the appropriate value of the t statistic by SDCon (or by 2 times the standard error of measurement from a relevant reliability study) (3). For many outcome measures, the uncertainty is considerably greater than the smallest important change, so even an inactive control treatment will appear to produce positive and negative responses. True proportions of positive and negative responses are therefore provided by counts of observed responses only rarely, when the error of measurement is much less than the smallest important change. The proportions can be estimated otherwise by assuming the responses have a t distribution defined by the net mean response and the standard deviation for individual responses, and confidence limits for the proportions could be derived by bootstrapping. I do not advise such calculations when the lower confidence limit for the standard deviation is negative.

111 112 113 114 115 116 117 118 119 120 121 122 123 124 125

Finally, Hecksteden et al. provide advice on the complexities of analytical models that include subject characteristics to account for individual responses. In my view these analyses need not be so daunting: it is enough to include each subject characteristic separately as a numeric linear or nominal effect interacted with the group effect to predict the change scores in each group. The predictor can represent a trait or a state of the subjects, in which case it is a moderator of the treatment effect. The predictor can also be a change score for a physiological state variable, in which case its effect represents potential mediation of the treatment effect and/or its individual responses (best understood with a scatterplot of change scores of dependent vs predictor). To the extent that the predictor has a substantial effect, the standard deviation of individual responses will decrease substantially. My spreadsheet for analysis of controlled trials allows for such analyses and provides confidence limits and appropriate evaluation of all effects (5). When subject characteristics are correlated and the researcher is interested in the unique contribution of each, a statistical package is needed to perform multiple linear regression, with each characteristic included as a main effect. Models that include interactions of subject characteristics should be considered only if there is a good theoretical basis for such interactions.

126 127

In summary, adequate quantification of individual responses in a controlled trial requires a large sample size or averaging of repeated measurements to compensate for a large error of

128 129 130 131 132 133 134 135

measurement. The write-up should include means and standard deviations of change scores in all treatment groups. Individual responses should be reported as a standard deviation with confidence limit derived from the standard deviations of the change scores before and after inclusion in the analysis of any subject characteristics representing potential moderators and of any change scores representing potential mediators. The standard deviation at baseline should be used to assess the magnitudes of effects and individual responses by standardization. These recommendations will be novel for most researchers, but they are not especially rocket science, and they need to be implemented.

136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152

1.

153 154

2.

3. 4. 5. 6. 7. 8.

Bouchard C, and Rankinen T. Individual differences in response to regular physical activity. Med Sci Sports Exerc 33: S446-451, 2001. Hecksteden A, Kraushaar J, Scharhag-Rosenberger F, Theisen D, Senn S, and Meyer T. Individual response to exercise training – a statistical perspective. J Appl Physiol (in press): 2015. Hopkins WG. How to interpret changes in an athletic performance test. Sportscience 8: 1-7, 2004. Hopkins WG. Measures of reliability in sports medicine and science. Sports Med 30: 1-15, 2000. Hopkins WG. Spreadsheets for analysis of controlled trials, with adjustment for a subject characteristic. Sportscience 10: 46-50, 2006. Hopkins WG, Marshall SW, Batterham AM, and Hanin J. Progressive statistics for studies in sports medicine and exercise science. Med Sci Sports Exerc 41: 3-12, 2009. Senn S, Rolfe K, and Julious SA. Investigating variability in patient response to treatment: a case study from a replicate cross-over study. Stat Methods Med Res 20: 657-666, 2011. Smith TB, and Hopkins WG. Variability and predictability of finals times of elite rowers. Med Sci Sports Exerc 43: 2155-2160, 2011.

Growth rates made easy.

Bath-taking made easy.

Epigenome editing made easy.

Arterial blood gases made easy.

Breath figure patterns made easy.

Enzyme catalysis: Evolution made easy.

GVHD prophylaxis made safe, easy, and inexpensive.

Hydroacenes Made Easy by Gold(I) Catalysis.

Cryostat sectioning of large brains made easy.

gHRV: Heart rate variability analysis made easy.

Comment on "Stereo-imaging made easy".

Antimicrobials: Targeting of C. difficile made easy.

Muscular ventricular septal defect repair made easy.

Gene therapy. Muscle transfection made easy.

An easy-to-use and ready-made "cricothyrotomy kit".

Synthesis of Barbaralones and Bullvalenes Made Easy by Gold Catalysis.

Editor's note: Clinical trial designs and statistics made easy.

Helicopter emergency medical service scene communications made easy.

Diagnosis of Diseases of the Eye Made Easy.

Haplessly hoping: macaque major histocompatibility complex made easy.

A living light bulb, ultrasensitive biodetection made easy.

Molecular biology made easy. The polymerase chain reaction.

Microfluidics made easy: A robust low-cost constant pressure flow controller for engineers and cell biologists.

A simple click by click protocol to perform docking: AutoDock 4.2 made easy for non-bioinformaticians.