Preventive Veterinary Medicine 113 (2014) 331–337

Contents lists available at ScienceDirect

Preventive Veterinary Medicine journal homepage: www.elsevier.com/locate/prevetmed

Bias—Is it a problem, and what should we do? Ian R. Dohoo ∗ Department of Health Management, Atlantic Veterinary College, University of Prince Edward Island, Charlottetown, PEI C1A 4P3, Canada

a r t i c l e

i n f o

Article history: Received 24 January 2013 Received in revised form 28 September 2013 Accepted 6 October 2013 Keywords: Bias Systematic error Confounding Selection bias Misclassification bias Quantitative bias adjustment

a b s t r a c t Observational studies are prone to two types of errors: random and systematic. Random error arises as a result of variation between samples that might be drawn in a study and can be reduced by increasing the sample size. Systematic error arises from problems with the study design or the methods used to obtain the study data and is not influenced by sample size. Over the last 20 years, veterinary epidemiologists have made great progress in dealing more effectively with random error (particularly through the use of multilevel models) but paid relatively little attention to systematic error. Systematic errors can arise from unmeasured confounders, selection bias and information bias. Unmeasured confounders include both factors which are known to be confounders but which were not measured in a study and factors which are not known to be confounders. Confounders can bias results toward or away from the null. The impact of selection bias can also be difficult to predict and can be negligible or large. Although the direction of information bias is generally toward the null, this cannot be guaranteed and its impact might be very large. Methods of dealing with systematic errors include: qualitative assessment, quantitative bias analysis and incorporation of bias parameters into the statistical analyses. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Observational studies can be compromised by both random error and systematic error. The latter is also referred to as bias. Over the past 30 years, veterinary epidemiologists have made great progress in improving their handling of random error. Specifically, the advent and widespread adoption of multilevel modelling techniques (also known as random effects models) has enabled researchers to assess the statistical significance of factors operating at multiple levels of the data hierarchy. Although the addition of random effects for “groups” does have a role to play in removing confounding due to unmeasured group-level confounders (Dohoo and Stryhn, 2006), the main contribution of multilevel modelling has been to improve our ability to obtain valid confidence intervals for estimates of

∗ Tel.: +1 902 566 0640; fax: +1 902 620 5053. E-mail address: [email protected] 0167-5877/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.prevetmed.2013.10.008

effects at various levels of the hierarchy (i.e. to quantify the random error correctly). Unfortunately, very little attention has been paid to systematic error. Improving study designs to minimize or eliminate systematic errors is the crucial first step in addressing this problem (and I believe we have made considerable progress in this area), but we cannot eliminate all errors. We should be doing more to address the impact of systematic errors in our research. The objectives of my paper are to: 1. briefly review random and systematic errors, 2. present some thoughts as to the nature, origin and impact of selection bias, 3. present some thoughts as to the magnitude of misclassification bias, 4. provide an overview of approaches for dealing with bias, 5. introduce quantitative bias analysis (QBA), and 6. show how bias parameters can be incorporated into the analysis of observational study data.

332

I.R. Dohoo / Preventive Veterinary Medicine 113 (2014) 331–337

β

Fig. 2. General framework for selection bias. Adapted from Hernan et al. (2004).

Fig. 1. Graphic representation of random and systematic errors. ˇ is an estimate of the parameter of interest and the shaded area of the graph shows the 95% confidence interval around this estimate. This reflects the fact that random error exists and the width of the confidence interval reflects the precision of the estimate. The dotted vertical line shows the true value of the parameter of interest. Assuming that the estimate (ˇ) represents the estimate which would be obtained from an infinitely large sample, the discrepancy between the true value and ˇ is attributable to systematic error.

2. Random and systematic errors Observational studies inevitably collect data on a subset of animals in the population of interest. Even if this subset is a true random sample, the estimate(s) (e.g. risk ratio, odds ratio) will vary somewhat from the true population value as a result of random variation inherent in the sampling process. We usually express our uncertainty in the estimate by computing a confidence interval which provides the reader with some idea as to the range of values within which the true population value might lie. The key features of random error are: we can always reduce the error by increasing the sample size (i.e. increasing the precision of the estimate), and the point estimate is asymptotically unbiased (i.e. in the long run – either with lots of data or with multiple repetitions of the study, the estimate will be correct). Systematic error arises when some feature of the study leads us to obtain an estimate which is not equal to the true population value. Increasing the sample size does nothing to reduce the magnitude of the systematic error. Fig. 1 portrays random and systematic errors graphically. There are three main types of bias which generate systematic errors: confounding, selection bias and information bias. The latter two will be discussed in subsequent sections of this paper. Confounding arises when an unmeasured (or measured but ignored) factor is related to both the exposure and outcome of interest and is not intermediate (in the causal pathway) between the exposure and outcome. Unmeasured confounders can either be “known but not measured” or “unknown (and hence not measured)”. Confounders that are known but not measured might include factors such as a livestock producer’s managerial ability (which is a very difficult factor to measure) which we suspect is related to both risk factors being investigated and also to an outcome (e.g. disease) of interest. Unknown confounders are ones

which we have not even considered as possible sources of confounding. For known (and measured) confounders, confounding can be controlled through the use of: (i) restriction, (ii) matching, or (iii) analytical control (e.g. by inclusion in a statistical model). Analytical control is the most widely used form of control. Given the substantial advances in methods for multilevel modelling over the last 15 years, we can now appropriately control for known confounders at all levels of the hierarchy. For unmeasured confounders at the group level, controlling confounding is facilitated by the use of statistical models which take the hierarchical structure of the data into account. If a confounder is a true group-level confounder (i.e. no variation within groups), simulations have shown that the confounding effects of the group-level confounder are completely removed by including a random effect for group in the model (Dohoo and Stryhn, 2006; Dohoo et al., 2009 – Section 20.4.2). 3. Selection bias There is no consistency in the literature as to how various study populations are named, but for the purpose of this manuscript I will use definitions presented in Dohoo et al. (2009) – Section 2.1.3, which are broadly consistent with the terminology used by Rothman et al. (2008). Target population – the population to which it might be possible to extrapolate results. (The target population is often not clearly defined). Source population – the population from which the study subjects are drawn. Study group (or sample) – the actual subjects (animals or groups of animals) which end up in the study and whose data are used in the analysis. Selection bias arises whenever the study group is not representative of the source population. Selection bias can arise in many ways such as non-response, loss to followup, selective survival, and admission risk bias. The possible causes are numerous and the reader is referred to general textbooks (Dohoo et al., 2009 – Chapter 12; Rothman et al., 2008 – Chapter 12) for a more complete discussion. Throughout this paper, I will generally consider selection bias arising from non-response. Hernan et al. (2004) published a general framework for understanding selection bias. Any factor which is related to – and consequent to – both the exposure and disease can be a source of selection bias. Fig. 2 shows the simplest representation of this situation in that both the exposure (E)

I.R. Dohoo / Preventive Veterinary Medicine 113 (2014) 331–337

333

imputation – have become more widely available in standard statistical software and should be considered. Discussion of these methods is beyond the scope of this paper, so the reader is referred to texts on the subject (Buuren, 2012; Little and Rubin, 2002; Molenberghs and Kenward, 2007). 3.1. Magnitude of selection bias Consider the following scenario:

Fig. 3. Example of when a factor (non-response) initially considered to be a source of selection bias might be a surrogate for an unmeasured confounder.

and outcome (D) are related to the probability of selection, but, in this example, are independent of each other. It is not intuitively obvious why a factor (probability of selection) which is consequent to the outcome of interest should be of concern in a study. However, once selection is controlled for, a spurious association is created between E and D. Selection is always controlled for by a form of restriction – the analysis is restricted to study participants. Although we generally consider non-response to be a potential source of selection bias, that might not always be the case. In some situations, non-response might be closely associated with an unmeasured confounder, making non-response a surrogate for that confounder. In these situations, restricting the analysis to responders will produce a less biased estimate than would analysis based on both responders and non-responders (of course that is not possible). This is represented in Fig. 3. In panel A, it appears that non-response is a source of selection bias in a study of the impact of off-farm employment on mastitis levels in dairy herds. However, as can be seen in panel B, if off-farm employment is strongly associated with low managerial ability (designated “poor farmer” – a confounding factor which is hard to measure), then non-response serves as a surrogate for the confounder and control of the confounder by limiting the analysis to responders would be beneficial. This highlights the need to think carefully about potential causes of selection bias and how they might relate to factors under investigation. One particular cause of this selection bias deserves mention; bias arising from missing data. Although no more (or less) important than other forms of selection bias, recent advances in analytical methods might make it feasible to address missing data during the analysis. Most researchers present results based on “complete-case analyses” – analysis that is based on those observations for which the data were complete. Depending on the mechanism which generated the missing values, a complete-case analysis might just have less power than an analysis which included all cases, or it might be biased (i.e. contain systematic error). Two general approaches to dealing with missing values – maximum likelihood estimation of models which incorporate missing-value parameters, and multiple

• A study in which the response rate was only 30% in both the exposed and non-exposed groups. • Evidence that disease rates were substantially (50%) higher in non-responders than responders. • Responders: p(D+|E+) = 0.2, p(D+|E−) = 0.1. • Non-responders: p(D+|E+) = 0.3, p(D+|E−) = 0.15. Most people would assume that selection bias was a serious problem in this scenario because of the very poor response rate. In fact, there would be no selection bias because the response rate was equally poor in the exposed and non-exposed groups. However, suppose that the example above is modified so that the response rates are substantially different between the two exposure groups: 70% non-response in the exposed but only 30% in the nonexposed then selection bias is present – but the bias evident in the risk ratio is modest (true value = 2.35, observed value = 2.0) despite the large disparity in response rates. This example was presented to highlight three features of selection bias. First, it is often easy to “suspect” selection bias, even when none is present. Secondly, a quick superficial consideration of the problem of selection bias can often lead to very incorrect conclusions as to its impact. Thirdly, the magnitude of selection bias might- or might not-be, much less than initially expected. Two recent studies of the impact of selection bias on controversial topics in human epidemiology exemplify this problem. In an assessment of the impact of selection bias on case–control studies of household pesticide exposure and childhood leukemia, Aydin et al. (2011) found that selection bias would not completely explain the positive effects observed in these studies (i.e. the conclusion of a positive effect was legitimate), but most of the uncorrected estimates were biased away from the null (i.e. larger than the corrected estimates). On the other hand Rudant et al. (2010) carried out a simulation study to evaluate the potential effects of both selection bias and information bias (recall error) on studies of the effect mobile phone use on risk of brain tumours. Those authors found that within plausible ranges of selection probabilities and levels of recall bias, there would be very little impact on observed associations. 4. Information bias Information bias arises whenever an investigator has the wrong information about an exposure or outcome of

334

I.R. Dohoo / Preventive Veterinary Medicine 113 (2014) 331–337

Table 1 Effect of misclassification bias on study results. The exposure (E) was measured without error, but the disease (D) was measured using a diagnostic test with sensitivity = 0.6 and specificity = 0.9. Although the “true” OR = 6.0, the “observed” OR = 2.67.

D+ D−

Given the potentially large effects of misclassification bias on our study results, this source of bias deserves more attention in the veterinary literature. 5. Approaches for dealing with bias

“True” population data

Observed data

E+

E−

E+

E−

300 200

100 400

200 300

100 400

interest or about other covariates included in the model. As with selection bias, there are many causes (recall bias, imperfect diagnostic tests, errors in data collection or transcription, etc.) If the error is in a variable measured on a continuous scale, it is termed “measurement error”; errors in categorical variables are referred to as “misclassification errors”. Errors may also be classified as non-differential (if they are equal in the two groups being compared) or differential (if they are not equal) (Dohoo et al., 2009 – Section 12.6). Given that virtually all diagnostic tests used to generate data for observational studies are imperfect, it behoves us to think about the potential magnitude of those errors. Table 1 shows some hypothetical data with the true population values presented in the first two columns, and the observed data that were generated by a diagnostic test with sensitivity = 0.6 and specificity = 0.9 presented in the last two columns. As can be seen, the true odds ratio (OR) for these data (OR = 6.0) is reduced to an observed OR = 2.67 by the information bias present – a very serious underestimation of the true effect. In my experience, misclassification bias frequently produces serious distortion of study results given the nature of many of the diagnostic tests we use. One reason why the problem of misclassification bias is frequently overlooked is that if the errors are nondifferential, we assume that the bias will always be toward the null (i.e. we will under-estimate the effect) – which is sometimes considered a less serious error than over-estimating the effect. Although non-differential misclassification bias is generally toward the null, this is not assured for the following reasons. • Although the expectation of a non-differential misclassification of a binary exposure or outcome is toward the null, observed results in a given study are not totally predictable and can be in either direction (Jurek et al., 2006). • It is not possible to predict the direction of the bias for specific levels of exposure variables that have more than two categories (Weinberg et al., 1994). • If misclassification errors of both the exposure and outcome are dependent, then even if they are both nondifferential, they can produce bias in either direction (Kristensen, 1992). This situation might arise when the same method (e.g. a questionnaire) is used to obtain data on both the exposure and outcome. • If a confounder is measured with error, our ability to remove the confounding effect from the model might be greatly reduced, resulting in bias in either direction.

There are five major approaches to dealing with systematic errors in observational research. 1. Modify the study design to minimize or eliminate the bias. This should always be done and will not be discussed further; it is beyond the scope of this paper. 2. Ignore the bias or insert some sort of “wishy–washy” statement such as “The results of this study may have been affected by selection bias so should be interpreted with caution”. This statement is meaningless and is usually inserted by authors who want to cover themselves in case future work shows their results to be completely incorrect. Neither ignoring the bias nor inserting such a statement is acceptable and will not be considered further. 3. Perform a qualitative assessment of the biases which are of the greatest concern. 4. Perform a quantitative assessment (QBA) of the biases which are of the greatest concern. 5. Modify the analysis to incorporate bias parameters into the statistical modelling. A qualitative assessment of the biases of concern should consider both the direction and expected magnitude of the bias(es) of concern. This is not an easy process. As was seen in the selection bias example presented above, a superficial consideration of the issue is very likely to produce incorrect results. A qualitative assessment becomes extremely difficult if you have concern about multiple biases. Their impacts can be in opposite directions and it is virtually impossible to estimate their net effect correctly. In the absence of good estimates of bias parameters (described Section 6), qualitative analyses are the best we can do. However, to get an idea of the potential magnitude of non-response and misclassification biases, spreadsheets in which you can create a hypothetical study population and superimpose various levels of bias are available at www.upei.ca/ver. Reviewers and journals have a role to play in encouraging thorough qualitative analyses, but in my experience, often fail to do so. In 2012, I had the discouraging experience of an editor (not from this journal) insisting that I remove all discussion about the potential impact of systematic errors in a paper I published because I did not have quantitative estimates of bias parameters and references to back them up. Ultimately, the bias discussion was left in the manuscript – but it was discouraging that the suggestion to remove it was even made at the editorial level. 6. Quantitative bias analysis A QBA is an analysis conducted after a study has been completed; the purpose of a QBA is to adjust the study results for suspected systematic errors. The estimates of systematic error can then be combined with estimates of

I.R. Dohoo / Preventive Veterinary Medicine 113 (2014) 331–337

• Selection bias – sampling fractions for each of the exposure/disease categories (i.e. the proportion of the source population included in the study group within each exposure/disease category) • Misclassification bias – the sensitivity and specificity of the tests used for parameters measured with error (e.g. sensitivity and specificity of the exposure or outcome classification) • Unmeasured confounders – the prevalence of the confounder along with the strength of its associations with both the exposure and outcome. Ways of generating estimates of the bias parameters are described in (Lash et al., 2009) – but it should be kept in mind that an educated guess is probably better than assuming that no bias exists. With estimates of those parameters in hand, the investigator can perform either a simple or a probabilistic QBA. In simple analysis, point estimates of the bias parameters are used to adjust observed results for the bias of concern. In a probabilistic analysis, rather than using point estimates of the bias parameters, the investigator specifies distributions for the parameters to reflect the fact that the parameters are probably not known with certainty. An iterative process ensues in which bias parameter estimates are drawn repeatedly from the distribution(s) and used to compute adjusted estimates for the parameter of interest. The result of this process is a distribution of results which may be presented graphically. Example 12.4 in Dohoo et al. (2009) is repeated here (with permission) to demonstrate simple and probabilistic QBAs. The example is based on results from a case–control study by Nodtvedt et al. (2007) who observed that lactating dogs whose diet included no home-cooked foods had an increased risk of atopic dermatitis (OR = 2.33). For pedagogic purposes, we postulated that selection bias had arisen if owners of cases were more likely to participate in the study than owners of control dogs, as were owners who fed home cooked foods. (Note: I do not in any way imply that selection bias did impact this study, this example is for pedagogic purposes only). Table 2 shows the observed data for the study and the estimates of the bias parameters used in the simple and probabilistic QBAs. Triangular distributions were chosen for the sampling fractions because they are one of the easiest distributions to specify and visualize. The simple QBA produced an adjusted estimate of the OR of 1.40 – suggesting that, if the estimates of the sampling fractions were correct, the original estimate of the detrimental effect of having no home cooked foods was over-estimated. Results from the probabilistic QBA are shown in Fig. 4. The median value of the resultant distribution is now 1.29 with 30% of the individual estimates being 1). This was to be expected given that the study assumed non-differential misclassification errors and the outcome was binary. However, the magnitude of the errors was surprising. Looking at the effects of sand bedding (compared to straw) on the incidence of new intramammary infections, the OR was underestimated by a factor of 1.9 (0.51 vs. 0.27). For the elimination of existing infections, the discrepancy was even larger – the effect of sand bedding was underestimated by a factor of 2.9 (1.7 vs. 4.9). Although the underestimated effects remained statistically significant, this would constitute a form of Type-II error (failing to detect the full effect that exists). Interestingly, even though effects were biased toward the null, it was possible to have Type-1 errors as well. Adjustment for misclassification of the effect of type of ration on the prevalence of CNS moved the OR from 1.3 to 1.6 – but it lost statistical significance as a result of the uncertainty in the estimates of sensitivity and specificity that were used in the adjustment. Failure to adjust would have resulted in this risk factor being (erroneously) declared “statistically significant”.

8. Conclusions Much progress has been made in the quality of statistical analyses in the veterinary epidemiology literature. This has been brought about by a combination of advances in methodology (particularly in the area of multilevel models) and a much greater awareness of the need for appropriate analyses. Consequently, most published research now appropriately accounts for – and presents – the random error associated with a study. Unfortunately, comparable progress has not been made in dealing with systematic error. Lack of information about the bias parameters necessary to adjust for systematic errors almost certainly contributes to researchers’ unwillingness to tackle this problem. Some might assume that – in the absence of reliable estimates of these parameters – it is better not to adjust for systematic errors. However, ignoring the issue is equivalent to making the assumption that no biases exist (e.g. the sensitivities and specificities of the procedures used for classifying exposure and outcome were all 100%). This is almost certainly incorrect and the use of educated guesses of bias parameters in a QBA is almost certainly better than ignoring the issue. I recommend that authors carefully consider the potential impacts of systematic errors in their research and, at a minimum, provide a carefully considered qualitative assessment of those impacts. If any information at all is available on the bias parameters needed to adjust for systematic errors, the application of some form of QBA is encouraged. Finally, when designing studies, investigators should consider incorporating (into the study) data collection which will enable estimation of the bias parameters required to carry out a QBA of the study’s results. Editors also have a role to play in dealing with this issue. Manuscript submissions which ignore the possibility of bias, or deal with it in a superficial and non-informative manner, should not be accepted.

I.R. Dohoo / Preventive Veterinary Medicine 113 (2014) 331–337

Conflict of interest The author declares that he has no financial or personal conflict of interest that would interfere with this article. References Aydin, D., Feychting, M., Schuz, J., Andersen, T.V., Poulsen, A.H., Prochazka, M., Klaeboe, L., Kuehni, C.E., Tynes, T., Roosli, M., 2011. Impact of random and systematic recall errors and selection bias in case–control studies on mobile phone use and brain tumors in adolescents (CEFALO study). Bioelectromagnetics 32, 396–407. Buuren, V., 2012. Flexible Imputation of Missing Data. CRC Press, Boca Raton, FL. Dohoo, I., Martin, S., Stryhn, H., 2009. Veterinary Epidemiologic Research. VER Inc., Charlottetown. Dohoo, I., Andersen, S., Dingwell, R., Hand, K., Kelton, D., Leslie, K., Schukken, Y., Godden, S., 2011a. Diagnosing intramammary infections: comparison of multiple versus single quarter milk samples for the identification of intramammary infections in lactating dairy cows. J. Dairy Sci. 94, 5515–5522. Dohoo, I.R., Stryhn, H., 2006. Simulation studies on the effects of clustering. In: Proc. Int. Symp. Vet. Epidemiol. Econ., Vina del Mar, Chile. Dohoo, I.R., Smith, J., Andersen, S., Kelton, D.F., Godden, S., 2011b. Diagnosing intramammary infections: evaluation of definitions based on a single milk sample. J. Dairy Sci. 94, 250–261. Dufour, S., Dohoo, I.R., Barkema, H.W., Descoteaux, L., Devries, T.J., Reyher, K.K., Roy, J.P., Scholl, D.T., 2012. Epidemiology of coagulase-negative staphylococci intramammary infection in dairy cattle and the effect of bacteriological culture misclassification. J. Dairy Sci. 95, 3110–3124.

337

Greenland, S., 2003. The impact of prior distributions for uncontrolled confoundingand response bias: a case study of the relation of wire codes and magnetic fields to childhood leukemia. J. Am. Stat. Assoc. 98, 47–54. Hernan, M.A., Hernandez-Diaz, S., Robins, J.M., 2004. A structural approach to selection bias. Epidemiology 15, 615–625. Jurek, A.M., Maldonado, G., Greenland, S., Church, T.R., 2006. Exposuremeasurement error is frequently ignored when interpreting epidemiologic study results. Eur. J. Epidemiol. 21, 871–876. Kristensen, P., 1992. Bias from nondifferential but dependent misclassification of exposure and outcome. Epidemiology 3, 210–215. Lash, T., Fox, M., Fink, A., 2009. Applying Quantitative Bias Analysis to Epidemiologic Data. Springer, New York. Little, R., Rubin, D., 2002. Statistical Analysis with Missing Data, second ed. John Wiley and Sons, Hoboken, NJ. McInturff, P., Johnson, W.O., Cowling, D., Gardner, I.A., 2004. Modelling risk when binary outcomes are subject to error. Stat. Med. 23, 1095– 1109. Molenberghs, G., Kenward, M., 2007. Missing Data in Clinical Studies. John Wiley and Sons, Chichester, UK. Nodtvedt, A., Bergvall, K., Sallander, M., Egenvall, A., Emanuelson, U., Hedhammar, A., 2007. A case–control study of risk factors for canine atopic dermatitis among boxer, bullterrier and West Highland white terrier dogs in Sweden. Vet. Dermatol. 18, 309–315. Rothman, K.G., Greenland, S., Lash, T., 2008. Modern Epidemiology, third ed. Lippincott, Philadelphia. Rudant, J., Clavel, J., Infante-Rivard, C., 2010. Selection bias in case–control studies on household exposure to pesticides and childhood acute leukemia. J. Exp. Sci. Environ. Epidemiol. 20, 299–309. Weinberg, C.R., Umbach, D.M., Greenland, S., 1994. When will nondifferential misclassification of an exposure preserve the direction of a trend? Am. J. Epidemiol. 140, 565–571.

Bias--is it a problem, and what should we do?

Observational studies are prone to two types of errors: random and systematic. Random error arises as a result of variation between samples that might...
547KB Sizes 0 Downloads 0 Views