REVIEWS Epidemiology research in rheumatology —progress and pitfalls Deborah P. M. Symmons Abstract | Epidemiology research is a vital component of clinical studies in all medical fields. This Review provides a brief introduction to the methodology and interpretation of population and clinical epidemiology studies of musculoskeletal disorders. Data sources (including ‘big data’ and the issue of missing data), study design (cross‑sectional, case–control and cohort studies, including clinical trial design) and the interpretation of study results are discussed with examples from the field of rheumatology, particularly using findings in patients with rheumatoid arthritis. Two or more treatments can be compared in clinical trials using a variety of study designs including superiority, noninferiority or equivalence. The different types of risk in epidemiological studies—absolute, attributable, background and relative—are important concepts in epidemiological research and their relative usefulness to clinicians and patients should be considered carefully. The potential pitfalls and challenges of generalizing the results of epidemiological studies to understanding disease aetiology and to clinical practice are also emphasized. The aim of the Review is to help readers to critically appraise published articles that use epidemiological designs or methods. Symmons, D. P. M. Nat. Rev. Rheumatol. advance online publication 7 July 2015; doi:10.1093/nrrheum.2015.92

Introduction

Arthritis Research UK Centre for Epidemiology, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, University of Manchester, Manchester M13 9PT, UK. deborah.symmons@ manchester.ac.uk

What is ‘epidemiology’? Epidemiology is defined as the study of the distribution and determinants of disease in populations. Clinical epidemiology is the study of the distribution and determinants of outcomes in populations of patients with a specific disease. This definition means that epidemiological research is an extremely wide field of endeavour—it embraces the appropriate design, analysis and interpretation of all studies of risk factors for the development of disease (including genetic, biochemical, lifestyle and environmental risk factors). Additionally, all studies of the outcome and determinants of outcome of disease fall inside the epidemiology remit. Treatment choice and treatment response (efficacy and adverse events) are important determinants of disease outcome; thus, the design and analysis of clinical trials, as well as longitudinal observational studies of treatment outcome (for example drug registries), are encompassed within ­clinical epidemiology. The increasing ability of epidemiological research to answer these pivotal questions is largely due to improvements in computational power in the past few decades that have given us the ability to assemble and link huge quantities of health-related data from a large number of individuals in a systematic fashion and at an affordable cost. Data assembled for reasons unrelated to epidemiology have often been serendipitously found to be amenable to epidemiological research. As statistical Competing interests The author declares no competing interest.

methodologies developed in parallel with database design and capacity, multiple hypotheses can now be addressed simultaneously (for example in the analysis of the whole human genome), enabling the stratification of data in multiple dimensions and, by adjusting for confounders, focusing on one specific risk factor— all these approaches would never be possible using pen and paper or even a handheld calculator. These developments have brought several challenges to the epi­demi­ologist: the need for robust and logical study design (databases and statistical methods cannot determine whether the question being addressed is plausible); how to solve the problem of missing data; and how to interpret what might seem conflicting results generated from different analyses. This Review explores data sources (including ‘big data’), epidemiological study design and the challenges and potential pitfalls of interpreting the results of epidemiological studies applied to rheumatology. Most examples used in this Review are taken from the study of rheumatoid arthritis (RA).

Data sources Big data The ability to analyze large amounts of data has led to the search for databases that include sufficient highquality and relevant data to allow specific scientific questions to be answered. Establishing such databases de novo is possible, as was the case for the disease registries of the Consortium Of Rheumatology Researchers Of North America (CORRONA)1,2 and for the Norfolk

NATURE REVIEWS | RHEUMATOLOGY

ADVANCE ONLINE PUBLICATION  |  1 © 2015 Macmillan Publishers Limited. All rights reserved

REVIEWS Key points ■■ Epidemiology is the study of the distribution and determinants of disease in populations ■■ The recent increase in computer power has enabled epidemiological studies to be conducted with very large databases and to explore multiple risk factors of outcomes within the same analysis ■■ The main epidemiological study designs are cross-sectional, cohort and case–control ■■ Clinical trials, a type of cohort study, can test the superiority, noninferiority or equivalence of two or more treatments ■■ Absolute and attributable risks are more meaningful to the clinician and patient than relative risks

Arthritis Register (NOAR),3 but the recruitment of sufficient patients with the relevant diagnosis to be followed long enough for the outcomes of interest to occur takes time and considerable financial support. An alternative approach is to seek out existing datasets, assembled for a different purpose, and conduct s­ econdary analyses. Musculoskeletal epidemiological studies have been conducted using large national, regional or occupational-­based health interview surveys or health examination surveys on random samples of the general population. Examples include the US National Health and Nutritional Examination Surveys (NHANES), 4 the UK Biobank5 and the Nurses’ Health Studies6 in the USA. Some health surveys, such as the Nurses’ Health Studies, include a longitudinal component in which participants are contacted every 2–5 years to be asked for an update on their medical history and lifestyle factors such as tobacco smoking. In some instances, for example in the Health Survey for England and NHANES, these studies are open for investigators outside the main survey study team to propose questions or patient exami­nations related to the musculoskeletal system for inclusion in future rounds of contacts. Several Scandinavian countries, including Sweden and Finland, have populationbased national registers of, for example, diagnoses for hospital discharges and outpatient attendances.7 These can be linked together using the unique identity number of each individual—enabling the study of, for example, cancer incidence in patients with RA.8 The term ‘big data’ refers to sources of large, complex and linkable information. Big data include, for example, genomic, medical, environmental, financial, geographic and social media information. Information on rheumatic and musculoskeletal disease occurrence and outcome can increasingly be derived from healthcare administrative databases. Examples include the UK Clinical Practice Research Datalink (CPRD), 9 which was established to maximise the way anonymised UK National Health Service clinical data can be linked to enable multiple types of observational research. In the USA, health administrative databases such as Medicare,10 Kaiser Permanente11 and Veterans Administration,12 designed to support patient care and for billing purposes, can also be used for e­ pidemiological research. Analyses based on such large datasets present the possibility for increased ‘false positive’ studies, and in many cases weak associations might have statistical

significance but lack clinical relevance. In the field of genetics research, this problem was addressed by requiring replication of study findings in independent data sets, and also by setting high barriers in terms of statistical significance testing.13 The time has come for nongenetic epidemiology researchers to do the same.14 An additional important point is that analysis of ‘big data’ is largely hypothesis-generating rather than hypothesis-testing. Medicine must remain evidence-based, and findings from epidemiological studies should be subjected to randomised clinical trials if they are to improve future patient management. Secondary analysis of these large administrative and healthcare databases presents important challenges because the data were not collected for the purposes that they are now being used.15 Researchers are likely to discover that the specific data they would like to analyse have not been collected. Additionally, the application of standard case definitions is not possible in some databases and new algorithms have to be developed; consequently, results might not be comparable across studies. Furthermore, administrative databases are subject to surveillance bias—patients with more frequent contacts with health services are more likely to have additional medical problems identified—and tend to underreport nonbilled procedures. Individual patients might also be monitored for varying and non-overlapping lengths of time. Authors are often not transparent when reporting inclusion and exclusion criteria for a particular analysis or when explaining how the issue of missing data was handled. Complex, multivariable, adjusted statistical tests can only include individuals with complete sets of data, but having all the required items of data present for each individual within administrative databases might be the exception rather than the rule. If the analysis is confined to individuals with complete sets of data, this can result in the exclusion of a large number of patients (and so lead to the reduction of statistical power), which can occur in a nonrandom pattern (thus providing a nonvalid result).

Missing data Addressing the problem of missing data requires an understanding of whether data are missing at random or if data for specific time points or types of individual are systematically absent.16 If the data are missing at random (an assumption which must be tested), missing data can be imputed. Examples of single imputation include ‘last observation carried forward’, often used in the analysis of clinical trials; replacing the missing value with the mean of values for that variable in the remainder of the dataset; or using a regression model to estimate the missing value based on variables available for that individual and the entire dataset. Alternatively, if whole categories of individuals are underrepresented (for example, because young men might have a lower response rate to a survey than elderly women), the responses obtained within that category can be weighted within the analysis. A general problem with single imputation is that the dataset tends to become more homogeneous and

2  |  ADVANCE ONLINE PUBLICATION

www.nature.com/nrrheum © 2015 Macmillan Publishers Limited. All rights reserved

REVIEWS Box 1 | Development of a case definition ■■ Agree on a ‘gold standard’—usually subjective, e.g. the opinion of the physician ■■ Assemble a list of variables likely to discriminate, in combination, between the disease in question and other similar diseases ■■ Assemble a large series of patients (according to the gold standard) and control individuals (who have similar diseases) from a wide variety of sources and collect information on the list of variables ■■ Use statistical modelling to develop algorithms that discriminate between cases and controls with the greatest accuracy ■■ Test the algorithms in a wide range of settings

the uncertainty around the missing data is artificially removed, so the resulting analysis is more likely to be statistically significant than if the dataset had included no missing variables. Alternatively, applying multiple imputations reintroduces the full uncertainty associated with missing data.17 Multiple values are estimated for each item of missing data using statistical models and then pooled for the final analysis. When multiple imputations are carried out, the researchers should also conduct sensitivity analyses to establish whether the conclusions are sensitive to assumptions about the pattern of ‘missingness’.18 The process of data imputation should be reported in manuscripts (in an online supplement, if necessary) so that the reader can understand the assumptions made. This approach is an extension of the initiative to StrengThen the Reporting of OBsErvational studies (STROBE).19 The STROBE statement consists of a check list of 32 items that relate to the structure of epidemio­ logi­c al articles. The statement provides guidance to authors to improve the reporting of observational studies and facilitates the interpretation and critical appraisal of such articles by readers.

Case definition and outcomes

Building up a picture of the epidemiology of a disease is dependent on having a robust case definition used

uniformly across studies. Case definitions are important as entry criteria for clinical studies (of disease aetiology and outcome, for example) and for clinical trials, as well as for studies of disease occurrence. The development of an acceptable case definition comprises a series of steps (Box 1). Several case definitions in current use have been developed by the ACR20 and the EULAR.21 The development of robust outcome measures and measures of treatment response should follow similar steps. Rheumatologists often fail to recognize that recent improvements in disease outcome depend as much on having a core set of outcome measures for each disease,22 and valid and reproducible measures of t­ reatment response,23–25 as they do on new therapeutic agents.

Study design

Study designs in epidemiology can be divided in three main types: cross-sectional, cohort and case–control. These are described in detail in the following sections.

Cross-sectional studies In cross-sectional studies, information about risk factors (for disease onset or outcome) is collected at the same time as disease outcome data. The major disadvantage of these studies is that establishing the direction of any association observed is not possible. For example, in a cross-sectional study of the association of obesity with osteoarthritis of the knee, researchers would not be able to conclude whether obesity was a risk factor for the development of osteoarthritis or vice versa. One exception to this limitation is the study of genetic risk factors, as DNA remains constant throughout the lifespan. Thus, in genome-wide association studies, blood samples are generally collected at the same time as the phenotype of the individual is determined.26–28 By contrast, cohort and case–control studies incorporate the important dimension of time. Time is particularly necessary for the study of environmental risk factors, which encompass lifestyle, occupational and recreational factors (such as air pollution and sunlight exposure) as well as studies of the influence of comorbidities and their treatment. Some environmental

Table 1 | Examples of studies that identified risk factors for the development of RA Study name

Study design

Risk factor identified

Source of risk factor data

Reference

NHS

Nested case–control; new cases of RA vs matched controls from within the NHS

Tobacco smoking

Baseline and two yearly follow-up questionnaires

Costenbader et al. (2006)35

Traffic pollution

Geographic information system linked to area of residence in the year 2000

Puett et al. (2009)36

Circulating 25-hydroxyvitamin D

Nested blood sample collection on 25% of participants

Hirakil et al. (2014)37

NOAR-EPIC

Case–cohort; linkage of a population study and an inception cohort of IP/RA

Tobacco smoking, obesity, low alcohol consumption

Baseline recording in EPIC

Lahiri et al. (2012)38

SEIRA

Case–control; new cases of RA vs population based matched controls

Tobacco smoking

Baseline questionnaire

Stolt et al. (2003)39

Silica exposure

Occupational history (baseline questionnaire)

Stolt et al. (2005)40

Abbreviations: IP, inflammatory polyarthritis; NHS, Nurses’ Health Studies; NOAR-EPIC, Norfolk Arthritis Register–European Prospective Investigation of Cancer; RA, rheumatoid arthritis SEIRA, Swedish Epidemiologic Investigation of RA.

NATURE REVIEWS | RHEUMATOLOGY

ADVANCE ONLINE PUBLICATION  |  3 © 2015 Macmillan Publishers Limited. All rights reserved

REVIEWS Table 2 | Examples of studies that identified predictors for the outcome of RA and the methodology used Study name

Study design

Outcome

Method of identifying outcome

Predictor identified

Reference

Mayo clinic studies41

Inception cohort of RA vs matched cohort without RA

Cardiovascular mortality

Medical record review

Low BMI

Maradit Kremers et al. (2005)42

Norfolk Arthritis Register3

Inception cohort

Mortality

Record linkage to national death registry

RF

Goodson et al. (2002)43

Lymphoma

Record linkage to hospital admission data

RF, DMARD therapy

Franklin et al. (2006)44

X-ray progression

X-rays taken as part of the study

ACPAs, age at onset

Bukhari et al. (2007)45

Medicare10

Nested retrospective cohort study of patients with RA within the Medicare database

Prescription of DMARDs

Prescription of a DMARD as recorded in the Medicare database

Age, socioeconomic status, geographical location

Schmajuk et al. (2011)46

One Canadian and one US health insurance databases

Nested retrospective cohort studies of patients with RA or psoriasis within the health insurance databases

Diabetes mellitus

New diagnosis of diabetes mellitus; use of diabetes-mellitus-related medication reported in the databases

Risk of diabetes mellitus lower in patients treated with a TNF inhibitor or hydroxychloroquine than in those treated with other nonbiologic DMARDs

Solomon et al. (2001)47

Swedish registers7

Multiple population cohort studies

Incidence of cancer

Record linkage between three RA registers and cancer register

TNF inhibitor not associated with an increased risk of cancer

Askling et al. (2005)8

Abbreviations: ACPAs, anti-citrullinated protein antibodies; RA, rheumatoid arthritis; RF, rheumatoid factor.

factors can have an influence many years before disease development, whereas others have an effect close to the onset of symptoms.

Cohort studies In cohort studies, individuals are often recruited prior to the onset of disease and then monitored prospectively. The cohort can be divided into those individuals with or without the exposure of interest, and the relative incidence (risk ratio) of disease onset in the two groups can then be compared. Cohort studies have the advantage that information about exposure status is collected prior to the onset of disease, and so is not subject to recall bias. However, cohort studies have the disadvantage that individuals might have to be monitored for long periods of time before the disease of interest develops, so a robust system of case ascertainment and follow-up is necessary for this approach.29 Consequently, cohort studies are expensive and often not as efficient as case–control studies. However, cohorts can sometimes be identified retrospectively, for example from health administrative databases. These databases can include information

on exposures such as tobacco smoking, body weight, socioeconomic status and occupation, as well as the subsequent medical history of millions of individuals. Cases can also be identified by record linkage between different databases. In prospective studies of disease outcome, patients with a disease (RA, for example) are recruited and monitored until they develop the outcome of interest (such as radiological erosions or cardiovascular disease). Prospective studies can include patients with either prevalent or incident disease, the latter of which are called inception cohorts. Again, retrospective cohorts of patients with a particular disease can be assembled from administrative databases. The key defining features of the cohort study are that information about exposures (risk factors or risk predictors) is collected before the outcome of interest has occurred, and that individuals are divided into groups based on the exposure of interest.

Case–control studies In case–control studies, patients are selected after having already developed the disease or outcome of

SE

Peptidylarginine deiminase

Protein citrullination

Second trigger (e.g. infection, breast-feeding) ACPAs

RA

Figure 1 | Gene–environment interaction in the aetiology of RA. Tobacco smoking may lead toNature citrullination Reviewsof| proteins Rheumatology within the lungs. In individuals with at least one copy of the HLA-DRβ SE, production of autoantibodies directed against citrullinated proteins can occur. ACPAs alone are not sufficient to cause RA—a second trigger is required (such as an infection or breast feeding) which, in individuals with the SE, can lead to RA. 48 Abbreviations: ACPAs, anti-citrullinated protein antibodies; SE, shared epitope; RA, rheumatoid arthritis.

4  |  ADVANCE ONLINE PUBLICATION

www.nature.com/nrrheum © 2015 Macmillan Publishers Limited. All rights reserved

REVIEWS

ACPAs

+

process by which tobacco smoking (or exposure to silica dust) leads to citrullination of proteins in the lung. In individuals with at least one copy of the shared epitope (a 5‑amino acid sequence motif in the third allelic hypervariable region of the HLA-DRβ chain), smoking can be associated with the development of anti-citrullinated protein antibodies (ACPAs). A second trigger seems to be needed for ACPA-positive individuals to develop RA (Figure 1).48 A similar gene–environment interaction has been described as a risk factor for cardiovascular mortality in patients with RA (Figure 2).49

+

2 SE

CVD mortality (HR 7.8) Patients with RA

Figure 2 | Gene–environment interaction in the aetiology of increased Nature Reviews | Rheumatology cardiovascular mortality in RA—results from the Norfolk Arthritis Register. Patients with RA who smoke tobacco, have two copies of the HLA-DRβ SE and are ACPApositive have 7.8-fold higher risk of dying from cardiovascular disease than patients with RA who are nonsmokers, have no copies of the HLA-DRβ SE and are ACPA-negative. Within the cohort of 751 patients with RA in these studies, only 14 had this high CVD risk combination.43 Abbreviations: ACPAs, anti-citrullinated protein antibodies; CVD, cardiovascular disease; SE, shared epitope; RA, rheumatoid arthritis.

interest.30 Information about exposure status is then collected retrospectively, possibly by means of a questionnaire, by searching the medical records of patients electronically or by record linkage.31 The selection of an appropriate control group is often challenging in case–control studies;32–34 controls should be individuals who would be eligible as ‘cases’ if they had developed the disease. In matched case–control studies, controls and cases are matched for characteristics that are not the immedi­ate subject of study, such as age or sex; this particular approach makes the study design more efficient. However, if the study has a matched case–control design, the matching must be maintained during the analysis. ‘Nested’ case–control studies can also be performed within a cohort study. Baseline information on risk factors already collected for cases can be compared with that of controls within the cohort (who can be matched either individually or frequency-matched for factors such as time under follow-up, age, gender or socioeconomic status). If the cases are compared with a random sample of the whole cohort, the study is known as a case–cohort study; in this approach, differences between the cases and the controls, in particular time of follow-up, must be adjusted for in the analysis. Results from case–control studies are expressed as odds ratios. Examples of studies and their designs which have addressed risk factors for the development of RA are shown in Table 1.35–47 Table 2 shows some examples of studies of predictors of outcome in RA.

Risk factors and disease prevention

The increase in the complexity of statistical methodology and in computing power has enabled the study of multiple risk factors and their interactions within the same analysis, including gene–gene interactions and gene–environment interactions. One of the first gene–environment inter­ actions to be described in rheumatology was the potential

Risk prediction models Over the past 10 years, an increasing number of genetic and environmental risk factors for the development of RA have been identified. Models for assigning a risk of developing RA to individuals have been developed using only genetic factors,50 only environmental factors38 or both combined.51 Those at the highest risk could be targeted with preventive strategies such as lifestyle modification (such as smoking cessation or losing weight) or immunomodulatory therapy to prevent the development of RA.52 Similar approaches to prevention are ongoing for other rheumatic diseases.53–56

Studying treatment response Clinical trials Treatment is one of the major determinants of disease outcome. The role of treatment in disease outcome should be formally assessed in clinical trials. A clinical trial is an example of an epidemiological cohort study in which the exposure (treatment under study, active comparison or placebo) is assigned to the patient by the study design. Both pharmacological and nonpharmacological (including surgery, physiotherapy, psychological therapies or education programmes) interventions can be assessed in clinical trials. Comparison groups are often not included in phase I and II trials, which can be ana­ lysed as descriptive, or hypothesis-generating studies. In phase III trials, patients are assigned (usually randomly) to receive the study treatment or an active comparator or a placebo. Wherever possible, the patient (single-blind trial) or both patients and medical staff involved in treating the patient (double-blind trial) should be unaware of which arm of the trial the patient has been assigned to. Importantly, the primary outcome measure must be specified in advance. Phase III trials are, therefore, hypothesis testing and should be analysed as any other longitudinal cohort study comparing the outcome of interest in two or more groups.57 Conventionally, phase III trials are designed and powered to test whether the treatment of interest is superior to the comparator. Newer trial designs include noninferiority trials (which aim to show that the new treatment is no worse than an active comparator) and equivalence trials (which aim to show the new treatment is no better and no worse than an active comparator) (Figure 3). In equivalence trials an acceptable equivalence boundary (∆, the minimum important difference between the treatments) should be set before

NATURE REVIEWS | RHEUMATOLOGY

ADVANCE ONLINE PUBLICATION  |  5 © 2015 Macmillan Publishers Limited. All rights reserved

REVIEWS Maximum allowable difference ‘Control’ is better

‘A’ is better Superiority Equivalence Noninferiority

–Δ

0 True effect



Figure 3 | Different types of trial design comparing treatment ‘A’ with| Rheumatology ‘control’. Nature Reviews The vertical line at zero represents the point where no difference between the two treatments exists, whereas the dotted vertical lines represent the acceptable error around the estimate. If the 95% confidence limits of the difference between the two treatments lie entirely to the right of the purple area, treatment A is superior to control. If these limits lie entirely to the left of the purple area, the control is superior. If the 95% confidence interval lies entirely within the purple area, the two treatments are equivalent. If the 95% confidence limits do not extend to the left of the dashed line, treatment A is noninferior to control. A trial must specify in advance which of these outcomes it is testing, and the study should then be powered accordingly.

the trial begins. The goal of the trial is to demonstrate that treatment with either therapy is equally good, and that the confidence intervals do not exceed a difference of –∆ or +∆. Trials of biosimilar drugs often have this format.58 A noninferiority trial is designed to show that the new treatment is not unacceptably worse than, or is ‘non­ inferior’ to, an active control. Statistically, such a study differs from an equivalence trial because the Δ is only one-sided towards −Δ. Noninferiority is claimed if the lower limit of the CI of the treatment effect difference does not exceed −Δ, meaning that the risk of the new treatment being inferior to the control is within acceptable boundaries. An example of this type of trial is the comparison of etoricoxib and celecoxib for the treatment of osteoarthritis in two noninferiority trials.59 Notably, noninferiority designs have several limitations,60,61 and conclusions from noninferiority trials are highly sensitive to the method of analysis. An intention-to-treat analysis, usually preferred as being more robust in a superiority trial, can be biased towards confirming noninferiority: for example, if a large number of patients ‘cross-over’ from one treatment to the other, groups will be ‘blended’ and outcomes are likely to be similar in an intention-to-treat analysis. Similarly, loss-to-follow-up will also increase the similarity between groups because of the assumption that patients who do not complete the trial have not met the primary endpoint. Therefore, a noninferiority trial should always report both intention-to-treat and perprotocol (or as-treated) analyses, since both analyses have strengths and limitations. The intention-to-treat analysis preserves the advantages of randomization, whereas the per-­protocol analysis reflects the primary hypothesis of noninferiority more closely. Reporting guidelines for these types of trial were published in 2012.62 In all cases, the trial design must be specified in advance.

The notion that clinical trials can be generalized to patients who fulfil the inclusion and exclusion criteria for the trial (and would have been willing to participate in the trial) is common in clinical research. However, in fact, trial results can only be generalized to patients who have similar baseline characteristics to those enrolled in the trial—for example, even if men and women were eligible to enter a trial, its results can only be generalized to women if no men were recruited.

Pharmacoepidemiology Pharmacoepidemiology is a relatively new discipline that studies the use and effects of drugs in large numbers of people.63 Whereas treatment efficacy (how a drug works in ideal circumstances) can be assessed in clinical trials (corresponding to controlled epidemiological experiments), effectiveness (how a drug works in the routine clinical setting) is best judged in large observational studies. The study of drug safety is an important component of pharmacoepidemiology, and clinical trials are usually too small to study any but the most common adverse events. In addition, clinical trials often have strict inclusion criteria that prevent many patients who are prescribed a new medication after licensing from being eligible for inclusion in the trials, usually because of having comorbidities. Thus, studies of drug effectiveness and safety are usually conducted either by setting up registries or longitudinal observational studies specifically for the purpose, or in the context of existing large health administrative databases. Personalized or stratified medicine The identification of predictors of treatment response and of adverse events, particularly to biologic agents, in patients with rheumatic diseases is currently of much interest. Such predictors can be genetic, environmental, relating to lifestyle, sociological or psychological (for example illness beliefs). The goal of this approach is to amalgamate such predictors into algorithms that first facilitate the identification of those patients with the worst prognosis, and then prescribing the drug to which the patient is most likely to respond and least likely to ­experience adverse effects—so-called stratified medicine.64

Interpreting epidemiological studies

The results of analytical epidemiological studies are usually expressed as either risk ratios or as odds ratios. In both cases, these ratios describe the risk associated with a particular determinant in multiples of the background risk. Ratios 50 times more likely to have an MI than women without SLE in the Framingham Offspring study (RR 52.4; 95% CI 21.6–98.5).67 Yet, the background rate of MI in young women is so low that, even with this much higher relative risk, both the absolute rate and the rate of MI attributable to SLE is much higher in older than in younger women with SLE (Table 3). Failure to appreciate these differences might lead to preventive strategies being targeted exclusively at the young.

Conclusion

The contribution of epidemiology to the growing understanding of disease aetiology, outcome and treatment response in rheumatology is considerable and is increasing. Information about risk factors and predictors can be combined into algorithms and used to guide decisions about disease prevention and treatment. The process of developing prediction algorithms that can assign a probability of a particular outcome to an individual is in its infancy, but will play a key role in stratified medicine in the future.

http://www.cms.gov/Research‑StatisticsData‑and‑Systems/Research‑StatisticsData‑and‑Systems.html (2015). Kaiser Permanente. Division of Research [online], http://www.dor.kaiser.org (2015). US Department of Veterans Affairs. Office of Research & Development [online], http:// www.research.va.gov (2015). Khoury, M. J. & Ioannidis, J. P. A. Medicine. Big data meets public health. Science 346, 1054–1055 (2014). Ioannidis, J. P. A., Loy, E. Y., Poulton, R. & Chia, K. S. Researching genetic versus nongenetic determinant of disease: a comparion and proposed unification. Sci. Transl. Med. 1, 7ps8 (2009). Sarrazin, M. S. & Rosenthal, G. E. Finding pure and simple truths with administrative data. JAMA 307, 1433–1435 (2012). Little, R. J. A. & Rubin, D. B. Statistical Analysis With Missing Data 2nd edn (Wiley and Sons, 2002). Sterne, J. A. C. et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009). Ware, J. H., Harrington, D., Hunter, D. J. & D’Agostino, R. B. Missing data. N. Engl. J. Med 367, 1353–1354 (2012).

NATURE REVIEWS | RHEUMATOLOGY

19. von Elm, E. et al. Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ 335, 806–808 (2007). 20. American College of Rheumatology. Practice Management [online], http://www.rheumatology. org/practice/clinical/classification/ classification_criteria_for_rheumatic_diseases (2014). 21. EULAR. The European League Against Rheumatism [online], http://www.eular.org (2015). 22. OMERACT. Outcome Measures in Rheumatology [online], http://www.omeract.org (2015). 23. Prevoo, M. L. et al. Modified disease activity scores that include twenty‑eight‑joint counts. Development and validation in a prospective longitudinal study of patients with rheumatoid arthritis. Arthritis Rheum. 38, 44–48 (1995). 24. van Gestel, A. M. et al. Development and validation of the European League Against Rheumatism response criteria for rheumatoid arthritis. Comparison with the preliminary American College of Rheumatology and the World Health Organization/International League Against Rheumatism Criteria. Arthritis Rheum. 39, 34–40 (1996).

ADVANCE ONLINE PUBLICATION  |  7 © 2015 Macmillan Publishers Limited. All rights reserved

REVIEWS 25. Felson, D. T. et al. American College of Rheumatology. Preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum. 38, 727–735 (1995). 26. Wellcome Trust Case Control Consortium et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010). 27. Panoutsopoulos, K. et al. Insights into the genetic architecture of osteoarthritis from stage 1 of the arcOGEN study. Ann. Rheum. Dis. 70, 864–867 (2011). 28. Mirkov, M. U. et al. Genome wide association analysis of anti-TNF drug response in patients with rheumatoid arthritis. Ann. Rheum. Dis. 72, 1375–1381 (2013). 29. Hunt, J. R. & White, E. Retaining and tracking cohort study members. Epidemiol. Rev. 20, 57–70 (1998). 30. Miettinen, O. S. The “case-control” study: valid selection of subjects. J. Chronic Dis. 38, 543–548 (1985). 31. Correa, A., Stewart, W. F., Yeh, H. C. & Santos‑Burgoa, C. Exposure measurement in case-control studies: reported methods and recommendations. Epidemiol. Rev. 16, 18–32 (1994). 32. Wacholder, S., McLaughlin, J. K., Silverman, D. T. & Mandel, J. S. Selection of controls in casecontrol studies: I. Principles. Am. J. Epidemiol. 135, 1019–1028 (1992). 33. Wacholder, S., Silverman, D. T., McLaughlin, J. K. & Mandel, J. S. Selection of controls in casecontrol studies: II. Types of controls. Am. J. Epidemiol. 135, 1029–1041 (1992). 34. Wacholder, S., Silverman, D. T., McLaughlin, J. K. & Mandel, J. S. Selection of controls in casecontrol studies: III. Design options. Am. J. Epidemiol. 135, 1042–1050 (1992). 35. Costenbader, K. H., Feskanich, D., Mandl, L. A. & Karlson, E. W. Smoking intensity, duration, and cessation, and risk of rheumatoid arthritis in women. Am. J. Med. 119, 503–511 (2006). 36. Puett, R. C., Hart, J. E., Laden, F., Costenbader, K. H. & Karlson, E. W. Exposure to traffic pollution and increased risk of rheumatoid arthritis. Environ. Health Perspect. 117, 1065–1069 (2009). 37. Hirakil, L. T. et al. Circulating 25-hydroxyvitamin D level and risk of developing rheumatoid arthritis. Rheumatology (Oxford) 53, 2243–2248 (2014). 38. Lahiri, M., Morgan, C., Symmons, D. P. & Bruce, I. N. Modifiable risk factors for RA: prevention, better than cure? Rheumatology (Oxford) 51, 499–512 (2012). 39. Stolt, P. et al. Quantification of the influence of smoking on rheumatoid arthritis: results from a population-based case-control study. Ann. Rheum. Dis. 62, 835–841 (2003).

40. Stolt, P. et al. Silica exposure is associated with increased risk of developing rheumatoid arthritis: results from the Swedish EIRA study. Ann. Rheum. Dis. 64, 582–586 (2005). 41. Maradit, K. H., Crowson, C. S. & Gabriel, S. E. Rochester Epidemiology Project: a unique resource for research in the rheumatic diseases. Rheum. Dis. Clin. North Am. 30, 819–834 (2004). 42. Maradit-Kremers, H., Nicola, P. J., Crowson, C. S., Ballman, K. V. & Gabriel, S. E. Cardiovascular death in rheumatoid arthritis: a population based study. Arthritis Rheum. 52, 722–732 (2005). 43. Goodson, N. J. et al. Mortality in early inflammatory polyarthritis: cardiovascular mortality is increased in seropositive patients. Arthritis Rheum. 46, 2010–2019 (2002). 44. Franklin, J., Lunt, M., Bunn, D., Symmons, D. & Silman, A. Incidence of lymphoma in a large primary care derived cohort of inflammatory arthritis. Ann. Rheum. Dis. 65, 617–622 (2006). 45. Bukhari, M. et al. The performance of anti-cyclic citrullinated peptide antibodies in predicting the severity of radiologic damage in inflammatory polyarthritis: result from the Norfolk Arthritis Register. Arthritis Rheum. 56, 2929–2935 (2007). 46. Schmajuk, G. et al. Receipt of disease-modifying antirheumatic drugs among patients with rheumatoid arthritis in Medicare managed care plans. JAMA 305, 480–486 (2011). 47. Solomon, D. H. et al. Association between disease-modifying antirheumatic drugs and diabetes risk in patients with rheumatoid arthritis and psoriasis. JAMA 305, 2525–2531 (2001). 48. Klareskog, L. et al. A new model for the aetiology of rheumatoid arthritis: smoking may trigger HLA-DR (shared epitope)-restricted immune reactions to autoantigens modified by citrullination. Arthritis Rheum. 54, 38–46 (2006). 49. Farragher, T. et al. Association of HLA-DR1 gene with premature death, especially from cardiovascular disease, in patients with rheumatoid arthritis. Arthritis Rheum. 58, 359–369 (2008). 50. Yarwood, A. et al. A weighted prediction score using all know susceptibility variants to estimate rheumatoid arthritis risk. Ann. Rheum. Dis. 74, 170–176 (2015). 51. Karlson, E. W. et al. Association of environmental and genetic factors and gene-environment interactions with risk of developing rheumatoid arthritis. Arthritis Care Res. (Hoboken) 65, 1147–1156 (2013). 52. Deane, K. D. Can rheumatoid arthritis be prevented? Best Pract. Res. Clin. Rheumatol. 27, 467–485 (2013). 53. Eder, L. et al. Association between environmental factors and onset of psoriatic arthritis in patients with psoriasis. Arthritis Care Res. (Hoboken) 63, 1091–1097 (2011).

8  |  ADVANCE ONLINE PUBLICATION

54. Singh, J. A., Reddy, S. G. & Kundukulam, J. Risk factors for gout and prevention: a systematic review of the literature. Curr. Opin. Rheumatol. 23, 192–202 (2011). 55. Simard, J. F. & Costenbader, K. H. What can epidemiology tell us about systemic lupus erythematosus? Int. J. Clin. Pract. 61, 1170–1180 (2007). 56. Felson, D. T. et al. Osteoarthritis: new insights. Part 1: the disease and its risk factors. Ann. Intern. Med. 133, 635–646 (2000). 57. Hackshaw, A. K. A Concise Guide to Clinical Trials (Wiley-Blackwell, 2009). 58. Yoo, D. H. et al. A randomised, double-blind, parallel-group study to demonstrate equivalence in efficacy and safety of CT‑P13 compared with innovator infliximab when coadministered with methotrexate in patients with active rheumatoid arthritis: the PLANETRA study. Ann. Rheum. Dis. 72, 1613–1620 (2013). 59. Bingham III, C. O. et al. Efficacy and safety of etoricoxib 30 mg and celecoxib 200 mg in the treatment of osteoarthritis in two identically designed randomised placebo controlled noninferiority studies. Rheumatology (Oxford) 46, 496–507 (2007). 60. Le Henannf, A., Giraudeau, B., Baron, G. & Ravaud, P. Quality of reporting of non-inferiority and equivalence randomised trials. JAMA 205, 1147–1151 (2006). 61. Head, S. J., Kaul, S., Bogers, A. J. & Kappetein, A. P. Non-inferiority study design: lessons to be learned from cardiovascular trials. Eur. Heart J. 33, 1318–1324 (2012). 62. Piaggio, G. et al. Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA 308, 2594–2604 (2012). 63. Strom, B. L., Kimmel, S. E. & Hennessey, S. (eds) Pharmacoepidemiology 5th edn (WileyBlackwell, 2012). 64. Isaacs, J. D. & Ferraccioli, G. The need for personalised medicine for rheumatoid arthritis. Ann. Rheum. Dis. 70, 4–7 (2011). 65. Galloway, J. B. et al. Anti-TNF therapy is associated with an increased risk of serious infections in patients with rheumatoid arthritis especially in the first 6 month of treatment, updated results from the British Society for Rheumatology Biologics Register with special emphasis on risk in the elderly. Rheumatology (Oxford) 50, 124–131 (2011). 66. Cook, R. J. & Sackett, D. L. The number needed to treat: a clinically useful measure of treatment effect. BMJ 310, 452–454 (1995). 67. Manzi, S. et al. Age-specific incidence rates of myocardial infarction and angina in women with systemic lupus erythematosus: comparison with the Framingham Study. Am. J. Epidemiol. 145, 408–415 (1997).

www.nature.com/nrrheum © 2015 Macmillan Publishers Limited. All rights reserved

Epidemiology research in rheumatology-progress and pitfalls.

Epidemiology research is a vital component of clinical studies in all medical fields. This Review provides a brief introduction to the methodology and...
921KB Sizes 4 Downloads 7 Views