Comment

of GWAS. Additionally, the small effect of common variants might mean that they are not identified as significant in GWAS;7 a real association between such common variants and the disease might be absent because of strong linkage disequilibrium with other loci.1 Another limitation of GWAS is the inability to detect variants with low frequency but a relevant effect size (ie, they increase risk of disease), which can contribute substantially to missing heritability.8 α-1 antitrypsin deficiency is the only accepted and established genetic risk factor for COPD;1 after its first identification in six patients 51 years ago, it is now recognised as the major cause of COPD in thousands of patients worldwide. It has an effect size greater than do common variants. Nevertheless, large-scale GWAS of COPD or lung function have not identified the gene encoding α-1 antitrypsin as a major genetic determinant.9 In other words, if α-1 antitrypsin deficiency had not been detected five decades ago because of the patent phenotypic correlate (ie, absence of α-1 globulin band on agar-gel electrophoresis), SERPINA1 (which encodes α-1 antitrypsin) would not have been recognised as the main COPD gene. Therefore, could a COPD gene with a frequency and effect size similar to SERPINA1 affecting COPD pathways remain unidentified? The chance is small, but the possibility cannot be ruled out. A goal of COPD research is to advance understanding of pathways underlying COPD mechanisms to allow development of therapeutic and preventive strategies. Key developments in the genetics of complex traits in the next few years will depend on technological advances that will reduce the cost of

next-generation whole-genome or exome sequencing, allow investigation of disease-specific stem cells,10 and improve access to epigenomes, miRNA, and replication time profiling. With these advances, it is possible that no COPD gene will be left unturned. Ilaria Ferrarotti, *Maurizio Luisetti Department of Molecular Medicine, Division of Pneumology, Center for Diagnosis of Inherited Alpha1-antitrypsin Deficiency, San Matteo Hospital Foundation, University of Pavia, 27100 Pavia, Italy [email protected] ML has received an unrestricted grant from Grifols to support scientific activities at the Center for Diagnosis of Inherited Alpha1-antitrypsin Deficiency. IF declares that she has no conflicts of interest. 1 2

3

4

5 6

7

8 9

10

Berndt A, Leme AS, Shapiro SD. Emerging genetics of COPD. EMBO Mol Med 2012; 4: 1144–55. Silverman EK, Vestbo J, Agusti A, et al. Opportunities and challenges in the genetics of COPD 2010: an International COPD Genetics Conference report. COPD 2011; 8: 121–35. Obeidat M, Wain LV, Shrine N, et al. A comprehensive evaluation of potential lung function associated genes in the SpiroMeta general population sample. PLoS One 2011: 6: e19382. Cho MH, McDonald M-LN, Zhou X, et al, on behalf of the NETT, ICGN, ECLIPSE, and COPDGene Investigators. Risk loci for chronic obstructive pulmonary disease: a genome-wide association study and meta-analysis. Lancet Respir Med 2014; published online Feb 7. http://dx.doi.org/10.1016/ S2213-2600(14)70002-5. Repapi E, Sayers I, Wain LW, et al. Genome-wide association study identifies five loci associated with lung function. Nat Genet 2011: 42: 36–44. Coxson HO, Dirksen A, Edwards LD, et al. The presence and progression of emphysema in COPD as determined by CT scanning and biomarker expression: a prospective analysis from the ECLIPSE study. Lancet Respir Med 2013; 1: 129–36. Zhou JJ, Cho MH, Castaldi PJ, Hersh CP, Silverman EK, Laird N. Heritability of chronic obstructive pulmonary disease and related phenotypes in smokers. Am J Respir Crit Care Med 2013; 188: 941–47. Manolio T, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature 2009; 461: 747–53. Thun GA, Imboden M, Ferrarotti I, et al. Causal and synthetic associations of variants in the SERPINA gene cluster with Alpha1-antitrypsin serum levels. PLoS Genetics 2013; 8: e1003585. Heard E, Tishkoff S, Todd JA, et al. Ten years of genetics and genomics: what have we achieved and where are we heading? Nat Rev Genet 2010; 11: 723–23.

Phenotyping of COPD: challenges and next steps See Comment page 174

172

That FEV1 alone does not account for the complexity and heterogeneity of chronic obstructive pulmonary disease (COPD) is now well recognised.1 The Global Initiative for Chronic Obstructive Lung Disease (GOLD) global strategy has recently moved away from the recommendation of stratification of the severity of COPD, and choice of treatment by FEV1 alone.2 A new approach to risk assessment has been proposed, including the assessment of symptoms, health status, exacerbation rate, and severity.2 This approach is a tentative move towards personalised treatment for patients with COPD, matching

therapy more closely to a multidimensional assessment of specific patient attributes—the patient’s phenotype. There is huge interest in definition of phenotypes in COPD. The definition “a single or combination of disease attributes that describe differences between individuals with COPD as they relate to clinically meaningful outcomes (symptoms, exacerbations, response to therapy, rate of disease progression or death)” was recently proposed. 3 With progress in knowledge and developments in physiology, lung imaging, medical biology, and genetics, identification of phenotypes www.thelancet.com/respiratory Vol 2 March 2014

Comment

Definition

Method

Comments

A method used to group individuals in a dataset that are similar to each other on several variables while separating groups of individuals that differ from each other. It consists of Steps 1 and 2, described below.

Uses statistical measures of distance such as Gower’s distance or Euclidean distance. Clusters are created by finding the optimum grouping of individuals closest to each other, resulting in high within-cluster similarities, and low inter-cluster similarities.

Being a statistical, data-driven technique, clusters generated are less susceptible to a priori hypotheses than are the results of hypothesis-driven analyses, and are prone to spurious groupings. However, clusters can be created from any dataset, and will match selection biases.

A dimension-reduction technique that Principal component analysis identifies the components of the data that account for maximum variability between (PCA) individuals.

Each subject is projected on to these components to form linear combinations of the original variables (eigenvectors). Cluster analysis is then performed on these eigenvectors.

The vectors formed by PCA and subsequently used for cluster analysis do not have any clinical interpretation, which makes the interpretation of clusters based on PCA vectors difficult.

Factor analysis

A dimension-reduction technique that analyses co-variance between individuals.

Shared co-variances are used to identify “factors” that are tied specifically to constructs of the disease that are understood.

As the factors identified are real attributes, and not mathematical abstractions, factor analysis is more closely related to biological plausibility than is PCA. However, it is also dependent on selection of variables and the assumptions of linear versus non-linear relationships between variables.

Self-organising maps

A dimension-reduction technique that uses artificial neural network methodology.

Topographical maps are created using nodes—twodimensional vectors created from the data. Clusters are then derived from these topographical maps.

A newer method with strengths and limitations that are similar to those of PCA.

A priori identification of variables

Consulting published work or experts to choose the variables used to define phenotypes.

Cluster analysis

Step 1: selection of variables

Less data-driven than PCA or factor analysis, and might avoid spurious selection of variables, but might miss previously unidentified characteristics.

Step 2: formation of the clusters Tree-based analysis

A hierarchical clustering method.

Subjects are clustered in a way that each cluster is part of a larger cluster (tree structure).

The number of clusters depends on the level at which the data are partitioned, and as such does not need to be pre-specified.

K-means clustering

A non-hierarchical clustering method.

Using statistical techniques, the data are split into For k number of subjects, the number of clusters can be as high varying numbers of clusters that are mutually exclusive. as (k-1).

Table: Terms used and methods of cluster analysis

of COPD that provide prognostic information to alter clinically meaningful outcomes is an urgent medical need. However major challenges remain in studies on COPD phenotyping—many methodological problems at the stages of selection of patients, completeness of dataset, statistical analysis, and longitudinal follow-up limit our capacity to properly define COPD phenotypes. Studies usually sample patients from respiratory or COPD clinics in referral centres—eg, a convenience sample.4–7 Such studies could derive phenotypes showing the severe end of the disease range, and not necessarily early disease that exists in the population. A study might include only individuals with very severe disease,6 limiting the external validity of the results. A population-based sampling strategy might overcome this limitation and better mirror the population of asymptomatic and symptomatic COPD patients at large. Another problem often seen in the studies is selective inclusion of patients with complete datasets used for the statistical derivation of phenotypes.4–6,8 If excluded patients differ on important characteristics from patients included in the analysis, this could lead to biased results. Multiple imputation, a strategy used to substitute missing data with a range of simulated data, provides estimates for outcomes being studied, while statistically accounting for www.thelancet.com/respiratory Vol 2 March 2014

the uncertainty associated with missing data, and could be used in future studies to overcome this limitation. Finally, derivation of phenotypes from cross-sectional data is the usual strategy in most existing studies,4,6,7,9 and such a strategy, although very useful for understanding of phenotypes at a point in time, does not enable assessment of the temporal stability of these phenotypes. A longitudinal cohort with repeated measures would be needed to explore sources of variability in phenotypes and their evolution. Additionally, the effect of environmental exposures, medications, and interventions can be best examined in prospective studies. Prospective assessment of the usefulness of the established phenotypes is the only way to enable an iterative validation process in which candidate phenotypes are identified before their relevance to clinical outcomes is established. Another major challenge in studies on COPD phenotyping is the approach to derivation of phenotype. There are two steps to derivation of a phenotype: choice of the variables; and clustering of individuals with similar profiles (table). Researchers have used various statistical methods for both steps. Selection of variables is crucial, as true clinical clusters might be overlooked in favour of random clusters formed because of the inclusion of too 173

Comment

many variables. Principal component analysis is often used to simplify the data, but the clinical meaningfulness of the components is difficult to interpret.10 Use of factor analysis is thought to be a robust method. Consulting of experts or published work for the a priori selection of relevant variables might be preferable to inclusion of huge amounts of data to derive phenotypes. This approach avoids selection of variables that might not be truly important to the phenotype, at the risk of missing previously unidentified patient characteristics. Cluster analysis is a method used to group individuals in a dataset that are similar to each other in several variables, while separating groups of individuals that differ from each other. Several different clustering methods have been used, such as k-means clustering,6,8,9 tree-based supervised cluster analysis,11 and newer methods that need further validation, such as self-organising maps.7 Since they are statistical, data-driven techniques, derived clusters are less susceptible to a priori hypotheses, and are prone to spurious groupings. Moreover, clusters can be created out of any dataset, and will match the subset of patients sampled. An important step after derivation of phenotypes is to test their internal and external validity. Few studies investigate the robustness of phenotypes to the statistical methods used in clustering, and the variables used to define the clusters. Poor validation of the phenotypes in an external population is also a major problem in most studies, limiting the validity of these derived phenotypes. In summary, many methodological limitations are observed in COPD phenotyping studies. Derivation of phenotypes would be best done with prospective data, enabling the assessment of temporal variability and stability of the features. An ideal study design would be longitudinal with repeated measurements to assess multiscale abnormalities—ie, clinical expression (person

scale), physiological and lung imaging (organ scale), airway or systemic inflammation (cell-tissue scale), and genetic variants (cell-genome scale). However, we will probably obtain the full spectrum of COPD phenotypes through multiple cohorts from different settings. *Jean Bourbeau, Lancelot M Pinto, Andrea Benedetti Respiratory Epidemiology and Clinical Research Unit, Montréal Chest Institute, 3650 St Urbain, Room K1.32, Montréal, QC, Canada H2X2 P4 (JB, LMP, AB), Department of Medicine (JB, AB), and Department of Epidemiology, Biostatistics and Occupational Health (AB), McGill University Health Centre, McGill University, Montréal, QC, Canada H2X 2PA [email protected] We declare that we have no competing interests. 1 2

3

4

5 6

7

8

9

10 11

Agusti A, Calverley PM, Celli B, et al. Characterisation of COPD heterogeneity in the ECLIPSE cohort. Respir Res 2010; 11: 122. Vestbo J, Hurd SS, Agusti AG, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Resp Crit Care Med 2013; 187: 347–65. Han MK, Agusti A, Calverley PM, et al. Chronic obstructive pulmonary disease phenotypes: the future of COPD. Am J Resp Crit Care Med 2010; 182: 598–604. Burgel PR, Paillasseur JL, Caillaud D, et al. Clinical COPD phenotypes: a novel approach using principal component and cluster analyses. Eur Respir J 2010; 36: 531–39. Burgel PR, Roche N, Paillasseur JL, et al. Clinical COPD phenotypes identified by cluster analysis: validation with mortality. Eur Respir J 2012; 40: 495–96. Cho MH, Washko GR, Hoffmann TJ, et al. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation. Respir Res 2010; 11: 30. Vanfleteren LE, Spruit MA, Groenen M, et al. Clusters of comorbidities based on validated objective measurements and systemic inflammation in patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med 2013; 187: 728–35. Garcia-Aymerich J, Gomez FP, Benet M, et al. Identification and prospective validation of clinically relevant chronic obstructive pulmonary disease (COPD) subtypes. Thorax 2011; 66: 430–7. Spinaci S, Bugiani M, Arossa W, Bucca C, Rolla G. A multivariate analysis of the risk in chronic obstructive lung disease (COLD). J Chronic Dis 1985; 38: 449–53. Ringner M. What is principal component analysis? Nat Biotech 2008; 26: 303–04. Disantostefano RL, Li H, Rubin DB, Stempel DA. Which patients with chronic obstructive pulmonary disease benefit from the addition of an inhaled corticosteroid to their bronchodilator? A cluster analysis. BMJ Open 2013; 3: e001838.

Coming off the GOLD standard See Comment page 172

174

The one second forced expiratory volume (FEV1), as Sir Winston Churchill once remarked of the unfortunately named Mr Bossom, is neither one thing nor the other. FEV1 is a measure of air flow but, because it is highly correlated with the forced vital capacity (FVC), is a poor indicator of airflow obstruction. Identification of two basic spirometric measures of ventilatory function is therefore useful—the vital capacity (VC, more usually

the FVC), which gives a rough indication of lung size and assesses restriction, and the FEV1/FVC ratio, which assesses airflow obstruction adjusted for lung size. A low FEV1 by itself indicates either obstruction or restriction. It is puzzling that every method of assessment of severity of COPD1–3 relies on the FEV1 when the measurement of choice for airflow obstruction is generally agreed to be the FEV1/FVC ratio. The most likely www.thelancet.com/respiratory Vol 2 March 2014

Phenotyping of COPD: challenges and next steps.

Phenotyping of COPD: challenges and next steps. - PDF Download Free
52KB Sizes 1 Downloads 4 Views