Papers in Press. Published October 27, 2014 as doi:10.1373/clinchem.2014.226571 The latest version is at http://hwmaint.clinchem.org/cgi/doi/10.1373/clinchem.2014.226571 Clinical Chemistry 61:1 000 – 000 (2015)

Cancer Diagnostics

Chromosomal Instability in Cell-Free DNA Is a Serum Biomarker for Prostate Cancer Ekkehard Schu¨tz,1 Mohammad R. Akbari,2,5 Julia Beck,1 Howard Urnovitz,1 William W. Zhang,3 Kirsten Bornemann-Kolatzki,1 William M. Mitchell,4* Robert K. Nam,3 and Steven A. Narod2,5

BACKGROUND: Genomic instability resulting in copy number variation is a hallmark of malignant transformation and may be identified through massive parallel sequencing. Tumor-specific cell free DNA (cfDNA) present in serum and plasma provides a real-time, easily accessible surrogate. METHODS:

DNA was extracted from serum of 204 patients with prostate cancer (Gleason score 2–10), 207 male controls, and patients with benign hyperplasia (n ⫽ 10) and prostatitis (n ⫽ 10). DNA was amplified by use of random primers, tagged with molecular identifiers, sequenced on a SOLID system, and aligned to the human genome. We evaluated the number of sequence reads of cfDNA in sliding 100-kbp intervals for variation from controls. We used chromosomal regions with significant variations in alignment hits for their ability to segregate patients and matched controls.

RESULTS:

Using ROC curves to assess diagnostic performance, we evaluated the number of regions in a first subset (n ⫽ 177), with variations in alignment hits alone, provided an area under the curve (AUC) of 0.81 (95% CI 0.7– 0.9, P ⬍ 0.001). Using 5 rounds of 10-fold cross-validation with the full dataset, we established a final model that discriminated prostate cancer from controls with an AUC of 0.92 (0.87– 0.95), reaching a diagnostic accuracy of 83%. Both benign prostatic hypertrophy and prostatitis could be distinguished from prostate cancer by use of cfDNA, with an accuracy of 90%. CONCLUSIONS: Assessment of a limited number of chromosomal structural instabilities by use of massive parallel sequencing of cfDNA was sufficient to distinguish between prostate cancer and controls. This large cohort

1

Chronix Biomedical, Göttingen, Germany; 2 Women’s College Research Institute, Women’s College Hospital, University of Toronto, Toronto, Canada; 3 Division of Urology, Sunnybrook Research Institute, University of Toronto, Toronto, Canada; 4 Department of Pathology, Microbiology, and Immunology, Vanderbilt University, Nashville, TN; 5 Dalla Lana School of Public Health, University of Toronto, Toronto, Canada. * Address correspondence to this author at: Department of Pathology, Microbiology, and Immunology, Vanderbilt University, Nashville, TN 37235. Fax 615343-7023; e-mail [email protected].

demonstrates the utility of cfDNA in prostate cancer recently established in other malignant neoplasms. © 2014 American Association for Clinical Chemistry

The prostate is the most common site of cancer in men, with 240 000 new cases diagnosed annually in the US and approximately 28 000 yearly deaths. Prostate screening typically relies on digital rectal exam and prostate specific antigen (PSA).6 Limitations of the PSA test include its relatively low diagnostic sensitivity and specificity and its inability to distinguish between lowgrade and high-grade lesions. Recent screening trials suggest that PSA-based screening programs result in small or no reduction in mortality, with significant treatmentrelated adverse events (1 ). Better serum/plasma biomarkers are needed to supplement the inexpensive PSA test in the diagnosis and management of a disease with a multiplicity of presentations and clinical outcomes. The hypothesis that DNA from cell-free plasma or serum can be used for the preclinical detection of human malignancies has been studied for ⬎15 years (2 ). These studies are all based on the ability to distinguish cancer-specific DNA markers from those of nonmalignant tissues. Our interest in prostate cancer is based on the recent discoveries that almost all cancer types exhibit chromosomal structural instability, through either progressive evolution or catastrophic genomic events (3 ). A constant amount of routine cell-free DNA (cfDNA) present through apoptosis can be distinguished from neoplastic apoptosis (4 ). In cancer patients, cfDNA is released from apoptotic cells as nucleosomes from both healthy and diseased tissue that includes tumor cells as well as microbial nucleic acids from systemic infections (5 ). The clinical utility

Received May 1, 2014; accepted September 19, 2014. Previously published online at DOI: 10.1373/clinchem.2014.226571 Nonstandard abbreviations: PSA, prostate specific antigen; cfDNA, cell-free DNA; NGS, next-generation sequencing; WGA, whole-genome amplification; CNV, copy number variation; CNI, copy number instability; CpG, C-phosphate-G; MSA, mass sequence and assembly. Abstract presentation: ASCO 2013; Quantifying copy number variations in cellfree DNA for potential clinical utility from a large prostate cancer cohort. J Clin Oncol 2013;31(15S):5072.

6

1

Copyright (C) 2014 by The American Association for Clinical Chemistry

of cfDNA as a biomarker is contingent on the ability to distinguish between those origins. The statistical power provided by massive parallel sequencing in next-generation sequencing (NGS) platforms is ideally suited for the detection in cfDNA of differences in chromosomal instability as regional DNA ploidy heterogeneity (6 ). Embryonic cfDNA can be detected in maternal serum even if it represents ⬍0.5% of the total DNA present (7 ). The number of unique sequence reads is the determining factor in identifying regional DNA ploidy heterogeneity in trace amounts of pathologic cfDNA. By calculating a genome-wide z-score, Heitzer et al. (8 ) could identify tumor-associated aneuploidy by low-coverage cfDNA sequencing and were able to discriminate plasma samples from men with and without prostate cancer at a detection limit of 1% tumor-derived cfDNA. The recent availability of NGS platforms with massive capacity has provided the ability to use cfDNA in blood as the basis for developing cancer biomarkers, either for detection or for monitoring of therapeutic efficacy (9 – 21 ). Quantitative detection of cfDNA as a “liquid biopsy” has been demonstrated to accurately reflect the evolving genomic instabilities observed in cancer (9 – 12, 15–21 ). Canine mammary carcinomas are a heterogeneous collection of histopathological types. Despite this heterogeneity, paired end massively parallel sequencing by Beck et al. (20 ) proved that copy-number imbalances of the tumors were reflected by cfDNA. Moreover, minimal residual disease was detected after surgery as a solitary metastasis a year later. Similar genomic rearrangements and mutations in a variety of human cancers can be detected in cfDNA at a high rate without false positives (15–19 ). Three recently published analyses of cfDNA sequences focused on the evaluation of the genomic origin of coding and noncoding regions that included repetitive elements (5, 13, 14 ). In this study, we explored the utility of assessing cfDNA fragments from the entire prostate cancer genome in serum as a genetic signature of apoptotic prostatic cancer cells as a potential clinically useful biomarker. Materials and Methods We studied 204 biopsy-proven cases of prostate cancer and 207 age-matched healthy controls that were asymptomatic and/or had recent negative prostate biopsies obtained from 3 sources under informed consent and institutional review board approval. An additional 10 patients with benign prostate hyperplasia and 10 patients with prostatitis were included in the study. Whole blood was collected and serum was recovered by centrifugation immediately after clotting and stored frozen (⫺20 °C) in aliquots. We analyzed 89 sera from 2

Clinical Chemistry 61:1 (2015)

prostate carcinoma with a Gleason score ⬍7 and 84 with a Gleason score ⱖ7. Gleason score was not recorded for the remaining 31 samples. Sera from patients with prostate carcinoma were obtained from 76 patients diagnosed at age ⱕ65 years. Immediately before whole-genome amplification (WGA), serum samples (ⱖ200 ␮L) were thawed and centrifuged at 4000g for 20 min to pellet any cellular debris. Total nucleic acids were extracted from 200 ␮L supernatant with the High Pure Viral Nucleic Acid Kit (Roche Applied Science) according to the manufacturer’s instructions, but without the use of carrier RNA. All samples provided sufficient cfDNA for NGS. Each sample was subjected to commercial WGA in independent duplicates, which was conducted in a LightCycler480 with the addition of 1⫻ EvaGreen (GenomePlex Whole Genome Amplification Kit, WGA, Sigma-Aldrich). To avoid bias caused by overamplification during WGA, the amplifications were monitored in real time and were stopped at the first cycle trending outside the linear amplification range. Massive parallel sequencing used the SOLID-P2 NGS platform. We used ROC analyses to evaluate the performance of the final model in the entire sample and in subgroups defined by Gleason score and age of diagnosis. Specific details of sample preparation for sequencing, sequence alignment, data normalization, analysis of copy number variations (CNVs), and hotspot identification including ROC analysis are provided in the Supplemental Data, which accompanies the online version of this article at http:// www.clinchem.org/content/vol61/issue1. All samples contained cfDNA sufficient for low cycle number WGA [mean (SD) 16 (4) cycles per sample]. The mean DNA yield was 3 (1) ␮g. Online Supplemental Table 1 provides the mean number of sequences and sequence size per sample. Sequence data has been deposited with the European Nucleotide Archive. Results SEQUENCING RESULTS

A total of 1.5 ⫻ 109 sequence reads were generated for the 204 patients with prostate cancer and 1.4 ⫻ 109 sequence reads were generated for the 207 healthy controls, for a total combined sequence length of 1.1 ⫻ 1011 nucleotides. Nucleotides of uniquely human origin were defined by the mapping software. Online Supplemental Table 1 provides a summary of the mean fragment lengths and the nucleotide counts in each group and the proportions of sequences/nucleotides that uniquely map to the human genomic database (HG18). PREANALYSIS OF COPY NUMBER INSTABILITY IN cfDNA

We performed the initial analysis on a subset of 177 samples to globally test the ability of CNV detection of regions

cfDNA in Prostate Cancer

Fig. 1. Comparative Circos plots of control and prostate cancer cfDNA quantified by massive parallel sequencing. The chromosome map is located on the external periphery with the kinetochore in red. The relative chromosomal deviations of individual cfDNA samples from combined control average typical cfDNA frequency (green represents gain; red represents loss) are illustrated as inner wheels. (A), Example of 5 healthy controls showing minimal numbers of copy number imbalances. (B), Example of 5 prostate cancer patients showing significant numbers of regional copy number imbalances (DNA ploidy heterogeneity).

of chromosomal structural instability in cfDNA that distinguish patients with prostate cancer from controls. The copy number instability (CNI) score, which can be interpreted as a general measure of genomic instability, is directly related (within the technology limits) to the regional chromosomal DNA ploidy heterogeneity, ranging from 0 to 2008 in the 177 samples,; the median in controls was 9 (95% CI 5–12) and 44 (30 – 67) in prostatic carcinoma. The ROC curve on the basis of this CNI score provided an area under the curve (AUC) of 0.81 (95% CI 0.69 – 0.90) (see online Supplemental Fig. 2). These preliminary data demonstrated that a higher occurrence of copy number imbalances in cfDNA of patients with malignant prostate tumors could be detected compared to controls. Comparative Circos plots from 5 patient and control samples in which the number of regions significantly deviated from euploidy are illustrated in Fig. 1. Evidence of chromosomal structural instability detected as DNA ploidy heterogeneity was widespread in the prostate cancer cohort. CROSS-VALIDATION OF cfDNA GENOMIC ALIGNMENT HIT VARIANCE AS A PREDICTOR OF PROSTATE CANCER

The selected consolidated genomic clusters were subjected to 5 independent rounds of 10-fold crossvalidation each. The number of regional clusters in the final model was restricted to 20. The mean AUC for the

ROC calculated from each of the 50 validation sets was 0.85 (0.06) (P ⬍ 10⫺8). CHROMOSOMAL REGIONAL CLUSTERS OF DNA PLOIDY HETEROGENEITY

Table 1 illustrates the chromosomal distribution of relative locations and cluster compositions (bin components including closely adjacent genes) with their respective increases or decreases in observed apoptotic DNA serum frequencies, on the basis of the cross-validation results. These 20 clusters in the highest abundance of the cross-validation defined the final model that was used for all subsequent analyses of the regional CNI index. The frequencies of the consolidated regions used for crossvalidation and the regions that were in the final model are shown in online Supplemental Fig. 3. ROC ANALYSIS

Table 2 summarizes the ROC analysis by use of the CNI index from cfDNA for the study and its component parts. The AUC of the ROC curve for the entire study was 0.92 (95% CI 0.87– 0.95) (Fig. 2A). An insignificant improvement in AUC (Fig. 2B) was obtained in patients with a low- or moderate-grade (Gleason score ⬍7) neoplasm vs patients with a high-grade (Gleason score ⱖ7) neoplasm. Age had little effect on cfDNA distribution. The inclusion of patients with benign Clinical Chemistry 61:1 (2015) 3

Table 1. Regions of genomic instability basis for cfDNA prostate cancer biomarkers. Bin size, Abundance vs Global/ Weight kbp controlsb localc factord

Chromosome

Region, kbpa

Bin componentse,f

HS1

39 200–39 400

200

Loss

Global

1.0

MACF1,g HSPE1P8*, 11 transcribed loci; [RRAGC, MYCBP, GJA9, RHBDL2]

149 250–149 450

200

Loss

Global

1.0

SEC22B (lie), PDE4DIP, NOTCH2, NBPF1, 2 uncharacterized gene loci, 21 transcribed loci

212 200–212 600

400

Loss

Global

1.0

PPP2R5A, SNORA16B, TMEM206, NENF, ATF3, HNRNPH1, 1 uncharacterized gene locus, 17 transcribed loci; [INTS7, DTL, MIR3122]

HS3

110 000–110 600

600

Gain

Global

1.0

6 transcribed loci

HS4

186 600–187 500

900

Loss

Local

1.0

FAT1, MRPS36P2*, 3 uncharacterized gene loci, 21 transcribed loci; [TLR3, FAM149A, ORAOV1P1*, CYP4V2, CKAP2, F11, SLC25A5P6*, MTNR1A]

HS6

46 750–47 150

400

Gain

Global

1.0

ANKRD66, MEP1A, GPR116, GPR110, 1 uncharacterized gene locus, 9 transcribed loci

HS8

43 150–43 450

300

Gain

Local

1.0

HGSNAT, RNY5P6*, VN1R45/46P*, AFG3L2*, POTEA, 3 poorly characterized loci,3 transcribed loci

43 050–43 550

500

Gain

Global

0.5

FNTA, POMK, HGSNAT, RNY5P6, VN1R45/46P, AFG3L2P1, POTEA, SNX18P27*, 10 transcribed loci

120 750–121 150

400

Gain

Local

0.5

SNTB1, 1 uncharacterized gene locus, 7 transcribed loci; [TAF2, DSCC1, COL14A1]

HS9

88 650–88 850

200

Gain

Global

1.5

MIR4289, 1 uncharacterized gene locus, 3 transcribed loci; [GOLM1]

HS10

17 900–18 300

400

Gain

Local

1.5

MRC1, SLC39A12, CACNB2, 1 uncharacterized gene locus, 8 transcribed loci

27 600–28 000

400

Gain

Local

1.0

MKX, ARMC4, RPL36AP55*, 1 uncharacterized gene locus, 4 transcribed loci; [PTCHD3, RAB18]

46 750–47 050

300

Gain

Local

1.0

SLC38A2, 1 uncharacterized gene locus, 2 transcribed loci

109 500–109 900

400

Gain

Global

0.5

UBE3B, MMAB, MVK, FAM222A, TRPV4, GLTP, TCHP, 1 uncharacterized gene locus, 25 transcribed loci; [USP30, ALKBH2, UNG, ACACB, FOXN4, MYO1H, KCTD10]

HS13

21 200–22 100

900

Loss

Local

1.5

ESRRAP2*, MIPEPP3*, GRK6P1*, GAPDHP52*, RNA5SP25*, ZDHHC20, HIST1H2BPS3*, MICU2, FNTAP2*, RNU6–59P*, RPS7P10*, FGF9, LINC00424, NME1P1*, 3 uncharacterized gene loci (⫹/⫺), 32 transcribed loci; [IFT88, IL17D, N6AMT2, XPO4, PPIAP27*, LATS2, SAP18, SKA3, MRPL57]

HS15

19 450–19 750

300

Gain

Global

1.0

No genes or transcribed loci, high density of repetitive elements

52 900–53 200

300

Gain

Local

1.5

EEF1A1, 1 transcribed loci; [FAM214A, ONECUT1, RPSAP55*]

61 500–62 000

500

Gain

Global

0.5

VPS13C, 3 uncharacterized gene loci, 12 transcribed loci (RORA)

HS12

Continued on page XX

prostatic hypertrophy or prostatitis had no effect on the ROC analysis of the entire cohort or as a function of microscopic prostate cancer grade. When using a 10fold cross-validation, in which 10 random groups of prostate cancer samples were compared with benign samples, the AUC was 0.90 (0.055), similar to the cancer vs control sera. Fig. 3 illustrates the CNI index for all prostate cancer patients and as a function of Gleason 4

Clinical Chemistry 61:1 (2015)

score and benign prostatic disease. The latter is significantly different from the cancer cohort (P ⬍ 0.00001). The genes that were present in the identified regions are summarized in online Supplemental Table 2. CORRELATION WITH CONCURRENT PSA CONCENTRATIONS

PSA concentrations were available in a subset of samples (126 cases and 108 controls). The majority of pa-

cfDNA in Prostate Cancer

Table 1. Regions of genomic instability basis for cfDNA prostate cancer biomarkers. (Continued from page XX) Bin size, Abundance vs Global/ Weight kbp controlsb localc factord

Chromosome

Region, kbpa

Bin componentse,f

HS16

77 050–77 250

200

Gain

Local

1.0

MON1B, SYCE1L, VN2R10P*, 11 transcribed loci

HS20

42 250–43 250

1000

Gain

Local

1.5

PTPRT, PPIAP21*, 2 uncharacterized gene loci, 10 transcribed loci; [IFT52, MYBL2, GTSF1L, TOX2, JPH2, LOC100505783, GDAP1L1, FITM2, R3HDML,HNF4A, MIR3646, TTPAL, SERINC3, PKIG, ADA]

Chromosomal location can be found at http://www.ncbi.nlm.nih.gov/genome/?term ⫽ homo⫹sapiens (accessed April 2014). Gain indicates a significant increase in sequence reads in prostate cancer of the indicated bin vs controls; loss indicates a significant decrease in sequence reads of the indicated bin. c Normalization on either the whole genome (global) or normalized per chromosome (local). d Factor used for calculation of the numeric biomarker. e Genes [brackets] are located immediately adjacent to the defined bins. Pseudogenes are identified by an asterisk (*). f Transcribed loci include both observed cDNA and sequences compatible with transcription. g MACF1, microtubule-actin crosslinking factor 1; HSPE1P8, heat shock 10-kDa protein 1 pseudogene 8; RRAGC, Ras-related GTP binding C; MYCBP, MYC binding protein; GJA9, gap junction protein, alpha 9, 59 kDa; RHBDL2, rhomboid, veinlet-like 2 (Drosophila); SEC22B, SEC22 vesicle trafficking protein homolog B (S. cerevisiae) (gene/pseudogene); PDE4DIP, phosphodiesterase 4D interacting protein; NOTCH2, notch 2; NBPF1, neuroblastoma breakpoint family, member 1; PPP2R5A, protein phosphatase 2, regulatory subunit B⬘, alpha; SNORA16B, small nucleolar RNA, H/ACA box 16B; TMEM206, transmembrane protein 206; NENF, neudesin neurotrophic factor; ATF3, activating transcription factor 3; HNRNPH1, heterogeneous nuclear ribonucleoprotein H1 (H); INTS7, integrator complex subunit 7; DTL, denticleless E3 ubiquitin protein ligase homolog (Drosophila); MIR3122, microRNA 3122; FAT1, FAT atypical cadherin 1; MRPS36P2, mitochondrial ribosomal protein S36 pseudogene 2; TLR3, toll-like receptor 3; ORAOV1P1, oral cancer overexpressed 1 pseudogene 1; CYP4V2, cytochrome P450, family 4, subfamily V, polypeptide 2; CKAP2, cytoskeleton associated protein 2 (a.k.a. LB1); F11, coagulation factor XI; SLC25A5P6, solute carrier family 25 (mitochondrial carrier; adenine nucleotide translocator), member 5 pseudogene 6; MTNR1A, melatonin receptor 1A; ANKRD66, ankyrin repeat domain 66; MEP1A, meprin A, alpha (PABA peptide hydrolase); GPR110/116, G protein-coupled receptors 110 and 116; HGSNAT, heparan-alpha-glucosaminide N-acetyltransferase; RNY5P6, RNA, Ro-associated Y5 pseudogene 6; VN1R45P/46P, vomeronasal 1 receptor 45/46 pseudogenes; AFG3L2P1, AFG3-like AAA ATPase 2 pseudogene 1; POTEA, POTE ankyrin domain family, member A; FNTA, farnesyltransferase, CAAX box, alpha; POMK, protein-O-mannose kinase; SNX18P27, sorting nexin 18 pseudogene 27; SNTB1, syntrophin, beta 1 (dystrophin-associated protein A1, 59 kDa, basic component 1); TAF2, TAF2 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 150 kDa; DSCC1, DNA replication and sister chromatid cohesion 1; COL14A1, collagen, type XIV, alpha 1; MIR4289, microRNA 4289; GOLM1, golgi membrane protein 1; MRC1, mannose receptor, C type 1; SLC39A12, solute carrier family 39 (zinc transporter), member 12; CACNB2, calcium channel, voltage-dependent, beta 2 subunit; MKX, mohawk homeobox; ARMC4, armadillo repeat containing 4; RPL36AP55, ribosomal protein L36a pseudogene 55; PTCHD3, patched domain containing 3; RAB18, RAB18, member RAS oncogene family; SLC38A2, solute carrier family 38, member 2; UBE3B, ubiquitin protein ligase E3B; MMAB, methylmalonic aciduria (cobalamin deficiency) cblB type; MVK, mevalonate kinase; FAM222A, family with sequence similarity 222, member A; TRPV4, transient receptor potential cation channel, subfamily V, member 4; GLTP, glycolipid transfer protein; TCHP, trichoplein, keratin filament binding; USP30, ubiquitin specific peptidase 30; ALKBH2, alkB, alkylation repair homolog 2 (E. coli); UNG, uracil-DNA glycosylase; ACACB, acetyl-CoA carboxylase beta; FOXN4, forkhead box N4; MYO1H, myosin IH; KCTD10, potassium channel tetramerization domain containing 10; ESRRAP2, estrogen-related receptor alpha pseudogene 2; MIPEPP3, mitochondrial intermediate peptidase pseudogene 3; GRK6P1, G protein-coupled receptor kinase 6 pseudogene 1; GAPDHP52, glyceraldehyde 3 phosphate dehydrogenase pseudogene 52; RNA5SP25, RNA, 5S ribosomal pseudogene 25; ZDHHC20, zinc finger, DHHC-type containing 20; HIST1H2BPS3, HIST1H2B pseudogene 3; MICU2, mitochondrial calcium uptake 2; FNTAP2, farnesyltransferase, CAAX box, alpha pseudogene 2; RNU6–59P, RNA, U6 small nuclear 59, pseudogene; RPS7P10, ribosomal protein S7 pseudogene 10; FGF9, fibroblast growth factor 9; LINC00424, long intergenic non-protein coding RNA 424; NME1P1, NME/NM23 nucleoside diphosphate kinase 1 pseudogene 1; IFT88, intraflagellar transport 88; IL17D, interleukin 17D; N6AMT2, N-6 adenine-specific DNA methyltransferase 2 (putative); XPO4, exportin 4; PPIAP27, peptidylprolyl isomerase A (cyclophilin A) pseudogene 27; LATS2, large tumor suppressor kinase 2; SAP18, Sin3A-associated protein, 18 kDa; SKA3, spindle and kinetochore associated complex subunit 3; MRPL57, mitochondrial ribosomal protein L57; EEF1A1, eukaryotic translation elongation factor 1 alpha 1; FAM214A, family with sequence similarity 214, member A; RPSAP55, ribosomal protein SA pseudogene 55; VPS13C, vacuolar protein sorting 13 homolog C (S. cerevisiae); MON1B, MON1 secretory trafficking family member B; SYCE1L, synaptonemal complex central element protein 1-like; VN2R10P, vomeronasal 2 receptor 10 pseudogene; PTPRT, protein tyrosine phosphatase, receptor type, T; PPIAP21, peptidylprolyl isomerase A (cyclophilin A) pseudogene 21; IFT52, intraflagellar transport 52; MYBL2, v-myb avian myeloblastosis viral oncogene homolog-like 2; GTSF1L, gametocyte specific factor 1-like; JPH2, junctophilin 2; FITM2, fat storage-inducing transmembrane protein 2; R3HDML, R3H domain containing-like; HNF4A, hepatocyte nuclear factor 4, alpha; MIR3646, microRNA 3646; TTPAL, tocopherol (alpha) transfer protein-like; SERINC3, serine incorporator 3; PKIG, protein kinase (cAMP-dependent, catalytic) inhibitor gamma; ADA, adenosine deaminase. a

b

tients with prostate cancer had an increased PSA concentration. Five patients with prostatic carcinoma had a PSA value ⱕ4 ␮g/L with clear evidence of chromosomal instability, as measured by cfDNA. Overall there was no correlation between serum PSA concentration and the cfDNA biomarker (r ⫽ 0.3). Discussion Cancer is characterized by genomic instability (aneuploidy) and is clonal in origin (22 ). The Mitelman– National Cancer Institute database currently catalogs

⬎63 000 human cancers with individual clonal karyotypes distinct from their cells of origin, including ⬎2000 gene fusions (23 ). This massive collection of individual clonal karyotypes is derived from individual progressive mutational events provided by singlenucleotide polymorphisms and chromosomal CNVs in a cellular form of Darwinian speciation (24, 25 ). Tumor structural chromosomal variations observed as regional DNA ploidy heterogeneity are recapitulated in the cfDNA as a function of apoptosis and the cellular release of nucleosomes that can be distinguished from typical cellular apoptotic DNA. Clinical Chemistry 61:1 (2015) 5

Table 2. AUC based on frequencies of apoptotic DNA in serum of patients with prostatic cancer. Sample set size (n) Prostate cancer

Control

AUC (95% CI)

Accuracy (95% CI)

204

207

All

Prostate cancer

Healthy

Control

0.92 (0.87–0.95)

0.84 (0.80–0.88)

89

207

Gleason score ⬍7

Healthy

0.94 (0.89–0.97)

0.89 (0.84–0.92)

84

207

Gleason score ⱖ7

Healthy

0.91 (0.85–0.96)

0.87 (0.83–0.91)

204

227

All

Healthy and OMCa

0.92 (0.87–0.95)

0.84 (0.79–0.88)

89

20

Gleason score ⬍7

OMC

0.93 (0.83–0.98)

0.91 (0.86–0.96)

84

20

Gleason score ⱖ7

OMC

0.90 (0.78–0.97)

0.88 (0.81–0.94

192

201

41 ⬍ age ⬍ 81b

41 ⬍ age ⬍ 81

0.91 (0.86–0.95)

0.84 (0.79–0.88)

76

174

41 ⬍ age ⬍ 65

41 ⬍ age ⬍ 65

0.92 (0.85–0.97)

0.86 (0.80–0.90)

Age range between youngest prostate cancer (n ⫽ 41) and oldest control (n ⫽ 81). Other medical conditions (OMC) samples were 10 benign prostate hypertrophy and 10 prostatitis. These were not included in the original ROC AUC and cross-validation analyses. When added as additional controls for confirmation, the ROC AUC did not deteriorate the specificity or sensitivity of the original set, which serves as additional rationale for the use of the selected regions.

a

b

The clinical variability of prostate cancer is predicated on its immense genomic variability. Massive parallel sequencing on a limited number of primary prostate carcinomas (n ⫽ 7) first demonstrated the vast genomic complexity of advanced disease (Gleason score ⱖ7/stage ⱖTc2). A mean of 3866 somatic mutations (range 3192–5865) per genome was documented with a C-phosphate-G (CpG) mutation rate 10-fold higher than non-CpG mutations (26 ). Chromosomal fusions were common, with breakpoints clustered in regions of high transcriptional activity (transcriptomes

located in euchromatin). A median of 90 rearrangements (range 43–213) per genome were found with multiple intragenic breakpoints predicted to encode truncated proteins of potentially altered biological function. In contrast, we used the statistical power provided by massive parallel sequencing for a populationbased analysis of a large cohort of patients with prostate cancer and a wide range of disease severity. Although the NGS platform provided relatively short sequences (40 or 50 bp), they were appropriate for unique assignment to individual chromosomal loci. The advantage

Fig. 2. ROC curve of validated data set of prostate cancer vs controls (black line). Dashed line depicts the borders of the 95% CI. (A), Prostate cancer patients (n ⫽ 202) compared with controls (n ⫽ 207). (B), Prostate cancer patients with Gleason score ⬍7 (n ⫽ 89) compared with controls (n ⫽ 207).

6

Clinical Chemistry 61:1 (2015)

cfDNA in Prostate Cancer

Fig. 3. CNI index determined in cfDNA by sequence ploidy heterogeneity for prostate carcinoma, benign prostatic hypertrophy, and prostatitis compared with controls. Boxes (interquartile ranges) and whiskers (5th and 95th percentiles) are shown together with the median (black horizontal line) of the investigated groups.

of our assignment of 100 kbp and greater individual contiguous chromosomal loci is the ability to recognize chromosomal hotspots of regional DNA ploidy heterogeneity. The limitation of this approach is that single or small nucleotide deletion/duplication mutations, individual breakpoints, or breakpoints involving translocations and/or sequence inversions will not be recognized. We demonstrated variations in the number of cfDNA sequences circulating in the serum of patients with prostate cancer compared with healthy controls. These sequences derive from apoptotic cell death of cancer cells that provide detectable differences from routine cell turnover. The numbers of observed chromosomal fragments in prostate cancer cfDNA can derive from duplications, deletions, and association between euchromatin and heterochromatin. More condensed heterochromatin is relatively protected from DNA degradation and would be expected to yield increased DNA sequences similar to duplications, with the expected opposite sequence frequencies from euchromatin or deletions. The regional chromosomal ploidy heterogeneity detected in cfDNA is an early indicator of progressive genomic instability. Twenty loci of chromosomal instability (hotspots) (on chromosomes 1, 3, 4, 6, 8, 9, 10, 12, 13, 15, 16, and 20) were incorporated in the final model (Table 1). At a diagnostic specificity of 95%, the model based on these regions

of genomic instability had a diagnostic sensitivity of 73% and was independent of PSA results with Gleason score ⬍7 prostate cancer. No separation of cfDNA chromosomal markers was observed on the basis of the Gleason score that would allow identification of aggressive from indolent prostate cancer. This observation is consistent with the lack of the hemizygous 3-Mb deletion generated by the fusion of TMPRSS2 (transmembrane protease, serine 2)7 and ERG (v-ets erythroblastosis virus E26 oncogene homolog) on chromosome 21 within the confines of our 20-loci hotspot model, although TMPRSS2-ERG and related fusions are associated with aggressive disease. TMPRSS2-ERG fusion can be detected by low-coverage high-throughput sequencing of cfDNA from advanced-stage prostate cancer patients preselected for high tumor cfDNA concentrations (8 ). ERG was not recognized by our group analysis without any patient preselection. An independent method of assessing chromosomal imbalances in cancer is by the clinically useful tool of cytogenetics. Although the Progenetix cytogenetics database of 702 prostate carcinomas (http:// www.progenetix.org) cannot be directly compared with our more discrete top 20 loci of gains and losses, comparison of gains and losses is of interest. Our data span the Gleason score spectrum, whereas the cytogenetic clinical data will be skewed toward more clinically advanced disease. The cytogenetics recorded chromosomal imbalances associated with gains and losses over large regions of all 22 somatic chromosomes, whereas we focused on discrete chromosome regions (Table 1 bins) with the greatest number of gains or losses. Nevertheless, chromosome 8 was recorded as having the greatest number of gains by cytogenetics, in agreement with our identification of 3 gain loci. Similarly, the largest cytogenetic recorded losses occurred in chromosome 13. We found a large region of chromosome 13 with a loss. The major disparity was observed on chromosomes 6 and 16, with minor cytogenetic p-arm gains overlapping our observed gains, but was not consistent with the losses observed by cytogenetics in the q arm, which were not observed in the top 20 loci of our cfDNA analysis. Analysis of the chromosomal genetic content identified in Table 1 and relevant functional content in online Supplemental Table 2 provides insight into the dynamic real-time elements of chromosomal instabil-

7

Human genes: TMPRSS2, transmembrane protease, serine 2; ERG, v-ets erythroblastosis virus E26 oncogene homolog; RORA, RAR-related orphan receptor A; FAM149, family with sequence similarity 149, member A; ONECUT1, one cut homeobox 1; TOX2, tox high mobility group box family member 2; GDAP1L1, ganglioside induced differentiation associated protein 1-like 1; HTRA1– 4, HtrA serine peptidase 1 through 4.

Clinical Chemistry 61:1 (2015) 7

ity in prostate cancer. These 20 regions of chromosomal instability represent ⬍0.5% of the total chromosomal DNA distributed over 12 chromosomes. Five regions were associated with significant reductions in copies of cfDNA in 3 chromosomes (HS1, HS4, and HS13) (Table 1) vs healthy controls. The remainder of these chromosomal hotspots were associated with significant increases in copies of cfDNA. Among the 99 functional genes and 25 pseudogenes contained within these loci of chromosomal instability (see online Supplemental Table 2), 49 of the functional genes (49%) have functions directly relevant to chromosomal instability and cancer. These include genes involved in cell division, transcription/translation, altered membrane structure function, differentiation, and apoptosis. Target functions include kinetochore and sister chromatid stabilization; S-phase signaling and G1-S regulation; transcription factors controlling cell cycle progression, epigenetic regulation, membrane oncogenes, and release of metabolic growth factors; and morphogenesis/ cytoskeleton organization. One notable gene of interest is a hormone gene [RORA (RAR-related orphan receptor A)] enhancer within a fragile region of the genome (see online Supplemental Table 2). Genes overexpressed in a variety of cancers include FAM149 (family with sequence similarity 149; includes members FAM149A, FAM149B1, FAM149B1P1), ONECUT1 (one cut homeobox 1), TOX2 (tox high mobility group box family member 2), GDAPIL1 (ganglioside induced differentiation associated protein 1-like 1), and HTRA1– 4 (HtrA serine peptidase 1 through 4). Additionally, within these foci of instability are 9 genes of unknown function and a large number of transcribed but untranslated RNA that are potential sources of regulatory miRNAs. The large number of cancer-related genes found in the abnormal cfDNA distribution are consistent with chromosomal instability that is reflected in the cfDNA. By massive parallel sequencing of cfDNA, we identified a small number of foci (⬍0.5% of total DNA) of chromosomal instability, a recognized common phenomenon of malignant neoplasia. These foci are populated with a variety of genes relevant to the process of neoplasia that accurately predict prostatic carcinoma without dependence on the stage of disease or tumor aggressive potential. These core genes may provide clues to the genomic core for prostate cancer distinct from the progressive genomic instabilities driven by the multiplicity of fusions with their individual related gene dysfunctions responsible for the wide variance in clinical phenotypes. cfDNA is currently in an exponential clinical development growth phase. Cancer-derived DNA present in blood was first reported in 1948 (27 ), but the field remained in a dormant state for ⬎50 years. Mea8

Clinical Chemistry 61:1 (2015)

surement of absolute concentrations of cfDNA has been suggested for the diagnosis (28 ) and prognosis (29 ) of breast and lung cancer (30 ) but has little clinical utility. The rapid development of NGS platforms with their massive sequence capacities, however, has provided the statistical power to drive clinical applications. Massive parallel sequencing to detect genomic alterations in chromosomal copy number in blood was reported initially for the detection of fetal trisomy 21 in 2008 (31 ). The method has been validated for trisomies 13, 18, and 21 as a clinical laboratory procedure with a remarkable diagnostic accuracy ⬎99% (32 ). The statistical power provided by NGS platforms through massive parallel sequencing coupled with mass assembly [mass sequence and assembly (MSA)] allowed us to report algorithms predictive of variant CreutzfeldtJakob in ruminantia (33, 34 ) and multiple sclerosis (35 ) and breast cancer (14, 36 ) in humans without genomic knowledge, as is the case in this study. Despite the potential for bias introduced by whole-genome sequencing, cancer-derived cfDNA has been demonstrated to recapitulate genomic tumor DNA (11, 20, 21, 37 ). Individual tumor genomic analysis brings significant advantages to the power of cfDNA diagnostics. Preliminary data recently reported at the 2014 American Society of Clinical Oncology meeting on small numbers of head/neck and colorectal cancers indicates that patient responses to DNA targeted therapy (radiation, 5-fluorouracil, and/or cis-platinum) is most effective (P ⱕ 0.01) in patients with chromosomal number imbalance (38, 39 ). Although tumor tissue is still the gold standard for clinical molecular diagnostics, major disadvantages are involved in acquiring tissue samples (i.e., biopsies may be complicated and are invasive procedures). On the other hand, liquid biopsies, owing to their minimal invasiveness, can be scheduled more frequently and may be the only option for some patients. Furthermore, a tissue biopsy is often obtained from only 1 tumor site, and the genetics of distant metastases cannot be assessed (8, 21 ). It is arguable that cfDNA sequencing better reflects the genomes of all cancer subclones present in a patient. If validated with larger numbers of patients, massive parallel sequencing will provide substantial value to the selection of therapeutic regimens by liquid biopsy analysis.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.

cfDNA in Prostate Cancer

Authors’ Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest: Employment or Leadership: E. Schu¨tz, Chronix Biomedical; J. Beck, Chronix Biomedical; H. Urnovitz, Chronix Biomedical; K. Bornemann-Kolatzki, Chronix Biomedical; W.M. Mitchell, Chronix Biomedical, Vanderbilt University. Consultant or Advisory Role: W.M. Mitchell, Chronix Biomedical. Stock Ownership: E. Schu¨tz, Chronix Biomedical; J. Beck, Chronix Biomedical; W.M. Mitchell, Vanderbilt University.

Honoraria: E. Schu¨tz, Chronix Biomedical. Research Funding: H. Urnovitz, Chronix Biomedical. Expert Testimony: None declared. Patents: E. Schu¨tz, patent no. WO/2013/086352; H. Urnovitz, patent no. WO/2013/086352. Other Remuneration: E. Schu¨tz, Chronix Biomedical. Role of Sponsor: The funding organizations played a direct role in the design of study, review and interpretation of data, and preparation and final approval of manuscript.

References 1. Chou R, Croswell JM, Dana T, Bougatsos C, Blazina I, Fu R, et al. Screening for prostate cancer: a review of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med 2011;155:762–71. 2. Stroun M, Anker P, Lyautey J, Lederrey C, Maurice PA. Isolation and characterization of DNA from the plasma of cancer patients. Eur Cancer Clin Oncol 1987;23:707–12. 3. Meyerson M, Pellman D. Cancer genomes evolve by pulverizing single chromosomes. Cell 2011; 144:9 –10. 4. Thierry AR, Mouliere F, Gongora C, Ollier J, Robert B, Ychou M, et al. Origin and quantification of circulating DNA in mice with human colorectal cancer xenografts. Nucleic Acids Res 2010;38: 6159 –75. 5. Beck J, Urnovitz HB, Riggert J, Clerici M, Schu¨tz E. Profile of the circulating DNA in apparently healthy individuals. Clin Chem 2009;55:730 – 8. 6. Lo YM, Chiu RW. Next-generation sequencing of plasma/serum DNA: an emerging research and molecular diagnostic tool. Clin Chem 2009;55: 607– 8. 7. Lo YM, Tein MS, Lau TK, Haines CJ, Leung TN, Poon PM, et al. Quantitative analysis of fetal DNA in maternal plasma and serum: implications for noninvasive prenatal diagnosis. Am J Hum Genet 1998;62:768 –75. 8. Heitzer E, Ulz P, Belic J, Gutschi S, Quehenberger F, Fischereder K, et al. Tumor-associated copy number changes in the circulation of patients with prostate cancer identified through wholegenome sequencing. Genome Med 2013;5:30. 9. Forshew T, Murtaza M, Parkinson C, Gale D, Tsui DW, Kaper F, et al. Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci Transl Med. 2012; 4:136ra68. 10. Diaz LA Jr, Bardelli A. Liquid biopsies: genotyping circulating tumor DNA. J Clin Oncol 2014;32: 579 – 86. 11. Leary RJ, Sausen M, Kinde I, Papadopoulos N, Carpten JD, Craig D, et al. Detection of chromosomal alterations in the circulation of cancer patients with whole-genome sequencing. Sci Transl Med 2012;4:162ra154, 1–12. 12. Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, Duncan C, et al. Development of personalized tumor biomarkers using massively parallel sequencing. Science Trans Med 2010;2:20ra14, 1–7. 13. van der Vaart M, Semenov DV, Kuligina EV, Richter VA, Pretorius PJ. Characterisation of circulating DNA by parallel tagged sequencing

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

on the 454 platform. Clin Chim Acta 2009;409: 21–7. Beck J, Urnovitz HB, Mitchell WM, Schu¨tz E. Next generation sequencing of serum circulating nucleic acids from invasive ductal breast cancer patients reveals differences to healthy and nonmalignant controls. Mol Cancer Res 2010;8:335– 42. Murtaza M, Dawson SJ, Tsui DW, Gale D, Forshew T, Piskorz AM, et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 2013;497:108 – 12. McBride DJ, Orpana AK, Sotiriou C, Joensuu H, Stephens PJ, Mudie LJ, et al. Use of cancerspecific genomic rearrangements to quantify disease burden in plasma from patients with solid tumors. Genes Chromosomes Cancer 2010;49: 1062–9. Dawson SJ, Tsui DW, Murtaza M, Biggs H, Rueda OM, Chin SF, et al. Analysis of circulating tumor DNA to monitor metastatic breast cancer. N Engl J Med. 2013;28:368:1199 –209. Bettegowda C, Sausen M, Leary RJ, Kinde I, Wang Y, Agrawal N, et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci Transl Med. 2014;6:224ra24. Newman AM, Bratman SV, To J, Wynne JF, Eclov NC, Modlin LA, et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med. 2014; 20:548 –54. Beck J, Hennecke S, Bornemann-Kolatzki K, Urnovitz HB, Neumann S, Ströbel P, et al. Genome aberrations in canine mammary carcinomas and their detection in cell-free plasma DNA. PLoS One 2013 30;8:e75485. Chan KC, Jiang P, Zheng YW, Liao GJ, Sun H, Wong J, et al. Cancer genome scanning in plasma: detection of tumor-associated copy number aberrations, single-nucleotide variants, and tumoral heterogeneity by massively parallel sequencing. Clin Chem 2013;59:211–24. Sticker TP, Kumar V. Neoplasia. In: Kumar V, Abbas AK, Fausto N, Aster JC, eds. Robbins and Cotran pathologic basis of disease, 8th Ed. Philadelphia: Elsevier Saunders 2010;259 –330. Mitelman F, Johansson B, Mertens F, Eds. Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer. http://cgap.nci.nih.gov/ Chromosomes/Mitelman (Accessed October 2014). Duesberg P, Mandrioli D, McCormack A, Nicholson JM. Is carcinogenesis a form of speciation? Cell Cycle 2011;10:2100 –14. Duesberg P, Iacobuzio-Donahue C, Brosnan JA,

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

McCormack A, Mandrioli D, Chen L. Origin of metastases: subspecies of cancers generated by intrinsic karyotypic variations. Cell Cycle 2012;11: 1151– 66. Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY, et al. The genomic complexity of primary human prostate cancer. Nature 2011;470:214 –20. Mandel P, Metais P. Les acides nucle´iques du plasma sanguin chez l’homme. C R Seances Soc Biol Fil 1948;142:241–3. Gal S, Fidler C, Lo YM, Taylor M, Han C, Moore J, et al. Quantitation of circulating DNA in the serum of breast cancer patients by real-time PCR. Br J Cancer 2004;90:1211–5. Silva JM, Silva J, Sanchez A, Garcia JM, Dominguez G, Provencio M, et al. Tumor DNA in plasma at diagnosis of breast cancer patients is a valuable predictor of disease-free survival. Clin Cancer Res 2002;8:3761– 6. Sozzi G, Conte D, Leon M, Ciricione R, Roz L, Ratcliffe C, et al. Quantification of free circulating DNA as a diagnostic marker in lung cancer. J Clin Oncol 2003;21:3902– 8. Lun FM, Tsui NB, Chan KC, Leung TY, Lau TK, Charoenkwan P, et al. Noninvasive prenatal diagnosis of monogenic diseases by digital size selection and relative mutation dosage on DNA in maternal plasma. Proc Natl Acad Sci U S A 2008; 105:19920 –5. Palomaki GE, Deciu C, Kloza EM, LambertMesserlian GM, Haddow JE, Neveux LM, et al. DNA sequencing of maternal plasma reliably identifies trisomy 18 and trisomy 13 as well as Down syndrome: an international collaborative study. Genet Med 2012;14:296 –305. Beck J, Urnovitz HB, Groschup MH, Ziegler U, Brenig B, Schu¨tz E. Serum nucleic acids in an experimental bovine transmissible spongiform encephalopathy model. Zoonoses Public Health 2009;56:384 –90. Gordon PM, Schu¨tz E, Beck J, Urnovitz HB, Graham C, Clark R, et al. Disease-specific motifs can be identified in circulating nucleic acids from live elk and cattle infected with transmissible spongiform encephalopathies. Nucleic Acids Res 2009; 37:550 – 6. Beck J, Urnovitz HB, Saresella M, Caputo D, Clerici M, Mitchell WM, Schu¨tz E. Serum DNA motifs predict disease and clinical status in multiple sclerosis. J Mol Diagn 2010;12:312–9. Beck J, Schu¨tz E, Urnovitz HB, Tabchy A, Mitchell WM, et al Taylor M, Han C. Cell-free DNA copy number variations as a marker for breast cancer in a large study cohort. J Clin Oncol 2013;

Clinical Chemistry 61:1 (2015) 9

31(Suppl):abstract 11013. 37. Mohan S, Heitzer E, Ulz P, Lafer I, Lax S, Auer M, et al. Changes in colorectal carcinoma genomes under anti-EGFR therapy identified by wholegenome plasma DNA sequencing. PLoS Genet 2014;10:e1004271.

10 Clinical Chemistry 61:1 (2015)

38. Beck J, Bornemann-Kolatzki K, Richardson BE, Lee JH, Mitchell WM, Schu¨tz E. Detection of novel HPV mutations and chromosomal number imbalance (CNI) in laryngeal cancer using nextgeneration sequencing (NGS). J Clin Oncol 2014; 32(Suppl):5s (abstract 6072).

39. Beck J, Gaedcke J, Urnovitz HB, BornemannKolatzki K, Grade M, Mitchell WM, et al. Comprehensive analyses of rectal cancer genomes to reveal copy number variations as potential predictor of induction therapy efficacy. J Clin Oncol 2014;32(Suppl):abstract e14549.

Chromosomal instability in cell-free DNA is a serum biomarker for prostate cancer.

Genomic instability resulting in copy number variation is a hallmark of malignant transformation and may be identified through massive parallel sequen...
954KB Sizes 0 Downloads 9 Views