The Journal of Molecular Diagnostics, Vol. 18, No. 3, May 2016

jmd.amjpathol.org

Plasmid-Based Materials as Multiplex Quality Controls and Calibrators for Clinical Next-Generation Sequencing Assays David J. Sims,* Robin D. Harrington,* Eric C. Polley,y Thomas D. Forbes,* Michele G. Mehaffey,* Paul M. McGregor, III,* Corinne E. Camalier,* Kneshay N. Harper,* Courtney H. Bouk,* Biswajit Das,* Barbara A. Conley,y James H. Doroshow,y P. Mickey Williams,* and Chih-Jian Lih* From the Molecular Characterization and Clinical Assay Development Laboratory,* Frederick National Laboratory for Cancer Research, Frederick; and the Division of Cancer Treatment and Diagnosis,y National Cancer Institute, Bethesda, Maryland Accepted for publication November 20, 2015. Address correspondence to Chih-Jian Lih, Ph.D., Frederick National Laboratory for Cancer Research, Bldg. 320, Room 5, 1050 Boyles St., Frederick, MD 21702. E-mail: jason.lih@ nih.gov.

Although next-generation sequencing technologies have been widely adapted for clinical diagnostic applications, an urgent need exists for multianalyte calibrator materials and controls to evaluate the performance of these assays. Control materials will also play a major role in the assessment, development, and selection of appropriate alignment and variant calling pipelines. We report an approach to provide effective multianalyte controls for next-generation sequencing assays, referred to as the control plasmid spiked-in genome (CPSG). Control plasmids that contain approximately 1000 bases of human genomic sequence with a specific mutation of interest positioned near the middle of the insert and a nearby 6-bp molecular barcode were synthesized, linearized, quantitated, and spiked into genomic DNA derived from formalin-fixed, paraffin-embeddedeprepared hapmap cell lines at defined copy number ratios. Serial titration experiments demonstrated the CPSGs performed with similar efficiency of variant detection as formalin-fixed, paraffin-embedded cell line genomic DNA. Repetitive analyses of one lot of CPSGs 90 times during 18 months revealed that the reagents were stable with consistent detection of each of the plasmids at similar variant allele frequencies. CPSGs are designed to work across most next-generation sequencing methods, platforms, and data analysis pipelines. CPSGs are robust controls and can be used to evaluate the performance of different next-generation sequencing diagnostic assays, assess data analysis pipelines, and ensure robust assay performance metrics. (J Mol Diagn 2016, 18: 336e349; http://dx.doi.org/10.1016/j.jmoldx.2015.11.008)

Next-generation sequencing (NGS) technology is having major effects on biomedical research. Decreasing costs and increasing data generation are driving rapid uptake of this method. Clinical applications have quickly followed.1,2 NGS technology is currently under evaluation for guiding cancer patient treatment selection.3,4 However, there is uncertainty that there is sufficient interlaboratory concordance for meaningful clinical use. The rapid proliferation of different sequencing methods, platforms, and data analysis tools has resulted in a high discordance of mutations reported from different clinical NGS assays.5,6 Reference and control materials that contain known analytes (variants) at known allele fraction [variant allele frequency (VAF)] in a form comparable to clinical specimens are essential for

comparing and monitoring the assay performance and will be valuable in the study of cross-platform comparisons and identifying weaknesses in informatics pipelines (ie, alignment and variant calling). However, unlike most conventional assays (eg, Sanger sequencing and PCR-based methods) that typically detect single or only a few analytes, Supported by National Cancer Institute, NIH, grants HHSN261200800001E and NO1-CO-2008-00001. This work does not express or represent the opinion of the National Cancer Institute, National Institutes of Health, or Department of Health and Human Service. Disclosures: None declared. Current address of M.G.M., Department of Pediatrics, University of Washington, Seattle, WA.

Published by Elsevier Inc. on behalf of the American Society for Investigative Pathology and the Association for Molecular Pathology. http://dx.doi.org/10.1016/j.jmoldx.2015.11.008

Plasmid Based Multiplex Controls for NGS an NGS assay usually measures hundreds to thousands of genomic loci. Currently, there is no standardized set of clinically relevant materials useful as controls or calibrators to standardize the assessment of NGS data across platforms, assays, and informatics pipelines. Genome in a Bottle, a public consortium led by the National Institute of Standards and Technology, has released a reference genome and will soon release several other genomes.7 These are valuable resources but do not directly address the need for clinically relevant controls and calibrators. Therefore, there is an urgent need to implement highly multiplexed materials as calibrators and controls for the clinical use of NGS assays.5,6,8 One approach to NGS calibrators and controls relies on the use of cell line genomic DNA. A mixture of variant types and VAF can be manufactured by combining genomes at defined molar ratios.9 This approach is limited by the number of genomes that can be mixed while maintaining an adequate VAF and by the number of different mutations that can be introduced into a single cell line. Another approach is the use of synthetic nucleic acid molecules, such as long oligonucleotides as used in the SNaPshot assay10 and in vitro transcribed RNA molecules from the External RNA Control Consortium (ERCC) used in gene expression and RNAseq assays.11 In taking the first step toward building highly multiplex control materials, we report the development and characterization of a control plasmidebased multianalyte calibrator and control material for NGS assays, termed the control plasmid spiked-in genome (CPSG). We found that these materials are scalable in their ability to incorporate many different variants with different allele frequencies in a complex mixture, are easy to design and manufacture, are distinguishable from a clinical specimen, and are detectable by various genomic assays. Our results indicate that CPSGs can serve as routine assay controls to monitor performance of NGS assays and standards for cross-site and cross-platform comparison studies and as valuable tools for the evaluation, development, and testing of new informatics pipelines. Such an approach was previously accepted by the US Food and Drug Administration as an effective method of validating the detection of rare germline variants with an NGS platform in a submission of 510 (k) premarket notification (Food and Drug Administration, http://www.accessdata.fda.gov/cdrh_ docs/pdf13/K132750.pdf, last accessed November 20, 2015) by Illumina (Illumina MiSeqDx Cystic Fibrosis Clinical Sequencing Assay; Illumina Inc., San Diego, CA). Importantly, we also found that the efficiency of variant detection in CPSG samples is similar to that of formalinfixed, paraffin-embedded (FFPE) genomic DNA samples.

Materials and Methods Design and Construction of Control Plasmids To evaluate the performance of various NGS assays on different types of mutations, a panel of 69 control plasmids

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

was designed and constructed and a subset of them used for this study. This panel of 69 control plasmids contains 38 single-nucleotide variants (SNVs), nine SNVs at a homopolymeric region (HP; >3 identical bases in a row), 12 insertion/deletions (indels), five indels at HP, and five large indels (gap size >4 bp). Mutations of interest (MOIs) in these control plasmids were selected because of their known clinical actionable value and high recurrent frequency in the Catalogue of Somatic Mutations in Cancer database or because they represent rare mutation types. For each MOI, an approximate 1000-bp region flanking (approximately 500 bp upstream and approximately 500 bp downstream) the MOI was synthetically generated (DNA 2.0, Menlo Park, CA). In addition, a 6-bp insert sequence (ACATCG), which functions as a molecular barcode, was placed 5 to 20 bp away from the

Figure 1 Design of control plasmids. A: Map of a representative control plasmid, pNF1_34041. Each control plasmid was constructed by inserting approximately 1000 bp of genomic DNA (blue box) spanning the mutation of interest (MOI) (red star). A 6-bp (ACATCG) molecular barcode (orange rectangle in B) was inserted near the MOI to track variant reads. Single-cut restriction sites are indicated by yellow triangles. B: Coordination of the molecular barcode with the MOI. A subset of sequencing reads from MOIs [an A deletion (red box)] and 6-bp molecular barcode confirms the mutation is plasmid borne.

337

Sims et al Table 1

List of 69 Control Plasmids

Plasmid name pAKT1_33765 pAKT1_36918 pAKT2_93894 pAKT3_48227 pAPC_13127 pAPC_18561 pAPC_18584 pARHGAP5_ 88502 pATM_21924

Mutation position (hg19)

Transcript

CDS mutation

AA mutation

Mutation type

NCIRestriction CPSG51 MPACT WES (yes/ TSCA (yes/ enzyme CPSG13 (yes/ used (yes/no) no) no) (yes/no) no)

14:105246551 14:105246455 19:40761084 1:243809253 5:112175639 5:112175957 5:112175539 14:32561739

ENST00000349310 ENST00000349310 ENST00000392038 ENST00000366539 ENST00000457016 ENST00000457016 ENST00000457016 ENST00000345122

c.49G>A c.145G>A c.268G>T c.371A>T c.4348C>T c.4666_4667insA c.4284delC c.1864G>A

p.E17K p.E49K p.V90L p.Q124L p.R1450* p.T1556fs*3 p.I1417fs*2 p.E622K

SNV at HP SNV SNV SNV at HP SNV Indel at HP Indel SNV

ScaI PvuI BglI BglI BglI BglI BglI BglI

No No No No No No No No

Yes Yes Yes Yes Yes Yes No Yes

Yes Yes Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes No Yes Yes

Yes Yes Yes Yes Yes Yes Yes Yes

p.C353fs*5

Indel

PvuI

No

No

Yes

Yes

Yes

p.D1853N SNV at HP BglI p.I1264fs*24 Large Indel ScaI

No Yes

Yes Yes

Yes Yes

Yes No

Yes Yes

PvuI ScaI BglI ScaI

Yes No No Yes

Yes Yes No No

Yes Yes No Yes

Yes Yes No Yes

Yes Yes Yes Yes

SNV at HP BglI Large Indel BglI

No Yes

Yes Yes

Yes Yes

No Yes

Yes Yes

SNV BglI Large Indel ScaI

No No

Yes Yes

Yes Yes

Yes No

Yes Yes

SNV at HP SNV SNV SNV at HP SNV SNV SNV SNV at HP SNV SNV SNV SNV SNV at HP SNV SNV SNV SNV SNV Indel at HP Indel

ScaI ScaI ScaI ScaI BglI ScaI ScaI ScaI ScaI BglI ScaI ScaI BglI BglI PvuI PvuI BglI BglI BglI BglI

No No No No No No No No No No No No No No Yes No No No Yes No

Yes No Yes Yes Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Yes Yes Yes Yes No

Yes No Yes Yes Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Yes Yes Yes Yes Yes

No No Yes Yes Yes No Yes No No Yes No No No No Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

SNV SNV SNV SNV at HP Indel Indel at HP Indel

ScaI ScaI ScaI BglI BglI BglI BglI

No No No Yes Yes Yes No

Yes No Yes Yes No Yes No

Yes No Yes Yes Yes Yes Yes

No No No Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes Yes

Large Indel SNV Indel at HP SNV Indel

BglI ScaI ScaI ScaI BglI

Yes Yes No No Yes

Yes Yes Yes Yes No

Yes Yes Yes Yes Yes

Yes No Yes No Yes

Yes Yes Yes Yes Yes

11:108117847 ENST00000278616 c.1058_1059 delGT pATM_41596y 11:108175462 ENST00000278616 c.5557G>A pATR_20627 3:142254973 NM_001184 c.3790_3796 delATAAAAG pBRAF_476 7:140453135 ENST00000288602 c.1799_1800T>A pCTNNB1_5664 3:41266124 ENST00000349496 c.121.A>G pDNMT3A_53042 2:25457243 ENST00000321117 c.2644C>T pEGFR_12378 7:55249012 ENST00000275493 c.2310_2311 insGGT pEGFR_6224 7:55259515 ENST00000275493 2573T>G pEGFR_6225 7:55242466 ENST00000275493 2236_2250del15 pEGFR_6240 pERBB2_682

7:55249071 17:37880993

pERCC1_140843 pEZH2_37028 pFBXW7_22965 pFGFR3_715 pFLT3_783 pGABRA6_70853 pGABRG2_74722 pGNAQ_28758 pGNAS_27887 pIDH1_28747 pIDH2_33733 pIDH2_41590 pJAK2_12600 pKIT_1314 pKRAS_521 pMET_700 pMLH1_26085 pMPL_18918 pMSH2_111644 pMSH2_26122

19:45924470 7:148508727 4:153249384 4:1803568 13:28592642 5:161117296 5:161580301 9:80409488 20:57484420 2:209113113 15:90631838 15:90631934 9:5073770 4:55599321 12:25398283 7:116423428 3:37067240 1:43815009 2:47705450 2:47705559

pMTOR_94356 pMYD88_85940 pNBN_35664 pNF1_24443 pNF1_24468 pNF1_34041 pNF1_41820

1:11291097 3:38182641 8:90947833 17:29576111 17:29679318 17:29554610 17:29556989

pNPM1_17559 pNRAS_584 pPARP1_21691 pPARP2_75849 pPDGFRA_28053

5:170837547 1:115256529 1:226551692 14:20820412 4:55141048

p.V600E p.T41A p.R882C p.D770_ N771insG p.L858R p.E746_ A750del ENST00000275493 2369C>T p.T790M ENST00000269571 2322_2323ins12 p.M774_A775 insAYVM ENST00000013807 c.287C>A p.A96E ENST00000320356 1937A>T p.Y646F ENST00000281708 c.1394G>A p.R465H ENST00000440486 c.746C>G p.S249C ENST00000241453 2503G>T p.D835Y ENST00000274545 c.763G>C p.V255L ENST00000356592 c.1355A>G p.Y452C ENST00000286548 c.626A>C, p.Q209L ENST00000371085 c.601C>T p.R201C ENST00000345146 394C>T p.R132C ENST00000330062 515G>A R172K ENST00000330062 419G>A p.R140Q ENST00000381652 1849G>T p.V617F ENST00000288135 2447A>T p.D816V ENST00000311936 c.35G>A p.G12D ENST00000318493 c.3757T>G p.Y1253D ENST00000231790 c.1151T>A p.V384D ENST00000372470 1544G>T p.W515L ENST00000233146 c.2250delG p.G751fs*12 ENST00000233146 c.2359_ p.L787fs*11 2360delCT ENST00000361445 c.2664A>T p.L888F ENST00000396334 794T>C p.L265P NM_006904 c.2242C>T p.P748S ENST00000358273 c.4084C>T p.R1362* ENST00000358273 c.7501delG p.E2501fs*22 ENST00000358274 c.2395delA p.M799fs*22 ENST00000358273 c.2987_ p.R997fs*16 2988insAC ENST00000517671 863_864insTCTG p.W288fs*12 ENST00000369535 c.182A>G p.Q61R ENST00000366794 c.2738delG p.G913fs*4 NM_005484.2 c.398A>C p.D133A ENST00000257290 c.1694_1695insA p.S566fs*6

SNV SNV SNV Indel

(table continues)

338

jmd.amjpathol.org

-

The Journal of Molecular Diagnostics

Plasmid Based Multiplex Controls for NGS Table 1

(continued ) NCIRestriction CPSG51 MPACT WES enzyme CPSG13 (yes/ (yes/ TSCA (yes/ used (yes/no) no) no) (yes/no) no)

Plasmid name

Mutation position (hg19)

Transcript

CDS mutation

AA mutation

Mutation type

pPDGFRA_736z pPIK3CA_12464 pPIK3CA_763 pPIK3CA_775 pPTEN_4986 pPTEN_5152 pPTEN_5809 pPTPN11_13000 pRAD51_117943 pRB1_891 pRET_965 pSMAD4_14105

4:55152093 3:178952149 3:178936091 3:178952085 10:89717716 10:89692904 10:89717775 12:112888210 15:41001312 13:48941648 10:43617416 18:48603093

ENST00000257290 NM_006218.1 NM_006218.1 NM_006218.1 ENST00000371953 ENST00000371953 ENST00000371953 ENST00000351677 ENST00000267868 ENST00000267163 ENST00000355710 ENST00000342988

p.D842V p.N1068fs*4 p.E545K p.H1047R p.P248fs*5 p.R130* p.K267fs*9 p.E76K p.Q145* p.R320* p.M918T p.A466fs*28

SNV ScaI Indel BglI SNV PvuI SNV PvuI Indel BglI SNV ScaI Indel at HP BglI SNV ScaI SNV ScaI SNV ScaI SNV BglI Indel BglI

No No Yes No No No No No No No No No

Yes No Yes Yes No Yes Yes Yes Yes Yes Yes No

Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes No Yes Yes Yes Yes No Yes

Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

pTP53_10648 pTP53_10660 pTP53_10662 pTP53_18610 pTP53_6530 pVHL_18578

17:7578406 17:7577120 17:7577538 17:7579419 17:7577558 3:10188282

ENST00000269305 ENST00000269305 ENST00000269305 ENST00000269305 ENST00000269305 ENST00000256474

c.2525A>T c.3204_3205insA c.1633G>A c.3140A>G c.741_742insA c.388C>T c.800delA c.226G>A c.433C>T c.958C>T 2753T>C c.1394_ 1395insT 524G>A 818G>A 743G>A c.263delC c.723delC c.426_429 delTGAC

p.R175H p.R273H p.R248Q p.S90fs*33 p.C242fs*5 p.G144fs*14

SNV SNV SNV Indel Indel Large indel

No No No No No No

Yes Yes Yes No No No

Yes Yes Yes Yes Yes Yes

Yes Yes No Yes Yes Yes

Yes Yes Yes Yes Yes Yes

BglI BglI BglI PvuI BglI PvuI

*Premature stop codon created by mutation. Excluded from the limit of detection analysis using three data analysis pipelines and the next-generation sequencing data set from the NCI-MPACT assay due to this variant occurring as a single-nucleotide polymorphism in the NA12878 hapmap background into which the plasmid was spiked. z Excluded from analysis because of over dilution during CPSG51 preparation. AA, amino acid change notation; CDS, coding DNA sequence change notation; CPSG, control plasmid spiked-in genome; HP, homopolymeric region; indel, insertion/deletion; NCI, National Cancer Institute; TSCA, TruSeq Custom Amplicon; SNV, single-nucleotide variant; WES, whole exome sequencing. y

MOI. Each of the approximate 1000-bp fragments was flanked with attB sites (attB1 - ACAACTTTGTACAAAAAAGTTGGC at 50 end and attB2 - TCAACTTTCTTGTACAAAGTTG at 30 end) and then cloned into an entry vector (pDONR253, Thermo Fisher Scientific, Waltham, MA) by the Gateway cloning system. The full-length insert sequences, including the MOI and molecular barcode in each entry clone, were verified by Sanger sequencing. The entry clones were then used to generate the final construct by recombining the insert fragments via LR reaction into pDEST-318, a small pUC19-based ampicillin-resistant holding vector. An example construct, pNF1_34041, is shown in Figure 1A. The control plasmid DNAs were purified using the GenElute XL kit (Sigma-Aldrich, St. Louis, MO) and quantitated by spectrophotometry using a NanoDrop 2000 (Thermo Fisher Scientific). The pertinent mutation information of 69 control plasmids, CPSG pool composition, and NGS assay used in this study are listed in Table 1. Because these 69 plasmids were constructed gradually for a specified period, we only made two pools (CPSG13 and CPG51) from the plasmids available at that time when the experiments were launched for characterization.

Preparation of CPSG Samples The workflow of CPSG DNA sample preparation is illustrated in Figure 2. Briefly, the control plasmids were

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

linearized by a single-cut restriction enzyme within the vector backbone as indicated in Table 1, purified by the Qiagen PCR Cleanup kit (Qiagen, Valencia, CA), and an aliquot was run on a Bioanalyzer DNA 7500 Chip (Agilent Technologies Inc., Santa Clara, CA) to verify complete digestion and the correct size of the plasmid. The purified linear plasmids were quantitated by spectrophotometry using a NanoDrop 2000, and the number of copies per microliter was calculated for each based on the sizes of the plasmids. Genomic DNA extracted from FFPE cell pellets of hapmap CEPH NA12878 (Coriell Institute for Medical Research, Camden, NJ) was also quantitated by spectrophotometry on a NanoDrop 2000, and the number of copies per microliter was calculated. Each of the quantitated plasmids was pooled at an equal molar ratio, and the pooled plasmids were spiked into CEPH DNA at indicated copy number ratios (plasmid versus genome of hapmap cells). The composition of selected plasmid pools for each of the studies below are given in Table 1.

Description of NGS Assays The CPSG DNA samples were characterized by three different NGS assays. Library preparation methods and the sequencers used are indicated below. The National Cancer Institute’s MPACT (NCI-MPACT) assay12 is a targeted amplicon sequencing assay using the AmpliSeq technology

339

Sims et al

Figure 2 Control plasmid spiked-in genome (CPSG) titration workflow. Schematic showing the procedure for generating a CPSG sample. Briefly, plasmids are linearized, quantified, pooled, and spiked into a background genome at a determined copy number ratio.

The TruSeq Custom Amplicon (TSCA) assay is based on Illumina’s TSCA technology and sequenced on the MiSeq sequencer (Illumina). The same list of mutations used to design the NCI-MPACT assay was submitted into the TSCA panel design website (Illumina) to build the NCIMPACT TSCA panel (Supplemental Table S1). Libraries were prepared from 250 ng of CPSG according to the manufacturer’s guidelines and quantified using the KAPA Library Quantitation Kit (KAPA Biosystems, Wilmington, MA). After quantification, libraries were normalized to 4 nmol/L (MiSeq version 3 chemistry), pooled in equal volumes, denatured with 0.2 N NaOH, and diluted to 12 pM (MiSeq version 3 chemistry). The libraries were then sequenced on the MiSeq using 2  300 paired end mode. The whole exome sequencing (WES) assay uses the Agilent SureSelect XT Human All exon version 5.0 baits (Agilent Technologies Inc.) on a HiSeq 2000 sequencer (Illumina). The library preparation and sequencing procedures followed the vendor’s user manuals. For WES, 500 ng of CSPG was sheared to 150 to 200 bp using a Covaris E220 sonicator (Covaris, Woburn, MA). After cleanup with AMPure XP Beads (Beckman Coulter, Brea, CA), samples were checked for correct size distribution using a Bioanalyzer 2100 system (Agilent Technologies Inc.). These fragmented DNA samples were then processed to add sequencing adaptors, hybridize with biotinylated RNA bait set (SureSelect XT Human All Exon version 5, Agilent Technologies Inc), and enrich the captured fragments for sequencing. The AMPure XP purified libraries were examined for size distribution (300 to 400 bp) using an Agilent Bioanalyzer and quantified using the KAPA Library Quantification Kit (KAPA Biosystems). A pooled library made by mixing the two final libraries at equal molar ratio was clustered at 16 pM per flow cell lane using the Illumina cBot before sequencing on an Illumina HiSeq 2000 platform (Illumina). Sequencing reactions were run using 2  100 paired-end mode.

NGS Data Analysis and Bioinformatics

on the Personal Genome Machine (PGM) sequencer (Thermo Fisher Scientific). Briefly, 20 ng of CPSG pool DNA was used to generate the library by multiplex PCR using the NCI-MPACT custom amplicon panel and the Ion AmpliSeq Library Kit version 2.0 with barcode incorporation (Thermo Fisher Scientific). The libraries were quantified using the Ion Library Quantification Kit (Thermo Fisher Scientific), and 10 mL of a 10 pM library dilution was used for clonal amplification onto ion sphere particles using the Ion Template OT2 200 Kit (Thermo Fisher Scientific) on the Ion One Touch 2 instrument before sequencing. Templated ion sphere particles were subjected to 500 flows of 200-bp bidirectional sequencing on an Ion Torrent PGM system using Ion 316 chips. All procedures were performed according to the manufacturer’s instructions.

NGS data from the NCI-MPACT assay were analyzed by the Torrent Suite Software (TSS) version 4.4.2 (Thermo Fisher Scientific, Waltham, MA), which includes alignment and variant calling. The data analysis parameters recommended by the manufacturer were maintained with the exception of increasing the necessary minimum number of variant reads for each type to 25 reads (snp_min_coverage, indel_min_coverage, etc.) and relaxing the strand-specific error threshold to 36% (sse_prob_threshold Z 0.36), as per our clinical protocol. For variants called in flow space, VAF was calculated by the pipeline as the number of flow space alt allele observations (FAOs) in the variant call format (VCF) file divided by the sum of the FAO reads and flow space reference allele observations (FROs) in the VCF [approximately equivalent to flow evaluator read depth at the locus (FDP)]. For variants called by the long indel assembler module, VAF was calculated by the number of

340

jmd.amjpathol.org

-

The Journal of Molecular Diagnostics

Plasmid Based Multiplex Controls for NGS alt allele observations (AOs) in the VCF divided by the sum of AOs and reference allele observations (ROs) in the VCF (approximately equivalent to the read depth). To assess the limit of detection, the cutoffs for the lowest allele frequency called by the pipeline were manually reduced to 1%. In addition, comparisons were run with the 3.2.1, 4.0.2, and 4.4.2 versions of the pipeline to reveal performance improvements as the bioinformatics algorithms have developed over time. NGS data from the TSCA assay were analyzed using the built-in MiSeq Reporter version 2.5 pipeline (Illumina) using the default parameters and cutoffs provided by the manufacturer. VAF was calculated by dividing the allelic depth in the VCF by approximate read depth in the VCF. For WES data analysis, demultiplexed FASTQ files were generated with Casava version 1.8.2 configureBclToFastq.pl (Illumina) starting with .bcl files. The multiple FASTQ files generated by this script were concatenated and primer trimmed using the ea-utils fastq-mcf tool with the options el 30 eq 10 eu eP 33 to remove Illumina PCR and sequencing primers from the sequences. The trimmed sequences were mapped to human reference genome hg19 using the Burrows-Wheeler Aligner version 0.6.2 aln and sample mode with default settings.13 The resulting SAM files were converted to BAM format, sorted, deduplicated, realigned, and base quality score recalibrated using samtools, Picard, and GATK tools14 following the best practices guidelines version 3 as mentioned in the GATK website (https://www.broadinstitute.org/gatk/guide/bestpractices?bpmZDNAseq, last accessed November 5, 2015). The variants were called using HalplotypeCaller within GATK version 3.3 using ploidy Z 20 to increase sensitivity to low allele frequency variants expected in the samples. VAF was calculated by dividing the allelic depth in the VCF by approximate read depth in the VCF. Because we were interested only in the variants present in the spiked-in plasmids, we used a custom BED file (Supplemental Table S2) that contained 100 bp of flanking sequence around the variants to be identified to limit variant calling to these regions (GATK tools option eL) and to decrease the time needed to call variants in the exome samples both before and after base recalibration. NGS data were visualized using the Integrated Genome Viewer version 2.3.5615 and CLC Genomics Workbench version 8.0.2 (Qiagen). Statistical analysis was performed using the R statistical software suite,16 and graphs were generated using the R ggplot2.17 This work used the computational resources of the NIH HPC Biowulf cluster (http://hpc. nih.gov, last accessed November 5, 2015).

Performance Comparison Study To evaluate whether the efficiency of detecting variants in CPSG samples is comparable to FFPE genomic DNA, we conducted a parallel spike-in study by preparing two pairs of samples harboring a mutation contained in a plasmid and the

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

paired sample containing the identical mutation in genomic DNA derived from FFPE processed cell pellets (Figure 3A). The first pair of samples, pBRAF_476 and the FFPE prepared melanoma cell line MALME-3M, carries the same BRAF c.1799T>A SNV mutation (ENST00000288602, p.V600E, COSM476). Both pBRAF_476 plasmid DNA and MALME-3M FFPE genomic DNA were spiked into FFPE CEPH genomic DNA at the same copy number ratio (50%, 25%, 12.5%, 6.25%, and 3.125%). The DNA samples of both series were sequenced in triplicate by the NCI-MPACT assay on the PGM platform, and regression lines for the observed versus expected VAF were generated. As a marker of detection efficiency over a dilution series, the slopes of these regression lines were determined, and an analysis of covariance was performed to determine whether the slopes of the regression lines of the two series were different. The second pair of sample series carries the same APC c.4248delC indel mutation (ENST00000457016, p.I1417fs*2, COSM18584) in the pAPC_18584 control plasmid and FFPE colon HCT-15 cell line. The starting titration point for this series was increased to 75%, followed by a similar twofold dilution series (ie, 50%, 25%, 12.5%, 6.25%) to accommodate the 10% VAF cutoff in the default pipeline parameters.

Reproducibility Assessment To assess the reproducibility of CPSG performance, a 13-plasmid pool named CPSG13, consisting of four SNVs, one SNV at HP, three indels, two indels at HP, and three large indels (Table 1), was made and spiked into CEPH (NA12878) hapmap genomic DNA at an estimated ratio of 25%. A large preparation of this material was aliquotted into 25-use tubes and stored frozen at 80 C until use. The CPSG13 sample was used as a positive control for the NCI-MPACT assay and characterized 90 times during a period of 18 months by three operators (30 times by each). The VAFs of the 13 mutations detected by the NCI-MPACT assay were plotted against the date of assay performance to assess variation within and between operators across the 13 plasmids. A two-way analysis of variance test was used to evaluate whether there is a statistically significant difference among VAFs measured by three operators after adjusting for differences in plasmids.

Assessment of the Effect of Data Analysis Pipelines on NGS Results To assess the effect of different NGS data analysis pipelines on variant calling, a pool of 51 control plasmids, named CPSG51, was spiked into hapmap CEPH (NA12878) genomic DNA at five titration points: 50%, 25%, 12.5%, 6.25%, and 3.125% (Table 1). CPSG51 samples were sequenced by the NCI-MPACT assay, and the same raw NGS data (ie, preebase-called, unaligned sequence) were analyzed by TSS versions 3.2.1, 4.0.2, and 4.4.2. The default limit of detection for SNVs and indels in all three pipelines were manually reduced to 1% to call variants at a

341

Sims et al

Figure 3

Parallel spike-in study strategy. A: Schematic diagram of parallel spike-in study. A pair of serially diluted sample sets made by spiking control plasmid pBRAF_476 or genomic DNA from cell line MALME-3M carrying the same BRAF V600E mutation, into the hapmap genomic DNA at the same titration points, followed by sequencing and regression analysis of the expected allele frequency versus the observed allele frequency. B: Scatterplot and regression analysis for pBRAF_476/MALME-3M and pAPC_18584/HCT15. Linear regression models were fit for each. FFPE, formalin-fixed, paraffin-embedded; PGM, Personal Genome Machine; VAF, variant allele frequency.

lower allele frequency and thus determine the low end of detection of each in each version of software. pATM_41596 (c.5557G>A, p.D1853N) was excluded from the calculation because it was found to be a naturally occurring heterozygous single-nucleotide polymorphism (rs1801516) found in CEPH NA12878. This creates an inflated VAF compared with the expected VAF of the plasmid alone. Plasmid pPDGFRA_736 (c.2525A>T, p.D842V) was also excluded from calculation because of an error made in dilution during the CPSG51 preparation. The data from each of the pipelines was used to calculate the variant detection rates of each, defined as the percentage of the number of detected variants divided by the total number of variants spiked into each titration point for the 49 mutations in four mutation types. The 95% CI was estimated using the Clopper and Pearson method.16,18

CPSG51 series samples were used to assess the limit of detection of three NGS sequencing assays run on different

platforms: the AmpliSeq/PGM based NCI-MPACT assay, the TruSeq Custom Amplicon/MiSeq based TSCA assay, and the SureSelect Human All exon v5 baits/Hiseq2000 WES assay. The following 17 plasmids in CPSG51 were excluded from the analysis of three-platform comparison: pAPC_18561, pATR_20627, pEGFR_6224, pERBB2_682, pERCC1_ 140843, pGABRA6_70853, pGNAQ_28758, pGNAS_27887, pJAK2_12600, pKIT_1314, pMTOR_94356, pNBN_35664, pNRAS_584, pPARP2_75849, pPTEN_5152, pRET_965, and pTP53_10662. These plasmids were excluded because they could not be detected by the locked TSCA design and data analysis methods because the MOI was outside the region covered by the custom TSCA assay panel or there was interference from the molecular barcode residing within the library primer binding region (see Using CPSG to Access Performance of NGS Assays Designed for Different Platforms). For the aforementioned reasons, pATM_41596 and pPDGFRA_736 were also excluded. Therefore, only 32 plasmids were used in all three assays for interassay comparison. The data from each assay were used to calculate the variant detection rates as defined above for 32 mutations in four mutation types. The 95% CI was estimated using the Clopper and Pearson method.16,18

342

jmd.amjpathol.org

Performance Comparison of Three NGS Assays on Different Platforms

-

The Journal of Molecular Diagnostics

Plasmid Based Multiplex Controls for NGS

Results Construction of Control Plasmids and Function of Molecular Barcode All 69 control plasmids were successfully constructed and manufactured with a mean yield >100 mg. Given a mean size of A control plasmid and genomic DNA samples (Figure 3B). Similar values for the slope (plasmid Z 1.0516, cell line Z 0.9864) were observed between the two species. An analysis of covariance was performed to test for differences in the slopes of the two and found no evidence of significantly different slopes between the plasmid and cell line species (F Z 1.03, P Z 0.321). A similar analysis was performed for a deletion mutation, APC c.4248delC, and found to have nearly identical slopes (plasmid Z 1.0967, cell line Z 1.0643). Again, no significant difference was found between the two slopes after statistical analysis (F Z 1.02, P Z 0.321). These results indicate that performance of CPSG is very similar to FFPE genomic DNA in detecting different types of mutations over a wide range of allele frequencies.

Assessment of Reproducibility of CPSG Samples Because these CPSG samples were intended to be used as internal controls to monitor assay performance over time, the ability to reproducibly detect the mutations in these plasmids by different operators at different times was

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

assessed. The 13 mutations in CPSG13 were all detected 90 times by three operators during an 18-month period (Figure 4A). Aggregate data for each of the 13 plasmids for each operator are given in Table 2, and the VAF SD for each plasmid is indicated. A two-way analysis of variance test revealed that minimal differences in the detected VAFs were observed among the three operators (F Z 1.54, P Z 0.040). The detected VAF for each of the 13 mutations was a mean of 25.55% with a mean SD of approximately 5.4% VAF in 90 replicates. Because of difficulties in quantifying these materials, there was a large range of VAFs observed for each plasmid, which does not seem to correlate with the variant type being detected (Table 3). Analysis of the CPSG13 DNA samples during 18 months revealed only minor changes in VAF (mean Z 25.55, SD Z 1.40) of each of the 13 plasmids (Figure 4B), which is well within the expected normal variance of an NGS assay.9 These results indicate that CPSG material is highly stable and variant detection is highly reproducible among different operators for a long period. In addition, subjecting the material to at least 25 freeze/thaw cycles during the period the material was tested has no observable effect on detection of the variants and allele frequencies within the sample. Taken together, these data indicate that the CPSG samples are reliable, stable positive controls for monitoring the performance of NGS assays.

Using CPSG to Assess Performance of Different Data Analysis Pipelines It is known that different data analysis pipelines can markedly affect the results of variant calling, yet it is difficult to evaluate which pipelines are more accurate because of the lack of well-characterized calibrator materials. To demonstrate the value of using the CPSG as standards in assessing the performance of NGS data analysis pipelines, the same prealigned NGS data generated from sequencing CPSG51 samples by the NCI-MPACT assay were analyzed by three, sequential versions of the TSS data analysis pipelines: 3.2.1, 4.0.2, and 4.4.2. The detection rates for each of the titration points for 49 plasmids (excluding pATM_41596 and pPDGFRA_736 as described) in CPSG51 were used to evaluate the performance of the three pipelines. The overall detection rates of all 49 over the five titration points are given in Table 4. A tile plot (Figure 5) was generated for each of the three pipelines to indicate the performance of each of the plasmids over the development of the alignment and variant calling pipelines. The detection rates of each variant type by three versions of pipeline are summarized in Supplemental Table S3. In general, the performance of the three pipelines was similar in SNV, SNV at HP, and large indel variant types. However, improvements made in the TSS version 4.0.2 and TSS version 4.4.2 pipelines have made them more effective at calling three of five indels at HPtype variants at lower titration points compared with the

343

Sims et al

Figure 4 CPSG13 reproducibility during 18 months with three operators. A: CPSG13 was repeatedly sequenced by the National Cancer Institute’s MPACT (NCI-MPACT) assay 90 times by three operators (OP1, OP2, and OP3) during an 18-month period. Boxplots showing the variant allele frequency (VAF) distribution binned by plasmid for each operator were plotted. The dashed line represents the 25% VAF point at which the plasmids were intended to be titrated. B: The mean VAF for each of the 13 plasmids and three operators was plotted as a green solid line during 18 months to evaluate the variability and stability of the material. The blue dashed line represents the expected 25% VAF. The red line represents a regression model through the series. The gray shadow represents the range of VAF for the 13 plasmids at each test date on the x axis.

earlier TSS version 3.2.1 pipeline. Two variants that appear to be challenging to all versions of the pipeline are an SNV in pAKT1_33765 located 1 bp away from the end of the amplicon (Supplemental Figure S1) and a 1-bp insertion in pAPC_18561 within a long homopolymeric region composed of a repeat of six consecutive adenosine residues (Supplemental Figure S2). These results indicate that CPSG samples can serve as powerful calibrators to assess the performance of data analysis pipelines in detecting and calling different types of variants and can highlight and identify the weaknesses in these pipelines, facilitating the ability to further improve on these algorithms.

Using CPSG to Assess Performance of NGS Assays Designed for Different Platforms In addition to the AmpliSeq/PGM-based NCI-MPACT assay, the same CPSG51 series was used to evaluate the performance of two other NGS assays that are based on different chemistry and sequencing platforms: the MiSeqbased TSCA assay and HiSeq-based WES assay. A tile plot indicating detection of the plasmids with each of the three platforms was generated (Figure 6). With the TSCA assay, variants in 32 plasmids were detected at the expected VAF (Figure 6), but variants in 15 plasmids (pMTOR_94356, pRET_965, pPTEN_5152, pPARP2_75849,

344

pERBB2_682, pERCC1_140843, pGNAS_27887, pATR_ 20627, pKIT_1314, pAPC_18561, pGABRA6_70853, pEGFR_6224, pNBN_35664, pJAK2_12600, pGNAQ_28758) were either not detected or detected with VAFs T SNV also occurs within the 50 library primer. Therefore, neither of these two variants could be detected using this panel because the MOI would be masked by the library primer. To get an informative comparison among the three different platforms, we removed these 15 plasmids from the analysis, leaving 32 that were detected consistently across all three platformbased assays. The WES assay detected all 32 variants in the control plasmids at higher titration points. One plasmid, pNPM1_17559, was missed at the 12.5% titration point, a variant in pTP53_10648 was missed at the 6.25% titration point, and six plasmids, pFGFR3_715, pIDH1_28747, pMLH1_26085, pPTEN_5809, pNPM1_17559, and pPIK3CA_763, were Table 4

missed at the 3.125% titration point (Figure 6). Further analyses suggest that the lack of detection of those low VAF variants was due to lower coverage read depth within the targeted region. On average, the whole exome sequencing assay produced a 176.5 times read depth in targeted regions. Given the mean read depth, the number of sequencing reads harboring the lower allele frequencies derived from the spiked-in plasmids would decrease to single digits, and the default cutoffs of the data analysis pipelines prohibit detection of these variants at such low variant allele frequencies. Overall, the 32 mutations had a very high concordance in detection rates among the three assay platforms over the five titration points. These data indicate that the plasmids are detected at a similar sensitivity over each of the 32 plasmids (Table 5) or by each variant type (Supplemental Table S4) across the three platforms.

Discussion To date, there are no widely accepted multianalyte standards or controls for clinical NGS assays. Well-characterized and available multianalyte controls would be valuable for a

Comparison of Detection Rates across Three Data Analysis Pipelines TSS version 3.2.1

TSS version 4.0.2

TSS version 4.4.2

Titration

Total No. of mutations

No. of detected mutations

Detection rate, % (95% CI)

No. of detected mutations

Detection rate, % (95% CI)

No. of detected mutations

Detection rate, % (95% CI)

50 25 12.5 6.25 3.125

49 49 49 49 49

48 46 43 42 42

97.96 93.88 87.76 85.71 85.71

47 47 46 46 45

95.92 95.92 93.88 93.88 91.84

47 46 47 47 46

95.92 93.88 95.92 95.92 93.88

(89.15e99.95) (83.13e98.72) (75.23e95.37) (72.76e94.06) (72.76e94.06)

(86.02e99.50) (86.02e99.50) (83.13e98.72) (83.13e98.72) (80.40e97.73)

(86.02e99.50) (83.13e98.72) (86.02e99.50) (86.02e99.50) (83.13e98.72)

TSS, Torrent Suite Software.

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

345

Sims et al

Figure 5 Comparison of the three CPSG51 data analysis pipelines. CPSG51 was sequenced by the National Cancer Institute’s MPACT assay, and the same raw data were analyzed by Torrent Suite Software (TSS) versions 3.2.1, 4.0.2, and 4.4.2. Tile plots indicating whether a variant was detected (red box) or not (gray box) were generated for each of the three pipeline versions. Each row represents a mutation in a control plasmid, and each column represents a titration point. The plasmid name and the variant type (in parenthesis) are indicated on the left. HP, homopolymeric region; indel, insertion/deletion; SNV, singlenucleotide variant.

variety of applications for clinical NGS assays used for oncology patient diagnosis and treatment selection. They can be used as routine run controls for assessment of analytical performance of a given run or lot of reagents used and for the assessment of assay and reagent analytical performance over time, operators, laboratories, and instruments. These materials may even serve as good controls for validation of clinical NGS assays in the absence of samples with hard to find mutations, as was reported in a recent submission of a 510 (k) premarket notification to the Food and Drug Administration by Illumina for their MiSeqDx Cystic Fibrosis 139-Variant Assay panel. We designed and generated a plasmid-based calibrator and quality control material by constructing 69 control plasmids that contain frequently occurring mutations in tumors, representing many different types of variants (SNVs, small indels, large indels, and variants located within or near homopolymeric sequence) (Table 1). We found that the performance of synthetic plasmids is nearly identical to endogenous variants contained within genomic DNA derived from FFPE cell lines (Figure 3B). Our data indicate that CPSG samples are highly stable and generate reproducible results when assayed by different operators for a long period (Figure 4). Importantly, we illustrated the utility of CPSG samples in evaluating different DNA analysis pipelines (Figure 5) and compared the performance of NGS assays designed for different NGS platforms (Figure 6). To

our knowledge, our work represents the first example of establishing a highly multiplex analyte control materials suitable for NGS clinical assays. Although only two plasmid pools (CPSG13 and CPSG51) were used for proof of principle and demonstrating the utilities of 69 control plasmids, we believe that these results were representative and larger panels should perform similarly. By comparison to homologous recombinationemediated site-specific mutagenesis of cell line genomes, CPSG materials represent an approach far less expensive and time consuming, as well as more straightforward, with no limitation in variant number and types, for manufacturing highquality controls for NGS assays. Because the size of the plasmids is much smaller in relation to a human genome, the mass of each plasmid spiked into a reference genome is negligible, allowing one to spike a very large number of unique targets into a single genome, and at varying quantities. Plasmid-based panels could include clinically relevant, frequently occurring, and rare mutations in tumors with actionable value. In contrast to the inherent limitation of the highest available allele fraction in the cell line genomic DNA blending approach, the allele fraction for all mutations in a pool of CPSG are flexible over a very wide range. In addition, genetic instability is often problematic in transformed cancer-derived cell lines,19,20 whereas plasmids offer clonal selection and stability. The hapmap genomic DNA (NA12878) selected for plasmid spike is derived from

346

jmd.amjpathol.org

-

The Journal of Molecular Diagnostics

Plasmid Based Multiplex Controls for NGS

Figure 6

Comparison of the three CPSG51 assay platforms. A 32-plasmid subset detectable across three different next-generation sequencing chemistries and platforms was compared. A tile plot was generated for each of the three platforms, and an indication was made as to whether the variant was detected (red box) or not (gray box). Each row represents a mutation in a control plasmid, and each column represents a titration point. The plasmid name and the variant type (in parenthesis) are indicated on the left. HP, homopolymeric region; indel, insertion/deletion; TSCA, TruSeq Custom Amplicon; SNV, single-nucleotide variant; WES, whole exome sequencing.

normal cells and is extremely well characterized by multiple sequencing platforms and available as a certified reference material from the National Institute of Standards and Technology.7 In addition, our data clearly indicate that CPSG is applicable to a variety of frequently used library construction methods, sequencing chemistries, and sequencers. Taken together, we believe that the advantages in cost, turnaround time, scalability, flexibility, stability, and applicability make the CPSG an ideal control material for NGS assays. The 6-bp molecular barcode sequence was originally designed to function as a distinguishable marker and to offer a possibility to spike the control plasmid directly into the clinical specimen. Although these barcodes served as a

Table 5

distinguishable marker of the control plasmidebearing mutations (Figure 1B), this sequence also created difficulty for NGS informatics pipelines, especially when the mutation was located at the end of an amplicon (Supplemental Figure S1). We also experienced either a failure to detect or a severely reduced VAF for several mutations when using our TSCA panel, which we originally thought might be due to an inability of alignment and variant calling pipeline to map this sequence. However, after looking in depth at the resultant reads in the BAM file and identifying the location of the target-specific library primers, we learned that the difficulty was actually due to a less than ideal panel design in which the molecular barcode was located within the primer binding region. This therefore reduced the robustness

Comparison of Detection Rates of 32 Plasmids across Three Platforms NCI-MPACT

TSCA

Titration

Total No. of mutations

No. of detected mutations

Detection rate, % (95% CI)

No. of detected mutations

Detection rate, % (95% CI)

No. of detected mutations

Detection rate, % (95% CI)

50 25 12.5 6.25 3.125

32 32 32 32 32

31 31 31 31 31

96.88 96.88 96.88 96.88 96.88

32 32 32 32 32

100.00 100.00 100.00 100.00 100.00

32 32 31 31 26

100.00 100.00 96.88 96.88 81.25

(82.00e99.84) (82.00e99.84) (82.00e99.84) (82.00e99.84) (82.00e99.84)

WES

(86.66e100.00) (86.66e100.00) (86.66e100.00) (86.66e100.00) (86.66e100.00)

(86.66e100.00) (86.66e100.00) (82.00e99.84) (82.00e99.84) (62.96e92.14)

NCI, National Cancer Institute; TSCA, TruSeq Custom Amplicon; WES, whole exome sequencing.

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

347

Sims et al of amplification for these variant containing sequence reads (Supplemental Figures S3 and S4). Because the TSCA assay panel was designed after the control plasmids were designed and synthesized, the interference in assay chemistry by molecular barcode and its effect on target selection was not expected or discovered until obtaining the results. Because of the routine soft clipping and/or sequence trimming in the MiSeq Reporter pipeline, as with most alignment and variant calling pipelines, it is unknown whether these variants would have been detected even in the absence of the molecular barcode sequence. Perhaps induced by the presence of the molecular barcode, aggressive sequence trimming in almost all of these cases led to strand bias (Supplemental Figures S3 and S4). It is likely, though, that these problems can be rectified by moving the primers further away and downstream of the MOI or by moving the molecular barcode toward the center of amplicon, on the other side of the MOI. These two elements highlight how useful such material is in ensuring not only that the chemistry is ideal to identify mutations of interest in lieu of being able to obtain samples with rare variants but also that the pipeline being used is performing as well as possible. We also observed that in many cases the observed VAF deviated by a large amount (up to 10.5%) from an expected 25% titration point in CPSG13 samples (Table 3) and speculated that this could result from the inaccurate quantification method we used. Better upfront quantification methods, such as digital PCR, may allow for more precise quantification of each plasmid species before pooling, which may result in more similar observed VAFs among the population of plasmids to be detected. To add further value to these materials, we have designed plasmids that contain specific gene fusion transcripts that can be in vitro transcribed for RNAseq experiments and have found the possibility of using the relative copy number of plasmids as calibration standards for CNV assays and circulating tumor DNA variant detection. Circulating tumor DNA is an area of keen research interest because data support the quantitative assessment of circulating tumor DNA as a marker of therapy response and disease progression.21,22 CPSG could be a good calibration control for circulating tumor DNA assessment because VAFs in this material can often be ultralow. For the NGS-based diagnostics field to progress, it is imperative that assay control and calibration standards are made available. On the basis of nearly identical analytical performance of the CPSG and genomic DNA, we have established a reagent that can act as a control material for already developed assays, a test material for the development of new assays and informatics pipelines and algorithms, and routine testing of new reagent lots and informatics pipelines. We recognize that the CPSG material would not be useful as a full process control from tissue biopsy to a final clinical report because preanalytical procedures, such as tissue fixation and embedding, tumor

enrichment, and nucleic acid extraction, are outside the scope of the CPSG material. However, use of control materials, such as CPSG, would eliminate the need to find and consume limited tumor specimens and materials from laboratory archives for assay and pipeline development and improvements and save these clinical materials for more informative uses, such as analytical validation and proficiency testing. Furthermore, this would allow for a common set of materials that could be used across sites for standardization testing of clinical assay validation results.

348

jmd.amjpathol.org

Acknowledgments We thank the core facilities in Frederick National Laboratory for Cancer Research: the Protein Expression Laboratory for construction and preparation control plasmids support, the Laboratory of Molecular Technology for Sanger sequencing support, and Advanced Biomedical Computation Center for computation support. D.J.S., P.M.W., and C.-J.L. conceived and designed the study; D.J.S., T.D.F., M.G.M., P.M.W., and C.-J.L. developed the methodology; D.J.S., R.D.H., T.D.F., B.D., K.N.H., P.M.M., C.E.C., and C.H.B. acquired data; D.J.S., R.D.H., T.D.F., B.D., P.M.W., and C.-J.L. acquired data; D.J.S., R.D.H., T.D.F., B.D., B.A.C., J.H.D., P.M.W., and C.-J.L. wrote, reviewed, and revised the manuscript.

Supplemental Data Supplemental material for this article can be found at http://dx.doi.org/10.1016/j.jmoldx.2015.11.008.

References 1. Schrijver I, Aziz N, Farkas DH, Furtado M, Gonzalez AF, Greiner TC, Grody WW, Hambuch T, Kalman L, Kant JA, Klein RD, Leonard DG, Lubin IM, Mao R, Nagan N, Pratt VM, Sobel ME, Voelkerding KV, Gibson JS: Opportunities and challenges associated with clinical diagnostic genome sequencing: a report of the Association for Molecular Pathology. J Mol Diagn 2012, 14:525e540 2. Xuan J, Yu Y, Qing T, Guo L, Shi L: Next-generation sequencing in the clinic: promises and challenges. Cancer Lett 2013, 340:284e295 3. Tran B, Dancey JE, Kamel-Reid S, McPherson JD, Bedard PL, Brown AM, Zhang T, Shaw P, Onetto N, Stein L, Hudson TJ, Neel BG, Siu LL: Cancer genomics: technology, discovery, and translation. J Clin Oncol 2012, 30:647e660 4. Andre F, Mardis E, Salm M, Soria JC, Siu LL, Swanton C: Prioritizing targets for precision cancer medicine. Ann Oncol 2014, 25:2295e2303 5. Rehm HL, Bale SJ, Bayrak-Toydemir P, Berg JS, Brown KK, Deignan JL, Friez MJ, Funke BH, Hegde MR, Lyon E: ACMG clinical laboratory standards for next-generation sequencing. Genet Med 2013, 15:733e747 6. Gargis AS, Kalman L, Berry MW, Bick DP, Dimmock DP, Hambuch T, et al: Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol 2012, 30:1033e1036 7. Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M: Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 2014, 32: 246e251

-

The Journal of Molecular Diagnostics

Plasmid Based Multiplex Controls for NGS 8. Pant S, Weiner R, Marton MJ: Navigating the rapids: the development of regulated next-generation sequencing-based clinical trial assays and companion diagnostics. Front Oncol 2014, 4:78 9. Singh RR, Patel KP, Routbort MJ, Reddy NG, Barkoh BA, Handal B, Kanagal-Shamanna R, Greaves WO, Medeiros LJ, Aldape KD, Luthra R: Clinical validation of a next-generation sequencing screen for mutational hotspots in 46 cancer-related genes. J Mol Diagn 2013, 15:607e622 10. Dias-Santagata D, Akhavanfard S, David SS, Vernovsky K, Kuhlmann G, Boisvert SL, Stubbs H, McDermott U, Settleman J, Kwak EL, Clark JW, Isakoff SJ, Sequist LV, Engelman JA, Lynch TJ, Haber DA, Louis DN, Ellisen LW, Borger DR, Iafrate AJ: Rapid targeted mutational analysis of human tumours: a clinical platform to guide personalized cancer medicine. EMBO Mol Med 2010, 2: 146e158 11. Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, et al: The External RNA Controls Consortium: a progress report. Nat Methods 2005, 2:731e734 12. Lih C-J, Sims DJ, Harrington RD, Polley EC, Zhao Y, Mehaffey MG, Forbes TD, Das B, Datta V, Harper KN, Bouk CH, Rubinstein LV, Simon RM, Conley BA, Chen AP, Kummar S, Doroshow JH, Williams PM: Analytical validation and application of a targeted next generation sequencing mutation detection assay for use in treatment assignment in the NCI-MPACT trial (NCT01827384). J Mol Diagn 2016, 18:51e67 13. Li H, Durbin R: Fast and accurate long-read alignment with BurrowsWheeler transform. Bioinformatics 2010, 26:589e595 14. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25:2078e2079

The Journal of Molecular Diagnostics

-

jmd.amjpathol.org

15. Thorvaldsdottir H, Robinson JT, Mesirov JP: Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2013, 14:178e192 16. R Development Core Team: R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014 17. Wickham H: ggplot2: Elegant Graphics for Data Analysis. New York, Springer-Verlag, 2009 18. Clopper CJ, Pearson ES: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934, 26:404e413 19. Roschke AV, Tonon G, Gehlhaus KS, McTyre N, Bussey KJ, Lababidi S, Scudiero DA, Weinstein JN, Kirsch IR: Karyotypic complexity of the NCI-60 drug-screening panel. Cancer Res 2003, 63:8634e8647 20. Stults DM, Killen MW, Shelton BJ, Pierce AJ: Recombination phenotypes of the NCI-60 collection of human cancer cells. BMC Mol Biol 2011, 12:23 21. Sausen M, Phallen J, Adleff V, Jones S, Leary RJ, Barrett MT, Anagnostou V, Parpart-Li S, Murphy D, Kay Li Q, Hruban CA, Scharpf R, White JR, O’Dwyer PJ, Allen PJ, Eshleman JR, Thompson CB, Klimstra DS, Linehan DC, Maitra A, Hruban RH, Diaz LA Jr, Von Hoff DD, Johansen JS, Drebin JA, Velculescu VE: Clinical implications of genomic alterations in the tumour and circulation of pancreatic cancer patients. Nat Commun 2015, 6:7686 22. Piotrowska Z, Niederst MJ, Karlovich CA, Wakelee HA, Neal JW, Mino-Kenudson M, Fulton L, Hata AN, Lockerman EL, Kalsy A, Digumarthy S, Muzikansky A, Raponi M, Garcia AR, Mulvey HE, Parks MK, DiCecca RH, Dias-Santagata D, Iafrate AJ, Shaw AT, Allen AR, Engelman JA, Sequist LV: Heterogeneity underlies the emergence of EGFRT790 wild-type clones following treatment of T790M-positive cancers with a third-generation EGFR inhibitor. Cancer Discov 2015, 5:713e722

349

Plasmid-Based Materials as Multiplex Quality Controls and Calibrators for Clinical Next-Generation Sequencing Assays.

Although next-generation sequencing technologies have been widely adapted for clinical diagnostic applications, an urgent need exists for multianalyte...
2MB Sizes 2 Downloads 9 Views