Vaccine 32 (2014) 7115–7121

Contents lists available at ScienceDirect

Vaccine journal homepage: www.elsevier.com/locate/vaccine

Evaluation of cells and biological reagents for adventitious agents using degenerate primer PCR and massively parallel sequencing Shasta D. McClenahan a , Christine Uhlenhaut b , Philip R. Krause a,∗ a b

Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA Highly Pathogenic Viruses, Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany

a r t i c l e

i n f o

Article history: Received 20 August 2014 Received in revised form 14 October 2014 Accepted 15 October 2014 Available online 27 October 2014 Keywords: Cell substrate Adventitious agent Virus Degenerate PCR Massively parallel sequencing (MPS)

a b s t r a c t We employed a massively parallel sequencing (MPS)-based approach to test reagents and model cell substrates including Chinese hamster ovary (CHO), Madin-Darby canine kidney (MDCK), African green monkey kidney (Vero), and High Five insect cell lines for adventitious agents. RNA and DNA were extracted either directly from the samples or from viral capsid-enriched preparations, and then subjected to MPS-based non-specific virus detection with degenerate oligonucleotide primer (DOP) PCR. MPS by 454, Illumina MiSeq, and Illumina HiSeq was compared on independent samples. Virus detection using these methods was reproducibly achieved. Unclassified sequences from CHO cells represented cellular sequences not yet submitted to the databases typically used for sequence identification. The sensitivity of MPS-based virus detection was consistent with theoretically expected limits based on dilution of virus in cellular nucleic acids. Capsid preparation increased the number of viral sequences detected. Potential viral sequences were detected in several samples; in each case, these sequences were either artifactual or (based on additional studies) shown not to be associated with replication-competent viruses. Virus-like sequences were more likely to be identified in BLAST searches using virus-specific databases that did not contain cellular sequences. Detected viral sequences included previously described retrovirus and retrovirus-like sequences in CHO, Vero, MDCK and High Five cells, and nodavirus and endogenous bracovirus sequences in High Five insect cells. Bovine viral diarrhea virus, bovine hokovirus, and porcine circovirus sequences were detected in some reagents. A recently described parvo-like virus present in some nucleic acid extraction resins was also identified in cells and extraction controls from some samples. The present study helps to illustrate the potential for MPS-based strategies in evaluating the presence of viral nucleic acids in various sample types, including cell culture substrates and vaccines. © 2014 Published by Elsevier Ltd.

1. Introduction Adventitious viruses pose a potential threat to the production, and sometimes the safety, of vaccines that are produced in metazoan cell substrates. When adventitious agents are present in biological products, they are often associated with cell substrates, both because cell substrates are required for adventitious viruses to replicate and because long passage histories can increase the likelihood of an exposure to an adventitious agent, often from an animal-sourced reagent. The recent use of metagenomic techniques to discover the presence of porcine circovirus 1 (PCV-1) as an adventitious agent in a rotavirus vaccine [1] has led to the suggestion that deep (or massively parallel) sequencing (MPS)-based

∗ Corresponding author. Tel.: +1 301 796 1862. E-mail address: [email protected] (P.R. Krause). http://dx.doi.org/10.1016/j.vaccine.2014.10.022 0264-410X/© 2014 Published by Elsevier Ltd.

methods could be used to augment or possibly replace current tests of vaccines or the cells used to produce them. These techniques, which promise to yield sequences that represent all of the nucleic acids present in a sample, are particularly attractive because they have the potential to detect both known and unknown viruses. However, experience using these techniques to evaluate vaccines is limited, and there are challenges to using a method with the potential for false positives and for at least some uninterpretable results. MPS has been used to detect adventitious agents in cells and supernatants identifying parvovirus sequences in fetal bovine serum and endogenous retroviruses (ERVs) in a human cell line [2], ERVs in the Vero cell line [1,3], and nodavirus and latent errantivirus in the High Five cell line [3]. Most recently, PCV-1, likely introduced from porcine trypsin, was identified in Vero cells used to produce some live attenuated rotavirus vaccines and was also found in the final product [1,4].

7116

S.D. McClenahan et al. / Vaccine 32 (2014) 7115–7121

We previously reported on conditions that improve sensitivity of deep sequencing for adventitious virus detection in cells [5]. In the present study, we used degenerate-oligonucleotide primer PCR (DOP-PCR) to pre-amplify samples from cells representative of those typically used to produce vaccines, and used MPS to evaluate sequences present in these cells. 2. Materials and methods 2.1. Cell lines, sample preparation, and virus detection Cell lines tested included CHO (ATCC, Manassas, VA), Vero (ATCC), MDCK (ATCC), and High Five (Invitrogen, Carlsbad, CA). Cell-culture reagents tested included fetal bovine serum (FBS;Invitrogen, and HyClone, Thermo Scientific, Logan, UT), cellculture medium (DMEM, EMEM, Express Five SFM, CHO medium; Invitrogen), and trypsin (Invitrogen). Nucleic acids were prepared as previously described [5–8]. DNA and RNA were separately directly extracted (DE) from freeze-thawed cells using the All Prep Kit (Qiagen, Valencia, CA), sometimes after virus particle enrichment (capsid preparation, CP) by nuclease digestion and ultracentrifugation. Reverse-transcribed RNA (using random hexamers) or DNA was subjected to DOP-PCR using a single PCR primer specifically adapted to adding either 454 or Illumina sequencing primers, in some cases with multiplex identifying sequences (MID) [5]. Primers used are described in Table 1. Each MPS run comprised a pool with 12 indexes per run. 454 sequencing (Frederick National Laboratory for Cancer Research) was completed in 2010 with both titanium and FLX chemistry (Table 1). Illumina HiSeq (Beckman-Coulter Genomics) using 8 lanes of 2 × 100 nt reads was completed in 2011, and the MiSeq instrument was used in-house with 2 × 150 or 2 × 250 nt read lengths in 2013–2014. 2.2. Spiked virus samples Virus particles, either 1 × 106 particles of minute virus of mice (MVM; ATCC VR-1346) or 1 × 105 particles of MS2 phage (ATCC 15597-B1), were spiked into CHO cell pellets containing 1 × 106 cells to test the sensitivity of DOP-PCR paired with MPS. Total DNA and RNA were extracted by DE and after CP. The RNA was reversetranscribed and DNA and cDNA were amplified by DOP-PCR and prepared for MPS by MiSeq as described above. The MVM and MS2 spikes were verified by qPCR to contain the spiked virus quantity using published assays [9,10]. 2.3. Sequence analysis Virus sequences were identified using blastn and tblastx searches of the NCBI Basic Alignment Search Tool (BLAST), using either the entire NCBI nt database or an in-house viral database containing all viral sequences in GenBank as of July 2014. Sequences were considered “hits” if the search revealed a match with an E-value of 10−6 or less, while potential matches above this threshold were considered “no hits”. BLAST outputs were summarized using metagenomic analysis software (MEGAN) [11] with a minimum bit score of 60, and a minimum taxon hit of one. Additional analyses of “no hits” from the CHO cell line were performed against a database of Cricetulus griseus CHO-K1 cell line genome sequences available under the whole genome shotgun (WGS) contigs database (GenBank Assembly ID GCA 000223135.1, GenBank ID AFTD00000000.1) [12]. “No hit” sequences from each cell line were directly compared with each other using in house software. For each cell sample analyzed, the proportion of obtained sequences attributable to each potential adventitious virus was calculated, along with 99% binomial confidence intervals. Sequence

coverage of viral genomes were calculated with the Highperformance Integrated Virtual Environment (HIVE) hexagon alignment algorithm [13]. 2.4. Confirmatory testing and PCRs Some virus sequences detected by MPS were confirmed with specific PCRs to verify that these sequences were in fact present in the original samples. Primers were chosen based on sequences from MPS, known full length genes, or using published primers. For bovine viral diarrhea virus (BVDV), additional confirmatory PCRs were performed using a 630-bp polymerase gene PCR and a published real-time PCR for BVDV-1 [14] (Table 1). We used MadinDarby bovine kidney (MDBK) cell cultures for two passages with positive control BVDV to test the ability of BVDV to grow from samples containing BVDV sequences by MPS. 3. Results Cell lines and reagents similar to those used in the manufacturing process for biological products were analyzed for potential adventitious viral agents using DOP-PCR/MPS. DOP-PCR uses a single PCR primer that contains a 3 anchor sequence, allowing amplification of a representational library of both DNA and RNA in a sample, while permitting efficient addition of MPS primers by additional PCR. Three independent experiments were performed, each on an independently obtained and prepared sample, using different HTS modalities (454, Illumina MiSeq, and Illumina HiSeq). Previous experiments showed DOP-PCR could detect virus genomes with coverage ranging from 1% to 91% [5]. DNA and RNA were analyzed separately directly from each sample and after particleenrichment. Table 2 shows the number of sequences obtained from each sample using each extraction technique and each sequencing modality. As previously observed, more sequences were obtained using Illumina sequencing than 454 sequencing [5]. The number of sequences obtained using CP was similar to that using DE, indicating that (as previously reported), although viral sequences are generally enriched, CP does not completely eliminate non-viral nucleic acids [5,15]. We noted two types of artifact with Illumina sequencing, both previously described [16–19]. First, there was evidence of carryover sequence inclusion from previous runs on the MiSeq [18]. For example, although PhiX was not used as a control in these experiments, we occasionally noticed PhiX phage sequences in these sequencing data, likely a carry-over from the immediately previous sequencing run on the same machine. Illumina has suggested improved cleaning protocols [20], and other investigators have suggested using different barcodes for subsequent runs to eliminate this. We also noted some barcode reading errors, in which sequences run with different barcode primers on the same run appeared grouped with the wrong sample. Other investigators have suggested using double barcodes to eliminate this problem [16,17]. Obviously erroneous sequences (PhiX sequences and viral sequences clearly belonging to different simultaneously-run samples) were not considered in the analyses presented in this report. Most sequences from CHO cell-derived nucleic acids did not align to the GenBank nt database (data not shown). For example, only 28% of sequences in the DNA DE MiSeq sample aligned to the nt database. Using less stringent criteria (E-value 10) for comparison only slightly increased the number of sequences aligning to the nt database, still leaving 68% of “no hit” sequences without clear identity. Because DOP-PCR primer constant anchor sequences lead to amplification of identical sequences on replicate samples (data not shown), sequences found in common between different cell

S.D. McClenahan et al. / Vaccine 32 (2014) 7115–7121

7117

Table 1 PCR primers used for these experiments. For 454 high throughput sequencing both FLX and titanium (Ti) primers and reagents were used. Multiplex identifier (MID) or index sequences were added to some samples for multiplexing. Primer Name

Application

Sequence

Sense

DOP DOP 454 A FLX DOP 454 B FLX DOP 454 MID A Ti DOP 454 MID B Ti IL-DOP Illumina P5 adapter Illumina P7 adapter MVM F MVM R MVM Probe MS2 F MS2 R MS2 Probe BVDV-1 BVDV-1 BVDV-1 probe BVDV Pol BVDV Pol

Nonspecific amplification Nonspecific 454 amplification Nonspecific 454 amplification Overlap PCR for 454 library Overlap PCR for 454 library Nonspecific Illumina amplification Overlap PCR for Illumina library Overlap PCR for Illumina library Specific PCR Specific PCR qPCR Probe Specific PCR Specific PCR qPCR Probe Specific PCR Specific PCR qPCR Probe Specific PCR Specific PCR

CCGACTCGAGINNNNNNTGTGG GCCTCCCTCGCGCCATCAGINNNNNNTGTGG GCCTTGCCAGCCCGCTCAGINNNNNNTGTGG CGTATCGCCTCCCTCGCGCCATCAG-MID-CCGACTCGAGINNNNNNTGTGG CTATGCGCCTTGCCAGCCCGCTCAG-MID-CCGACTCGAGINNNNNNTGTGG GCTCTTCCGATCTINNNNNNTGTGG AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT CAAGCAGAAGACGGCATACGAGAT-Index- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC AGTTTGCCATGCTATTTGC ACTGGTTTACTTGCTGTCC FAM-ATTTCTTTTGCCTCCTTGTCTGTTT -TAMRA CCAGCATCCGTAGCCTTATTGG GTTGCTTGTTCAGCGAACTTCTT FAM-TAAGGCGCTGCATCCTGCAACTTGTGC-TAMARA TAGCCATGCCCTTAGTAGGAC GACGACTACCCTGTACTCAGG FAM-AACAGTGGTGAGTTCGTTGGATGGCTT-TAMARA CTGGTAGARCAAYTGRTCAG CRTCATCMCCACAGACRTG

+ + − + − + + − + − + + − + + − + + −

lines might suggest origination from reagents used in sequencing or consistent sequencing artifacts. However, the vast majority of “no hit” sequences were unique to each cell line, and very few “no hit” sequences were found in common between the different cell lines or negative controls. Based on BLAST alignment (E-value of 10−6 or less) of “no hit” sequences from directly extracted DNA on the MiSeq, 96% or more of these “no hit” sequences were unique to each cell line, and only 58 (0.02%) exact common sequences were identified between Vero and MDCK cells (n = 243,416). No other identical sequences were identified between CHO, High Five, or negative controls. This indicates that almost all of these “no hit” sequences were not likely due to reagents used in sequencing or to consistent sequencing artifacts. Comparing “no hit” sequences from CHO cells with the separately available WGS genome of the CHO-K1 cell line [12] reduced the proportion of “no hits” from 72% of the total reads (when compared with the nt database) to 2% when compared with the CHO genome database. Sequences initially classified as “no hits” therefore appear to largely represent hamster sequences that are not present in the nt database. While the Vero cell genome has not been published, a blastn search of Vero “no hit” sequences against the

entire WGS database showed that 85% of “no hits” match sequences in the WGS database (data not shown). Thus, unclassified sequences from animal-derived cells may often represent cellular sequences not yet submitted to databases typically used for sequence identification. DOP-PCR analysis of extraction (negative) controls revealed mammalian, bacterial, and fungal sequences. Many of these exact sequences were also identified in the CHO cells, suggesting that some of the “no hit” sequences could also represent sequences related to reagents or to the sample extractions. In one experiment, 95% of bacterial sequences identified in CHO cells (representing 0.01–10% of the total sequences) were also found in “no template” controls, indicating that many of these sequences were derived from reagents. Sequences from some DNA and RNA bacteriophages were identified in some samples and negative controls. Because bacterial sequences were also identified in these same samples, it is not unexpected that some phage sequences would also be present. We used virus spikes at known concentrations to test the sensitivity of DOP-PCR paired with MPS for virus detection in CHO cells with the Illumina MiSeq. 1 × 106 particles of MVM (a DNA

Table 2 Total sequences obtained by massively parallel sequencing (MPS) for each cell line. Sample libraries were prepared for MPS by degenerate oligonucleotide primer PCR (DOP-PCR). 454 and Illumina (IL) MiSeq and HiSeq platforms were used for MPS. Samples were prepared either by direct extraction (DE) or capsid preparation (CP). Cell Line

Sequencing platform

Total sequences

Direct extraction (DE) Total DNA

Capsid preparation (CP) Total RNA

No.

%

No.

DNA %

No.

RNA %

No.

%

CHO

454 IL-HiSeq IL-MiSeq

340,234 41,059,180 11,881,802

98,900 12,495,577 2,042,928

29.07 30.43 17.19

57,420 9,009,263 3,652,592

16.88 21.94 30.74

81,032 9,653,808 3,044,136

23.82 23.51 25.62

102,882 9,900,532 3,142,146

30.24 24.11 26.45

Vero

454 IL-HiSeq IL-MiSeq

339,655 19,153,424 7,347,861

74,651 2,257,523 2,388,667

21.98 11.79 32.51

62,084 4,644,064 1,524,782

18.28 24.25 20.75

73,065 5,827,775 2,139,772

21.51 30.43 29.12

129,855 6,424,062 1,294,640

38.23 33.54 17.62

MDCK

454 IL-HiSeq IL-MiSeq

911,226 51,610,231 3,591,720

218,430 14,769,181 1,185,481

23.97 28.62 33.01

157,172 14,394,289 924,909

17.25 27.89 25.75

206,013 14,608,957 807,242

22.61 28.31 22.48

329,611 7,837,804 674,088

36.17 15.19 18.77

High five

454 IL-HiSeq IL-MiSeq

654,970 43,161,466 13,189,524

80,362 17,269,445 5,942,754

12.27 40.01 45.06

156,636 6,976,786 3,056,580

23.91 16.16 23.17

233,801 9,845,893 2,641,311

35.70 22.81 20.03

184,171 9,069,342 1,548,879

28.12 21.01 11.74

193,241,293

58,823,899

30.44

44,616,577

23.09

49,162,805

25.44

40,638,012

21.03

Total

7118

S.D. McClenahan et al. / Vaccine 32 (2014) 7115–7121

Table 3 Spiking studies. Purified stocks containing 1 × 106 minute virus of mice (MVM) particles and 1 × 105 MS2 phage particles were spiked into 1 × 106 CHO cells and subjected to DOP-PCR and massively parallel sequencing on the Illumina MiSeq. The number of reads for each virus is given following direct extraction (DE) and capsid preparation (CP). Nucleic acid

Extraction

DNA

DE CP

RNA

DE CP

Total sequences

MS2

MVM No.

%

2,508,424 2,330,764

4 12

0.0002 0.0005

2,596,846 2,401,261

– –

– –

No.

%

– –

– –

2 602

Table 4 Viral genome coverage from massively parallel sequencing data with 454 and Illumina HiSeq sequences. Cell Substrate

Virus

Genome Size (bp)

Sequencing Platform

CHO

ERV

9,603

BVDV Hokovirus

12,573 5,105

454 Illumina 454 454

27 75 19 1

3.3 8.1 1.7 4

SERV

8,393

BaEV

8,018

BVDV

12,573

454 Illumina 454 Illumina 454

14 165 3 143 17

7.0 13.4 5.2 9.3 1.7

Vero

0.0001 0.025

virus) or 1 × 105 particles of MS2 RNA phage were spiked into CHO cell pellets containing 1 × 106 cells. By qPCR, this corresponded to 1 × 106 genome copies for MVM and 3 × 105 genome copies for MS2. Based on calculations of expected copy numbers for a virussized genome (∼5 × 103 bp), in a background of directly extracted DNA from a mammalian genome (3 × 109 bp), we expected approximately 1.6 MVM reads for each million cellular DNA reads obtained with Illumina MPS. For RNA sequences, the expected number of reads depends on the quantity of RNA present in the cell, which in turn depends on whether ribosomal RNA is present. We found 4 MVM sequences among approximately 2.5 million sequences obtained from directly-extracted MVM-spiked cells, consistent with the theoretically expected proportion (Table 3). Capsid preparation enriched the spiked MVM sequences about 3-fold. For MS2, capsid preparation enriched the viral sequences about 300-fold (Table 3). We determined the proportion of sequences in each cell line that appeared viral in origin, using the GenBank nt database (Fig. 1A, C, G) or smaller, more specific viral databases (Fig. 1B, D–F, H). Virus-like sequences were more likely to be identified using the virus-specific databases. BVDV sequences were detected using 454 sequencing in one of the three independent samples from each cell line (sequence totals include 13 from CHO, 17 from Vero and 6 from MDCK), suggesting that a reagent used in common among these experiments (likely serum used to grow the cells in 2010, which tested positive for BVDV by IFA) probably contained BVDV sequences (data not shown). Previously described ERV sequences (U09104.1) were detected in CHO cells [21,22], and CP did not appear to enrich these sequences relative to DE. Other ERV sequences identified from CHO cells aligned most closely with mouse mammary tumor virus (MMTV; AF228552.1), murine leukemia virus (AF019230), Abelson murine leukemia virus (NC 001499) and feline leukemia virus (NC 001940). A single bovine hokovirus (a parvovirus) sequence was also detected in the 454 run for the CHO cells, suggesting that hokovirus nucleic acids were present in the serum used to culture the cells (data not shown). Previously described simian endogenous retrovirus (SERV) and a baboon-like endogenous retrovirus (BaEV) sequences were detected in DNA and RNA of Vero cells [1,3,23], and also did not enrich with CP, suggesting that they likely were not particleassociated. In MDCK cells, BVDV was detected in the 454 sample. Likely ERV sequences were also identified in MDCK by comparison with a viral database (Fig. 1E) and prototypical canine retrovirus sequences [24,25] (Fig. 1F). Two sequences from Illumina MiSeq DNA from DE aligned with feline endogenous retrovirus RD114 when aligned to the viral database, and sequences that matched a UR2 sarcoma virus were found both in the earlier 454 run and extraction controls of MDCK cells using a 2012 version of GenBank. This homology was no longer apparent in 2014, with these sequences showing better matches to other mammalian cellular sequences. In High Five cells, 10-20% of detected sequences from RNA samples were nodavirus sequences, consistent with previous

MDCK

BVDV

12,573

454

High Five

Nodavirus

3,107

Bracovirus

540,215

454 Illumina 454 Illumina

Unique Viral Sequences

Genome Coverage (%)

6

1.7

32,120 1,386,823 12 189

77.3 85.7 0.6 3.5

reports [3,26]. With this high representation in the DE nucleic acids, significant enrichment by CP may not have been possible. Detection of nodavirus sequences in the DNA samples may have resulted from RT activity of the Taq polymerase. Some sequences initially appeared to align to baculovirus, but consistent with other studies, further evaluation identified these as transposable retroelements [3,27]. ERV-like sequences identified as errantivirus [27,28] and (likely endogenous) bracovirus sequences were also observed in High Five cells. Sequences of a recently described parvo-like virus (PHV), present in some nucleic acid extraction resins [29,30], were also identified in cells and extraction controls from samples prepared with Qiagen columns in 2011, but not in later samples. Using the described methods, these viral sequences align only to recently published PHV sequences but not to other parvovirus sequences in GenBank. Similar results were obtained in each of the three experiments, which used different preparations of cell lines and different sequencing methods, suggesting that virus detection using these general methods could be reproducibly achieved. For example, the viral sequences identified in the CHO (Fig. 1A) and Vero (Fig. 1C) cells were consistent between the three runs and error bars overlapped for each virus identified. The largest difference in the percentage of viral sequences identified was with retrovirus-like elements in High Five cells. Of the identified viral sequences, with the exception of hokovirus, all putative viruses aligned with more than one independent sequence, with genome coverage ranging from

Evaluation of cells and biological reagents for adventitious agents using degenerate primer PCR and massively parallel sequencing.

We employed a massively parallel sequencing (MPS)-based approach to test reagents and model cell substrates including Chinese hamster ovary (CHO), Mad...
1MB Sizes 0 Downloads 8 Views