LETTER

Risks of double-counting in deep sequencing Lou et al. (1) report a technique for increased sensitivity of DNA sequencing that they call “circle sequencing.” The authors’ method involves circularizing single-stranded DNA and performing rolling-circle amplification (RCA). Multiple duplicates are thereby generated, and errors can be removed by comparing the sequences of the duplicates. Lou et al. report accuracy similar to that of singlestranded tagging techniques (2, 3), but less than that achieved with double-stranded tagging (4). The authors propose their method as a high-efficiency tool for deep-sequencing of heterogeneous DNA samples. To obtain accurate deep-sequencing from a targeted region, it is necessary to ensure that each starting molecule is counted only once in the final sequence data. However, the method, as described, does not include steps to avoid multiple counting of the same starting molecule. Overcounting could occur for several reasons: (i) A large excess of random DNA primers is used. Thus, each template may be primed multiple times, thereby generating multiple extension products from each circular molecule. (ii) RCA results in long DNA products. The authors shear the DNA after the RCA step to an average length of three duplicates. Thus, the RCA reaction might generate 12 duplicates of a given molecule, for example, which is then sheared to generate four smaller molecules. All four of these molecules could inadvertently be scored as independent molecules. (iii) Y-shaped

E1560 | PNAS | April 22, 2014 | vol. 111 | no. 16

adapters are annealed to the amplified DNA. Y-adapters independently amplify each of the two strands, and could thereby overestimate the data yield by a factor of two. (iv) The assay uses double-stranded DNA. Each of the two initial DNA strands can be circularized and counted as unique molecules, potentially overestimating data yield by an additional factor of two. These problems can be overcome; because their starting DNA is randomly sheared before circularization, unique molecules can be identified by virtue of having unique circularization points. Erroneous scoring of duplicate molecules could thereby be avoided by filtering out consensus sequences that have matching shear points or transposed shear points (to account for molecules arising from the complementary strand). Although it is conceivable that these problems do not affect the dataset described in the paper by Lou et al. (1), because the number of input circles substantially exceeded the number of reads produced (1,000- to 100,000-fold; table S2 in ref. 1), incorporation of shear-point filtering should be considered in future applications. Notably, for ultradeep-sequencing applications, such as detection of specific rare mutations among tumor cell populations, obtaining such molar excesses of a targeted genomic region is often not practical. There are a limited number of possible shear points flanking any given DNA sequence, and shear points do not occur

randomly. Thus, the maximal depth that can be obtained by circle sequencing with shear-point filtering is limited. This limitation could be overcome by incorporation of a random tag sequence into the workflow to uniquely label each starting molecule before amplification. With this modification, the circle sequencing assay could theoretically be extended to allow for unlimited sequencing depth with no risk of overcounting. Michael W. Schmitta,b,1, Edward J. Foxa, and Jesse J. Salka,b Departments of aPathology and bMedicine, University of Washington School of Medicine, Seattle, WA 98195 1 Lou DI, et al. (2013) High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc Natl Acad Sci USA 110(49):19872–19877. 2 Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B (2011) Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci USA 108(23):9530–9535. 3 Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R (2011) Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc Natl Acad Sci USA 108(50):20166–20171. 4 Schmitt MW, et al. (2012) Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci USA 109(36): 14508–14513.

Author contributions: M.W.S., E.J.F., and J.J.S. analyzed data and wrote the paper. The authors declare no conflict of interest. 1

To whom correspondence should be addressed. E-mail: [email protected].

www.pnas.org/cgi/doi/10.1073/pnas.1400941111

Risks of double-counting in deep sequencing.

Risks of double-counting in deep sequencing. - PDF Download Free
529KB Sizes 1 Downloads 3 Views