news and views

npg

© 2015 Nature America, Inc. All rights reserved.

Nanopore sequencing gets a boost with accurate error modeling and variant-calling tools for Oxford Nanopore Technology’s highly anticipated MinION platform. The MinION is the first nanopore sequencer to be commercialized, and it represents a significant departure from existing technologies. In this issue of Nature Methods, Jain et al.1 characterize sequence data from the device and provide new tools to improve alignment and variant calling. The results help to establish the utility of the sequencer and suggest a promising future for nanopore sequencing. The MinION measures changes in electrical current as individual strands of DNA pass through one of its 500 tiny protein pores. This device is particularly notable for its size: smaller than a smartphone, it is operated through the USB port of a laptop computer. When the MinION was announced in 2012, the company stunned the sequencing community by claiming theoretically unlimited read lengths and the ability to sequence directly from a blood sample. However, it was over 2 years before the first users got access to the device. Reports of its performance ‘in the wild’ are now starting to emerge2,3. Beyond its size and portability, the technology is unique in how it reads sequence: it is the only instrument to date that directly measures a single DNA strand rather than incorporation events relative to a template strand. Owing to the size and shape of the pore, raw output is an electronic trace of current changes generated not by individual bases but by 5-nucleotide ‘words’ known as k-mers. A new branch of bioinformatics will undoubtedly be dedicated to turning this trace into accurate DNA sequence (or directly to variants compared to a reference).

Cloud-based software from Oxford Nanopore Technologies (ONT) called Metrichor currently performs this task, and additional tools are emerging for MinION data analysis4,5. Early reports using the MinION demonstrated decent data quality and real-world applications2,3, but Jain et al.1 are the first to thoroughly evaluate and optimize the peculiarities of MinION data. They reveal rapid improvements in accuracy due to sequencing chemistry updates over 6 months, to the present level of 85% for reads from both DNA strands (Fig. 1). Senior author Mark Akeson is one of the pioneers of the field who helped solve a formidable challenge impeding nanopore sequencing—he used an enzyme to control the speed at which DNA transits the pores— and set the scene, with others, for commercialization of the technology6–9. Yet Akeson fulfills a different role here, as a participant among hundreds of others in the MinION early access program in May 2014. Jain et al.1 assess the ability of nanopore reads to form an accurate consensus sequence to detect variants—the first hurdle for all sequencing technologies. Without good genotyping accuracy, a sequencing platform has little hope of becoming a success, despite other advantages. The results are encouraging: they demonstrate a precision and recall of up to 99% for detecting single-nucleotide polymorphisms on a simple bacteriophage genome when sequenced to high coverage. This was achieved by modeling the error profile of the MinION, requiring precise read alignment to

Nicholas J. Loman is at the Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK; Mick Watson is at The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, and Edinburgh Genomics, Edinburgh, UK. e-mail: [email protected] or [email protected]

90 80 70 60

Jun

Jul

Oct Nov

Figure 1 | Accuracy of two-directional reads on the MinION nanopore sequencer has improved rapidly since the device was introduced to early access users (months shown for 2014).

nature methods | VOL.12 NO.4 | APRIL 2015 | 303

Katie Vicari/Nature Publishing Group

Nicholas J Loman & Mick Watson

Read accuracy (%)

Successful test launch for nanopore sequencing

a known reference sequence, despite the fact that available aligners do not model MinION data explicitly. To overcome this, the authors accounted for how transitions between different k-mers result in signal changes, which allowed them to estimate instrument accuracy independently of the alignment software. The results demonstrate that the MinION may already be producing data of sufficient quality for many uses, such as microbial variant calling and diagnostics. Those hoping for a fully stochastic error model, meaning that 100% accuracy would be easily achievable, will be disappointed. Certain transitions between k-mers are much harder to distinguish than others, particularly those associated with runs of the same nucleotide. This, coupled with the inherent noise associated with single-molecule sequencing, may mean that accurate sequences are difficult to achieve in certain genomic regions. Other sequencing platforms have failed to gain traction because of systematic error modes, so why should the MinION be any different? We are upbeat about this technology for several reasons. First, many members of the sequencing and genomics community hope that nanopore data can evolve similarly to data produced by the Pacific Biosciences RS instrument (PacBio). The first PacBio reads also had high error rates but reached high accuracy through incremental improvements and better bioinformatics, and the technology is now used routinely in genome assembly. Long reads allow researchers to resolve repeats within the genome that challenge short-read platforms,

npg

© 2015 Nature America, Inc. All rights reserved.

news and views and long reads also complement the cheap, highly accurate reads from Illumina’s platforms. Jain et al.1 used the MinION to determine the number of repeats in the previously inaccessible human X chromosome CT47repeat region using spanning reads as long as 42 kilobases. Second, sequencers have become smaller and cheaper and are entering the hands of individual academic labs and clinical environments. The MinION takes this trend one step further: it fits in your pocket, suggesting that real-time pathogen detection, bedside sequencing in hospitals and real-time environmental monitoring could soon become a reality. Several challenges need to be overcome for truly portable sequencing, not least the reliance on a laboratory to prepare samples, a high input requirement of high-quality DNA (>1 microgram at present) and the need to keep reagents chilled. What can we expect next? In the research space, the challenge of de novo assembly is likely to fall next, meaning that the MinION may become a useful rival to the PacBio, particularly for closing gaps in microbial genomes. The fact that ONT technology sequences a single strand directly means that it has the potential to measure base modifications such as methylated DNA. Finally, there is the tantalizing possibility

of direct RNA sequencing, or even protein sequencing (another field in which Akeson is active)10. Indeed, anything that could be forced through a nanopore is potentially detectable. The future is awash with amazing possibilities, and the detailed treatment of nanopore error models presented here is an important first step in making the MinION a useful and viable platform. COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details are available in the online version of the paper (doi:10.1038/nmeth.3327). 1. Jain, M. et al. Nat. Methods 12, 351–356 (2015). 2. Quick, J., Quinlan, A.R. & Loman, N.J. Gigascience 3, 22 (2014). 3. Ashton, P.M. et al. Nat. Biotechnol. 33, 296–300 (2015). 4. Loman, N.J. & Quinlan, A.R. Bioinformatics 30, 3399–3401 (2014). 5. Watson, M. et al. Bioinformatics 31, 114–115 (2015). 6. Cherf, G.M. et al. Nat. Biotechnol. 30, 344–348 (2012). 7. Kasianowicz, J.J., Brandin, E., Branton, D. & Deamer, D.W. Proc. Natl. Acad. Sci. USA 93, 13770–13773 (1996). 8. Branton, D. et al. Nat. Biotechnol. 26, 1146–1153 (2008). 9. Bayley, H. Clin. Chem. 61, 25–31 (2015). 10. Nivala, J., Marks, D.B. & Akeson, M. Nat. Biotechnol. 31, 247–250 (2013).

Better together: multiplexing samples to improve the preparation and reliability of gene expression studies Ali Mortazavi

Two methods for early tagging of sample RNA before RT-qPCR or full RNA-seq open the door to experiments with fewer technical batch effects. Gene expression studies can be divided into two groups according to the number of samples involved. Studies with a few samples and few replicates are done typically with RNA-seq, which offers the ability to quantify known genes and potentially discover new genes and transcripts. The primary reason that RNA-seq studies are not done on larger sample numbers is not the cost of sequencing itself—the sequencing of samples to a

moderate depth of 10–20 million reads is affordable—but the cost and efforts involved in preparing the samples for sequencing. A majority of the larger studies involving more than a few dozen samples instead still use microarrays or reverse-transcription quantitative PCR (RT-qPCR) with a panel of gene candidates. Two papers in this issue improve sample preparation using molecular tagging of RNA1,2.

Ali Mortazavi is in the Department of Developmental and Cell Biology, and at the Center for Complex Biological Systems, University of California, Irvine, Irvine, California, USA. e-mail: [email protected]

304 | VOL.12 NO.4 | APRIL 2015 | nature methods

Modular, early-tagged amplification (META) RNA profiling, developed by Narayan et al.1, is a method for shallow sequencing of pooled qPCR targets (either microRNA or mRNA) that is primarily targeted to replace the current methods for measuring the expression of known gene panels. Shishkin et al. introduce RNAtag-Seq2, which tags RNA that is then pooled for ribosomal depletion, reverse transcription and library building. Both groups show the reproducibility and scalability of their methods, and they offer the opportunity to better address technical batch effects of sample preparation—one of the great problems of gene expression analysis with qPCR, microarrays and singlelibrary RNA-seq3. It is worth considering why batch effects are a concern given that previous studies have demonstrated high technical reproducibility, even when repeatedly sequencing the same library was necessary to achieve a minimum of 10–20 million reads. This was much more common in the late 2000s than today given that we can now get many more reads per sequencing lane. Most of the batch effects today come from the preparation of the RNA-seq libraries, which is a complex multistep process involving extraction of RNA, depletion of rRNA, reverse transcription, and the actual addition of sequencing primers to appropriately sized fragments. Each of these steps differs slightly between protocols, or, more insidiously, the same person following a single protocol could inadvertently introduce slight technical variations between any libraries for the same study built at different times (Fig. 1a). This variability at the level of the technical steps can lead to differences in read coverage that are systematic but not biological. Robotic or microfluidic approaches to library building can alleviate much of this variation but are expensive, which puts them out of reach of most laboratories. Although several software packages have been developed to detect and correct these issues, batch effects still represent a difficult problem4. The strategy of early parallelization, which is to tag samples and to pool them before further processing steps, has been used previously to sequence tags from single cells3 and for the indexing-first chromatin immuno­ precipitation protocol5. In both cases, the driving motivation was to lower the amount of input material necessary, and although this is also a feature of META RNA profiling and RNAtag-Seq, it is not the prime focus.

Successful test launch for nanopore sequencing.

Successful test launch for nanopore sequencing. - PDF Download Free
1MB Sizes 0 Downloads 12 Views