Molecular Ecology Resources (2015)

doi: 10.1111/1755-0998.12375

Does complete plastid genome sequencing improve species discrimination and phylogenetic resolution in Araucaria? MARKUS RUHSAM,* HARDEEP S. RAI,† SARAH MATHEWS,‡ T. GREGORY ROSS,§ SEAN W. GRAHAM,§ LINDA A. RAUBESON,¶ WENBIN MEI,¶,** PHILIP I. THOMAS,* MARTIN F. GARDNER,* RICHARD A. ENNOS†† and P E T E R M . H O L L I N G S W O R T H * *Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh, EH3 5LR, UK, †Department of Wildland Resources, Utah State University, 5230 Old Main Hill, Logan, UT 84322, USA, ‡The Arnold Arboretum of Harvard University, 22 Divinity Avenue, Cambridge, MA 02138, USA, §Department of Botany, University of British Columbia, 3529-6270 University Blvd., Vancouver, BC, Canada, V6T 1Z4, ¶Central Washington University, University Way, Ellensburg, WA 98926-7537, USA, **Department of Biology, University of Florida, Gainesville, FL 32611, USA, ††Institute of Evolutionary Biology, University of Edinburgh, West Main Rd, Edinburgh, EH3 9JT, UK

Abstract Obtaining accurate phylogenies and effective species discrimination using a small standardized set of plastid genes is challenging in evolutionarily young lineages. Complete plastid genome sequencing offers an increasingly easy-toaccess source of characters that helps address this. The usefulness of this approach, however, depends on the extent to which plastid haplotypes track morphological species boundaries. We have tested the power of complete plastid genomes to discriminate among multiple accessions of 11 of 13 New Caledonian Araucaria species, an evolutionarily young lineage where the standard DNA barcoding approach has so far failed and phylogenetic relationships have remained elusive. Additionally, 11 nuclear gene regions were Sanger sequenced for all accessions to ascertain the success of species discrimination using a moderate number of nuclear genes. Overall, fewer than half of the New Caledonian Araucaria species with multiple accessions were monophyletic in the plastid or nuclear trees. However, the plastid data retrieved a phylogeny with a higher resolution compared to any previously published tree of this clade and supported the monophyly of about twice as many species and nodes compared to the nuclear data set. Modest gains in discrimination thus are possible, but using complete plastid genomes or a small number of nuclear genes in DNA barcoding may not substantially raise species discriminatory power in many evolutionarily young lineages. The big challenge therefore remains to develop techniques that allow routine access to large numbers of nuclear markers scaleable to thousands of individuals from phylogenetically disparate sample sets. Keywords: Araucaria, DNA barcoding, next generation sequencing, phylogeny, plastid genome Received 28 October 2014; revision received 15 January 2015; accepted 16 January 2015

Introduction DNA barcoding aims to discriminate among species using a small, standardized set of genes. This has been successful in many animal groups such as birds, fishes and amphibians, where the use of a 648-bp portion of the mitochondrial CO1 gene has resulted in high levels of species discrimination (Ward et al. 2005; Kerr et al. 2007; Smith et al. 2008). The search for a barcode with equivalent discriminatory power in land plants has proved problematic, and despite the use of two ‘core barcode’ plastid regions (rbcl and matK) and additional loci such Correspondence: Markus Ruhsam, Fax: +44 (0) 131 248 2901; E-mail: [email protected]

© 2015 John Wiley & Sons Ltd

as trnH-psbA and the internal transcribed spacers (ITS) of nuclear ribosomal DNA, species discrimination success in plants is usually lower (Kress & Erickson 2007; Hollingsworth et al. 2009b, 2011). Recent advances in next-generation sequencing (NGS) techniques have made it possible to obtain large amounts of sequence data for multiple individuals at relatively low cost (Harrison & Kidner 2011; Egan et al. 2012). Entire organellar genomes are increasingly being used for phylogenetic analyses and have proven to be effective in resolving evolutionary relationships, especially at lower taxonomic levels where recent divergence and rapid radiations have resulted in limited sequence variation (Parks et al. 2009; Whittall et al. 2010; Zhang et al. 2011; Kane et al. 2012; Yang et al. 2013). The possibility

2 M. RUHSAM ET AL. that a lack of variation in the modest number of plastid genes assayed for DNA barcoding may hamper species discrimination has led to suggestions of using entire plastid genome sequences to improve resolution (Parks et al. 2009; Nock et al. 2011). However, concerns have also been raised that a shortage of variable characters is not necessarily the main limiting factor in species discrimination using plastid DNA. Plastid haplotypes simply might not track the morphological species boundaries due to hybridization and lineage sorting of ancestral polymorphism (Fazekas et al. 2009; Hollingsworth et al. 2011). Nuclear loci are an alternative to plastid DNA. Previously, obtaining single-copy nuclear sequence data from multiple homologous loci from many different species has been laborious. However, next-generation sequencing (NGS) of transcriptomes and genome sequences provides a rich resource for primer design and has greatly facilitated access to the nuclear genome of plants. This availability of multiple independent loci, coupled with the often faster mutation rates of nuclear over plastid regions, makes nuclear markers an attractive proposition for DNA barcoding and reconstructing evolutionary relationships among closely related species. On the other hand, the larger effective population size of nuclear genes leads to increased retention of ancestral polymorphisms. Also, reticulation and complex histories, coupled with heterozygosity, can lead to patterns of variation in nuclear genes that do not correspond to species boundaries or species trees (e.g. Maddison 1997; Maddison & Knowles 2006). We have evaluated both the use of complete plastid genome sequencing and nuclear loci to see if they increase species discrimination and phylogenetic resolution in a set of closely related conifer species. The study group is a monophyletic assemblage of 13 species of Araucaria section Eutacta, endemic to New Caledonia, a small island of ~19 000 km2 in the Pacific 1500 km east of Australia. New Caledonia is famous for its rich flora and high levels of endemism (Morat 1993). It is regarded as one of the world’s biodiversity hotspots (Myers et al. 2000) and is well known for its species radiations (Morat 1993; Duangjai et al. 2009; Barrabe et al. 2013). New Caledonia is particularly rich in Araucaria species, as 68% (13 of 19) of the currently recognized species occur only there. They are considered to be fairly well defined morphologically and ecologically, and most of the species are relatively straightforward to distinguish, although character differences between some pairs of species can be subtle. For instance, A. montana and A. laubenfelsii can be difficult to distinguish unambiguously, especially in areas where the species ranges overlap and/or only juvenile plants are available.

Early attempts to resolve the evolutionary relationship of New Caledonian Araucaria have failed due to the lack of variation in the rbcL gene (Setoguchi et al. 1998). Similarly, various combinations of seven plastid DNA barcodes only had about 30% species discrimination success at best (Hollingsworth et al. 2009a). More recent attempts to resolve the evolutionary relationships within the New Caledonian clade using AFLPs (Gaudeul et al. 2012), a combination of eight plastid genes and morphological characters (Escapa & Catalano 2013), and a study of 11 plastid genes plus the nuclear ribosomal internal transcribed spacer ITS2 (Kranitz et al. 2014), all recovered three major clades (‘coastal’, ‘small leaved’ and ‘large leaved’). However, depending on the study, there was little or no bootstrap support for most of these clades and relationships among clades were ambiguous. With few exceptions, species resolution within clades was poor (Gaudeul et al. 2012; Escapa & Catalano 2013; Kranitz et al. 2014). Therefore, despite recent insights into the evolutionary relationships of New Caledonian Araucaria species, a fully resolved and well-supported phylogeny remains elusive, as does effective DNA barcode discrimination of individual species. Our approach involved complete sequencing of plastid genomes from multiple individuals per species using a combination of 454 and Illumina platforms, and also the use of 454 and Illumina sequenced transcriptomes to develop 11 putative single-copy nuclear regions for direct Sanger sequencing. Specifically, we tackled the following questions. (i) Does the increased number of variable characters in completely sequenced plastid genomes improve species discrimination in this closely related species complex?; (ii) Do the completely sequenced plastid genomes provide increased phylogenetic resolution compared to previous sequencing of 11 plastid genes and ITS2, and 142 AFLP markers? Finally, (iii) do the plastid genome sequences provide more or less resolution compared with direct Sanger sequencing of 11 nuclear loci?

Materials and methods Plant material and DNA extraction All 13 New Caledonian Araucaria species and their sister species, A. heterophylla, endemic to Norfolk Island (Setoguchi et al. 1998; Gaudeul et al. 2012; Escapa & Catalano 2013) were included in the study (Table 1, Table S3, Supporting information). We included multiple individuals per species from 11 of 13 New Caledonian species. DNA was extracted using the Qiagen DNeasy Mini Plant Kit following the manufacturer’s protocol, apart from letting samples incubate in extraction buffer at 65 °C for 2 h.

© 2015 John Wiley & Sons Ltd

PLASTOME BARCODING & PHYLOGENY OF ARAUCARIA 3 Table 1 Araucaria accessions plus voucher information, Solexa read length (se = single end, pe = paired end) and number of reads produced. ‘E’ Voucher numbers refer to herbarium specimens held at the Royal Botanic Garden Edinburgh (E), samples with RBGE numbers refer to living material which was wild collected by staff of the Royal Botanic Garden Edinburgh and is held at Montgomery Botanical Garden, and voucher ‘Watt 577’ and ‘EB1024’ were deposited at University of Adelaide (ADU) Species

Collecting number

Location

Voucher

Read length

# of reads

A. bernieri

NC05_47

E00215017

80 se

783467

A. bernieri

NC01_670

n/a

100 pe

5021370

A. biramulata

NC05_176

E00215116

80 se

567032

A. biramulata

NC03_4061

E00166477, E00166478

100 pe

4335740

A. columnaris A. columnaris

Reference NC02_128

n/a E00215046

80 se 80 se

n/a 815375

A. columnaris

NC01_806

E00137308, E00137307

80 se

974188

A. muelleri ‘Goro’

NC08_446

E00304204

100 pe

5094162

A. heterophylla A. humboldtensis

n/a NC08_42

EB1024 E00251948

100 pe 100 pe

21175962 4508705

A. humboldtensis

NC05_116

E00182995

100 pe

5150272

A. laubenfelsii

NC01_601

E00137218

100 pe

4673603

A. laubenfelsii

NC03_4055

E00166495

80 se

750406

A. luxurians

NC08_385

E00304081

100 pe

3481726

A. montana

NC08_125

n/a

100 pe

5652096

A. montana

NC03_4246

E00215450

80 se

952294

A. muelleri

NC03_5054

RBGE 20022532

80 se

1669692

A. muelleri

NC05_18

E00182947

100 pe

4623039

A. nemorosa

NC03_3066

n/a

100 pe

5747575

A. nemorosa

NC03_4010

E00166494

80 se

484263

A. rulei

NC01_73

n/a

100 pe

4104741

A. rulei

NC05_170

E00215051, E00182993

80 se

1138716

A. schmidii

NC03_5062

RBGE 20022536

80 se

1117431

A. scopulorum

NC01_82

n/a

100 pe

4474826

A. scopulorum

NC01_154

E00137877

80 se

863351

A. subulata

Watt 577

Watt 577 (S/86-20B)

80 se

2251942

A. subulata

NC05_69

E00215105

100 pe

4476850

A. subulata

NC01_679

22.27S/166.90E Pic du Grand Kaori 22.25S/166.82E Pic des Pins 21.93S/166.25E Mt Tonta 21.76S/166.00E Mt Do Of unknown wild origin 22.60S/167.52E Ile des Pins 22.65S/167.45E Ile des Pins 22.18S/166.89E Col de Yate Of cultivated origin, University of Adelaide 21.88S/166.41E Mamie 22.11S/166.89E Mt Humboldt 22.07S/166.35E Mt Mou 21.76S/166.00E Mt Do 21.32S/165.10E Foret Francais 21.53S/165.67E Me Ori 20.56S/164.78E Mt. Panie 22.12S/166.60E Montagne des Sources 22.28S/166.89E Pic du Grand Kaori 22.34S/167.00E Cap Reine 22.18S/166.85E Port Boise 20.48S/164.23E Tiebaghi 21.90S/166.38E Refuge Volcain 20.59S/164.77E Mt. Panie 20.48S/164.23E Tiebaghi 21.20S/165.51E Cap Bocage 22.03S/166.45E Mt Dzumac 22.12S/166.66E Parc Territorial de la Riviere Bleue 22.03S/166.45E Mt Dzumac

E00131569

80 se

226512

© 2015 John Wiley & Sons Ltd

4 M. RUHSAM ET AL.

PCR amplification and sequencing Plastid genome. The plastid genome of each accession except A. heterophylla was amplified with 58 primer pairs (designed using the complete reference plastid genome of A. columnaris, see below; Table S1, Supporting information) in single reactions yielding fragments between 2000 and 5500 base pairs. Most fragments overlapped by a few hundred base pairs with neighbouring amplicons. PCR reactions for each primer pair were performed in volumes of 20 lL using the following protocol: 19 Phusion HF buffer (Finnzymes), 0.2 mM dNTPs, 3% DMSO (dimethyl sulfoxide), 0.5 lM of each forward and reverse primer, 0.025 U PhusionTaq (Finnzymes) and 0.8 lL of unquantified DNA. The mixture was then cycled through the profile: 30 s at 98 °C, 35 cycles of 8 s at 98 °C, 30 s at 60 °C and 2 min at 72 °C, ending with 5 min at 72 °C to complete extension and subsequent storage at 4 °C. Amplicons were checked on a 1% agarose gel, and the brightness of bands was noted as either ‘bright’ or ‘faint’. All amplified PCR products of each species were pooled in a single tube adding 3 and 8 lL of bright and faint PCR amplifications, respectively. Illumina library preparation of pooled samples marked as ‘single end (se)’ in Table 1 followed Cronn et al. (2008) using 4-bp tags and were sequenced at the FAS Center for Systems Biology in Harvard, Cambridge MA. Illumina library preparation and sequencing of pooled samples marked as ‘paired end’ were outsourced to the Edinburgh Genomics facility at the University of Edinburgh. To obtain the plastid genome of Araucaria heterophylla, we used genomic DNA to construct a wholegenome shotgun library using a commercial library prep kit (Bioo Scientific) which was run on an Illumina HiSeq 2000 platform producing 100-bp paired-end reads. Assembly. Sequences for all species apart from A. heterophylla were assembled using the short read assembly software YASRA (http://www.bx.psu.edu/miller_lab/) which performs comparative assemblies of short reads using a reference genome. The reference genome consisted of a complete A. columnaris plastid genome which was produced by rolling circle amplification of plastid DNA followed by Sanger sequencing to fill gaps and check low-quality regions (Mei 2010). The YASRA command ‘make TYPE=solexa ORIENT=circular PID=medium’ was used to assemble the reads of each species into contigs that were then manually aligned to the reference in SEQUENCHER v.4.7 (Gene Codes Corporation). All alignments were visually inspected for inconsistencies. Ambiguous species-specific or individual specific SNPs and indels or regions with unusually high interspecific variation were checked by Sanger sequencing (up to 20 000 bp per accession).

To assemble the A. heterophylla plastid genome, a combination of de novo and reference-based assembly methods was performed on the resulting paired-end reads to minimize bias in reference-based assembly using short-length reads. An initial de novo assembly was performed using CLC GENOMICS WORKBENCH (V5.1.5, CLC Bio). The BLAST function in CLC was then used to identify contigs containing plastid sequences; these were then mapped onto the full plastid genome of Araucaria columnaris using SEQUENCHER v. 4.2.2 (Gene Codes Corporation). This hybrid assembly was used as a reference for a final assembly of contigs using the original A. heterophylla reads. These contigs were then manually re-aligned to the reference plastid genome in SEQUENCHER v 4.7 (Gene Codes Corporation). Nuclear regions. Unpublished annotated transcriptome data of A. laubenfelsii (Ruhsam et al.) were screened to identify putative single or low copy nuclear genes using the list compiled by Duarte et al. (2010). Twenty-six single-copy genes were trialled and compared against the corresponding gene in Arabidopsis thaliana or Zea mays, downloaded from the PLAZA website (Van Bel et al. 2012) to ascertain the number and position of possible introns. PRIMER 3 (Rozen & Skaletsky 2000) was then used to design primers anchored in different exons to amplify possible introns (Table S2, Supporting information). Of the 26 trialled regions, 11 were chosen for the amplification of all accessions based on amplification success, high sequence variability between species, and electropherogram quality (Table 2). PCR reactions for each primer pair were performed in volumes of 10 lL using the following protocol: 19 buffer (Bioline, London, UK), 0.2 mM dNTPs, 2 mM MgCl2, 0.2 lM of each forward and reverse primer, 0.025 U BioTaq (Bioline) and 1 lL of unquantified DNA. The mixture was then cycled through the profile: 4 min at 94 °C, 35 cycles of 30 s at 94 °C, 45 s at 56 °C and 1 min at 72 °C, ending with 10 min at 72 °C to complete extension and subsequent storage at 4 °C. Amplification success was checked on a 1% agarose gel. 5 lL of successful product was purified for sequencing by adding 2 lL of 1:10 ddH20 diluted ExoSAP-IT (USB Corporation, Cleveland, OH, USA) by incubating the mixture at 37 °C for 15 min followed by 80 °C for 15 min with subsequent storage at 4 °C. Sequencing was performed in 10 lL reactions containing 1.5 lL 59 BigDye buffer (Life Technologies, Carlsbad, CA, USA), 0.88 lL BigDye enhancing buffer BDX64 (MCLAB, San Francisco, CA, USA), 0.125 lL BigDye v3.1 (Life Technologies), 0.32 lM primer and 1 lM of purified PCR product. The sequences were run on an ABI-3730 and edited using Geneious v5.5 (Biomatters, Auckland, New Zealand). No cloning was carried out for heterozygous loci.

© 2015 John Wiley & Sons Ltd

PLASTOME BARCODING & PHYLOGENY OF ARAUCARIA 5 Table 2 Nuclear gene regions used for the nuclear phylogeny of New Caledonian Araucaria species. For two gene regions, two non contiguous regions were sequenced and concatenated for further analyses. AT, Arabidopsis thaliana; ZM, Zea mays Gene region

Description

Compared against

Product length (bp)

ABA4 ABI3 L34 Lac11

Abscisic acid deficient 4 Abscisic acid-insensitive 3 Putative 60S ribosomal protein L34 Putative laccase 11

AT1G67080 AT3G24650 n/a AT5G03260

PRF1 sks4 UBI K1 MTHFR2

Profilin-1, profilin homolog1 SKU5 similar 4 Ubiquitin family protein Galactose oxidase/kelch repeat superfamily protein Methylenetetrahydrofolate reductase 2

ZM06G29180 AT4G22010 AT4G06599 AT5G50310 AT2G44160

PHYO PHYP Total

Phytochrome O Phytochrome P

n/a n/a

694 659 232 Lac11_1 (238 bp) and Lac11_2 (398 bp) 636 491 685 460 341 MTHFR2_1 (279 bp) and MTHFR2_4 (499 bp) 778 436 631 6044

Analyses. We used a combination of parsimony, distance and Bayesian inference methods to assess (i) species discrimination – measured as the number of species whose individuals fall into mutually exclusive clusters (i.e. species-level monophyly), and (ii) phylogenetic resolution – measured as the number of nodes resolved and their levels of support. All sequences have been deposited in GenBank (Table S3, Supporting information). The plastid alignment (Table S4, Supporting information) included the entire genome (including introns and intergenic spacer regions) and was produced manually in SEQUENCHER 4.7 (Gene Codes Corporation) following the alignment principles described in Kelchner (2000). This was straightforward to produce because the sequences were very closely related. Plastid (Table S4, Supporting information) and nuclear (Table S5, Supporting information) data were treated separately for all analyses. Maximum Parsimony analysis (MP) used PAUP*4.0b10 (Swofford 2003), with gaps treated as missing data and polymorphic states as uncertain. A ‘Branch and Bound’ search with MulTrees on was carried out for both data sets. Combinability of the 11 nuclear genes was determined using the incongruence length difference (ILD) test of Farris et al. (1994), implemented in PAUP*4.0b10 (Swofford 2003) as the partition-homogeneity test. Statistical branch support was estimated via bootstrapping with the ‘Branch and Bound’ option and 1000 replicates. Distance analysis was carried out in PAUP*4.0b10 (Swofford 2003) using the Neighbour-joining algorithm and breaking ties randomly with the following distance settings: unweighted least squares, uncorrected (“p”) distance, negative branch lengths set to zero and equal rate for variable sites. Bayesian inference analyses (BI) were performed with MRBAYES v. 3.1.2 (Huelsenbeck & Ronquist 2001; Ronquist

© 2015 John Wiley & Sons Ltd

& Huelsenbeck 2003) using the GTR+I+Γ model, which was selected for both the plastid and nuclear data sets using MRMODELTEST v. 2.3 (Nylander 2004) under the Aikake Information Criterion. The analysis was run for 1 million generations with four MCMC chains in two independent parallel analyses, with one tree sampled every 500 generations. The average standard deviation of split frequencies was 0.0033 for the plastid data and 0.0074 for the nuclear data at the end of the run. AWTY (Nylander et al. 2008) and TRACER v1.5 (Rambaut & Drummond 2007) were used to assess the quality of the MCMC simulations: for both data sets, the bivariate plot of the split frequencies for the first and second run of the simulations showed a nearly perfect correlation suggesting a high degree of convergence between runs; the cumulative split frequency for a number of selected splits for both simulations revealed no trend in frequencies and a plot of chain variability among and between chains showed convergence. The effective sample size values (ESS), that is the number of effectively independent draws from the posterior, were >300 for all parameters indicating that sufficient sampling occurred. Trees were drawn using MrEnt (Zuccon & Zuccon 2014).

Results Plastid data The data matrix consisted of 28 samples and was 147 286 nucleotides long. Plastid genome sizes were very similar between accessions (Table 3) with A.muelleri_5054 having the smallest (146 477 bp) and A. humboldtensis_42 the largest plastid genome (146 892 bp). The average plastid genome was 146 693 nucleotides long. Long range PCR success varied between samples and resulted on average

6 M. RUHSAM ET AL. Table 3 Sequencing statistics of the plastid genome and 11 nuclear genes of 28 Araucaria accessions: number of nucleotides in plastid genome (length); number (bp) and percentage (%) of missing bases in the plastid and nuclear (6044 bp) data set Plastid genome

11 nuclear genes

Species

Length

Missing (bp)

Missing (%)

Missing (bp)

Missing (%)

Araucaria bernieri_47 Araucaria bernieri_670 Araucaria biramulata_176 Araucaria biramulata_4061 Araucaria columnaris_128 Araucaria columnaris_806 Araucaria columnaris_Ref Araucaria heterophylla Araucaria humboldtensis_42 Araucaria humboldtensis_116 Araucaria laubenfelsii_601 Araucaria laubenfelsii_4055 Araucaria luxurians_385 Araucaria montana_125 Araucaria montana_4246 Araucaria muelleri_18 Araucaria muelleri_5054 Araucaria muelleri “Goro”_446 Araucaria nemorosa_3066 Araucaria nemorosa_4010 Araucaria rulei_73 Araucaria rulei_170 Araucaria schmidii_5062 Araucaria scopulorum_82 Araucaria scopulorum_154 Araucaria subulata_69 Araucaria subulata_577 Araucaria subulata_679

146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146

8182 4886 3180 7314 12 336 7858 0 0 2510 3328 3594 3309 4882 2951 3404 2783 3493 2862 3412 14 536 3250 5675 1585 6003 8908 3566 9954 22 322

5.60 3.30 2.20 5.00 8.40 5.40 0.00 0.00 1.70 2.30 2.40 2.30 3.30 2.00 2.30 1.90 2.40 2.00 2.30 9.90 2.20 3.90 1.10 4.10 6.10 2.40 6.80 15.20

70 52 31 120 52 109 146 308 201 353 52 135 206 79 45 117 181 19 350 75 43 114 116 129 117 66 374 97

1.16 0.86 0.51 1.99 0.86 1.80 2.42 5.10 3.33 5.84 0.86 2.23 3.41 1.31 0.74 1.94 2.99 0.31 5.79 1.24 0.71 1.89 1.92 2.13 1.94 1.09 6.19 1.60

545 696 728 622 480 649 799 723 892 804 774 614 729 764 517 686 477 747 763 608 787 610 697 794 605 730 810 746

in 3.8% (5574 bp) of the plastid genome being undetermined (range: 0–15.2%; Table 3). Of 493 variable characters, 232 were parsimony informative (0.0016% of total alignment). MP analysis resulted in nine most parsimonious trees of 883 steps with a consistency index of CI = 0.82 and a retention index of RI = 0.88. BI retrieved exactly the same topology as the strict consensus MP tree (Fig. 1). Distance analysis using the Neighbour-Joining algorithm resulted in a very similar albeit slightly less resolved tree compared to MP and BI (Table 4 and Fig. S1, Supporting information). Three major clades were recovered which closely resembled the coastal (BS = 100%, PP = 1), small-leaved (BS = 88%, PP = 1) and large-leaved (BS = 76%, PP = 0.98) clade in Gaudeul et al. (2012) and Escapa & Catalano (2013). A. schmidii was sister to a combined small-leaved and coastal clade, and A. humboldtensis was sister to the rest of the largeleaved clade. Some species with multiple accessions were paraphyletic, such as A. scopulorum in the smallleaved clade, A. columnaris in the coastal clade as well as A. montana, A. biramulata, A. muelleri and A. rulei in the large-leaved clade. Monophyletic taxa included

A. subulata (BS = 100%, PP = 1) and A. bernieri (BS = 74%, PP = 0.94) in the small-leaved clade and A. laubenfelsii (BS = 86%, PP = 0.99) and A. humboldtensis (BS = 100%, PP = 1) in the large-leaved clade (Table 4).

Nuclear data The data matrix consisted of 11 genes with a combined length of 6044 bp. Overall sequencing success varied between samples and resulted in 0.3% (19 bp) to 6.3% (374 bp) of undetermined base pairs per accession (2.2% or 133 bp on average; Table 3). The ILD test yielded no significant incongruence between pairwise comparisons of 11 genes apart from sks4 and PRF1. However, this was a false positive as the strict consensus trees did not contradict each other, and so the genes were concatenated and analysed together. Of 69 variable characters, 55 were parsimony informative (0.0091% of total alignment). On average, 54% of the parsimony informative loci were heterozygous in at least one accession. All three phylogenetic analysis methods (MP, BI, distance) gave the same overall tree topology, although with from

© 2015 John Wiley & Sons Ltd

PLASTOME BARCODING & PHYLOGENY OF ARAUCARIA 7

Fig. 1 Maximum parsimony strict consensus phylogenetic tree based on separate analysis of the complete plastid genome (cpDNA) and 11 nuclear genes (nDNA). Support values are bootstrap values (BS, n = 1000) and posterior probability (PP), respectively. * denotes BS ≥ 99% and PP = 1.

minor differences within clades and varying levels of resolution (Fig. 1, Figs S2 and S3, Supporting information, respectively). MP analysis resulted in 567 trees of 152 steps with CI = 0.82 and RI = 0.89 (Fig. 1). The nuclear data retrieved the same three clades as the plastid data set: small-leaved clade (BS = 97%, PP = 1), coastal clade (BS = 100%, PP = 1) and large-leaved clade (BS = 85%, PP = 0.98). However, a few differences were observed. A. schmidii was nested within the small-leaved clade (sister to a combined small-leaved and coastal clade in the plastid tree), and A. humboldtensis was nested in the large-leaved clade (as opposed to a sistergroup relationship with the rest of the clade). Furthermore, the large-leaved clade was the sister group of the coastal clade (although with little support; BS = 71%, PP < 0.50; Fig. 1) in contrast to a sister relationship with the combined coastal and small-leaved clade in the plastid tree. Relationships within clades were not fully resolved and showed a lower resolution compared to the plastid tree, especially in the large-leaved clade. Species that were paraphyletic in the plastid tree were also nonmonophyletic in the nuclear tree apart from A. biramulata

© 2015 John Wiley & Sons Ltd

and A. scopulorum, which were part of a small polytomy that is potentially consistent with monophyly of individual species. Taxa which were retrieved as monophyletic by at least one phylogenetic analysis method included A. subulata (BS = 83%, PP = 1) and A. bernieri (BS = 62%, PP = 0.92) in the small-leaved clade, A. humboldtensis (BS = 100%, PP = 1) in the large-leaved clade as well as A. nemorosa (BS < 50%, PP = 0.97) in the coastal clade (Table 4). Overall, fewer than half of the 11 New Caledonian Araucaria species with multiple accessions were monophyletic in the plastid or nuclear trees. However, the plastid data supported the monophyly of about twice as many species nodes (BS > 50%) compared to the nuclear data (Table 4). Specifically, the most resolved trees inferred here resolved 23 nodes in the plastid tree (NJ) and 16 nodes in the nuclear tree (BI) (Table 4).

Discussion Where species show limited variation in standard markers such as ITS or routinely sequenced plastid regions, it

8 M. RUHSAM ET AL. Table 4 Phylogenetic resolution in the plastid and nuclear data sets for all Araucaria species with multiple accessions using different tree building algorithms Plastid genome MP

11 nuclear genes

BI

NJ

MP

BI

NJ

Node

sp-sp bases

res

BS

res

PP

res

BS

sp-sp bases

res

BS

res

PP

res

BS

A. bernieri A. biramulata A. columnaris A. humboldtensis A. laubenfelsii A. montana A. muelleri A. nemorosa A. rulei A. scopulorum A. subulata Small-leaved clade Coastal clade Large-leaved clade # nodes >0.5

1 0 0 9 n/a 0 0 0 0 0 22

Y N N Y Y N N N N N Y Y Y Y 22

74 – – 100 86 – – – – – 100 77 100 100

Y N N Y Y N N N N N Y Y Y Y 22

0.94 – – 1 0.99 – – – – – 1 1 1 1

N N N Y Y N N N N N Y Y Y Y 23

– – – 100 70 – – – – – 100 79 100 100

1 0 0 5 n/a 0 0 0 0 0 2

Y N N Y N N N N N N Y Y Y Y 11

62 – – 100 – – – – – – 83 97 1 85

Y N N Y N N N Y N N Y Y Y Y 16

0.92 – – 1 – – – 0.97 – – 1 1 1 0.98

N N N Y N N N N N N Y Y Y Y 10

– – – 97 – – – – – – 89 97 99 78

Y, yes; N, no; MP, Maximum Parsimony; BI, Bayesian Inference; NJ, Distance method (Neighbour Joining); res, node resolution; BS, bootstrap support; PP, posterior probability; # nodes >0.5, number of nodes with more than BS >50% (PP > 0.5) support; sp-sp bases, number of species-specific base pairs in taxa with multiple accessions.

can be difficult to test species limits and elucidate species relationships. New Caledonian Araucaria species are a good example of this: they comprise a recently diverged lineage of slowly growing trees with long generation times, where fully resolved evolutionary relationships remain elusive, and to date no DNA markers have been developed which can tell all the species apart (Gaudeul et al. 2012; Escapa & Catalano 2013; Kranitz et al. 2014). Here, we have sequenced plastid genomes from multiple accessions to investigate whether the complete plastid genome sequences can fully resolve phylogenetic relationships and provide markers which serve as DNA barcodes to discriminate among the species. In parallel, we have tested the efficacy of Sanger sequencing 11 nuclear genes to gain insights into phylogeny and species discrimination in this group.

Overall phylogenetic resolution Comparing the 147 286 bp of plastid sequences with the 6044 bp from 11 nuclear loci, there was in general little disagreement between the plastid and nuclear phylogenies. Analyses of both data sets recovered the same three well-supported clades, which were originally defined by Kranitz (2005): a small-leaved clade consisting of A. bernieri, A. scopulorum and A. subulata (cpBS = 100%, nBS = 95%), a large-leaved clade including A. rulei,

A. muelleri, A. laubenfelsii, A. biramulata, A. humboldtensis and A. montana (cpBS = 100%, nBS = 85%) and a coastal clade with A. columnaris, A. luxurians and A. nemorosa (cpBS = 100%, nBS = 100%). These clades were also retrieved by Gaudeul et al. (2012) using 142 polymorphic AFLP markers, by Escapa & Catalano (2013) using eight plastid genes and by Kranitz et al. (2014) using 11 plastid regions and ITS. However, support for the relationships between these three clades was generally poor and in some cases incongruent between studies (Gaudeul et al. 2012; Escapa & Catalano 2013; Kranitz et al. 2014). Relationships among the three clades were well resolved in the complete plastid genome phylogeny, which retrieved a fully supported [(small leaved, coastal) large leaved] relationship (BS = 100%, PP = 1). This is somewhat in contrast to the nuclear data set where a [(large leaved, coastal) small leaved] relationship was indicated. However, given the limited branch support for this relationship in the nuclear phylogeny (BS = 71%, PP < 0.50), this may be a soft incongruence. In general, this study supports the clade relationship of [(small leaved, coastal) large leaved] previously recovered by Escapa & Catalano (2013) and by Kranitz et al. (2014) while Gaudeul et al.’s (2012) study does not have strongly supported conflict. Thus, the plastid genome gives the most strongly supported indication of

© 2015 John Wiley & Sons Ltd

PLASTOME BARCODING & PHYLOGENY OF ARAUCARIA 9 relationships among these clades, but the possible correspondence between plastid gene trees and species trees remains tentative, given that the ~150 kbp still essentially represent a single-locus (linkage group) phylogeny.

Species monophyly Multiple accessions of all New Caledonian taxa apart from A. schmidii and A. luxurians were included in both data sets. More than half of the species with more than one individual (7 of 11) are non-monophyletic in the plastid phylogeny, and none of these seven species resolved as monophyletic in the nuclear phylogeny. There are a number of possible explanations for this lack of monophyly. Firstly, this could result partly from technical issues, as sequencing success was not 100% for all samples. On average, 3.8% of bases were missing for the plastid genome and 2.2% for the nuclear data (Table 3). However, this is unlikely to be the explanation as the samples with the most missing plastid and nuclear data are A. subulata accessions (15.2% and 6.2%, respectively) which always formed a fully supported clade. A. nemorosa_4010 had 9.9% missing plastid data but all the missing sequences occurred in regions where there was no variation between any of the other Araucaria species. This is also the case for accessions with many fewer missing data suggesting that undetermined nucleotides did not cause the lack of monophyly. Secondly, some of the samples might have been hybrids or have hybridization events in their ancestry. Although hybridization seems to be rare in New Caledonian Araucaria species (R. Mill, personal communication) and there are no confirmed observations of hybrids in the field (T. Jaffre, personal communication), results from Gaudeul et al. (2012, 2014) suggest that there might have been instances of hybridization between some Araucaria species. Hybridization can therefore not be excluded as a possibility. Another explanation for the observed non-monophyly is a young evolutionary age of the New Caledonian Araucaria clade. Results from Kranitz et al. (2014) suggest that the New Caledonian Araucaria clade is of recent origin and likely to have diversified ca. 20–3 MYA. A young age for this clade is also inferred by Leslie et al. (2012). Hence, there may have been insufficient time for species-specific mutations to occur and/or for complete sorting of ancestral polymorphisms. Mutation rates are generally slow in gymnosperms, where the synonymous substitution rate in the plastid and nuclear genome is about two thirds and one quarter of the rate in angiosperms, respectively (Li et al. 2011). Additionally, the slow turn-over of generations due to the long-lived nature of Araucaria trees, which can attain up to 1000 years in age (Rigg et al. 1998; Enright et al. 2009), is likely to

© 2015 John Wiley & Sons Ltd

contribute to delayed monophyly. This scenario of a young evolutionary age coupled with slow mutation rates agrees with the low number of species-specific base pairs in the plastid and nuclear data sets. Only three of 11 species with multiple accessions had species-specific character states. Interestingly, these were the same species in both data sets (Table 4). Equally, only a very low number of 232 (0.0016%) base pairs in the complete plastid genome of the New Caledonian Araucaria species were parsimony informative. Limited infra-specific gene flow is likely to confound estimates of monophyly even further as the slow spread of mutational variants throughout a species’ range will increase the time until monophyly is achieved for a certain locus (Hollingsworth et al. 2011). A wind-pollinated group of taxa on an island experiencing tropical cyclones seems an unlikely candidate to show paraphyly due to limited infra-specific gene flow. Nonetheless, population genetic studies on New Caledonian Araucaria species using nuclear and plastid microsatellites can show highly differentiated populations (M. Ruhsam, A. Finger and P.M. Hollingsworth, unpublished data). A combination of young evolutionary ages, slow substitution rates, limited infra-specific gene flow, the existence of many overlapping generations in populations with long-lived individuals and occasional hybridization events may all combine to explain the shortage of species monophyly in both the plastid and nuclear genome data sets.

Implications for barcoding – plastid data One obvious solution to cases in which DNA barcodes have limited discriminatory power is to add more data. Given the conserved structure of the plastid genome across land plants, and the increased ease of plastome sequencing – moving from the standard two to four plastid barcoding genes to entire plastome sequences is one potential option for addressing this issue (Kane et al. 2012). However, more than 50% (7 of 11) of New Caledonian Araucaria species from which more than one individual was sampled failed to form monophyletic groups in our complete plastid analysis. This suggests that a shortage of characters alone does not completely explain the problem and that failure of plastid genome evolution to track species evolution is a contributing factor. This issue has been highlighted elsewhere (Fazekas et al. 2009; Hollingsworth et al. 2011). Nevertheless, there are some gains in discriminatory power observed here. Using rbcL and matK (2932 bp; subsampled from these data) fails to resolve any of the 11 species as monophyletic. Using rbcL, matK and trnH-psbA (3939 bp) resolves only one species as monophyletic, and using rbcL, matK, trnH-psbA, atpF-atpH and psbK-psbI (5136 bp) resolves

10 M . R U H S A M E T A L . only two as monophyletic. These commonly used barcode loci were thus only able to resolve 18% of the species at best. In addition to modest gains in discriminatory power, one practical benefit from using completely sequenced plastid genomes is that it bypasses the issue of locus choice which has dominated much of the early DNA barcoding literature. This reduces the problem of different laboratories establishing data sets using mutually exclusive loci. Overall, our results indicate some gains in discrimination, and as costs continue to fall complete plastid genome sequencing is likely to be viable for sample sets of thousands of individuals from many species. However, our results also add a clear note of caution against viewing complete plastid genome sequences as a simple fix for limited DNA barcode resolution in plants. A further qualifier is that our use of about two samples per species gives only a weak test of monophyly – adding further individuals may ‘disrupt’ the species-specific clustering and reduce discrimination success.

Implications for barcoding – nuclear data Adding nuclear genes like ITS to the standard set of plastid barcodes may lead to improved species resolution (Yao et al. 2010; Li et al. 2011). In this study, we used Sanger sequencing of 11 nuclear genes to see whether we could obtain species-specific markers. This resulted in the discovery of the same taxa (n = 3) with species-specific markers compared to the plastome sequencing (Table 4). Although there were fewer species-specific base pairs in the nuclear data in absolute numbers (nnuclear = 8 compared to ncp = 32 for three species), the proportion of species-specific markers compared to sequenced base pairs was an order of magnitude higher (0.1% compared to 0.02%, respectively). One limitation of our study is the Sanger sequencing approach – designed to replicate standard barcoding pipelines. This leaves open the question of whether cloning regions with heterozygous loci would have led to higher discriminatory power. NGS approaches such as Hyb-Seq (Weitemier et al. 2014) essentially circumvent the traditional cloning step that is impractical for large-scale DNA barcoding studies. An examination of the trace files in our study, however, showed that on average, more than half (54%) of the parsimony informative characters from the 11 nuclear loci were heterozygous in at least one individual. Importantly, for those species that did not resolve as monophyletic, there were no cases of homozygous taxon-specific characters in the data set, suggesting that even with direct allele sequencing (via cloning or NGS), these markers will not show simple taxonomically informative characters that uniquely match species bound-

aries. In cases such as these, very large numbers of characters from the nuclear genome will be required to develop species-specific markers. However, the time involved and the costs (laboratory, informatics) of such an approach remain prohibitive for the application of high-throughput routine DNA barcoding at present.

Conclusions Complete plastid genome sequencing is becoming technically routine and used by many labs at low cost. In closely related species complexes, this can provide a useful additional source of characters for phylogenies and species discrimination, and the current study has provided the most strongly supported estimate of phylogenetic relationships among the New Caledonia Araucaria to date. However, the classic problem of an imperfect match between plastid relationships and species boundaries places a constraint on the information gains from NGS sequencing of plastid genomes at lower taxonomic levels. Likewise, sequencing a modest number of nuclear genes also has its limitations – especially considering that these markers were developed specifically for the group at hand. To reach the position of full species discrimination in all plant groups across the land plants (i.e. the ultimate goal of DNA barcoding), the big challenge remains – to develop a technique that allows routine access to large numbers of nuclear markers scaleable to thousands of individuals from phylogenetically disparate sample sets.

Acknowledgements This work was partly supported by the NSF Assembling the Tree of Life program (DEB-0629890) to SM, the Arnold Arboretum, and an NSERC (Natural Sciences and Engineering Research Council of Canada) Discovery grant to SWG. The Royal Botanic Garden Edinburgh is supported by the Scottish Government’s Rural and Environment Science and Analytical Services Division. Parts of the next-generation sequencing were carried out by Edinburgh Genomics, The University of Edinburgh. Thanks to Michael M€ oller for his help with the phylogenetic analyses.

References Barrabe L, Maggia L, Pillon Y et al. (2013) New Caledonian lineages of Psychotria (Rubiaceae) reveal different evolutionary histories and the largest documented plant radiation for the archipelago. Molecular Phylogenetics and Evolution, 71, 15–35. Cronn R, Liston A, Parks M, Gernandt DS, Shen R, Mockler T (2008) Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic Acids Research, 36, e122. Duangjai S, Samuel R, Munzinger J et al. (2009) A multi-locus plastid phylogenetic analysis of the pantropical genus Diospyros (Ebenaceae), with an emphasis on the radiation and biogeographic origins of the

© 2015 John Wiley & Sons Ltd

P L A S T O M E B A R C O D I N G & P H Y L O G E N Y O F A R A U C A R I A 11 New Caledonian endemic species. Molecular Phylogenetics and Evolution, 52, 602–620. Duarte JM, Wall PK, Edger PP et al. (2010) Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evolutionary Biology, 10, 61. Egan AN, Schlueter J, Spooner DM (2012) Applications of next-generation sequencing in plant biology. American Journal of Botany, 99, 175– 185. Enright NJ, Miller BP, Jaffre T (2009) Ecology and population dynamics of the endemic New Caledonian conifer, Araucaria muelleri (Araucariaceae). In: Araucariaceae (eds Bieleski RD, Wilcox MD), pp. 359–364. International Dendrological Society, Dunedin, New Zealand. Escapa IH, Catalano SA (2013) Phylogenetic analysis of Araucariaceae: integrating molecules, morphology, and fossils. International Journal of Plant Sciences, 174, 1153–1170. Farris JS, Kallersjo M, Kluge AG, Bult C (1994) Testing significance of incongruence. Cladistics, 10, 315–319. Fazekas AJ, Kesanakurti PR, Burgess KS et al. (2009) Are plant species inherently harder to discriminate than animal species using DNA barcoding markers? Molecular Ecology Resources, 9, 130–139. Gaudeul M, Rouhan G, Gardner MF, Hollingsworth PM (2012) AFLP markers provide insights into the evolutionary relationships and diversification of New Caledonian Araucaria species (Araucariaceae). American Journal of Botany, 99, 68–81. Gaudeul M, Gardner MF, Thomas P, Ennos RA, Hollingsworth PM (2014) Evolutionary dynamics of emblematic Araucaria species (Araucariaceae) in New Caledonia: nuclear and chloroplast markers suggest recent diversification, introgression, and a tight link between genetics and geography within species. BMC Evolutionary Biology, 14, 171. Harrison N, Kidner CA (2011) Next-generation sequencing and systematics: what can a billion base pairs of DNA sequence data do for you? Taxon, 60, 1552–1566. Hollingsworth ML, Clark AA, Forrest LL et al. (2009a) Selecting barcoding loci for plants: evaluation of seven candidate loci with species-level sampling in three divergent groups of land plants. Molecular Ecology Resources, 9, 439–457. Hollingsworth PM, Forrest LL, Spouge JL et al. (2009b) A DNA barcode for land plants. Proceedings of the National Academy of Sciences, USA, 106, 12794–12797. Hollingsworth PM, Graham SW, Little DP (2011) Choosing and using a plant DNA barcode. PLoS ONE, 6, e19254. Huelsenbeck JP, Ronquist F (2001) MrBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17, 754–755. Kane N, Sveinsson S, Dempewolf H et al. (2012) Ultra-barcoding in cacao (Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA. American Journal of Botany, 99, 320–329. Kelchner SA (2000) The evolution of non-coding chloroplast DNA and its application in plant systematics. Annals of the Missouri Botanical Garden, 87, 482–498. Kerr KCR, Stoeckle MY, Dove CJ, Weigt LA, Francis CM, Hebert PDN (2007) Comprehensive DNA barcode coverage of North American birds. Molecular Ecology Notes, 7, 535–543. Kranitz M-L (2005) Systematics and evolution of New Caledonian Araucaria. PhD thesis, University of Edinburgh. Kranitz M, Biffin E, Clark A et al. (2014) Evolutionary diversification of New Caledonian Araucaria. PLoS ONE, 9, e110308. Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS ONE, 2, e508. Leslie AB, Beaulieu JM, Rai HS, Crane PR, Donoghue MJ, Mathews S (2012) Hemisphere-scale differences in conifer evolutionary dynamics. Proceedings of the National Academy of Sciences, USA, 109, 16217–16221. Li D-Z, Gao L-M, Li H-T et al. (2011) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proceedings of the National Academy of Sciences, USA, 108, 19641–19646.

© 2015 John Wiley & Sons Ltd

Maddison WP (1997) Gene trees in species trees. Systematic Biology, 46, 523–536. Maddison WP, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Systematic Biology, 55, 21–30. Mei W (2010) A whole plastome approach to inferring relationships in the Araucariaceae. MSc thesis, Central Washington University. Morat P (1993) Our knowledge of the flora of New Caledonia: endemism and diversity in relation to vegetation types and substrates. Biodiversity Letters, 1, 72–81. Myers N, Mittermeier RA, Mittermeier CG, da Fonseca GAB, Kent J (2000) Biodiversity hotspots for conservation priorities. Nature, 403, 853–858. Nock CJ, Waters DLE, Edwards MA et al. (2011) Chloroplast genome sequences from total DNA for plant identification. Plant Biotechnology Journal, 9, 328–333. Nylander JAA (2004) MrModeltest v2. Program distributed by the author. Evolutionary Biology Centre, Uppsala University. Available from http://www.downloadatoz.com/developer-johan-nylander.html. Nylander JAA, Wilgenbusch JC, Warren DL, Swofford DL (2008) AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics, 24, 581–583. Parks M, Cronn R, Liston A (2009) Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biology, 7, 84. Rambaut A, Drummond A (2007) Tracer Version 1.5. Available from http://tree.bio.ed.ac.uk/software/tracer/. Rigg LS, Enright NJ, Jaffre T (1998) Stand structure of the emergent conifer Araucaria laubenfelsii, in maquis and rainforest, Mont Do, New Caledonia. Australian Journal of Ecology, 23, 528–538. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19, 1572–1574. Rozen S, Skaletsky HJ (2000) Primer3 on the WWW for general users and for biologist programmers. In: Bioinformatics Methods and Protocols: Methods in Molecular Biology (eds Misener S & Krawetz SA). Humana Press, Totowa, NJ. Setoguchi H, Asakawa Osawa T, Pintaud J-C, Jaffre T, Veillon J-M (1998) Phylogenetic relationships within Araucariaceae based on rbcL gene sequences. American Journal of Botany, 85, 1507–1516. Smith MA, Poyarkov NA, Hebert PDN (2008) DNA barcoding: CO1 DNA barcoding amphibians: take the chance, meet the challenge. Molecular Ecology Resources, 8, 235–246. Swofford DL (2003) PAUP*—Phylogenetic Analysis using Parsimony (*and other methods), version 4.0. Van Bel M, Proost S, Wischnitzki E et al. (2012) Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiology, 158, 590–600. Ward RD, Zemlak TS, Innes BH, Last PR, Hebert PDN (2005) DNA barcoding Australia’s fish species. Philosophical Transactions of the Royal Society B: Biological Sciences, 360, 1847–1857. Weitemier K, Straub SCK, Cronn RC et al. (2014) Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics. Applications in Plant Sciences, 2, 1400042. Whittall JB, Syring J, Parks M et al. (2010) Finding a (pine) needle in a haystack: chloroplast genome sequence divergence in rare and widespread pines. Molecular Ecology, 19, 100–114. Yang J-B, Tang M, Li H-T, Zhang Z-R, Li D-Z (2013) Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population genetic analyses. BMC Evolutionary Biology, 13, 84. Yao H, Song J, Liu C et al. (2010) Use of ITS2 region as the universal DNA barcode for plants and animals. PLoS ONE, 5, e13102. Zhang Y-J, Ma P-F, Li D-Z (2011) High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae). PLoS ONE, 6, e20596. Zuccon A, Zuccon D (2014) MrEnt: an editor for publication-quality phylogenetic tree illustrations. Molecular Ecology Resources, 14, 1090– 1094.

12 M . R U H S A M E T A L .

P.M.H. designed research, H.S.R., S.M., L.A.R., W.M., S.W.G., P.T., M.F.G and R.A.E contributed new reagents or analytical tools, M.R., W.M., H.S.R and T.G.R assembled the plastid genomes, M. R. and P.M.H. analysed the data and wrote the paper.

Data accessibility All sequence data have been deposited into GenBank (accession numbers in Table S3, Supporting information). The alignment of plastid and nuclear sequences used for the phylogenetic analysis can be found in Tables S4 and S5, Supporting information, respectively.

Fig. S1 Neighbor joining tree based on complete plastid sequences. Fig. S2 Bayesian inference tree based on 11 nuclear genes. Fig. S3 Neighbor joining tree based on 11 nuclear genes. Table S1 List of primer pairs used to amplify the plastid genome of New Caledonian Araucaria species (P101 to P158). Table S2 List of primer pairs used to amplify 11 nuclear single copy genes of New Caledonian Araucaria species. Table S3 Genbank accession numbers. Table S4 Plastid genome alignment used for phylogenetic analyses. Table S5 Alignment of 11 nuclear gene regions used for phylogenetic analyses.

Supporting Information Additional Supporting Information may be found in the online version of this article:

© 2015 John Wiley & Sons Ltd

Does complete plastid genome sequencing improve species discrimination and phylogenetic resolution in Araucaria?

Obtaining accurate phylogenies and effective species discrimination using a small standardized set of plastid genes is challenging in evolutionarily y...
405KB Sizes 1 Downloads 7 Views