Research

The chloroplast view of the evolution of polyploid wheat Piotr Gornicki1, Huilan Zhu2*, Junwei Wang2*, Ghana S. Challa2, Zhengzhi Zhang2, Bikram S. Gill3,4 and Wanlong Li2,5 1

Department of Molecular Genetics and Cell Biology, University of Chicago, 920 E 58th St, Chicago, IL 60637, USA; 2Department of Biology and Microbiology, South Dakota State

University, 252 North Plain Biostress, Brookings, SD 57007, USA; 3Wheat Genetics Resource Center, Department of Plant Pathology, Kansas State University, 4024 Throckmorton Hall, Manhattan, KS 66506, USA; 4Biotechnology Section, Faculty of Sciences, King Abdulaziz University, Jeddeh, Saudi Arabia; 5Department of Plant Science, South Dakota State University, 247 North Plain Biostress, Brookings, SD 57007, USA

Summary Authors for correspondence: Wanlong Li Tel: +1 605 688 5743 Email: [email protected] Piotr Gornicki Tel: +1 773 702 1081 Email: [email protected] Received: 10 January 2014 Accepted: 7 June 2014

New Phytologist (2014) doi: 10.1111/nph.12931

Key words: Aegilops, chloroplast genome, domestication, molecular evolution, polyploid wheat, Speltoides, Triticum.

 Polyploid wheats comprise four species: Triticum turgidum (AABB genomes) and

T. aestivum (AABBDD) in the Emmer lineage, and T. timopheevii (AAGG) and T. zhukovskyi (AAGGAmAm) in the Timopheevi lineage. Genetic relationships between chloroplast genomes were studied to trace the evolutionary history of the species.  Twenty-five chloroplast genomes were sequenced, and 1127 plant accessions were genotyped, representing 13 Triticum and Aegilops species.  The A. speltoides (SS genome) diverged before the divergence of T. urartu (AA), A. tauschii (DD) and the Aegilops species of the Sitopsis section. Aegilops speltoides forms a monophyletic clade with the polyploid Emmer and Timopheevi wheats, which originated within the last 0.7 and 0.4 Myr, respectively. The geographic distribution of chloroplast haplotypes of the wild tetraploid wheats and A. speltoides illustrates the possible geographic origin of the Emmer lineage in the southern Levant and the Timopheevi lineage in northern Iraq.  Aegilops speltoides is the closest relative of the diploid donor of the chloroplast (cytoplasm), as well as the B and G genomes to Timopheevi and Emmer lineages. Chloroplast haplotypes were often shared by species or subspecies within major lineages and between the lineages, indicating the contribution of introgression to the evolution and domestication of polyploid wheats.

Introduction The genus Triticum is an allopolyploid complex of agricultural importance. It contains two diploids, two tetraploids and two hexaploids. One diploid and both tetraploid species were domesticated and the two hexaploid species arose under cultivation in Eurasia during the last 10 000 yr (Salamini et al., 2002). Hexaploid common wheat, or bread wheat (T. aestivum, AABBDD genomes), originated in the Caspian Sea region by hybridization between a cultivated form of tetraploid T. turgidum (AABB genomes) and diploid goatgrass Aegilops tauschii (DD genome). The second hexaploid wheat, T. zhukovskyi (AAGGAmAm genomes) arose by hybridization between tetraploid Timopheevi wheat (T. timopheevii, AAGG genomes) and diploid einkorn (T. monococcum ssp. monococcum, Am Am genome). Triticum turgidum and T. aestivum constitute the Emmer lineage, and T. timopheevii and T. zhukovskyi the Timopheevi lineage of polyploid wheat (Gill & Friebe, 2002). The diploid wheat T. urartu (genome AA), as the male parent, contributed the A genome to the Emmer and Timopheevi lineages (Dvorak et al., 1988; Kerby & Kuspira, 1988) and *These authors contributed equally to this work. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

A. speltoides (SS genome), as the female parent, contributed cytoplasm (Wang et al., 1997) and the G genome (Kimber, 1974; Dvorak & Zhang, 1992) to T. timopheevii. The origin of the B genome and the cytoplasm of T. turgidum have been debated for 85 yr since Jenkins first proposed A. speltoides as the possible donor (Jenkins, 1929). Other studies have suggested that the B genome was derived from one of the other diploid species of the Sitopsis section of the genus Aegilops: A. bicornis (SbSb genome), A. longissima (SlSl genome), A. searsii (SsSs genome) or A. sharonensis (SshSsh genome) (reviewed in Haider, 2013). However, the monophyletic origin of A. speltoides (subsection Truncata) and the remaining Sitopsis species (subsection Emarginata) (Dvorak & Zhang, 1992; Badaeva et al., 1996a,b), consistent with the current taxonomy (Van Slageren, 1994), has been contradicted by other studies based on morphology and molecular data (Wang et al., 1997; Huang et al., 2002; Provan et al., 2004; Yamane & Kawahara, 2005). Morphology (Sarkar & Stebbins, 1956), karyotype structure (Riley et al., 1958), repeated DNA sequences (Dvorak & Zhang, 1990; Daud & Gustafson, 1996; Zhang et al., 2002), chromosome banding (Gill & Kimber, 1974), meiotic pairing (Gill & Kimber, 1974) and molecular phylogeny (Mori et al., 1997; Huang et al., 2002; Provan et al., 2004; Salse et al., 2008) have New Phytologist (2014) 1 www.newphytologist.com

New Phytologist

2 Research

failed to identify the donor(s) of the B nucleus and cytoplasm of the Emmer lineage. A polyphyletic origin and intercrossing of two or more of T. turgidum amphiploids with the same A genome, but different S (B or G) genomes raises another possible scenario for T. turgidum speciation (Sarkar & Stebbins, 1956; Zohary & Feldman, 1962; Kimber & Athwal, 1972). Diversity of the plasmon (organellar) genomes and the outcrossing nature of A. speltoides provide an alternative explanation: the B and G genomes could have been derived from two different genotypes of A. speltoides (Wang et al., 1997), supported by nuclear marker genotyping of a large collection of accessions of polyploid wheat and the Sitopsis species (Kilian et al., 2007). This explanation assumes that no gene flow exists between the two polyploid wheat lineages, due to the high sterility of their F1 hybrids. Finally, the B genome donor may be extinct or not yet collected. Plasmon variation in wheat and its relatives has been studied based on individual loci (Tsunewaki, 2009). Maternal inheritance and homoplasty of the chloroplast (cp) simplifies sequencebased phylogenetic analysis of polyploid wheats, but a robust phylogenetic analysis requires a substantial sequence length because of the low nucleotide substitution rates in cp genomes compared to nuclear genomes (Wolfe et al., 1987; Khakhlova & Bock, 2006). Here we report a sequence analysis of 25 entire cp genomes from 11 Triticum and Aegilops species complemented by genotype analysis of a large collection of Triticum and Aegilops species, using plasmon and nuclear markers, to reveal the origin of the chloroplast in the Emmer and Timopheevi lineages of wheat, and provide new molecular phylogenetic data for improving taxonomy of the Triticum/Aegilops complex. The results reshape our understanding of the evolution of the polyploid wheats and their close relatives.

Technologies, Grand Island, NY, USA) to 18 genome equivalents. cp DNA sequences were sorted, based on BLASTN searches of the wheat, barley, rice and maize chloroplast genome sequences, and assembled into contigs using the Consed program (Gordon et al., 1998). The contigs were assembled using CS cp sequence (NC_002762) as a template, removing low coverage sequences at the contig termini and merging contigs overlapping with 100% identity. The GenBank accession number of the cp Tim01 sequence is KJ614410. Genotyping Ten indels identified by comparison of Tim and CS cp sequences, one mitochondrial (mt) marker and five genomic markers were used in the genotype analysis (Table S2). The cp markers are located throughout the LSC block (Fig. 1), nine in intergenic regions and one (WL1072) at the 30 -end of rpl22. The number of different indel sizes for each marker found in the entire set of accessions varied from two (A in CS and B in Tim) to 10 (Table S2), with additional length variation caused by different lengths of A/T stretches. Many of the size differences resulted from rearrangements of short repetitive sequences found at the marker loci. Chromosome 3B-specific genomic clone PSR907 was provided by Michael D. Gale, John Innes Centre (Norwich, UK), and 5B-specific genomic clone FBA348 was provided by Michael Bernard, INRA Station d’Amelioration des Plantes (ClermontFerrand Cedex, France). These clones were sequenced to design

Materials and Methods Plant material Plant accessions used in this study, including country of origin, collection site, if available, and seed source are listed in Supporting Information Table S1. In addition, we used eight alloplasmic lines where common wheat Triticum aestivum ssp. aestivum cv Chinese Spring (CS) nucleus was combined with the cytoplasm of Aegilops speltoides (A. speltoides -CS), T. timopheevii (T. timopheevii-CS) and T. zhukovskyi (T. zhukovskyi-CS) (Tsunewaki, 2009) and seven CS-A. speltoides disomic addition lines containing individual A. speltoides chromosomes added to the genome of CS wheat (Friebe et al., 2000). Sequencing T. timopheevii chloroplast DNA Cp DNA was isolated from T. timopheevii ssp. timopheevii accession Tim01 (Tim) as previously described (Triboush et al., 1998) and sheared to 3–6-kb fragments using HydroShear® DNA shearer (Digilab Genomic Solutions, Ann Arbor, MI, USA). After end repair, the DNA fragments were cloned into pCRâ4Bunt-TOPOâ vector (Invitrogen) and end-sequenced using an ABI 3730 DNA analyzer (Applied Biosystems/Life New Phytologist (2014) www.newphytologist.com

Fig. 1 Organization of the Triticum and Aegilops chloroplast genomes. Annotations inside and outside of the circle are on opposite strands. Genes are color-coded by function. Positions of the seven largest indels are marked with black triangles, and positions of the 10 haplotype markers are marked with red arrows. The complete list of genes, large indels and markers, with their position in the genome, is given in Supporting Information Fig. S1. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist chromosome-specific PCR primers for markers prs907 and fba348, respectively, and the primer specificity was tested using DNA from CS nulli-tetrasomics. Primers for chromosome 3Bspecific STS marker wgp118, mt STS marker Orf256 and Pinb-A marker were adapted from previous studies (Hedgcoth et al., 2002; Liu et al., 2003; Li et al., 2008). The Orf256 marker was scored for all accessions and the four chromosome-specific markers were scored for 341 accessions of the Emmer and Timopheevi lineages. For genotyping the T. timopheevii Q/q locus, PCR marker WL189 targeted the functional single nucleotide polymorphism distinguishing the Q and q alleles (Simons et al., 2006). Identity of the corresponding PCR product was confirmed by sequencing and DpnI digestion. PCR primers were designed using the Primer3 program (Rozen & Skaletsky, 2000). Total plant DNA isolated as previously described (Li et al., 2008) was used as PCR template, and PCR products were separated by agarose gel electrophoresis or polyacrylamide gel electrophoresis. The PCR primers for the plasmon markers were tested using DNA from CS, Tim, A. speltoides-CS and T. timopheevii-CS cytoplasm substitution lines, and CS-A. speltoides disomic addition lines to ensure that the PCR products did not originate from the nuclear genomes. Scoring of 398 accession/marker combinations was verified by PCR sequencing. A complete list of plasmon haplotypes and nuclear genotypes is shown in Table S3.

Research 3

Single-run 75- or 50-bp sequences were generated at the University of Chicago Genomics Facility using SOLID 5500XL from Life Tech (Applied Biosystems/Life Technologies) according to the manufacturer’s procedures, with 10 samples barcoded together run on a single lane. Sequences were assembled using LifeScope 2.5.1 (Applied Biosystems/Life Technologies) and Sequencher v5.1 (Gene Codes, Ann Arbor, MI, USA), initially using CS and Tim cp DNA sequences as templates and later, other sequences, as they became available, selected as the closest relatives based on haplotype and other phylogenetic information. Ambiguities in the template-assisted assemblies were corrected manually and by de novo sequence assembly of sub-sets of the sequencing reads, both using Sequencher. Substitutions in protein coding sequences were verified by comparison of the raw sequencing data for multiple pairs of sequences. T. aestivum CS (GenBank accession number KC912694) and T. urartu (GenBank accession number KC912693) cp sequences from an independent project were used to assess quality of the assembled sequences. Incorrect length of A/T stretches is the major source of possible sequence errors, with a less frequent incorrect introduction of other indels also possible. Such errors do not affect the outcome of the phylogenetic analysis performed in this study because gap positions are excluded from all sequence comparisons. No errors were detected in the proteincoding sequences. Annotation

Next generation sequencing of chloroplast genomes Cp DNA of the following species was sequenced: T. aestivum ssp. aestivum cv CS (plant accession number TA3008, GenBank accession number KJ614396) and ssp. spelta (PI348000, KJ614403); T. turgidum ssp. carthlicum (TA2836, KJ614397 and TA2801, KJ614399), ssp. durum (PI520121, KJ614398) and ssp. dicoccoides (TA73, KJ614400; TA60 KJ614401 and TA1133, KJ614402); A. speltoides ssp. ligustica (AE918, KJ614404 and TA 1796, KJ614405) and ssp. speltoides (PI487232, KJ614406); T. timopheevii ssp. armeniacum (TA941, KJ614407; TA944, KJ614409 and TA1485, KJ614408); A. bicornis (Clae57, KJ614418); A. searsii (TA1926, KJ614413; TA1837, KJ614414 and TA1841, KJ614415); A. sharonensis (TA1995, KJ614419 and TA1996, KJ614417); A. longissima (TA1924, KJ614416); A. kotschyi (TA1980, KJ614420); T. urartu (PI428335, KJ61 4411); and A. tauschii (AL8/78, KJ614412). Chloroplasts were isolated using Chloroplast Isolation Kit (bioWORLD, Dublin, OH, USA): 1–5 g of fresh leaves were ground in liquid nitrogen and homogenized in 50–120 ml of chloroplast isolation buffer with 0.1% BSA. The homogenate was filtered through four layers of cheesecloth and centrifuged at 200 g for 3 min. Chloroplasts collected by centrifugation at 1000 g for 7 min were resuspended in 20–50 ml of 19 isolation buffer with 0.1% BSA and centrifuged at 1000 g for 7 min, all at 4–6°C. The chloroplast pellet was resuspended in 2–4 ml DNA isolation buffer (0.5 M NaCl, 0.1 M Tris, 0.05 M EDTA, 0.84% SDS, pH8.0) and incubated at 65°C for 1 h. After extraction with phenol-chloroform, chloroplast DNA was precipitated with 1 volume of isopropanol. A Whole Genome Amplification kit (Qiagen) was used to amplify DNA. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

The cp genome was annotated online using the DOGMA program (Wyman et al., 2004) and visualized using the GenomeVx program (Conant & Wolfe, 2008). Sequence alignment and phylogenetic analysis Sequences were aligned using the LOGAN tool (Brudno et al., 2003) on the Vista server (http://genome.lbl.gov/vista). Local alignments were tested using ClustalX (Larkin et al., 2007). Alignment of the protein-coding sequences was unambiguous. Gaps within sequences containing long stretches of the same nucleotide and varying in length in different genomes, short sequences appearing in opposite orientation in different genomes and some sequences consisting of short repeats were adjusted manually or removed. We identified 150 sites affected by these alignment ambiguities (some possible sequencing errors). They were excluded from the phylogenetic analysis (their inclusion did not affect tree topologies but changed length of some branches). The barley cp sequence (Hordeum vulgare cv Morex, GenBank accession number EF115541) was used as an outgroup for the Triticum/Aegilops complex. The total length of the alignment (including gaps) of the 26 cp sequences was 138 282 bp, comprising 82 186 bp in the Large Single Copy (LSC) block, 12 922 bp in the Small Single Copy (SSC) block and 21 587 bp in one Inverted Repeat (IR) block. Protein-coding sequences of 77 genes were concatenated in the original order on the chloroplast genome with the repeat sequences represented only once. Corresponding barley, maize, Sorghum and Brachypodium sequences were extracted from the complete chloroplast genome sequences New Phytologist (2014) www.newphytologist.com

4 Research

of Hordeum vulgare cv Morex (GenBank accession number EF115541), Zea mays (GenBank accession number 986563), Sorghum bicolor (GenBank accession number EF115542) and Brachypodium distachyon (GenBank accession number NC_011032.1). The total length of the coding sequence alignment was 56 337 bp. Separate calculations were made for 70 concatenated protein-coding sequences found in the SC blocks (including 30 -ends of rps12 and ndhH) and nine concatenated protein coding sequences found in the IR blocks (including 50 -ends of rps12 and ndhH). The alignment length (including maize and Sorghum sequences as outgroups) with all gap positions removed and number of synonymous sites was 51 243 bp (42 510 bp LSC plus 8733 bp SSC) and 17 079, and 4155 bp and 1385, respectively. Separate calculations were also made for concatenated sequences not encoding proteins (including introns) found in the SC blocks and the IR blocks. The length of the alignments with all gap positions removed was 39 728 bp (35 532 bp LSC plus 4196 bp SSC) and 17 010 bp, respectively. The following phylogenetic analysis was performed using MEGA 5.1 and 5.2 (Tamura et al., 2011). Distances and Neighbor-Joining (NJ) phylogenetic trees were calculated based on synonymous substitutions in protein-coding sequences (Nei–Gojobori and Jukes–Cantor methods) and on all substitutions in other sequences (Jukes–Cantor method). The trees are shown in Figs 2, S2 and S3, respectively. Bootstrap values were calculated using 1000 replicates. Standard errors for substitution rates were calculated, Tajima’s relative rate test and molecular clock test using the Maximum-Likelihood (ML) method were performed as well. The hypothesis of equal evolutionary rate at third codon positions throughout the tree including maize and Sorghum as

New Phytologist outgroups (Fig. S2) was not rejected. The best ML tree (not shown) was calculated for the same 29 protein-coding sequences as in Fig. S2 using MEGA 5.2 (Tamura et al., 2011). GTR+G was identified as the best substitution model by the Bayesian Information Criterion also using by MEGA 5.2. Bootstrap values were calculated using 500 replicates (not shown). Divergence times were calculated based on synonymous substitutions (Gaut, 1998) as well as based on all substitutions in protein-coding sequences using the ML method and the strict clock (Table S4) using MEGA 5.2 (Tamura et al., 2011). The divergence time between Pooideae and Panicoideae at 60 Myr ago (Ma) (Huang et al., 2002; Chalupska et al., 2008), an estimate based on fossil records, and between the Triticum/Aegilops complex and barley at 11.6 Ma, were used to calibrate the clock. Molecular estimates of the divergence time between barley and wheat have been used in previous studies to date more recent events in wheat evolution. Divergence time errors shown in this paper reflect only intrinsic properties of the sequence datasets. The topology of the NJ and ML trees based on the protein coding sequences is the same, except for T. urartu and A. tauschii apparently forming a separate clade in the ML tree (not shown), but without bootstrap support. The relative branch length is also very similar. Some dates calculated by the ML method are slightly earlier than the dates calculated by the NJ method, but they remain within the overlapping error ranges (Table S4). For simplicity, only the dates based on synonymous substitutions are shown in the text and figures. Phylogenetic analysis was also performed on joined sequences of the LSC and SSC blocks. The best ML trees were calculated using PhyML (Guindon et al., 2010). From the list of models

Fig. 2 Neighbor-joining (NJ) phylogenetic tree of Emmer, Timopheevi, Speltoides and Sitopsis lineages based on synonymous substitutions in 70 concatenated protein coding sequences in SC blocks of the chloroplast genomes. Barley was used as an outgroup. Divergence times (shown in red for selected nodes) were calculated using 11.6 Myr ago (Ma; Chalupska et al., 2008) for the divergence between barley and the Triticum/Aegilops complex. Species names are followed by accession numbers for plant seed collections and haplotype names (Supporting Information Table S1, Table 2). ‘H- -’ indicates one of the singleton haplotypes which was not numbered. Dots at nodes indicate bootstrap values > 80%. An expanded NJ tree including Brachypodium, Sorghum and maize as outgroups is shown in Fig. S2. A NJ phylogenetic tree based on concatenated SC sequences other than proteincoding exons is shown in Fig. S3. The inset on the left lists 16 major chloroplast haplotypes grouped into five clusters (Table 2) corresponding to five clades of the tree as indicated by color shading. The haplotype clusters were assigned to the tree clades based on cp sequence of plants carrying each haplotype. Names of the six haplotypes not represented on the tree are offset to the right. New Phytologist (2014) www.newphytologist.com

Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist

Research 5

implemented by PhyML, MEGA 5.2 identified TN93+G+I and Jmodeltest2 (Darriba et al., 2012) identified GTR+I as the best nucleotide substitution models using the Bayesian and Akamai Information Criteria. The two best trees are essentially identical. In this case, Shimodaira–Hasegawa-like (SH-like) method was used to estimate branch support. Comparison of the best ML tree to the NJ tree of Fig 2 is shown in Fig. S4.

Results The Triticum and Aegilops chloroplast genomes The 26 cp genomes (including barley) are conserved in size and gene content (Figs 1, S1). They are 135 568- to 136 875-bp long and consist of four blocks: Large Single Copy (LSC), Inverted Repeat A (IRA), Small Single Copy (SSC) and Inverted Repeat B (IRB). The sequences of the IRs are identical. Seven large indels (153–1096 bp) accounting for the majority of the size differences are in intergenic regions. Smaller indels are located mostly in intergenic regions. No large genome rearrangements were found. The cp genomes contain the same set of 131 genes encoding 84 proteins, eight ribosomal RNAs and 39 tRNAs. Seven of the protein-coding genes are present in two copies encoded by the inverted repeats (rps19, rpl2, rpl23, ycf15, ndhB, rps7 and rps15). The C-terminal part of ndhH is encoded by the SSC block and its N-terminal part by the neighboring IRB region. The N-terminal part of rps12 is encoded by the LSC block and the C-terminal part is encoded by the IRs. Ten protein-coding genes contain at least one intron (rps16, atpF, ycf3, petB, petD, rpl16, rpl2, ndhB, rps12 and ndhA), and two use non-ATG start codons (rps19, GTG and rpl2, ACG). Phylogeny of the Triticum/Aegilops complex The substitution rates (at both synonymous and other sites) are on average four- and 23-fold lower for the SC and IR sequences, respectively, relative to the rates in grass nuclear genes, based on tree distances between Triticum and Aegilops, and other grasses (Table 1). Substitution rates at all codon positions and at third codon positions are significantly lower in the Triticum/Aegilops lineages relative to Brachypodium with maize as an outgroup and Table 1 Substitution rates in chloroplast and nuclear genomes of grasses Panicoideae- Barley/ Triticum/Aegilops dS Cp LSC plus SSC Cp IR Nuclear genes



Barley-Triticum/Aegilops dS

dN* †

0.158  0.004

0.028  0.002

0.018  0.001

0.028  0.005‡ 0.59  0.07§

0.006  0.003‡ 0.114  0.027§

0.003  0.001 0.084  0.005¶

*

Concatenated SC sequences not encoding proteins. † 70 concatenated protein coding sequences. ‡ 9 concatenated protein coding sequences. § Multi-gene average rate (Chalupska et al., 2008). ¶ Rate for Acc1 and Acc-2 introns (Chalupska et al., 2008). Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

relative to barley with Brachypodium as an outgroup, determined by Tajima’s relative rate test (P-values < 0.05), which further reduces cp sequence divergence in the lineage. Full-length sequences of the cp genome were used in the phylogenetic analysis to overcome low sequence divergence at the species level and below. The NJ and ML phylogenetic trees were constructed using concatenated SC protein-coding sequences (Figs 2, S2), concatenated SC sequences that do not encode proteins (Fig. S3), as well as the combined complete sequences of the LSC and SSC blocks (Fig. S4). The topology of these trees is the same. Divergence times calculated for major nodes using alternative evolutionary models and clock calibration are either identical or very similar (Table S4). The phylogeny showed that A. speltoides diverged from the other Sitopsis species, including A. bicornis, A. longissima, A. searsii and A. sharonensis, before they diverged from A. tauschii and T. urartu (Figs 2, S3). Aegilops speltoides, together with Emmer and Timopheevi species, form a monophyletic clade (Speltoides lineage), indicating that diploid species of the Speltoides lineage are the maternal donors of the plasmons to the polyploid wheat lineages. In the Emmer lineage, hexaploid T. aestivum ssp. spelta forms a clade with wild tetraploid T. turgidum ssp. dicoccoides accessions TA1133 from Turkey and TA60 from northern Israel, separate from the clade formed by the other T. aestivum and T. turgidum accessions (Figs 2, S3). This association reveals the origin of the cytoplasm as well as a possible source of nuclear genetic material, responsible for the differentiation of the spelt wheat. Our data also reveal low sequence variation within species (Figs 2, S3). For example, A. speltoides ssp. ligustica accession AE918 collected from a site near Arbil, Iraq, and TA1796 collected from Kahta, Turkey, 500 km apart, differ by only two substitutions in the entire cp sequence. Similarly, A. speltoides ssp. speltoides accession PI487232 collected near Idlib, Syria, 350 km from the AE918 site, differs by three substitutions and a small indel; and it differs from TA1796 collected 750 km away by five substitutions and an indel. Sequence divergence is low within each of the two Emmer sub-lineages and only slightly higher among the four Timopheevi accessions (three wild accessions collected near Arbil and Dahuk in Iraq and one cultivated accession). Furthermore, there is little sequence divergence in the Sitopsis lineage, revealing both low haplotype variation and possibly very recent speciation. Interspecific chloroplast introgression may explain the observed Sitopsis divergence pattern. However, it seems unlikely, as it would require an introgression sweep involving all eight accessions representing five species. The branch length comparison with timescale bars in Figs 2 and S3 illustrates this phenomenon. Plasmon and nuclear genotypes A total of 1127 accessions of the 13 species were analyzed in this study using markers selected to distinguish wild Emmer from wild Timopheevi, in order to facilitate the search for the B- and G- genome donor species. In addition to polyploid wheat species New Phytologist (2014) www.newphytologist.com

New Phytologist

6 Research

and diploid species of the Sitopsis section of Aegilops, we included eight accessions of the SsSsUU tetraploid species, seven of A. kotschyi and one of A. peregrina, which presumably acquired the Ss genome and plasmon from A. searsii (Tsunewaki, 2009). Of the 455 A. speltoides accessions, 85 were used in a previous study on wheat evolution (Kilian et al., 2007), and the eight cytoplasm substitution lines were used for a Triticum/Aegilops plasmon evolution study (Wang et al., 1997). Of the 1127 euploid accessions, 1092 were grouped into 16 cp haplotypes, each represented by at least three accessions (Table 2). The 16 haplotypes form five clusters corresponding to five major clades of the phylogenetic tree, which included representatives of 10 haplotypes (Fig. 2). In the haplotype tree (Fig. S5), Sitopsis species appear more closely related to Timopheevi/ Speltoides than to Emmer due to bias of the cp markers selected based on polymorphism between T. aestivum and T. timopheevii. Each of the remaining 35 accessions either carries a unique haplotype not found in any other accession or shows multiple states for one or more markers (singles in Table S3). The plasmon variation of wild species is higher than variation of their domesticated relatives (Table 2). The plasmon variation among Sitopsis species is low, consistent with the low sequence variation of their cp genomes. In the Emmer lineage, H1 is the dominant haplotype, which was found in 164 (58%) of the 290 T. turgidum accessions (all subspecies) and in 72 (71%) of the 101 T. aestivum accessions. H04, the second most frequent haplotype in the Emmer lineage (23% of all T. turgidum accessions), was found in 63 of the 242 T. turgidum ssp. dicoccoides accessions, in 1 of the 10 T. turgidum ssp. turgidum accessions tested, TA1167, and in the only T. aestivum ssp. macha tested, TA2602. Accessions of T. aestivum ssp. spelta belong to either the H07 (77%, all European spelt) or H01 haplotype (23%, two accessions of European and one accession of Iranian spelt). H07 was also found in three accessions of

ssp. dicoccoides (TA0095, TA1133 and G5048 from Turkey) and in one accession of T. aestivum ssp. aestivum (CItr 12157). Other haplotypes are subspecies-specific: H03 was found only in ssp. aestivum (13% of T. aestivum accessions), whereas H02, H05 and H06 were found only in ssp. dicoccoides. In the Timopheevi lineage, H08 is the dominant haplotype found in 72% of the accessions. H09 and H10 account for 19% and 5% of the accessions, respectively. Although H09 is found exclusively in ssp. armeniacum, H08 and H10 were found in both subspecies of T. timopheevii and in hexaploid T. zhukovskyi. In A. speltoides, H11 is carried by 94% of the accessions. In the Sitopsis lineage, H15 is the sole haplotype found in diploid A. bicornis, A. longissima and A. sharonensis, and in tetraploids A. kotschyi and A. peregrina. H16 is a minor haplotype and only found in three of the 16 A. searsii accessions. At the mt locus orf256, state A was observed for all accessions of the Emmer lineage, state B for all accessions of the Sitopsis lineage (Table 2). However, either A or B (the majority of cases) was observed for both the Timopheevi and the Speltoides lineages. The WL189 marker targets the functional single nucleotide polymorphism that differentiates the q allele of the nonfree-threshing forms from the Q allele of the free-threshing forms of T. aestivum and T. turgidum (Simons et al., 2006). Two cultivated Timopheevi accessions (TA2729 and PI542472), which are free-threshing, have the expected Q allele, but they carry different cytoplasms, Emmer haplotype H01 and Timopheevi haplotype H10, respectively. Finally, for 98% of accessions tested, the combinations of states at the four chromosome-specific loci were unique for either the Emmer or the Timopheevi lineage (Table 2), further differentiating the two lineages. The geographic distribution of the cytoplasm in 719 accessions of wild diploid and tetraploid species throughout the Middle East showed both spatial separation and overlap of the ranges

Table 2 Major plasmon haplotypes (cp and mt markers) and their prevalence in Emmer, Timopheevi, Speltoides and Sitopsis lineages1 Emmer lineage haplotypes Number of accessions Haplotype cp (WL markers) mt AABB AABBDD T. turgidum T. aestivum % 343 540 1058 1062 1068 1072 1082 1405 1407 1425 orf256 % dicoccoides dicoccum carthlicum durum paleocolchicum polonicum turanicum turgidum aestivum sphaerococcum spelta macha H01 62 A A A A A A A A A A A 100 120 3 17 7 1 6 1 9 68 1 3 H02 6 A A C A A A A A A A A 100 23 H03 3 A A A A A A A A A B A 100 13 H04 17 A A D A A C A A A A A 100 63 1 1 H05 1 A A E A A C B A A A A 100 4 H06 3 A A F A A C B A A A A 100 11 H07 3 A A G A A C B A A A A 100 3 1 10 Timopheevi lineage haplotypes Number of accessions Haplotype cp (WL markers) mt AAGG T. timopheevvii T. zhukovskyi m m % 343 540 1058 1062 1068 1072 1082 1405 1407 1425 orf256 % Emmer lineage genotypes armeniacum timopheevii AAGGA A H8 73 B C B B B B B B B B A 10 16 nuclear markers B 90 147 2 1 wgp118 psr907 a348 pinb-A % H9 19 B C B C B B B B B B A 2 1 A A B B 47 B 98 42 A A A B 36 H10 5 B B B B B B B B B B A 8 1 A B B B 12 B 92 3 3 4 A B A B 1 Speltoides lineage haplotypes Number of accessions B B B B 2 cp (WL markers) Haplotype mt B A B B 2 Ae. speltoides % 343 540 1058 1062 1068 1072 1082 1405 1407 1425 orf256 % SS H11 94 B C B B B B B B A B A 4 17 Timopheevi lineage genotypes B 96 406 nuclear markers H12 1 B A B C B B B B B B B 100 4 wgp118 psr907 a348 pinb-A % H13 2 B A B C B B B C B B B 100 9 B B A A 82 H14 1 B A H C B B B B A A B 100 6 B A A A 11 Sitopsis lineage haplotypes Number of accessions B B B A 2 Haplotype cp (WL markers) mt B B A B 1 A. bicornis A. longissima A. sharonensis A. searsii A. kotschi A. peregrina b b l l sh sh s s s s s s % 343 540 1058 1062 1068 1072 1082 1405 1407 1425 orf25 % B B B B 2 SS SS S S SS S S UU S S UU H15 89 B A B C B B B B A A B 100 12 8 7 12 8 1 B A A B 1 H16 6 B A I C B B B B B A B 100 3 A B A A 1

1

Insets in the bottom right corner show frequency (%) of nuclear genotypes in Emmer and Timopheevi lineages. Marker states are indicated by letters (Supporting Information Table S2). The complete list of haplotypes and genotypes is shown in Table S3 and the information for individual plant accessions is shown in Table S1. New Phytologist (2014) www.newphytologist.com

Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist occupied by representatives of the Sitopsis lineage and the Speltoides lineage, which is further divided into Emmer and Timopheevi lineages of polyploid wheat (Fig 3). Five of the six cp haplotypes of wild Emmer (ssp. dicoccoides) were found in Israel and Jordan where they overlap with Speltoides. The overlapping range continues towards southern Turkey, where a rare wild Emmer haplotype (H07) is also found, in addition to H01 and H04. H07 is the major haplotype found in European spelt (ssp. spelta). The ranges of A. speltoides and the wild Timopheevi overlap in northern Iraq and southeastern Turkey. The wild Timopheevi range extends further northeast to Armenia and Azerbaijan, whereas the A. speltoides range extends to other parts of Turkey. The partially overlapping range of Sitopsis species, mostly confined to the East Mediterranean region, from north Egypt to west Syria (Fig. 3), is consistent with their recent origin (Fig. 2). Although their ranges overlap, cp sequence analysis excludes Sitopsis as the source of the cytoplasm for Emmer. Sitopsis is not the source of the cytoplasm for Timopheevi either. In this case, the species show significant spatial separation as well. Haplotypes are often shared by species within the major lineages, but infrequently by species across the major lineages. Two T. timopheevii ssp. armeniacum accessions (TA9 and TA11), collected near Arbil (Iraq), belong to the major haplotype H11B of A. speltoides (Table S3). Two T. timopheevii accessions, one from ssp. timopheevii (TA2729) and another from ssp. armeniacum (TA976), belong to the major haplotypes of the Emmer lineage, H01 and H04, respectively, with the mt marker typical of the Emmer lineage as well. Both accessions have morphology and karyotype structure characteristics of Timopheevi, such as coarse

Fig. 3 Geographic distribution of wild Emmer (217 accessions of ssp. dicoccoides), wild Timopheevi (207 accessions of ssp. armeniacum), Speltoides (253 accessions) and Sitopsis (42 accessions of all species). The sizes of the pie-charts are proportional to the number of accessions analyzed. Color-coded sectors reflect frequency of haplotypes. The collection sites are grouped by country or region (in Turkey and Iraq). Two heterogenous populations of wild Emmer and wild Timopheevi are labeled wE and wT, respectively. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

Research 7

and hairy leaf surfaces, and carry the Timopheevi-specific 6AS1GS translocation (Badaeva et al., 1994), suggesting that they are nuclear-cytoplasm hybrids formed by interspecific introgression of the nuclear genome of T. timopheevii with the cytoplasm of T. turgidum. Finally, we analyzed 22 T. turgidum ssp. dicoccoides accessions collected in Turkey, northern Iraq and western Iran (Harlan & Zohary, 1966). This Turkish-Iraqi race of wild emmer was intermediate between Emmer and Timopheevi types in morphology and chromosome pairing (Sachs, 1953; Wagenaar, 1961, 1966; Rawal & Harlan, 1975). Nine of the accessions carried the major Timopheevi cp haplotypes – either H08 or H09 – as well as both A and B states of the mt orf256 marker and specific nuclear genotypes that are characteristic of the Timopheevi lineage. The remaining 13 accessions carried Emmer cp haplotypes H01, H04, H06 and H07, the A state of the mt orf256 marker and specific nuclear genotypes that are characteristic of the Emmer lineage (Table S3).

Discussion Phylogenetic analysis of chloroplast genomes The paternal ancestors of polyploid wheat, Triticum urartu, T. monococcum and A. tauschii, have been well established, based on the sequence analysis of many nuclear genes. The history of the maternal ancestors of the G genome of Timopheevi and the B genome of Emmer wheat has been more difficult to decipher based on nuclear genome sequences. In this study, we show the utility of the chloroplast genome sequence to identify the maternal parents of the AABB and AAGG tetraploids, to describe the subsequent cytoplasmic inheritance in polyploid wheats, and to define the evolution of their close relatives. A cp-based phylogenetic analysis complemented with an indel-based chloroplast haplotype analysis of a large collection of Aegilops/Triticum species revealed the cytoplasmic constitution of each species and lineage, as well as possible cytoplasmic exchanges between them. Accessions representing different species and haplotypes were then correlated with the geographic distribution of their collection sites throughout the Middle East to complete the analysis. The marker genotype information alone, however, is not sufficient for establishing phylogenetic relationships between species. The high copy number of the higher plant chloroplast (Bendich, 1987) and gene conversion biased toward the wild-type sequence (Khakhlova & Bock, 2006) are likely to be the major factors contributing to the much lower substitution rates in the cp genomes than in the nuclear genomes (Wolfe et al., 1987). Due to the low sequence divergence, high-quality full-length cp genomic sequences are needed for a robust phylogenetic analysis at and below the species level. Comparison of cp sequences for multiple accessions, which represent species of the three major Aegilops/Triticum lineages, revealed a low level of haplotype variation. Haplotype divergence at nuclear loci could exceed 0.5 Myr, at least for some species (Chalupska et al., 2008), but it is much lower for the chloroplast genome. The low rate at which mutations in the cp genome are fixed apparently operates at the New Phytologist (2014) www.newphytologist.com

8 Research

population level as well. Low haplotype variation helps facilitate phylogenetic analysis and divergence time estimates of the closely related species (and subspecies) of the Aegilops/Triticum complex, because the estimated chloroplast divergence date might be closer to the time of species separation. Uniparental inheritance of the plastome and homoplasty of the chloroplast simplify sequencebased phylogenetic analysis. Finally, homoplasty makes application of next-generation sequencing methods possible. The history of polyploid wheats from the chloroplast perspective The time of barley divergence from the Triticum/Aegilops complex is estimated at 10.6  0.6 Ma based on the synonymous substitutions in protein-coding sequences (Table S4), placing it within the 11.6  2.4 range calculated previously based on sequence comparison of multiple nuclear genes (Chalupska et al., 2008). However, the divergence time of 1.4  0.2 Ma (Fig. 2) between T. urartu and A. tauschii (1.2  0.2 Ma, if just these two species are considered; Table S4), based on the cp sequences, is later than the previous estimates of 2.3  0.5 Ma based on Acc-1 sequences (Chalupska et al., 2008) and 2.7 Ma (95% confidence interval 1.4–4.1 Ma) (Dvorak & Akhunov, 2005). This difference might be explained by the much lower haplotype sequence divergence for the cp genome, than that for the nuclear genome. High haplotype divergence, could inflate species divergence time estimates in a nuclear locus-dependent manner (Chalupska et al., 2008). Classically, A. speltoides has been considered a part of the Sitopsis section. The phylogenetic reconstructions (Figs 2, S3), however, showed that A. speltoides is not a member of Sitopsis, but together with Emmer and Timopheevi it forms a clade (Speltoides lineage) paraphyletic with respect to T. urartu, A. tauschii as well as other Sitopsis species. The Speltoides lineage diverged 1.2 Myr before divergence of Sitopsis, T. urartu and A. tauschii (Fig. 2). Although understanding the source of the taxonomic discrepancy requires further investigation, our study clearly demonstrates that the cytoplasm genomes of polyploid wheats were inherited from Speltoides, not from any of the Sitopsis species (known as Emarginata). Accordingly, Sitopsis lineage is not the source of the B or G genomes. The Emmer lineage diverged from A. speltoides and Timopheevi 0.7  0.2 Ma. This date is at the higher end or slightly above earlier estimates based on Acc sequences (Chalupska et al., 2008) and the incidence of sequence duplications (Dvorak & Akhunov, 2005). Based on the intraspecific levels of restriction fragment length polymorphism and single strand configuration polymorphism, speciation of T. turgidum was proposed to be more ancient than speciation of T. timopheevii (Tsunewaki et al., 1991; Wang et al., 1997). Our study provides the first sequence-based divergence time comparison. The Emmer and Timopheevi tetraploidization events occurred within the last 0.7 and 0.4 Myr, respectively. Timopheevi and A. speltoides cp genomes are most closely related, consistent with A. speltoides being the maternal donor of the cytoplasm and the G genome to the Timopheevi tetraploid as New Phytologist (2014) www.newphytologist.com

New Phytologist previously proposed (Kimber, 1974; Dvorak & Zhang, 1992; Wang et al., 1997). None of the haplotypes carried by 391 Emmer accessions was found among the 450 A. speltoides accessions. The female donor of the cytoplasm and B genome to the Emmer wheat is either a distant relative of A. speltoides, as we know it today (perhaps even extinct), or the tetraploidization event occurred much earlier than the Timopheevi tetraploidization. The Emmer lineage founded within the last 0.7 Myr radiated from south Levant, the present day chloroplast diversity center of the wild emmer, to southeastern Turkey and northern Iraq where haplotypes H01, H04 and H07 are established. H07 is a minor haplotype that was likely to have been derived from H05/H06 (nearest haplotype neighbors different only at marker WL1058, Table 2), but is not found in south Levant. The time span allowed Emmer to differentiate both from Speltoides (cytoplasm and B genome) and within the Emmer lineage. The Timopheevi lineage is less differentiated, as the chloroplast sequence shows. It was founded within the last 0.4 Myr, probably in the northern Iraq region around Arbil, where many accessions of both major haplotypes of wild Timopheevi (H08 and H09) were collected. Polymorphism at mt orf256 locus was found in both Timopheevi and Speltoides in this region. Two accessions (TA9 and TA11) of wild Timopheevi collected near Arbil carry H11, the dominant haplotype of A. speltoides, hinting that they are the ancestral state of the species. The high level of karyotype diversity in wild Timopheevi collected in northern Iraq (Badaeva et al., 1994), is consistent with the species originating there. The major haplotype H01 of wild Emmer was the founder of all domesticated species of the lineage, most of which also carry H01 (Table 2). Emmer itself was domesticated in the Diyarbakir region of Turkey (Ozkan et al., 2011). The aestivum hexaploid AABBDD originated under cultivation in the southwestern Caspian Sea region via hybridization between domesticated AABB tetraploid and A. tauschii ssp. strangulata (Dvorak et al., 1998; Wang et al., 2013). Two cp haplotypes, H08 and H10, were present in both tetraploid and hexaploid domesticated Timopheevi, T. timopheevii ssp. timopheevii and in T. zhukovskyi, either due to recurrent hexaploidization events or outcrossing between these two species. H10 is a minor haplotype found in northern Iraq (Dahuk) and Azerbaijan. Similar to hexaploid T. aestivum, speciation of the hexaploid species T. zhukovskyi occurred under cultivation via hybridization of ssp. timopheevii with T. monococcum, probably in Transcaucasia, where ssp. timopheevii is cultivated. Chloroplast intra- and inter-lineage introgression According to the pivotal genome hypothesis of Zohary & Feldman (1962), tetraploid Emmer and Timopheevi species, which share the A genome, can acquire adaptive traits through introgression leading to mosaicity of the B and G genomes, a product of inter-lineage outcrosses. An intermediate race predicted by this hypothesis was previously found (Sachs, 1953; Wagenaar, 1961, 1966; Rawal & Harlan, 1975). Our study provides molecular evidence that evolution of the polyploid wheats was punctuated by chloroplast (plasmon) introgression. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist Wild Emmer and wild Timopheevi grow sympatrically in southeastern Turkey (Fig. 3), northern Iraq and western Iran (Harlan & Zohary, 1966). They have very similar morphology, and the key phenotypic difference is the hairiness of their leaf surfaces (Tanaka & Kawahara, 1976). Cytogenetically, the TurkishIraqi race of wild Emmer shows a range of chromosome pairing capabilities, from complete affinity to T. turgidum to high affinity to T. timopheevii (Sachs, 1953; Rawal & Harlan, 1975; Tanaka & Kawahara, 1976). The Turkish-Iraqi race has a mixed cp haplotype: 60% of the accessions belong to the Emmer lineage (H01, H04, H06 and H07) and 40% to the Timopheevi lineage (H08 and H09), confirming its heterogeneity (Rawal & Harlan, 1975). Triticum turgidum ssp. dicoccoides accession G4991, for example, which shows high chromosome pairing with both T. turgidum and T. timopheevii (Rawal & Harlan, 1975), carries Timopheevi cp H09 as a result of a cross between wild Emmer and wild Timopheevi. Conversely, wild Timopheevi accession TA976 carries Emmer lineage cp haplotype H04. In the cp genome, we detected higher diversity in the southern Levant population of the wild Emmer. But in the nuclear genome, a higher diversity was found in the Turkish-IraqiIranian population of wild Emmer (Ozkan et al., 2011). The increased nuclear genome diversity was most probably due to absorption of the Timopheevi variation via the inter-lineage introgression. The Timopheevi introgression further contributed to domestication of Emmer and its derivatives. For example, marker fba348, which is predominant in wild Timopheevi but rare in wild Emmer, is highly enriched in the domesticated tetraploid and hexaploid wheat of the Emmer lineage, and almost fixed in T. turgidum ssp. carthlicum (Tables 1, S1). Furthermore, the Q allele in T. timopheevii ssp. timopheevii, represented in this study by accession TA2799 with Emmer cytoplasm H01, was likely introgressed from free-threshing domesticated wheat. It may have been passed on to other populations such as accession PI542472 with the Timopheevi cytoplasm H10. This interlineage introgression is consistent with both polyploid wheat lineages sharing the Ph1 locus and other landmark loci (Feldman, 1966; Dhaliwal, 1977). The Emmer lineage is subdivided into aestivum and spelta sublineages (Figs 2, S3). Haplotypes H01, H02 and H03 constitute the aestivum sub-lineage. H02 is found exclusively in ssp. dicoccoides and constitutes the spelta sub-lineage, and H03 is found exclusively in ssp. aestivum. H03 carries the B state at WL1425, which is specific for Timopheevi and A. speltoides, and was not found in other subspecies of the Emmer lineage. (Table 2). Haplotypes H05, H06 and H07 constitute the spelta sub-lineage. H06 is exclusive to ssp. dicoccoides (10 accessions from Israel, mostly northern Israel, and one from Iran). H07 is a rare haplotype found in three accessions of ssp. dicoccoides from Turkey, but it is the most common haplotype (10 of the 13 accessions) in European spelt (Table 2). H05, with three ssp. dicoccoides accessions from Northern Israel and one from the West Bank, is not represented on the phylogenetic tree, but it most likely belongs to the spelta sub-lineage, because of the marker similarity to H06 and H07 (Fig. S5). Finally, H04, an intermediate between H01/H02 and H05/H06/H07, was found Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

Research 9

only in ssp. dicoccoides. Without cp sequence information, it is not known how closely H03, H04 and H05 are related to the other Emmer haplotypes. The European spelt cytoplasm was inherited from the rare T. turgidum ssp. dicoccoides haplotype H07 from southeastern Turkey. The macha cytoplasm was inherited from wild T. turgidum ssp. dicoccoides haplotype H04 found throughout the Middle East. Because the geographic distributions of wild Emmer and A. tauschii do not overlap (Nesbitt & Samuel, 1996), the nonfree-threshing European spelt and macha were more likely to have been formed by concomitant plasmon and nuclear introgression carrying the q allele from the two nonfree-threshing wild Emmer types to ssp. aestivum, than directly derived from independent hexaploidization events involving wild Emmer. Because the H07-carrying wild Emmer accessions were only found in Turkey, the spelt wheat most probably originated there and radiated west to Europe more recently than the free-threshing form of ssp. aestivum. H04 was found in only one accession of domesticated free-threshing tetraploid wheat, T. turgidum ssp. turgidum. The reciprocal introgression events were also identified. Two accessions of European spelt carry H01, the major ssp. aestivum haplotype, but are nonfree-threshing. Conversely, H07 was found in one accession of free-threshing ssp. aestivum. Therefore, the introgression between tetraploid wild Emmer and hexaploid common wheat played important role in differentiation of other hexaploid forms in the Emmer lineage. Analysis of the plasmon inheritance based on a chloroplast genome sequence and haplotype sampling of a large plant collection provides an independent view of the origin and evolution of polyploid wheats. It sheds new light on the history of the B and G genomes, which were inherited from the female parents together with the plasmon. Inter- and intra-lineage introgression further enriched the evolutionary history of polyploid wheats.

Acknowledgements We thank the scientists listed in Table S1 for providing seed samples; we thank Bernd Friebe (Kansas State University) for C-banding analysis, Michael Bernard (INRA, France) and Michael D. Gale (John Innes Centre, UK) for providing DNA clones, Jiang Li (South Dakota State University) for computational assistance, William J. Buikema (University of Chicago Cancer Research DNA Sequencing Facility) for help with sequencing data analysis, and Robert Haselkorn (University of Chicago) for help with manuscript preparation. This research was supported by South Dakota Agricultural Experimental Station (Brooking, SD, USA), South Dakota Wheat Commission (Pierre, SD, USA), University of Chicago (Chicago, IL), and WGRC I/UCRC NSF award (IIP-1338897) to BSG. Kansas Agricultural Experiment Station contribution no. 14-339-J.

References Badaeva ED, Badaev NS, Gill BS, Filatenko AA. 1994. Intraspecific karyotype divergence in Triticum araraticum (Poaceae). Plant Systematics and Evolution 192: 117–145.

New Phytologist (2014) www.newphytologist.com

10 Research Badaeva ED, Friebe B, Gill BS. 1996a. Genome differentiation in Aegilops. 2. Physical mapping of 5S and 18S.26S ribosomal RNA genes. Genome 39: 1150– 1158. Badaeva ED, Friebe B, Gill BS. 1996b. Genome differentiationin Aegilops. 1. Distribution of highly repetitive DNA sequences onchromosomes of diploid species. Genome 39: 293–306. Bendich AJ. 1987. Why do chloroplasts and mitochondria contain so many copies of their genome. BioEssays 6: 279–282. Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Program NCS, Green ED, Sidow A, Batzoglou S. 2003. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Research 13: 721–731. Chalupska D, Lee HY, Faris JD, Evrard A, Chalhoub B, Haselkorn R, Gornicki P. 2008. Acc homoeoloci and the evolution of wheat genomes. Proceedings of the National Academy of Sciences, USA 105: 9691–9696. Conant GC, Wolfe KH. 2008. GenomeVx: simple web-based creation of editable circular chromosome maps. Bioinformatics 24: 861–862. Darriba D, Taboada GL, Doallo R, Posada D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9: 772. Daud HM, Gustafson JP. 1996. Molecular evidence for Triticum speltoides as B-genome progenitor of wheat (Triticum aestivum). Genome 39: 543–548. Dhaliwal HS. 1977. Ph gene and origin of tetraploid wheats. Genetica 47: 177–182. Dvorak J, Akhunov ED. 2005. Tempos of gene locus deletions and duplications and their relationship to recombination rate during diploid and polyploid evolution in the Aegilops-Triticum alliance. Genetics 171: 323–332. Dvorak J, Luo M-C, Yang Z-L, Zhang H-B. 1998. The structure of the Aegilops tauschii genepool and the evolution of hexaploid wheat. Theoretical and Applied Genetics 97: 657–670. Dvorak J, McGuire PE, Cassidy B. 1988. Apparent sources of the A genomes of wheats inferred from polymorphism in abundance and restriction fragment length of repeated nucleotide sequences. Genome 30: 680–689. Dvorak J, Zhang HB. 1990. Variation in repeated nucleotide sequences sheds light on the phylogeny of the wheat B and G genomes. Proceedings of the National Academy of Sciences, USA 87: 9640–9644. Dvorak J, Zhang HB. 1992. Reconstruction of the phylogeny of the genus Triticum from variation in repeated nucleotide-sequences. Theoretical and Applied Genetics 84: 419–429. Feldman M. 1966. The mechanism regulating pairing in Triticum timopheevii. Wheat Information Service 2: 1–2. Friebe B, Qi LL, Nasuda S, Zhang P, Tuleen NA, Gill BS. 2000. Development of a complete set of Triticum aestivum–Aegilops speltoides chromosome addition lines. Theoretical and Applied Genetics 101: 51–58. Gaut BS. 1998. Molecular clocks and nucleotide substitution rates in higher plants. Evolutionary Biology 30: 93–120. Gill BS, Friebe B. 2002. Cytogenetics, phylogeny and evolution of cultivated wheats. In: Curtis BC, Rajaram S, Gomez-Macperson H, eds. Bread wheat improvement and production. Rome, Italy: FAO, 71–88. Gill BS, Kimber G. 1974. Giemsa C-banding and the evolution of wheat. Proceedings of the National Academy of Sciences, USA 71: 4086–4090. Gordon D, Abajian C, Green P. 1998. Consed: a graphical tool for sequence finishing. Genome Research 8: 195–202. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology 59: 307–321. Haider N. 2013. The origin of the B-genome of bread wheat (Triticum aestivum L.). Russian Journal of Genetics 49: 263–274. Harlan JR, Zohary D. 1966. Distribution of wild wheats and barley. Science 153: 1074–1080. Hedgcoth C, El-Shehawi AM, Wei P, Clarkson M, Tamalis D. 2002. A chimeric open reading frame associated with cytoplasmic male sterility in alloplasmic wheat with Triticum timopheevi mitochondria is present in several Triticum and Aegilops species, barley, and rye. Current Genetics 41: 357–365. Huang S, Sirikhachornkit A, Su XJ, Faris J, Gill B, Haselkorn R, Gornicki P. 2002. Genes encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate kinase of the Triticum/Aegilops complex and the evolutionary history of New Phytologist (2014) www.newphytologist.com

New Phytologist polyploid wheat. Proceedings of the National Academy of Sciences, USA 99: 8133–8138. Jenkins JA. 1929. Chromosome homologies in wheat and Aegilops. American Journal of Botany 16: 238–245. Kerby K, Kuspira J. 1988. Cytological evidence bearing on the origin of the B genome in polyploid wheats. Genome 30: 36–43. Khakhlova O, Bock R. 2006. Elimination of deleterious mutations in plastid genomes by gene conversion. Plant Journal 46: 85–94. Kilian B, Ozkan H, Deusch O, Effgen S, Brandolini A, Kohl J, Martin W, Salamini F. 2007. Independent wheat B and G genome origins in outcrossing Aegilops progenitor haplotypes. Molecular Biology and Evolution 24: 217–227. Kimber G. 1974. A reassessment of the origin of the polyploid wheats. Genetics 78: 487–492. Kimber G, Athwal RS. 1972. Reassessment of course of evolution of wheat. Proceedings of the National Academy of Sciences, USA 69: 912–915. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R et al. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. Li WL, Huang L, Gill BS. 2008. Recurrent deletions of Puroindoline genes at the grain Hardness locus in four independent lineages of polyploid wheat. Plant Physiology 146: 200–212. Liu B, Segal G, Rong JK, Feldman M. 2003. A chromosome-specific sequence common to the B genome of polyploid wheat and Aegilops searsii. Plant Systematics and Evolution 241: 55–66. Mori N, Miyashita NT, Terachi T, Nakamura C. 1997. Variation in coxII intron in the wild ancestral species of wheat. Hereditas 126: 281–288. Nesbitt M, Samuel D. 1996. From staple crop to extinction? The archaeology and history of hulled wheat. In: Padulosi S, Hammer K, Heller J, eds. First International workshop on hulled wheats. Castelvecchio Pascoli, Tuscany, Italy: International Plant Genetic Resources Institute, Rome, Italy, 41–100. Ozkan H, Willcox G, Graner A, Salamini F, Kilian B. 2011. Geographic distribution and domestication of wild emmer wheat (Triticum dicoccoides). Genetic Resources and Crop Evolution 58: 11–53. Provan J, Wolters P, Caldwell KH, Powell W. 2004. High-resolution organellar genome analysis of Triticum and Aegilops sheds new light on cytoplasm evolution in wheat. Theoretical and Applied Genetics 108: 1182–1190. Rawal K, Harlan JR. 1975. Cytogenetic analysis of wild emmer populations from Turkey and Israel. Euphytica 24: 407–411. Riley R, Unrau J, Chapman V. 1958. Evidence on the origin of the B genome of wheat. Journal of Heredity 49: 90–98. Rozen S, Skaletsky H. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods in Molecular Biology 132: 365–386. Sachs L. 1953. Chromosome behaviour in species hybrids with Triticum timopheevi. Heredity 7: 49–58. Salamini F, Ozkan H, Brandolini A, Schafer-Pregl R, Martin W. 2002. Genetics and geography of wild cereal domestication in the Near East. Nature Reviews Genetics 3: 429–441. Salse J, Chague V, Bolot S, Magdelenat G, Huneau C, Pont C, Belcram H, Couloux A, Gardais S, Evrard A et al. 2008. New insights into the origin of the B genome of hexaploid wheat: evolutionary relationships at the SPA genomic region with the S genome of the diploid relative Aegilops speltoides. BMC Genomics 9: 555. Sarkar P, Stebbins GL. 1956. Morphological evidence concerning the origin of the B genome in wheat. American Journal of Botany 43: 297–304. Simons KJ, Fellers JP, Trick HN, Zhang Z, Tai YS, Gill BS, Faris JD. 2006. Molecular characterization of the major wheat domestication gene Q. Genetics 172: 547–555. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony Methods. Molecular Biology and Evolution 28: 2731–2739. Tanaka M, Kawahara T. 1976. Wild tetraploid wheats from northern Iraq cytogenetically closely related to each other. Wheat Information Service 43: 3–4. Triboush SO, Danilenko NG, Davydenko OG. 1998. A method for isolation of chloroplast DNA and mitochondrial DNA from sunflower. Plant Molecular Biology Reporter 16: 183–189.

Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist Tsunewaki K. 2009. Plasmon analysis in the Triticum-Aegilops complex. Breeding Science 59: 455–470. Tsunewaki K. 1991. Origin of polyploid wheat revealed by RFLP analysis. In: Sasakuma T, Kinoshita T, eds. Nuclear and organellar genomes of wheat species. Yokohama, Japan: Kihara Memorial Yokohama Foundation, 31–39. Van Slageren MV. 1994. Wild wheats: a monograph of Aegilops L. and Amblyopyrum (Jaub. & Spach) Eig (Poaceae). Wageningen, the Netherlands: Agricultural University. Wagenaar EB. 1961. Studies on genome constitution of Triticum timopheevi Zhuk. 1. Evidence for genetic control of meiotic irregularities in tetraploid hybrids. Canadian Journal of Genetics and Cytology 3: 47–60. Wagenaar EB. 1966. Studies on genome constitution of Triticum timopheevi Zhuk. 2. T. Timopheevi complex and its origin. Evolution 20: 150–164. Wang G-Z, Miyashita NT, Tsunewaki K. 1997. Plasmon analyses of Triticum (wheat) and Aegilops: PCR-single-strand conformational polymorphism (PCR-SSCP) analyses of organellar DNAs. Proceedings of the National Academy of Sciences, USA 94: 14570–14577. Wang JR, Luo MC, Chen ZX, You FM, Wei YM, Zheng YL, Dvorak J. 2013. Aegilops tauschii single nucleotide polymorphisms shed light on the origins of wheat D-genome genetic diversity and pinpoint the geographic origin of hexaploid wheat. New Phytologist 198: 925–937. Wolfe KH, Li WH, Sharp PM. 1987. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proceedings of the National Academy of Sciences, USA 84: 9054–9058. Wyman SK, Jansen RK, Boore JL. 2004. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20: 3252–3255. Yamane K, Kawahara T. 2005. Intra- and interspecific phylogenetic relationships among diploid Triticum-Aegilops species (Poaceae) based on base-pair substitutions, indels, and microsatellites in chloroplast noncoding sequences. American Journal of Botany 92: 1887–1898. Zhang W, Qu L, Gu H, Gao W, Liu M, Chen J, Chen Z. 2002. Studies on the origin and evolution of tetraploid wheats based on the internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA. Theoretical and Applied Genetics 104: 1099–1106. Zohary D, Feldman M. 1962. Hybridization between amphidiploids and the evolution of polyploids in the wheat (Aegilops–Triticum) group. Evolution 16: 44–61.

Supporting Information Additional supporting information may be found in the online version of this article.

Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

Research 11

Fig. S1 Organization of the Triticum and Aegilops chloroplast genomes (expanded Fig. 1). Fig. S2 Neighbor-joining phylogenetic tree of Emmer, Timppheevi, Speltoides and Sitopsis lineages based on synonymous substitutions in 70 concatenated protein coding sequences in SC blocks of the chloroplast genomes with Brachypodium, maize and Sorghum sequences as outgroups. Fig. S3 Neighbor-joining phylogenetic tree of Emmer, Timppheevi, Speltoides and Sitopsis lineages based on concatenated SC sequences other than protein coding exons with barley sequence as an outgroup. Fig. S4 Comparison of the best ML tree based on joined sequences of the LSC and SSC blocks and the NJ tree of Fig 2. Fig. S5 Haplotype tree based on 10 chloroplast markers. Table S1 Triticum and Aegilops species analyzed in this study. Table S2 Plasmon and nuclear markers. Table S3 Complete list of plasmon haplotypes and nuclear genotypes, and their prevalence in Emmer, Timopheevi, Speltoides and Sitopsis lineages (expanded Table 1). Table S4 Divergence times calculated based on synonymous substitutions in 70 concatenated protein coding sequences (NJ method) and all substitutions (ML method) in SC blocks of the chloroplast genome. Please note: Wiley Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

New Phytologist (2014) www.newphytologist.com

The chloroplast view of the evolution of polyploid wheat.

Polyploid wheats comprise four species: Triticum turgidum (AABB genomes) and T. aestivum (AABBDD) in the Emmer lineage, and T. timopheevii (AAGG) and ...
1008KB Sizes 2 Downloads 3 Views