Advance Publication by J-STAGE Genes & Genetic Systems
Received for publication: January 15, 2017 Accepted for publication: August 1, 2017 Published online: October 6, 2017
1
Complete chloroplast genome and 45S nrDNA sequences of the medicinal
2
plant species Glycyrrhiza glabra and Glycyrrhiza uralensis
3 4
Sang-Ho Kang1*, Jeong-Hoon Lee 2, Hyun Oh Lee3,4, Byoung Ohg Ahn5, So Youn Won1,
5
Seong-Han Sohn1 and Jung Sun Kim1
6 7
1
8
Jeonju, Jeollabuk-do 54874, Republic of Korea
9
2
Genomics Division, National Institute of Agricultural Sciences, 370 Nongsaengmyeong-ro,
Department of Herbal Crop Research, National Institute of Horticultural and Herbal Science,
10
92 Bisanro, Eumseong, Chungbuk-do 27709, Republic of Korea
11
3
12
University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
13
4
14
gil, Bundang-gu, Seongnam, Gyeonggi-do 13558, Republic of Korea
15
5
16
ro, Jeonju, Jeollabuk-do 54875, Republic of Korea
Department of Plant Science, College of Agriculture and Life Sciences, Seoul National
Phyzen Genomics Institute, 605 Baekgoong Plaza 1, Seongnam-daero 331 beon-
R&D Coordination Division, Rural Development Administration, 370 Nongsaengmyeong-
17 18
Running Head
19
Chloroplast and 45S nrDNA of Glycyrrhiza species
20 21
Key words: Glycyrrhiza species, chloroplast, 45S nrDNA, medicinal plant
22 23
*Corresponding author.
24
Sang-Ho Kang 1
25
Genomics Division, National Institute of Agricultural Sciences, 370 Nongsaengmyeong-ro,
26
Jeonju, Jeollabuk-do 54874, Republic of Korea
27
E-mail:
[email protected] 28
Tel: +82-63-238-4560
29
Fax: +82-63-238-4554
30 31
ABSTRACT
32 33
Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important
34
species that are native to Asia and Europe. Extracts from these plants are widely used as
35
natural sweeteners because of their much greater sweetness than sucrose. In this study, the
36
three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of
37
these two licorice species and an interspecific hybrid are presented. The chloroplast genomes
38
of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and
39
127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes,
40
including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA
41
sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G.
42
uralensis showed two types of nrDNA, while G. uralensis contained a single type. The
43
complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S
44
rRNA. We identified simple sequence repeat and tandem repeat sequences. We also
45
developed four reliable markers for analysis of Glycyrrhiza diversity authentication.
46 47
INTRODUCTION
48 49
Licorice is a perennial herb belonging to family Fabaceae. The genus Glycyrrhiza includes 2
50
about 18 species in Asia, Europe, and Americas. G. uralensis occurs from Central Asia to the
51
northeastern part of China, whereas G. glabra is distributed from southern Europe to the
52
northwestern part of China. The roots and stolons of Glycyrrhiza uralensis Fisch. and
53
Glycyrrhiza glabra L. produce the most important crude drugs in the world (Gibson, 1978),
54
mainly glycyrrhizin, an oleanane-type triterpene saponin. Glycyrrhiza plants have been used
55
traditionally as anti-inflammatory (Finney and Somers, 1958; Kroes et al., 1997), antiviral
56
(Fiore et al., 2008), antiallergy (Park et al., 2004) and antiulcer (He et al., 2001). Because
57
licorice extracts are ca. 150 times sweeter than sucrose (Kitagawa, 2002), it is also widely
58
used in the world as a natural sweetener, with an annual value of over US $42.1 million
59
(Parker, 2006). As a medicinal plant, correct authentication of licorice plant ingredients
60
provides their safe use.
61
Chloroplast (CP) genome sequences are of central importance to tracing plant taxonomy
62
and authentication because its sequences are highly conserved across plant species. CP, which
63
has its own genome, is composed of a large single-copy (LSC), a small single-copy (SSC)
64
and two inverted repeats (IRs) (Gary et al., 1984; Shinozaki et al., 1986; Leseberg and
65
Duvall, 2009). Interestingly, licorice belongs to the inverted repeat-lacking clade (IRLC)
66
(Wojciechowski et al., 2004) of papilionoid legumes characterized by the loss of one copy of
67
IR. To date, only the CP genome of G. glabra has been sequenced among the Glycyrrhiza
68
species (Sabir et al., 2014).
69
The sequence of the 45S nuclear ribosomal DNA (nrDNA), bearing the 18S-5.8S-26S
70
ribosomal RNA genes, also provides additional information that can be very useful in plant
71
taxonomy and DNA barcoding (Chen et al., 2014; Techen et al., 2014; Mishra et al., 2016).
72
Especially, internal transcribed spacer (ITS1 and ITS2) sequences in nrDNA are potential
73
barcodes (Álvarez and Wendel, 2003; Yao et al., 2010). Although it is valuable for medicinal 3
74
identification, there is a little information of their comparison and polymorphism between
75
Glycyrrhiza species.
76
In the current study, we analyzed complete sequences of the CP and nrDNA of
77
Glycyrrhiza species. In addition, we identified 160 polymorphic sites in the CP genome and
78
10 polymorphic sites in the nrDNA that are valuable for the identification and authentication
79
of G. glabra and G. uralensis as well as G. glabra × G. uralensis interspecific hybrids.
80
Despite of its useful applications as medicinal ingredients and food resources, there is limited
81
information regarding the complete chloroplast genomes and the nrDNA sequences of
82
Glycyrrhiza species. The results of this study will provide an insight into the genetic
83
relationships among the various species in the genus Glycyrrhiza.
84 85
MATERIALS AND METHODS
86 87
Plant materials and DNA extraction
88 89
European licorice (G. glabra L.; a female parent) and Chinese licorice (G. uralensis Fisch; a
90
male parent) were planted in the greenhouse and artificially crossed in May 2007. In June
91
2008, stolons were separated from F1 (G. glabra × G. uralensis) licorice seedlings and
92
cultivated, resulting in 32 clonal lines of interspecific hybrids. The aerial parts of Glycyrrhiza
93
species were collected from Eumseong (36° 56´ 38.68´´ N, 127° 45´ 17.60´´ E), and
94
identified by Dr. JH Lee from the Department of Herbal Crop Research, National Institute of
95
Horticultural and Herbal Science, Rural Development Administration. Voucher specimens (G.
96
glabra: MPS000350-1, G. uralensis: MPS004535, G. glabra × G. uralensis F1: MPS002499)
97
are deposited at Korea Medicinal Resources Herbarium, Eumseong, Korea. Total DNA was
98
extracted from the young and fully expanded leaves of Glycyrrhiza species using the 4
99
modified cetyltrimethylammonium bromide (CTAB) method (Allen et al., 2006). DNA purity
100
and concentration was checked by electrophoresis analysis on 1.2% agarose gel and by
101
DropSense96 Spectrophotometer (Trinean, Belgium). High quality DNA (concentration >
102
100 ng/µl; A260/230 > 1.7; A260/280 = 1.8~2.0) was used for further analysis.
103 104
Illumina sequencing and de novo assembly of CP and nrDNA
105 106
Paired-end (PE) library was constructed with insert size ranging from 280 to 430 bp and
107
following the manufacturer’s specified protocols in TruSeq PE cluster Kit. The PE libraries
108
were sequenced using the Illumina genome analyzer (Hiseq1000, Illumina, USA) platform at
109
the in-house facility (Genomics Division, NAS, Korea). CP genome and nrDNA de novo
110
assembly was accomplished using approaches described in Kim et al. (Kim et al., 2015). In
111
short, sequences of low quality were trimmed lower than Phred scores of 20 using CLC
112
quality trim software. The remaining high quality sequences were assembled into contigs
113
using a CLC genome assembler beta 4.06 (CLC Inc., Rarhus, Denmark) with a minimum of
114
150–500 bp autonomously controlled overlap size at Phyzen Inc. (Seongnam, South Korea).
115
The obtained CP genome sequence was assembled using the G. glabra (KF201590) genome
116
as a reference sequence. The assembled nrDNA contig fully covered 45S nrDNA cistron unit
117
and partially covered intergenic spacer sequence.
118 119
Gene annotation, SNP genotyping and repeat sequence analysis
120 121
CP sequence was annotated using DOGMA (Jansen Lab, UT Austin, TX, USA) (Wyman et
122
al., 2004) and BLAST searches. The tRNA genes were identified using DOGMA and 5
123
tRNAscanSE (Lowe Lab, UCSC University, Santa Cruz, CA., USA) (Schattner et al., 2005).
124
The
125
(http://ogdraw.mpimp-golm.mpg.de/) (Lohse et al., 2007). Repeats in the CP sequence of the
126
Glycyrrhiza species were investigated using Tandem repeat finder, version 4.0 (LBI, Boston
127
University, Boston, MA., USA) (Benson, 1999) with 100% similarity and minimum size of
128
10 bp, respectively. Simple sequence repeat (SSR) motifs with a minimum size 10 bp were
129
identified using MISA (http://pgrc.ipk-gatersleben.de/misa/).
circular
CP
genome
map
was
constructed
using
the
OGDraw
software
130 131
Sequence divergence analysis
132 133
CP genome of G. glabra (KF201590) in Fabaceae family was downloaded from NCBI
134
database and aligned using MAFFT version 7 (http://mafft.cbrc.jp/alignment/server/).
135
Comparison of the four CP genomes among G. glabra (KU891817), G. uralensis
136
(KU862308), G. glabra × G. uralensis (KU862307) and G. glabra (KF201590) was
137
performed using mVISTA program in Shuffle-LAGAN mod (Frazer et al., 2004).
138 139
Identification of polymorphisms that can distinguish Glycyrrhiza species
140 141
Four PCR primers (Supplementary Table S1) were designed based on CP InDels and nrDNA
142
specific sequence regions among Glycyrrhiza species. These primers were used to distinguish
143
G. glabra and G. uralensis as well as G. glabra × G. uralensis. The PCR conditions were 4
144
min at 94°C followed by 38 cycles of 94 °C for 30 s, 60 °C for 30 s and 72 °C for 15 s,
145
followed by a final extinction at 72 °C for 1 min. Gel electrophoresis was performed using 1%
146
agarose gel stained with a fluorescent dye.
147 6
148
RESULTS AND DISCUSSION
149 150
After sequencing, we employed a combination of de novo assembly and reference-guided
151
strategies using Illumina PE reads ranging from 587 to 741 Mbp, which represents
152
approximately 226X ~ 400X CP genome coverages. The complete CP genomes of G. glabra,
153
G. uralensis and G. glabra × G. uralensis were circular at 127,895 bp, 127,716 bp, and
154
127,939 bp in length, respectively (Table 1). The complete CP gene content and order were
155
identical among the Glycyrrhiza species (Fig. 1). These three CP genomes belong to the
156
IRLC (Wojciechowski et al., 2004) of papilionoid legumes where the loss of one copy of the
157
IR occurred. The Glycyrrhiza CP genomes harbor 110 unique genes, including 76 protein-
158
coding genes, 30 tRNA genes, and 4 rRNA genes (Table 2). Among them, 9 protein-coding
159
and 6 tRNA genes contain single intron, but one gene (ycf3) contains two introns. infA, rpl22
160
and rps16 were absent in Glycyrrhiza species. Two CP encoded genes, infA and rpl22, were
161
missing from the CP genome of legumes (Doyle et al., 1995) but present in the nucleus
162
(Gantt et al., 1991). Loss of rps16 gene from CP DNA in Medicago and Populus was
163
identified (Ueda et al., 2008). Whole genome alignments of Glycyrrhiza species with the
164
annotation of G. glabra (KF201590) (Sabir et al., 2014) as a reference using mVISTA
165
revealed their sequence variations (Fig. 2). The whole CP genome alignment showed that the
166
coding region is more conserved than the intergenic region as in the case of most
167
angiosperms. Analysis of sequence variation between G. glabra (KF201590) and G. glabra
168
(KU891817) showed 30 single nucleotide polymorphisms (SNPs) and 24 insertions-deletions
169
(InDels). These SNPs and InDels may provide the information in authentication of
170
Glycyrrhiza species. The CP genome of G. glabra × G. uralensis shared 99.98 and 99.85%
171
similarity with G. glabra and G. uralensis, respectively, indicating that Glycyrrhiza species 7
172
also follow the mode of maternal plastid inheritance (Hagemann et al., 2004).
173
The nrDNA sequences were assembled into a single contig that ranged between
174
5,947 bp and 5,948 bp in length. G. glabra and G. glabra × G. uralensis showed two types of
175
nrDNA, while G. uralensis contained a single type of nrDNA (Table 1). The complete nrDNA
176
sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA (Fig. 3). The
177
average GC content ranged between 53.86 ~ 53.91%, which is almost identical among the
178
five nrDNA (Fig. 3).
179
Repeat sequences in the CP genomes of G. glabra, G. uralensis and G. glabra × G.
180
uralensis were analyzed using Tandem repeat finder, version 4.0. A total of 20 unique
181
sequences of tandem repeats were detected from the Glycyrrhiza CP genomes
182
(Supplementary Table S2). The lengths of tandem repeats in the CP genomes ranged from 11
183
to 39 bp, and most of the tandem repeats appear in 2 copies. Like in Bupleurum falcatum
184
(Shin et al., 2016), most of the tandem repeat sequences were identified in the non-coding
185
regions, with only three genic regions namely rps11, rpl20 and ycf1 containing tandem repeat
186
sequences. Tandem repeat sizes identified in Glycyrrhiza CP genomes were dominantly less
187
than 40 bp, while it is sufficient for illegitimate recombination (Sherman-Broyles et al., 2014).
188
SSRs, also known as microsatellites, frequently occur in the CP genomes. In this study,
189
mononucleotide SSRs were excluded. We identified 350, 349 and 352 SSRs with a length of
190
at least 10 bp from G. glabra, G. uralensis and G. glabra × G. uralensis, respectively (Fig. 4).
191
Among the SSRs, the pentanucleotide SSRs were the most commonly detected from the CP
192
genomes, accounting for 84% of total SSRs. Di-, tri- and tetra-nucleotides repeats were
193
composed of A or T at a higher level, which reflects A-T richness in the CP genomes (Zhang
194
et al., 2011; Yi and Kim, 2012). These SSRs may further serve as genetic markers for
195
phylogenetic and medicinal plant authentication studies (Zhang et al., 2016). 8
196
We detected 160 and 10 SNPs from the Glycyrrhiza CP genomes and nrDNA,
197
respectively (Supplementary Table S3 and S4). Like SSRs, most SNPs in chloroplast are
198
identified in the non-coding regions, whereas SNPs in nrDNA were detected in ITS1, ITS2
199
and 26S. Furthermore, we identified 83 InDels from the Glycyrrhiza CP genomes. PCR
200
primers were designed based on InDels and specific sequence regions (Supplementary Table
201
S1). We successfully amplified four PCR products that can distinguish between G. blabra and
202
G. uralensis species (Fig. 5). The primer pairs of ycf3F01/ycf3R01, atpHF01/atpHR01 and
203
ycf2F01/ycf2R01 amplified PCR products in Glycyrrhiza CP genomes. On the other hand,
204
5.8SF01/5.8SR01 primer pair amplified PCR product only in G. glabra and G. glabra × G.
205
uralensis in nrDNA. These primers will be used as Glycyrrhiza authentication markers.
206
In this study, the complete Glycyrrhiza CP genomes and nrDNA have been sequenced.
207
These genomes belong to the IRLC of papilionoid legumes that is characterized by the loss of
208
one copy of IR. The complete CP genomes of G. glabra, G. uralensis and G. glabra × G.
209
uralensis were 127,895 bp, 127,716 bp, and 127,939 bp in size, respectively. The
210
nrDNA sequences ranged between 5,947 bp and 5,948 bp in length. G. glabra and G.
211
glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a
212
single type of nrDNA. We developed four reliable markers for the analysis of
213
Glycyrrhiza diversity authentication. This study will open up further avenues of research
214
to a better understand of the molecular ecology and molecular phylogeny within Glycyrrhiza
215
species.
216 217
ACKNOWLEDGEMENTS
9
218
The authors thank the National Institute of Agricultural Sciences (NAS) Genome
219
Sequencing Core facility for their services. This work was carried out with the support of the
220
National Institute of Agricultural Sciences (Project No. PJ010889), Republic of Korea.
221 222
REFERENCES
223
Allen, G.C., Flores-Vergara, M.A., Krasynanski, S., Kumar, S., and Thompson, W.F. (2006) A
224
modified protocol for rapid DNA isolation from plant tissues using cetyltrimethyl-
225
ammonium bromide. Nat. Protoc. 1, 2320-2325.
226 227 228 229
Álvarez, I., and Wendel, J.F. (2003) Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet. Evol. 29, 417-434. Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573-580.
230
Chen, S., Pang, X., Song, J., Shi, L., Yao, H., Han, J., and Leon, C. (2014) A renaissance in
231
herbal medicine identification: from morphology to DNA. Biotechnology Adv. 32,
232
1237-1244.
233 234 235 236 237 238 239 240 241
Doyle, J.J., Doyle, J.L., and Palmer, J.D. (1995) Multiple independent losses of two genes and one intron from legume chloroplast genomes. Syst. Bot. 20, 272-294. Finney, R.S.H., and Somers, G.F. (1958) The anti-inflammatory activity of glycyrrhetinic acid and derivatives. J. Pharm. Pharmacol. 10, 613-620. Fiore, C., Eisenhut, M., Krausse, R., Ragazzi, E., Pellati, D., Armanini, D., and Bielenberg, J. (2008) Antiviral effects of Glycyrrhiza species. Phytother. Res. 22, 141-148. Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M., and Dubchak, I. (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32, W273-W279. Gantt, J.S., Baldauf, S.L., Calie, P.J., Weeden, N.F., and Palmer, J.D. (1991) Transfer of 10
242
rpl22 to the nucleus greatly preceded its loss from the chloroplast and involved the gain
243
of an intron. EMBO J. 10, 3073-3078.
244
Gary, M.W., Sankoff, D., and Cedergren, R.J. (1984) On the evolutionary descent of
245
organisms and organelles: a global phylogeny based on a highly conserved structural
246
core in small subunit ribosomal RNA. Nucleic Acids Res. 12, 5837-5852.
247
Gibson, M.R. (1978) Glycyrrihza in old and new perspectives. Lloydia 41, 348-354.
248
Hagemann, R. (2004) The sexual inheritance of plant organelles. In Molecular Biology and
249
Biotechnology of Plant Organelles. (eds.: Daniell, H., and Chase, C.), pp. 99-113.
250
Springer, Heidelberg.
251
He, J.X., Akao, T., Nishino, T., and Tani, T. (2001) The influence of commonly prescribed
252
synthetic drugs for peptic ulcer on the pharmacokinetic fate of glycyrrhizin from
253
Shaoyao-Gancao-tang. Biol. Pharm. Bull. 24, 1395-1399.
254
Kim, K., Lee, S.C., Lee, J., Yu, Y., Yang, K., Choi, B.S., Koh, H.J., Waminal, N.E., Choi, H.I,
255
Kim, N.H., et al. (2015) Complete chloroplast and ribosomal sequences for 30
256
accessions elucidate evolution of Oryza AA genome species. Sci. Rep. 28, 15655.
257
Kitagawa, I. (2002) Licorice root. A natural sweetener and an important ingredient in Chinese
258
medicine. Pure Appl. Chem. 74, 1189-1198.
259
Kroes, B.H., Beukelman, C.J., van den Berg, A.J.J., Wolbink, G.J., van Dijk, H., and Labadie,
260
R.P. (1997) Inhibition of human complement by β-glycyrrhetinic acid. Immunology 90,
261
115-120.
262
Leseberg, C.H., and Duvall, M.R. (2009) The complete chloroplast genome of coix lacryma-
263
jobi and a comparative molecular evolutionary analysis of plastomes in cereals. J. Mol.
264
Evol. 69, 311- 318.
265
Lohse, M., Drechsel, O., and Bock, R. (2007) OrganellarGenomeDRAW (OGDRAW): a tool 11
266
for the easy generation of high-quality custom graphical maps of plastid and
267
mitochondrial genomes. Curr. Genet. 52, 267-274.
268
Mishra, P., Kumar, A., Nagireddy, A., Mani, D.N., Shukla, A.K., Tiwari, R., and Sundaresan,
269
V. (2016) DNA barcoding: and efficient tool to overcome authentication challenges in
270
the herbal market. Plant Biotechnol. J. 14, 8-21.
271 272 273 274
Park, H.Y., Park, S.H., Yoon, H.K., Han, M.J., and Kim, D.H. (2004) Anti-allergic activity of 18β-glycyrrhetinic acid-3-O-β-D-glucuronide. Arch. Pharm. Res. 27, 57-60. Parker, P.M. (2006) The World Market for Licorice Roots: A 2007 Global Trade Perspective. ICON Group International Inc, San Diego.
275
Sabir, J., Schwarz, E., Ellison, N., Zhang, J., Baeshen, N.A., Mutwakil, M., Jansen, R., and
276
Ruhlman, T. (2014) Evolutionary and biotechnology implications of plastid genome
277
variation in the inverted-repeat-lacking clade of legumes. Plant Biotechnol. J. 12, 743-
278
754.
279
Schattner, P., Brooks, A.N., and Lowe, T.M. (2005) The tRNAscan-SE, snoscan and snoGPS
280
web servers for the detection of tRNAs and snoRNAs. Nucleic Acid Res. 33, W686-
281
W689.
282
Sherman-Broyles, S., Bombarely, A., Grimwood, J., Schmutz, J., and Doyle, J. (2014)
283
Complete plastome sequences from Glycine syndetika and six additional perennial wild
284
relatives of soybean. G3(Bethesda) 4, 2023-2033.
285
Shin, D.H., Lee, J.H., Kang, S.H., Ahn, B.O., and Kim, C.K. (2016) The complete
286
chloroplast genome of the Hare’s root, Bupleurum falcatum: its molecular features.
287
Genes(Basel) 7, pii E20 doi:10.3390/genes7050020.
288
Shinozaki, K., Ohme, M., Tanaka, M., Wakasugi, T., Hayashida, N., Matsubayashi, T., Zaita,
289
N., Chunwongse, J., Obokata, J., Yamaguchi-Shinozaki, K., et al. (1986) The complete 12
290
nucleotide sequence of the tobacco chloroplast genome: its gene organization and
291
expression. EMBO J. 5, 2043-2049.
292 293
Techen, N., Parveen, I., Pan, Z., and Khan, I.A. (2014) DNA barcoding of medicinal plant material for identification. Curr. Opin. Biotechnol. 25, 103-110.
294
Ueda, M., Nishikawa, T., Fujimoto, M., Takanashi, H., Arimura, S., Tsutsumi, N., and
295
Kadowaki, K. (2008) Substitution of the gene for chloroplast RPS16 was assisted by
296
generation of a dual targeting signal. Mol. Biol. Evol. 25, 1566-1575.
297
Wojciechowski, M.F., Lavin, M., and Sanderson, M.J. (2004) A phylogeny of legumes
298
(Leguminosae) based on analysis of the plastid matK gene resolves many well-
299
supported subclades within the family. Am. J. Bot. 91, 1846-1862.
300 301
Wyman, S.K., Jansen, R.K., and Boore, J.L. (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20, 3252-3255.
302
Yao, H., Song, J., Liu, C., Luo, K., Han, J., Li, Y., Pang, X., Xu, H., Zhu, Y., Xiao, P., et al.
303
(2010) Use of ITS2 region as the universal DNA barcode for plants and animals. PLoS
304
One 5, e13102.
305 306
Yi, D.K., and Kim, K.J. (2012) Complete chloroplast genome sequences of important oilseed crop Sesamum indicum L. PLoS One 7, e35872.
307
Zhang, Y., Du, L., Liu, A., Chen, J., Wu, L., Hu, W., Zhang, W., Kim, K., Lee, S.C., Yang,
308
T.J., et al. (2016) The complete chloroplast genome sequences of five Epimedium
309
species: lights into phylogenetic and taxonomic analyses. Front. Plant Sci. 7,
310
doi:10.3389/fpls.2016.00306.
311
Zhang, Y.J., Ma, P.F., and Li, D.Z. (2011) High-throughput sequencing of six bamboo
312
chloroplast genomes: phylogenetic implications for temperate woody bamboos
313
(Poaceae: Bambusoideae). PLoS One 6, e20596. 13
314
14
315
Table 1. Statistics of WGS and assembly summary of three Glycyrrhiza species Scientific name
Amount (Mbp)
CP genome
nrDNA
Length (bp)
Coverage (X)
GenBank Acc. No.
G. glabra
741.68
127,895
367.81
KU891817
G. uralensis G. glabra x G. uralensis
721.47
127,716
225.95
KU862308
587.42
127,939
399.91
KU862307
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 15
Length (bp) Type 1 5,947 Type 2 5,947 Type 1 5,948 Type 1 5,948 Type 2 5,947
Coverage (X) 616.43 600.79 1259.83 739.21 684.44
GenBank Acc. No. KX530462 KX530463 KX530461 KX530459 KX530460
335
Table 2. Gene composition in Glycyrrhiza CP genome Category of Gene G roup
Group of Genes Ribosomal RNAs
Name of Genes 16S (rrn16), 23S (rrn23) 4.5S (rrn4.5), 5S (rrn5) trnA-UGC †, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trn fM-CAU, trnG-GCC, trnG-UCC †, trnH-GUG, trnl-CAU, trnI-G
Transfer RNAs
AU †, trnK-UUU †, trnL-UAA †, trnL-UAG, trnL-CAA, trnMCAU, trnM-GUU, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-U CU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, tr
Self replication
nV-UAC †, trnV-GAC, trnW-CCA, trnY-GUA Small subunit of ri
rps2, rps3, rps4, rps7, rps8, rps11, rps12 †, rps14, rps15, rps
bosome
18, rps19
Large subunit of ri bosome
rpl2 †, rpl14, rpl16 †, rpl20, rpl23, rpl32, rpl33, rpl36
RNA polymerase
rpoA, rpoB, rpoC1 †, rpoC2
NADH-
ndhA †, ndhB †, ndhC, ndhD, ndhE
dehydrogenase
ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Photosystem I
psaA, psaB, psaC, psaI, psaJ, ycf3 # psbA, psbB, psbC, psbD, psbE, psbF
Photosynthesis
Photosystem II
psbH, psbI, psbJ, psbK, psbL, psbM psbN, psbT, psbZ
Cytochrome b6/f
petA, petB †, petD †, petG, petL, petN
ATP synthase
atpA, atpB, atpE, atpF †, atpH, atpI
Rubisco
rbcL
Other genes Unknown function
accD, ccsA, cemA, clpP, matK ORFs
¥
ycf1, ycf2, ycf4
336
†
indicates the existence of single intron in the corresponding genes;
337
#
indicates the existence of two introns in the corresponding genes;
338
¥
indicates open reading frames.
339 340 341 342 343 344 345 16
346
Figure Legends
347
Fig.1 The map of the CP genome of the Glycyrrhiza species. Genes shown outside the circle are
348
transcribed clockwise, while those drawn inside the circle are counterclockwise. Functionally-
349
annotated genes are seen in colored portions. The darker gray area in the inner circle shows the GC
350
content.
351 352
Fig.2 Comparison of the CP genome of G. glabra, G. uralensis and G. glabra × G. uralensis using G.
353
glabra (KF201590) as a reference sequence. The top line shows the order of genes (transcriptional
354
direction indicated by arrow). Genome regions are color coded as follows: conserved gene = blue,
355
tRNA and rRNA = sky blue and intergenic region = red.
356 357
Fig.3 Schematic diagram of nrDNA cistron unit of five Glycyrrhiza sequences. (A) Mapped read
358
depth of the nrDNA cistron unit sequences. (B) GC content plot was drawn with a window size of 40
359
nucleotides by UGENE program.
360 361
Fig.4 Number of simple sequence repeats in the Glycyrrhiza CP genomes. Classification of SSRs by
362
repeat types in G. glabra (A), G. uralensis (B) and G. glabra × G. uralensis (C).
363 364
Fig.5 Validation of InDel and sequence specific polymorphic sites. PCR analysis of InDel regions
365
from CP genome and sequence specific regions from nrDNA. M indicates 100-bp size marker; GG,
366
GU and F1 correspond to G. glabra, G. uralensis, G. glabra × G. uralensis, respectively. 1-4
367
represents ycf3F01-ycf3R01 primer pair, atpHF01-atpHR01 primer pair, ycf2F01-ycf2R01 primer
368
pair and 5.8SF01-5.8SR01 primer pair, respectively.
369
nrDNA based markers, respectively.
17
a, b
PCR product is derived from CP genome and
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Supplementary Table S1. Primers that detect polymorphism among Glycyrrhiza species Product size (bp) Primer Primer sequence
Location
Name
G. glabra x G. glabra
G. uralensis G. uralensis
ycf3F01a
GGGCGTTTTGAATAAGAACA
ycf3R01a
TGACTGATGGGGACAACAAA
atpHF01a
TCAATTGACTAACCAATTCAAACAA
atpHR01a
AACTCGCACACACTCCCTTT
ycf2F01a
GTATCGAAAGGCCCAATGAA
ycf2R01a
GTTCCACCCTGCAAGAACTC
5.8SF01b
CAGACCGTTGCCCGACAA
5.8SR01b
GTCTCATCACGAGCGTTCAA
a, b
ycf3 ~ psaA
300
251
300
atpH ~ atpF
445
464
440
ycf2 intron
331
382
358
5.8S ~ ITS2
227
No product
227
PCR product is derived from CP genome and nrDNA based markers, respectively.
Supplementary Table S2. Tandem repeat analysis in the Glycyrrhiza species used in this study Copy number Unit No.
Tandem repeat unit sequence
G.
G.
G. glabra x
glabra
uralensis
G. uralensis
Position
length
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GCTATTAATTAATTT
15
1.9
1.9
1.9
trnK-UUU ~ rbcL
AATTAAATTCAATAT
15
2.1
2.1
2.1
trnK-UUU ~ rbcL
AAAAGAATATTAAT
14
2.1
2.1
2.1
trnL-UAA ~ trnTUGU
AAAATATTATTAA
13
2.1
2.1
2.1
trnL-UAA ~ trnTUGU
ATATCAAAATAGATGAAG
18
2.1
3.1
2.1
trnL-UAA ~ trnTUGU
AATATCAAATAAAT
14
2.1
2.1
2.1
trnL-UAA ~ trnTUGU
TCTGATTTCTAGTATAAT
18
-
2.1
-
petN ~ trnC-GCA
TTGAATATAATTCAAAATA TTAA
23
2
2
2
atpA ~ trnR-UCU
TAGAAGATATAAT
13
-
2
-
trnQ-UUG ~ accD
ACATATATAGTG
12
2.1
2.1
2.1
trnQ-UUG ~ accD
AAATAGAAGATTTAAGTG AATCAAAAAACC
30
2.1
2.1
2.1
psaJ ~ rpl33
TCTTTTAATTCTGGTCATTG
20
2
2
2
rpl20
TAGAAATATTCTATTAAA
18
2.1
2.1
3.1
rps12 ~ clpP
TTATATTGTAACTATAATC ACTA
23
2
2
2
rps12 ~ clpP
ACTATTTTCTAAC
13
2
2
2
rps11 ~ rpl36
AGAATTAATAT
11
2.5
2.5
2.5
rps11 ~ rpl36
AATAATAAATAATCAAATC ATTATAA
26
1.9
1.9
1.9
trnN-GUU ~ ycf1
ATATATTTAAATAT
14
3
3
3
ycf1 ~ rps15
ATAATTTTATATTAACGAT AAGTATAGTAATTGATTAT T
39
-
2.4
-
trnL-UAG ~ rpl32
TATCAAAATAAACGAATG
18
2.1
2.1
2.1
rpl32 ~ ndhF
Supplementary Table S3. Summary of nucleotide polymorphisms in Glycyrrhiza CP genomes No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Site 165 211 1762 3841 4575 5046 5316 6126 6156 9685 9823 10738 10824 13847 14478 14545 14798 15046 15171 17174 17244 18076 18143 18770 20425 20749 20848 22426 23510 24218 24695 25760 25897 27109 27110 27112 27113 27114 27115 27117 27118 27556 28012 29387 30206 30706 30707 30971
G. glabra A G A C A C C A T G T T T T C T G A A A T T T T A C G C T G T A T A G G G T G T A A T T C G C T
G. uralensis G T G C C A C T C C C A T A T G A T T T C A G G G G A G C T C T G T A C A C C C T G C C T C T G
G. glabra x G. uralensis A G A G A C T A T G T T G T T T G A A A C T T T A C G C T G T T T A G G G T G T A A T T C G T T
Location trnH-GUG~psbA trnH-GUG~psbA psbA~trnK-UUU matK~trnK-UUU trnK-UUU~rbcL trnK-UUU~rbcL trnK-UUU~rbcL rbcL rbcL atpE~trnM-CAU trnM-CAU~trnV-UAC trnV-UAC~ndhC trnV-UAC~ndhC trnF-GAA~trnL-UAA trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU rps4~trnS-GGA rps4~trnS-GGA ycf3~ycf3 ycf3~ycf3 ycf3~ycf3 ycf3~psaA psaA psaA psaA psaB psaB psaB trnfM-CAU~trnG-GCC trnG-GCC~psbZ trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC psbC psbC psbC~psbD psbD~trnT-GGU psbD~trnT-GGU psbD~trnT-GGU psbD~trnT-GGU
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
32530 34271 34566 34749 34958 35046 35370 35601 35762 35769 35787 35828 36164 36411 36892 40421 44199 44603 44690 45617 46647 48795 48799 49195 49338 50047 50293 50422 50512 52258 52435 52466 52486 53775 53884 53971 55213 56595 56756 57159 57205 57404 57489 57665 57927 58078 58773 59465 60303 60318 60566 60567
G A T G T T T A A G T C G C A A A C C G C A T C C C G T G T C T C A A G G T T C C A C C A A A G G A G A
A T G A C T C G C A C C G A G G G A T A A G G T T A A G A T T C T T G A T G A T T T G T G G G A G C G A
G A T G T C T A A G T A T C A A A A C A C A G C C C G G G C C T C A A G T T T C T A C C A A A G A A T C
trnY-GUA~trnD-GUC petN~trnC-GCA petN~trnC-GCA petN~trnC-GCA petN~trnC-GCA petN~trnC-GCA trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB rpoB rpoB rpoC1~rpoC1 rpoC2 rpoC2 rpoC2 rpoC2 rpoC2 atpI~atpH atpI~atpH atpI~atpH atpI~atpH atpH~atpF atpH~atpF atpF atpF atpA atpA atpA atpA trnR-UCU~trnG-UCC trnR-UCU~trnG-UCC trnG-UCC~trnG-UCC trnG-UCC~trnS-GCU psbK~trnQ-UUG psbK~trnQ-UUG trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD accD accD accD~psaI psaI psaI~ycf4 psaI~ycf4
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
62406 63387 65701 65812 65864 66231 67640 68179 69504 69631 69818 70097 72170 72445 72471 72705 79885 81141 81262 83184 83437 83460 87766 88758 88949 89050 89698 90420 90925 98588 98724 101638 101963 103516 107576 107877 108127 109159 110096 110196 110273 111543 112561 113959 114098 114217 117293 117370 117372 117373 117376 117377
A A G C G G A A C G A G T A G T G T G G A T A T C C T G G T C G G T T C T A G G T G A G A T G G A G A G
A T T A A A C C T C C T G G A C T A A A G G G G T A C A T C T A A G G C C T T T A C C C T G G A C T C T
T A G G G C C C G C G T A G T G T G G A T A G C C T G G T C G G T T G T A T G T G C G A T A G A G A G
petA petA~psbJ psbE~petL psbE~petL psbE~petL psbE~petL trnP-UGG~psaJ psaJ~rpl33 rps18~rpl20 rpl20 rpl20 rpl20~rps12 clpP~psbB psbB psbB psbB rpl36 rps8~rpl14 rpl14 rpl16~rpl16 rps3 rps3 ycf2 ycf2 ycf2 ycf2 ycf2 ycf2 ycf2 rps12~trnV-GAC rps12~trnV-GAC trnI-GAU trnI-GAU~trnI-GAU trnA-UGC~rrn23 trnR-ACG~trnN-GUU trnN-GUU trnN-GUU~ycf1 ycf1 ycf1 ycf1 ycf1 ycf1 ycf1 ycf1~rps15 ycf1~rps15 ycf1~rps15 ndhA~ndhA ndhA~ndhA ndhA~ndhA ndhA~ndhA ndhA~ndhA ndhA~ndhA
153 154 155 156 157 158 159 160
117379 117416 118533 119378 121060 122114 122429 127983
T A T A T T C T
C G C C C C T C
T A T A T T C T
ndhA~ndhA ndhA~ndhA ndhI ndhG ndhD ndhD ndhD ndhF
Supplementary Table S4. Summary of nucleotide polymorphisms in nrDNA No.
Site
Location
1 2 3 4 5 6 7 8 9 10
1881 1995 2219 2220 2221 2235 3837 5649 5661 5705
ITS1 ITS1 ITS2 ITS2 ITS2 ITS2 26S 26S 26S 26S
G. glabra x G. uralensis major type T C T G C A G T T T
G. glabra x G. uralensis minor type T T C A A A A T C C
G. uralensis T C T G C A G T T T
G. glabra major type T T C A A A A T C C
G. glabra minor type G T C A A C A C C C