Advance Publication by J-STAGE Genes & Genetic Systems

Received for publication: January 15, 2017 Accepted for publication: August 1, 2017 Published online: October 6, 2017

1

Complete chloroplast genome and 45S nrDNA sequences of the medicinal

2

plant species Glycyrrhiza glabra and Glycyrrhiza uralensis

3 4

Sang-Ho Kang1*, Jeong-Hoon Lee 2, Hyun Oh Lee3,4, Byoung Ohg Ahn5, So Youn Won1,

5

Seong-Han Sohn1 and Jung Sun Kim1

6 7

1

8

Jeonju, Jeollabuk-do 54874, Republic of Korea

9

2

Genomics Division, National Institute of Agricultural Sciences, 370 Nongsaengmyeong-ro,

Department of Herbal Crop Research, National Institute of Horticultural and Herbal Science,

10

92 Bisanro, Eumseong, Chungbuk-do 27709, Republic of Korea

11

3

12

University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea

13

4

14

gil, Bundang-gu, Seongnam, Gyeonggi-do 13558, Republic of Korea

15

5

16

ro, Jeonju, Jeollabuk-do 54875, Republic of Korea

Department of Plant Science, College of Agriculture and Life Sciences, Seoul National

Phyzen Genomics Institute, 605 Baekgoong Plaza 1, Seongnam-daero 331 beon-

R&D Coordination Division, Rural Development Administration, 370 Nongsaengmyeong-

17 18

Running Head

19

Chloroplast and 45S nrDNA of Glycyrrhiza species

20 21

Key words: Glycyrrhiza species, chloroplast, 45S nrDNA, medicinal plant

22 23

*Corresponding author.

24

Sang-Ho Kang 1

25

Genomics Division, National Institute of Agricultural Sciences, 370 Nongsaengmyeong-ro,

26

Jeonju, Jeollabuk-do 54874, Republic of Korea

27

E-mail: [email protected]

28

Tel: +82-63-238-4560

29

Fax: +82-63-238-4554

30 31

ABSTRACT

32 33

Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important

34

species that are native to Asia and Europe. Extracts from these plants are widely used as

35

natural sweeteners because of their much greater sweetness than sucrose. In this study, the

36

three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of

37

these two licorice species and an interspecific hybrid are presented. The chloroplast genomes

38

of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and

39

127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes,

40

including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA

41

sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G.

42

uralensis showed two types of nrDNA, while G. uralensis contained a single type. The

43

complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S

44

rRNA. We identified simple sequence repeat and tandem repeat sequences. We also

45

developed four reliable markers for analysis of Glycyrrhiza diversity authentication.

46 47

INTRODUCTION

48 49

Licorice is a perennial herb belonging to family Fabaceae. The genus Glycyrrhiza includes 2

50

about 18 species in Asia, Europe, and Americas. G. uralensis occurs from Central Asia to the

51

northeastern part of China, whereas G. glabra is distributed from southern Europe to the

52

northwestern part of China. The roots and stolons of Glycyrrhiza uralensis Fisch. and

53

Glycyrrhiza glabra L. produce the most important crude drugs in the world (Gibson, 1978),

54

mainly glycyrrhizin, an oleanane-type triterpene saponin. Glycyrrhiza plants have been used

55

traditionally as anti-inflammatory (Finney and Somers, 1958; Kroes et al., 1997), antiviral

56

(Fiore et al., 2008), antiallergy (Park et al., 2004) and antiulcer (He et al., 2001). Because

57

licorice extracts are ca. 150 times sweeter than sucrose (Kitagawa, 2002), it is also widely

58

used in the world as a natural sweetener, with an annual value of over US $42.1 million

59

(Parker, 2006). As a medicinal plant, correct authentication of licorice plant ingredients

60

provides their safe use.

61

Chloroplast (CP) genome sequences are of central importance to tracing plant taxonomy

62

and authentication because its sequences are highly conserved across plant species. CP, which

63

has its own genome, is composed of a large single-copy (LSC), a small single-copy (SSC)

64

and two inverted repeats (IRs) (Gary et al., 1984; Shinozaki et al., 1986; Leseberg and

65

Duvall, 2009). Interestingly, licorice belongs to the inverted repeat-lacking clade (IRLC)

66

(Wojciechowski et al., 2004) of papilionoid legumes characterized by the loss of one copy of

67

IR. To date, only the CP genome of G. glabra has been sequenced among the Glycyrrhiza

68

species (Sabir et al., 2014).

69

The sequence of the 45S nuclear ribosomal DNA (nrDNA), bearing the 18S-5.8S-26S

70

ribosomal RNA genes, also provides additional information that can be very useful in plant

71

taxonomy and DNA barcoding (Chen et al., 2014; Techen et al., 2014; Mishra et al., 2016).

72

Especially, internal transcribed spacer (ITS1 and ITS2) sequences in nrDNA are potential

73

barcodes (Álvarez and Wendel, 2003; Yao et al., 2010). Although it is valuable for medicinal 3

74

identification, there is a little information of their comparison and polymorphism between

75

Glycyrrhiza species.

76

In the current study, we analyzed complete sequences of the CP and nrDNA of

77

Glycyrrhiza species. In addition, we identified 160 polymorphic sites in the CP genome and

78

10 polymorphic sites in the nrDNA that are valuable for the identification and authentication

79

of G. glabra and G. uralensis as well as G. glabra × G. uralensis interspecific hybrids.

80

Despite of its useful applications as medicinal ingredients and food resources, there is limited

81

information regarding the complete chloroplast genomes and the nrDNA sequences of

82

Glycyrrhiza species. The results of this study will provide an insight into the genetic

83

relationships among the various species in the genus Glycyrrhiza.

84 85

MATERIALS AND METHODS

86 87

Plant materials and DNA extraction

88 89

European licorice (G. glabra L.; a female parent) and Chinese licorice (G. uralensis Fisch; a

90

male parent) were planted in the greenhouse and artificially crossed in May 2007. In June

91

2008, stolons were separated from F1 (G. glabra × G. uralensis) licorice seedlings and

92

cultivated, resulting in 32 clonal lines of interspecific hybrids. The aerial parts of Glycyrrhiza

93

species were collected from Eumseong (36° 56´ 38.68´´ N, 127° 45´ 17.60´´ E), and

94

identified by Dr. JH Lee from the Department of Herbal Crop Research, National Institute of

95

Horticultural and Herbal Science, Rural Development Administration. Voucher specimens (G.

96

glabra: MPS000350-1, G. uralensis: MPS004535, G. glabra × G. uralensis F1: MPS002499)

97

are deposited at Korea Medicinal Resources Herbarium, Eumseong, Korea. Total DNA was

98

extracted from the young and fully expanded leaves of Glycyrrhiza species using the 4

99

modified cetyltrimethylammonium bromide (CTAB) method (Allen et al., 2006). DNA purity

100

and concentration was checked by electrophoresis analysis on 1.2% agarose gel and by

101

DropSense96 Spectrophotometer (Trinean, Belgium). High quality DNA (concentration >

102

100 ng/µl; A260/230 > 1.7; A260/280 = 1.8~2.0) was used for further analysis.

103 104

Illumina sequencing and de novo assembly of CP and nrDNA

105 106

Paired-end (PE) library was constructed with insert size ranging from 280 to 430 bp and

107

following the manufacturer’s specified protocols in TruSeq PE cluster Kit. The PE libraries

108

were sequenced using the Illumina genome analyzer (Hiseq1000, Illumina, USA) platform at

109

the in-house facility (Genomics Division, NAS, Korea). CP genome and nrDNA de novo

110

assembly was accomplished using approaches described in Kim et al. (Kim et al., 2015). In

111

short, sequences of low quality were trimmed lower than Phred scores of 20 using CLC

112

quality trim software. The remaining high quality sequences were assembled into contigs

113

using a CLC genome assembler beta 4.06 (CLC Inc., Rarhus, Denmark) with a minimum of

114

150–500 bp autonomously controlled overlap size at Phyzen Inc. (Seongnam, South Korea).

115

The obtained CP genome sequence was assembled using the G. glabra (KF201590) genome

116

as a reference sequence. The assembled nrDNA contig fully covered 45S nrDNA cistron unit

117

and partially covered intergenic spacer sequence.

118 119

Gene annotation, SNP genotyping and repeat sequence analysis

120 121

CP sequence was annotated using DOGMA (Jansen Lab, UT Austin, TX, USA) (Wyman et

122

al., 2004) and BLAST searches. The tRNA genes were identified using DOGMA and 5

123

tRNAscanSE (Lowe Lab, UCSC University, Santa Cruz, CA., USA) (Schattner et al., 2005).

124

The

125

(http://ogdraw.mpimp-golm.mpg.de/) (Lohse et al., 2007). Repeats in the CP sequence of the

126

Glycyrrhiza species were investigated using Tandem repeat finder, version 4.0 (LBI, Boston

127

University, Boston, MA., USA) (Benson, 1999) with 100% similarity and minimum size of

128

10 bp, respectively. Simple sequence repeat (SSR) motifs with a minimum size 10 bp were

129

identified using MISA (http://pgrc.ipk-gatersleben.de/misa/).

circular

CP

genome

map

was

constructed

using

the

OGDraw

software

130 131

Sequence divergence analysis

132 133

CP genome of G. glabra (KF201590) in Fabaceae family was downloaded from NCBI

134

database and aligned using MAFFT version 7 (http://mafft.cbrc.jp/alignment/server/).

135

Comparison of the four CP genomes among G. glabra (KU891817), G. uralensis

136

(KU862308), G. glabra × G. uralensis (KU862307) and G. glabra (KF201590) was

137

performed using mVISTA program in Shuffle-LAGAN mod (Frazer et al., 2004).

138 139

Identification of polymorphisms that can distinguish Glycyrrhiza species

140 141

Four PCR primers (Supplementary Table S1) were designed based on CP InDels and nrDNA

142

specific sequence regions among Glycyrrhiza species. These primers were used to distinguish

143

G. glabra and G. uralensis as well as G. glabra × G. uralensis. The PCR conditions were 4

144

min at 94°C followed by 38 cycles of 94 °C for 30 s, 60 °C for 30 s and 72 °C for 15 s,

145

followed by a final extinction at 72 °C for 1 min. Gel electrophoresis was performed using 1%

146

agarose gel stained with a fluorescent dye.

147 6

148

RESULTS AND DISCUSSION

149 150

After sequencing, we employed a combination of de novo assembly and reference-guided

151

strategies using Illumina PE reads ranging from 587 to 741 Mbp, which represents

152

approximately 226X ~ 400X CP genome coverages. The complete CP genomes of G. glabra,

153

G. uralensis and G. glabra × G. uralensis were circular at 127,895 bp, 127,716 bp, and

154

127,939 bp in length, respectively (Table 1). The complete CP gene content and order were

155

identical among the Glycyrrhiza species (Fig. 1). These three CP genomes belong to the

156

IRLC (Wojciechowski et al., 2004) of papilionoid legumes where the loss of one copy of the

157

IR occurred. The Glycyrrhiza CP genomes harbor 110 unique genes, including 76 protein-

158

coding genes, 30 tRNA genes, and 4 rRNA genes (Table 2). Among them, 9 protein-coding

159

and 6 tRNA genes contain single intron, but one gene (ycf3) contains two introns. infA, rpl22

160

and rps16 were absent in Glycyrrhiza species. Two CP encoded genes, infA and rpl22, were

161

missing from the CP genome of legumes (Doyle et al., 1995) but present in the nucleus

162

(Gantt et al., 1991). Loss of rps16 gene from CP DNA in Medicago and Populus was

163

identified (Ueda et al., 2008). Whole genome alignments of Glycyrrhiza species with the

164

annotation of G. glabra (KF201590) (Sabir et al., 2014) as a reference using mVISTA

165

revealed their sequence variations (Fig. 2). The whole CP genome alignment showed that the

166

coding region is more conserved than the intergenic region as in the case of most

167

angiosperms. Analysis of sequence variation between G. glabra (KF201590) and G. glabra

168

(KU891817) showed 30 single nucleotide polymorphisms (SNPs) and 24 insertions-deletions

169

(InDels). These SNPs and InDels may provide the information in authentication of

170

Glycyrrhiza species. The CP genome of G. glabra × G. uralensis shared 99.98 and 99.85%

171

similarity with G. glabra and G. uralensis, respectively, indicating that Glycyrrhiza species 7

172

also follow the mode of maternal plastid inheritance (Hagemann et al., 2004).

173

The nrDNA sequences were assembled into a single contig that ranged between

174

5,947 bp and 5,948 bp in length. G. glabra and G. glabra × G. uralensis showed two types of

175

nrDNA, while G. uralensis contained a single type of nrDNA (Table 1). The complete nrDNA

176

sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA (Fig. 3). The

177

average GC content ranged between 53.86 ~ 53.91%, which is almost identical among the

178

five nrDNA (Fig. 3).

179

Repeat sequences in the CP genomes of G. glabra, G. uralensis and G. glabra × G.

180

uralensis were analyzed using Tandem repeat finder, version 4.0. A total of 20 unique

181

sequences of tandem repeats were detected from the Glycyrrhiza CP genomes

182

(Supplementary Table S2). The lengths of tandem repeats in the CP genomes ranged from 11

183

to 39 bp, and most of the tandem repeats appear in 2 copies. Like in Bupleurum falcatum

184

(Shin et al., 2016), most of the tandem repeat sequences were identified in the non-coding

185

regions, with only three genic regions namely rps11, rpl20 and ycf1 containing tandem repeat

186

sequences. Tandem repeat sizes identified in Glycyrrhiza CP genomes were dominantly less

187

than 40 bp, while it is sufficient for illegitimate recombination (Sherman-Broyles et al., 2014).

188

SSRs, also known as microsatellites, frequently occur in the CP genomes. In this study,

189

mononucleotide SSRs were excluded. We identified 350, 349 and 352 SSRs with a length of

190

at least 10 bp from G. glabra, G. uralensis and G. glabra × G. uralensis, respectively (Fig. 4).

191

Among the SSRs, the pentanucleotide SSRs were the most commonly detected from the CP

192

genomes, accounting for 84% of total SSRs. Di-, tri- and tetra-nucleotides repeats were

193

composed of A or T at a higher level, which reflects A-T richness in the CP genomes (Zhang

194

et al., 2011; Yi and Kim, 2012). These SSRs may further serve as genetic markers for

195

phylogenetic and medicinal plant authentication studies (Zhang et al., 2016). 8

196

We detected 160 and 10 SNPs from the Glycyrrhiza CP genomes and nrDNA,

197

respectively (Supplementary Table S3 and S4). Like SSRs, most SNPs in chloroplast are

198

identified in the non-coding regions, whereas SNPs in nrDNA were detected in ITS1, ITS2

199

and 26S. Furthermore, we identified 83 InDels from the Glycyrrhiza CP genomes. PCR

200

primers were designed based on InDels and specific sequence regions (Supplementary Table

201

S1). We successfully amplified four PCR products that can distinguish between G. blabra and

202

G. uralensis species (Fig. 5). The primer pairs of ycf3F01/ycf3R01, atpHF01/atpHR01 and

203

ycf2F01/ycf2R01 amplified PCR products in Glycyrrhiza CP genomes. On the other hand,

204

5.8SF01/5.8SR01 primer pair amplified PCR product only in G. glabra and G. glabra × G.

205

uralensis in nrDNA. These primers will be used as Glycyrrhiza authentication markers.

206

In this study, the complete Glycyrrhiza CP genomes and nrDNA have been sequenced.

207

These genomes belong to the IRLC of papilionoid legumes that is characterized by the loss of

208

one copy of IR. The complete CP genomes of G. glabra, G. uralensis and G. glabra × G. 

209

uralensis  were  127,895  bp,  127,716  bp,  and  127,939  bp  in  size,  respectively.  The 

210

nrDNA sequences ranged between 5,947 bp and 5,948 bp in length. G. glabra and G. 

211

glabra ×  G.  uralensis  showed  two  types  of  nrDNA,  while  G.  uralensis  contained  a 

212

single  type  of  nrDNA.  We  developed  four  reliable  markers  for  the  analysis  of 

213

Glycyrrhiza diversity authentication.  This study will open up further avenues of research

214

to a better understand of the molecular ecology and molecular phylogeny within Glycyrrhiza

215

species.

216 217

ACKNOWLEDGEMENTS

9

218

The authors thank the National Institute of Agricultural Sciences (NAS) Genome

219

Sequencing Core facility for their services. This work was carried out with the support of the

220

National Institute of Agricultural Sciences (Project No. PJ010889), Republic of Korea.

221 222

REFERENCES

223

Allen, G.C., Flores-Vergara, M.A., Krasynanski, S., Kumar, S., and Thompson, W.F. (2006) A

224

modified protocol for rapid DNA isolation from plant tissues using cetyltrimethyl-

225

ammonium bromide. Nat. Protoc. 1, 2320-2325.

226 227 228 229

Álvarez, I., and Wendel, J.F. (2003) Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet. Evol. 29, 417-434. Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573-580.

230

Chen, S., Pang, X., Song, J., Shi, L., Yao, H., Han, J., and Leon, C. (2014) A renaissance in

231

herbal medicine identification: from morphology to DNA. Biotechnology Adv. 32,

232

1237-1244.

233 234 235 236 237 238 239 240 241

Doyle, J.J., Doyle, J.L., and Palmer, J.D. (1995) Multiple independent losses of two genes and one intron from legume chloroplast genomes. Syst. Bot. 20, 272-294. Finney, R.S.H., and Somers, G.F. (1958) The anti-inflammatory activity of glycyrrhetinic acid and derivatives. J. Pharm. Pharmacol. 10, 613-620. Fiore, C., Eisenhut, M., Krausse, R., Ragazzi, E., Pellati, D., Armanini, D., and Bielenberg, J. (2008) Antiviral effects of Glycyrrhiza species. Phytother. Res. 22, 141-148. Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M., and Dubchak, I. (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32, W273-W279. Gantt, J.S., Baldauf, S.L., Calie, P.J., Weeden, N.F., and Palmer, J.D. (1991) Transfer of 10

242

rpl22 to the nucleus greatly preceded its loss from the chloroplast and involved the gain

243

of an intron. EMBO J. 10, 3073-3078.

244

Gary, M.W., Sankoff, D., and Cedergren, R.J. (1984) On the evolutionary descent of

245

organisms and organelles: a global phylogeny based on a highly conserved structural

246

core in small subunit ribosomal RNA. Nucleic Acids Res. 12, 5837-5852.

247

Gibson, M.R. (1978) Glycyrrihza in old and new perspectives. Lloydia 41, 348-354.

248

Hagemann, R. (2004) The sexual inheritance of plant organelles. In Molecular Biology and

249

Biotechnology of Plant Organelles. (eds.: Daniell, H., and Chase, C.), pp. 99-113.

250

Springer, Heidelberg.

251

He, J.X., Akao, T., Nishino, T., and Tani, T. (2001) The influence of commonly prescribed

252

synthetic drugs for peptic ulcer on the pharmacokinetic fate of glycyrrhizin from

253

Shaoyao-Gancao-tang. Biol. Pharm. Bull. 24, 1395-1399.

254

Kim, K., Lee, S.C., Lee, J., Yu, Y., Yang, K., Choi, B.S., Koh, H.J., Waminal, N.E., Choi, H.I,

255

Kim, N.H., et al. (2015) Complete chloroplast and ribosomal sequences for 30

256

accessions elucidate evolution of Oryza AA genome species. Sci. Rep. 28, 15655.

257

Kitagawa, I. (2002) Licorice root. A natural sweetener and an important ingredient in Chinese

258

medicine. Pure Appl. Chem. 74, 1189-1198.

259

Kroes, B.H., Beukelman, C.J., van den Berg, A.J.J., Wolbink, G.J., van Dijk, H., and Labadie,

260

R.P. (1997) Inhibition of human complement by β-glycyrrhetinic acid. Immunology 90,

261

115-120.

262

Leseberg, C.H., and Duvall, M.R. (2009) The complete chloroplast genome of coix lacryma-

263

jobi and a comparative molecular evolutionary analysis of plastomes in cereals. J. Mol.

264

Evol. 69, 311- 318.

265

Lohse, M., Drechsel, O., and Bock, R. (2007) OrganellarGenomeDRAW (OGDRAW): a tool 11

266

for the easy generation of high-quality custom graphical maps of plastid and

267

mitochondrial genomes. Curr. Genet. 52, 267-274.

268

Mishra, P., Kumar, A., Nagireddy, A., Mani, D.N., Shukla, A.K., Tiwari, R., and Sundaresan,

269

V. (2016) DNA barcoding: and efficient tool to overcome authentication challenges in

270

the herbal market. Plant Biotechnol. J. 14, 8-21.

271 272 273 274

Park, H.Y., Park, S.H., Yoon, H.K., Han, M.J., and Kim, D.H. (2004) Anti-allergic activity of 18β-glycyrrhetinic acid-3-O-β-D-glucuronide. Arch. Pharm. Res. 27, 57-60. Parker, P.M. (2006) The World Market for Licorice Roots: A 2007 Global Trade Perspective. ICON Group International Inc, San Diego.

275

Sabir, J., Schwarz, E., Ellison, N., Zhang, J., Baeshen, N.A., Mutwakil, M., Jansen, R., and

276

Ruhlman, T. (2014) Evolutionary and biotechnology implications of plastid genome

277

variation in the inverted-repeat-lacking clade of legumes. Plant Biotechnol. J. 12, 743-

278

754.

279

Schattner, P., Brooks, A.N., and Lowe, T.M. (2005) The tRNAscan-SE, snoscan and snoGPS

280

web servers for the detection of tRNAs and snoRNAs. Nucleic Acid Res. 33, W686-

281

W689.

282

Sherman-Broyles, S., Bombarely, A., Grimwood, J., Schmutz, J., and Doyle, J. (2014)

283

Complete plastome sequences from Glycine syndetika and six additional perennial wild

284

relatives of soybean. G3(Bethesda) 4, 2023-2033.

285

Shin, D.H., Lee, J.H., Kang, S.H., Ahn, B.O., and Kim, C.K. (2016) The complete

286

chloroplast genome of the Hare’s root, Bupleurum falcatum: its molecular features.

287

Genes(Basel) 7, pii E20 doi:10.3390/genes7050020.

288

Shinozaki, K., Ohme, M., Tanaka, M., Wakasugi, T., Hayashida, N., Matsubayashi, T., Zaita,

289

N., Chunwongse, J., Obokata, J., Yamaguchi-Shinozaki, K., et al. (1986) The complete 12

290

nucleotide sequence of the tobacco chloroplast genome: its gene organization and

291

expression. EMBO J. 5, 2043-2049.

292 293

Techen, N., Parveen, I., Pan, Z., and Khan, I.A. (2014) DNA barcoding of medicinal plant material for identification. Curr. Opin. Biotechnol. 25, 103-110.

294

Ueda, M., Nishikawa, T., Fujimoto, M., Takanashi, H., Arimura, S., Tsutsumi, N., and

295

Kadowaki, K. (2008) Substitution of the gene for chloroplast RPS16 was assisted by

296

generation of a dual targeting signal. Mol. Biol. Evol. 25, 1566-1575.

297

Wojciechowski, M.F., Lavin, M., and Sanderson, M.J. (2004) A phylogeny of legumes

298

(Leguminosae) based on analysis of the plastid matK gene resolves many well-

299

supported subclades within the family. Am. J. Bot. 91, 1846-1862.

300 301

Wyman, S.K., Jansen, R.K., and Boore, J.L. (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20, 3252-3255.

302

Yao, H., Song, J., Liu, C., Luo, K., Han, J., Li, Y., Pang, X., Xu, H., Zhu, Y., Xiao, P., et al.

303

(2010) Use of ITS2 region as the universal DNA barcode for plants and animals. PLoS

304

One 5, e13102.

305 306

Yi, D.K., and Kim, K.J. (2012) Complete chloroplast genome sequences of important oilseed crop Sesamum indicum L. PLoS One 7, e35872.

307

Zhang, Y., Du, L., Liu, A., Chen, J., Wu, L., Hu, W., Zhang, W., Kim, K., Lee, S.C., Yang,

308

T.J., et al. (2016) The complete chloroplast genome sequences of five Epimedium

309

species: lights into phylogenetic and taxonomic analyses. Front. Plant Sci. 7,

310

doi:10.3389/fpls.2016.00306.

311

Zhang, Y.J., Ma, P.F., and Li, D.Z. (2011) High-throughput sequencing of six bamboo

312

chloroplast genomes: phylogenetic implications for temperate woody bamboos

313

(Poaceae: Bambusoideae). PLoS One 6, e20596. 13

314

14

315

Table 1. Statistics of WGS and assembly summary of three Glycyrrhiza species Scientific name

Amount (Mbp)

CP genome

nrDNA

Length (bp)

Coverage (X)

GenBank Acc. No.

G. glabra

741.68

127,895

367.81

KU891817

G. uralensis G. glabra x G. uralensis

721.47

127,716

225.95

KU862308

587.42

127,939

399.91

KU862307

316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 15

Length (bp) Type 1 5,947 Type 2 5,947 Type 1 5,948 Type 1 5,948 Type 2 5,947

Coverage (X) 616.43 600.79 1259.83 739.21 684.44

GenBank Acc. No. KX530462 KX530463 KX530461 KX530459 KX530460

335

Table 2. Gene composition in Glycyrrhiza CP genome Category of Gene G roup

Group of Genes Ribosomal RNAs

Name of Genes 16S (rrn16), 23S (rrn23) 4.5S (rrn4.5), 5S (rrn5) trnA-UGC †, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trn fM-CAU, trnG-GCC, trnG-UCC †, trnH-GUG, trnl-CAU, trnI-G

Transfer RNAs

AU †, trnK-UUU †, trnL-UAA †, trnL-UAG, trnL-CAA, trnMCAU, trnM-GUU, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-U CU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, tr

Self replication

nV-UAC †, trnV-GAC, trnW-CCA, trnY-GUA Small subunit of ri

rps2, rps3, rps4, rps7, rps8, rps11, rps12 †, rps14, rps15, rps

bosome

18, rps19

Large subunit of ri bosome

rpl2 †, rpl14, rpl16 †, rpl20, rpl23, rpl32, rpl33, rpl36

RNA polymerase

rpoA, rpoB, rpoC1 †, rpoC2

NADH-

ndhA †, ndhB †, ndhC, ndhD, ndhE

dehydrogenase

ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK

Photosystem I

psaA, psaB, psaC, psaI, psaJ, ycf3 # psbA, psbB, psbC, psbD, psbE, psbF

Photosynthesis

Photosystem II

psbH, psbI, psbJ, psbK, psbL, psbM psbN, psbT, psbZ

Cytochrome b6/f

petA, petB †, petD †, petG, petL, petN

ATP synthase

atpA, atpB, atpE, atpF †, atpH, atpI

Rubisco

rbcL

Other genes Unknown function

accD, ccsA, cemA, clpP, matK ORFs

¥

ycf1, ycf2, ycf4

336



indicates the existence of single intron in the corresponding genes;

337

#

indicates the existence of two introns in the corresponding genes;

338

¥

indicates open reading frames.

339 340 341 342 343 344 345 16

346

Figure Legends

347

Fig.1 The map of the CP genome of the Glycyrrhiza species. Genes shown outside the circle are

348

transcribed clockwise, while those drawn inside the circle are counterclockwise. Functionally-

349

annotated genes are seen in colored portions. The darker gray area in the inner circle shows the GC

350

content.

351 352

Fig.2 Comparison of the CP genome of G. glabra, G. uralensis and G. glabra × G. uralensis using G.

353

glabra (KF201590) as a reference sequence. The top line shows the order of genes (transcriptional

354

direction indicated by arrow). Genome regions are color coded as follows: conserved gene = blue,

355

tRNA and rRNA = sky blue and intergenic region = red.

356 357

Fig.3 Schematic diagram of nrDNA cistron unit of five Glycyrrhiza sequences. (A) Mapped read

358

depth of the nrDNA cistron unit sequences. (B) GC content plot was drawn with a window size of 40

359

nucleotides by UGENE program.

360 361

Fig.4 Number of simple sequence repeats in the Glycyrrhiza CP genomes. Classification of SSRs by

362

repeat types in G. glabra (A), G. uralensis (B) and G. glabra × G. uralensis (C).

363 364

Fig.5 Validation of InDel and sequence specific polymorphic sites. PCR analysis of InDel regions

365

from CP genome and sequence specific regions from nrDNA. M indicates 100-bp size marker; GG,

366

GU and F1 correspond to G. glabra, G. uralensis, G. glabra × G. uralensis, respectively. 1-4

367

represents ycf3F01-ycf3R01 primer pair, atpHF01-atpHR01 primer pair, ycf2F01-ycf2R01 primer

368

pair and 5.8SF01-5.8SR01 primer pair, respectively.

369

nrDNA based markers, respectively.

17

a, b

PCR product is derived from CP genome and

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Supplementary Table S1. Primers that detect polymorphism among Glycyrrhiza species Product size (bp) Primer Primer sequence

Location

Name

G. glabra x G. glabra

G. uralensis G. uralensis

ycf3F01a

GGGCGTTTTGAATAAGAACA

ycf3R01a

TGACTGATGGGGACAACAAA

atpHF01a

TCAATTGACTAACCAATTCAAACAA

atpHR01a

AACTCGCACACACTCCCTTT

ycf2F01a

GTATCGAAAGGCCCAATGAA

ycf2R01a

GTTCCACCCTGCAAGAACTC

5.8SF01b

CAGACCGTTGCCCGACAA

5.8SR01b

GTCTCATCACGAGCGTTCAA

a, b

ycf3 ~ psaA

300

251

300

atpH ~ atpF

445

464

440

ycf2 intron

331

382

358

5.8S ~ ITS2

227

No product

227

PCR product is derived from CP genome and nrDNA based markers, respectively.

Supplementary Table S2. Tandem repeat analysis in the Glycyrrhiza species used in this study Copy number Unit No.

Tandem repeat unit sequence

G.

G.

G. glabra x

glabra

uralensis

G. uralensis

Position

length

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GCTATTAATTAATTT

15

1.9

1.9

1.9

trnK-UUU ~ rbcL

AATTAAATTCAATAT

15

2.1

2.1

2.1

trnK-UUU ~ rbcL

AAAAGAATATTAAT

14

2.1

2.1

2.1

trnL-UAA ~ trnTUGU

AAAATATTATTAA

13

2.1

2.1

2.1

trnL-UAA ~ trnTUGU

ATATCAAAATAGATGAAG

18

2.1

3.1

2.1

trnL-UAA ~ trnTUGU

AATATCAAATAAAT

14

2.1

2.1

2.1

trnL-UAA ~ trnTUGU

TCTGATTTCTAGTATAAT

18

-

2.1

-

petN ~ trnC-GCA

TTGAATATAATTCAAAATA TTAA

23

2

2

2

atpA ~ trnR-UCU

TAGAAGATATAAT

13

-

2

-

trnQ-UUG ~ accD

ACATATATAGTG

12

2.1

2.1

2.1

trnQ-UUG ~ accD

AAATAGAAGATTTAAGTG AATCAAAAAACC

30

2.1

2.1

2.1

psaJ ~ rpl33

TCTTTTAATTCTGGTCATTG

20

2

2

2

rpl20

TAGAAATATTCTATTAAA

18

2.1

2.1

3.1

rps12 ~ clpP

TTATATTGTAACTATAATC ACTA

23

2

2

2

rps12 ~ clpP

ACTATTTTCTAAC

13

2

2

2

rps11 ~ rpl36

AGAATTAATAT

11

2.5

2.5

2.5

rps11 ~ rpl36

AATAATAAATAATCAAATC ATTATAA

26

1.9

1.9

1.9

trnN-GUU ~ ycf1

ATATATTTAAATAT

14

3

3

3

ycf1 ~ rps15

ATAATTTTATATTAACGAT AAGTATAGTAATTGATTAT T

39

-

2.4

-

trnL-UAG ~ rpl32

TATCAAAATAAACGAATG

18

2.1

2.1

2.1

rpl32 ~ ndhF

Supplementary Table S3. Summary of nucleotide polymorphisms in Glycyrrhiza CP genomes No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Site 165 211 1762 3841 4575 5046 5316 6126 6156 9685 9823 10738 10824 13847 14478 14545 14798 15046 15171 17174 17244 18076 18143 18770 20425 20749 20848 22426 23510 24218 24695 25760 25897 27109 27110 27112 27113 27114 27115 27117 27118 27556 28012 29387 30206 30706 30707 30971

G. glabra A G A C A C C A T G T T T T C T G A A A T T T T A C G C T G T A T A G G G T G T A A T T C G C T

G. uralensis G T G C C A C T C C C A T A T G A T T T C A G G G G A G C T C T G T A C A C C C T G C C T C T G

G. glabra x G. uralensis A G A G A C T A T G T T G T T T G A A A C T T T A C G C T G T T T A G G G T G T A A T T C G T T

Location trnH-GUG~psbA trnH-GUG~psbA psbA~trnK-UUU matK~trnK-UUU trnK-UUU~rbcL trnK-UUU~rbcL trnK-UUU~rbcL rbcL rbcL atpE~trnM-CAU trnM-CAU~trnV-UAC trnV-UAC~ndhC trnV-UAC~ndhC trnF-GAA~trnL-UAA trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU trnL-UAA~trnT-UGU rps4~trnS-GGA rps4~trnS-GGA ycf3~ycf3 ycf3~ycf3 ycf3~ycf3 ycf3~psaA psaA psaA psaA psaB psaB psaB trnfM-CAU~trnG-GCC trnG-GCC~psbZ trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC trnS-UGA~psbC psbC psbC psbC~psbD psbD~trnT-GGU psbD~trnT-GGU psbD~trnT-GGU psbD~trnT-GGU

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

32530 34271 34566 34749 34958 35046 35370 35601 35762 35769 35787 35828 36164 36411 36892 40421 44199 44603 44690 45617 46647 48795 48799 49195 49338 50047 50293 50422 50512 52258 52435 52466 52486 53775 53884 53971 55213 56595 56756 57159 57205 57404 57489 57665 57927 58078 58773 59465 60303 60318 60566 60567

G A T G T T T A A G T C G C A A A C C G C A T C C C G T G T C T C A A G G T T C C A C C A A A G G A G A

A T G A C T C G C A C C G A G G G A T A A G G T T A A G A T T C T T G A T G A T T T G T G G G A G C G A

G A T G T C T A A G T A T C A A A A C A C A G C C C G G G C C T C A A G T T T C T A C C A A A G A A T C

trnY-GUA~trnD-GUC petN~trnC-GCA petN~trnC-GCA petN~trnC-GCA petN~trnC-GCA petN~trnC-GCA trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB trnC-GCA~rpoB rpoB rpoB rpoC1~rpoC1 rpoC2 rpoC2 rpoC2 rpoC2 rpoC2 atpI~atpH atpI~atpH atpI~atpH atpI~atpH atpH~atpF atpH~atpF atpF atpF atpA atpA atpA atpA trnR-UCU~trnG-UCC trnR-UCU~trnG-UCC trnG-UCC~trnG-UCC trnG-UCC~trnS-GCU psbK~trnQ-UUG psbK~trnQ-UUG trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD trnQ-UUG~accD accD accD accD~psaI psaI psaI~ycf4 psaI~ycf4

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152

62406 63387 65701 65812 65864 66231 67640 68179 69504 69631 69818 70097 72170 72445 72471 72705 79885 81141 81262 83184 83437 83460 87766 88758 88949 89050 89698 90420 90925 98588 98724 101638 101963 103516 107576 107877 108127 109159 110096 110196 110273 111543 112561 113959 114098 114217 117293 117370 117372 117373 117376 117377

A A G C G G A A C G A G T A G T G T G G A T A T C C T G G T C G G T T C T A G G T G A G A T G G A G A G

A T T A A A C C T C C T G G A C T A A A G G G G T A C A T C T A A G G C C T T T A C C C T G G A C T C T

T A G G G C C C G C G T A G T G T G G A T A G C C T G G T C G G T T G T A T G T G C G A T A G A G A G

petA petA~psbJ psbE~petL psbE~petL psbE~petL psbE~petL trnP-UGG~psaJ psaJ~rpl33 rps18~rpl20 rpl20 rpl20 rpl20~rps12 clpP~psbB psbB psbB psbB rpl36 rps8~rpl14 rpl14 rpl16~rpl16 rps3 rps3 ycf2 ycf2 ycf2 ycf2 ycf2 ycf2 ycf2 rps12~trnV-GAC rps12~trnV-GAC trnI-GAU trnI-GAU~trnI-GAU trnA-UGC~rrn23 trnR-ACG~trnN-GUU trnN-GUU trnN-GUU~ycf1 ycf1 ycf1 ycf1 ycf1 ycf1 ycf1 ycf1~rps15 ycf1~rps15 ycf1~rps15 ndhA~ndhA ndhA~ndhA ndhA~ndhA ndhA~ndhA ndhA~ndhA ndhA~ndhA

153 154 155 156 157 158 159 160

117379 117416 118533 119378 121060 122114 122429 127983

T A T A T T C T

C G C C C C T C

T A T A T T C T

ndhA~ndhA ndhA~ndhA ndhI ndhG ndhD ndhD ndhD ndhF

Supplementary Table S4. Summary of nucleotide polymorphisms in nrDNA No.

Site

Location

1 2 3 4 5 6 7 8 9 10

1881 1995 2219 2220 2221 2235 3837 5649 5661 5705

ITS1 ITS1 ITS2 ITS2 ITS2 ITS2 26S 26S 26S 26S

G. glabra x G. uralensis major type T C T G C A G T T T

G. glabra x G. uralensis minor type T T C A A A A T C C

G. uralensis T C T G C A G T T T

G. glabra major type T T C A A A A T C C

G. glabra minor type G T C A A C A C C C

Complete chloroplast genome and 45S nrDNA sequences of the medicinal plant species Glycyrrhiza glabra and Glycyrrhiza uralensis.

Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from thes...
4MB Sizes 0 Downloads 12 Views