GENE-39922; No. of pages: 8; 4C: Gene xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

F

O

4

Xianwen Zhang a, Zhenwei Ye a, Tiankang Wang b, Hairong Xiong c, Xiaoling Yuan a, Zhigang Zhang d, Youlu Yuan b,⁎, Zhi Liu a,⁎

5 6 7 8 9

a

1 0

a r t i c l e

11 12 13 14 15

Article history: Received 20 May 2014 Received in revised form 26 August 2014 Accepted 29 August 2014 Available online xxxx

16 17 18 19 20 21

Keywords: Cotton Transcriptome Transcription factor SSR Polymorphism

College of Bioscience and Biotechnology, Hunan Agricultural University, Changsha 410128, China State Key Laboratory of Cotton Biology, Key Laboratory of Biological and Genetic Breeding of Cotton, The Ministry of Agriculture, Institute of Cotton Research, The Chinese Academy of Agricultural Sciences, Anyang 455004, China c Key Laboratory for Crop Germplasm Innovation and Utilization of Hunan Province, Hunan Agricultural University, Changsha 410128, China d Cotton Sciences Research Institute of Hunan Province, Changde 415101, China

R O

b

i n f o

P

3Q1

a b s t r a c t

Cotton is an important fiber plant, and it's attractive to elucidate the molecular mechanism of anther development due to the close relationship between the anther fertility and boll-setting, and also fiber yield. In the present paper, 47.2 million paired-end reads with average length of 82.87 bp from the anthers of TM-1 (Gossypium hirsutum L.), a genetic standard line, were generated through transcriptome sequencing, and 210,965 unigenes of more than 100 bp were obtained. BLAST, KEGG, COG, and GO analyses showed that the genes were enriched in the processes of transcription, translation, and post-translation as well as hormone signal transduction, the transcription factor families, and cell wall-related genes mainly participating in cell expansion and carbohydrate metabolism. Further analysis identified 11,153 potential SSRs. A suit of 5122 primer pair sequences were designed, and 82 of 300 randomly selected primer pairs produced reproducible amplicons that were polymorphic among 22 cotton accessions from G. hirsutum, Gossypium barbadense and Gossypium arboreum. The UPGMA clustering analysis further confirmed high quality and effectiveness of these novel SSR markers. The present study provided insights into the transcriptome profile of the cotton and established a public information platform for functional genomics and molecular breeding. © 2014 Published by Elsevier B.V.

D

2

Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker

E

C

T

E

1

35

R

39 37 36

R

38

1. Introduction

41

Cotton (Gossypium spp.) is one of the most economically important crops due to its fiber used as the principal natural source for the textile industry worldwide. The Gossypium genus contains 5 tetraploid (AD1 to AD5, 2n = 4×) and over 45 diploid (2n = 2×) species (where n is the number of chromosomes in the gamete of an individual). There are four cultivated cotton species, two diploids from Africa–Asia, Gossypium herbaceum L. (Gher, A1 genome) and Gossypium arboreum L. (Ga, A2 genome), and two tetraploids from Americas, Gossypium hirsutum L. (Gh, AD1 genome) and Gossypium barbadense L. (Gb, AD2 genome). At present, G. hirsutum is the most widely cultivated cotton species, accounting for more than 95% of the world cotton production (National Cotton Council, 2012, http://www.cotton.org/econ/cropinfo/index.cfm). Apart

46 47 48 49 50 51 Q2 52

U

44 45

N C O

40

42 43

22 23 24 25 26 27 28 29 30 31 32 33 34

Abbreviations: BLAST, Basic Local Alignment Search Tool; KEGG, Kyoto Encyclopedia of Genes and Genomes database; COG, Clusters of Orthologous Groups; GO, gene ontology; PCR, Polymerase Chain Reaction; SSR, Simple Sequence Repeat; TF, transcription factor; EST, Expressed Sequence Tag; UPGMA, Unweighted Pair Group Method with Arithmetic Mean. ⁎ Corresponding authors. E-mail addresses: [email protected] (Y. Yuan), [email protected] (Z. Liu).

from its economic value, cotton is also an excellent model system for studying polyploidization, cell elongation and cell wall biosynthesis (Al-Ghazi et al., 2009; Paterson et al., 2012; Qin and Zhu, 2011). Recently, a number of genome resources have been developed from the genus Gossypium including the construction of high-density tetraploid cotton genetic linkage maps (Guo et al., 2008; Yu et al., 2011). It's more attractive that a draft genome of a diploid cotton putative G. raimondii, a putative D-genome parent, was created using a whole-genome shortgun strategy (K. Wang et al., 2012). However, it's an urgent task to provide more genome-wide information about tetraploid cotton. The development of functional pollen and its releasing at appropriate stage to maximize pollination and fertilization are critical for plant reproduction, the creation of genetic diversity and biological productivity. These processes require cooperative interactions between gametophytic and sporophytic tissues within anther (Wilson and Zhang, 2009; Zhang and Wilson, 2009). Based on molecular studies, large numbers of genes related to pollen and anther development have been identified, and anther and pollen development pathway has been elucidated, especially in Arabidopsis and rice (Wilson and Zhang, 2009; Wilson et al., 2011). Stamen initiation is controlled by the homeotic genes APETALA3 (AP3), PISTILLATA (PI), and AGAMOUS

http://dx.doi.org/10.1016/j.gene.2014.08.058 0378-1119/© 2014 Published by Elsevier B.V.

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

112

119

Plants of upland cotton, the genetic standard TM-1 (G. hirsutum L.), were grown in an experimental field under standard field conditions in 2012. The anther at the stages of sporogenous cell division (bud length b 3.0 mm), pollen mother cell meiosis (bud length = 3.1– 4.5 mm), uninucleate microspore (bud length = 4.6–12 mm) and mature pollen (bud length N 12 mm) were harvested according to Deng et al. (2010), respectively, frozen in liquid nitrogen immediately, and stored at −80 °C for use.

120

2.2. RNA Isolation and Sequencing

121

For Illumina sequencing, the total RNA of each sample was isolated using Trizol (Invitrogen, Carlsbad, CA) and further purified with the RNeasy Plant Mini Kit (Qiagen, Valencia, CA). RNA quality was verified using a 2100 Bioanalyzer RNA Nanochip (Agilent, Santa Clara, CA) and all samples had RNA Integrity Number (RIN) value more than 8.5. Then RNA was quantified using NanoDrop ND-1000 Spectrophotometer (Nano-Drop, Wilmington, DE). The RNAs from four samples were mixed with equal amount, and then Illumina sequencing using the Solexa platform was performed according to the manufacturer's instructions (Illumina, San Diego, CA) in Shanghai Quanmai Bio-Technology Co., Ltd. Then, the stringent filter process on the raw sequencing data was carried out with the criteria that quality threshold is 20 (error rate 1%) to remove low-quality data, and length threshold is 35 bp to rule out the ambiguous N-containing

99 100 101 102 103 104 105 106 107 108

113 114 115 116 117 118

122 123 124 125 126 127 128 129 130 131 132 133 134

C

97 98

E

95 96

R

93 94

R

91 92

O

89 90

C

87 88

N

85 86

U

83 84

139

The samples were assembled with SOAPdenovo (Li et al., 2010). The reads were first combined to form longer fragments, i.e., contigs. The EST sequences were downloaded from cotton database (http:// gossypium.info/, cotton46a), and further mapping was performed using Bowtie and Velvet software to obtain non-redundant unigenes that were as long as possible. The resulting unigenes were determined by performing BLASTX (V2.2.14) (Altschul et al., 1997) searches against protein databases, with the priority order of NR (non-redundant protein sequences in NCBI), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes database (KEGG) (V56.0, Oct. 1, 2010) (Kanehisa et al., 2008), and COG with the criteria of E-value ≤ 1e−3. Based on the results of the protein database annotation, Blast2GO (Conesa et al., 2005) was employed to obtain the functional classification of the unigenes based on Gene Ontology (GO) terms. WEGO software (Ye et al., 2006) was used to perform the GO functional classification for all unigenes. Further BLASTx against Arabidopsis thaliana Transcription Factors database (PlantTFDB), Cell Wall Navigator (CWN) protein database and MAIZEWALL database, respectively, were performed. The expectation (E)-value cutoff was set at 1E−5.

140 141

F

2.1. Plant Materials

81 82

2.3. De Novo Assembly and Analysis of Illumina Reads

O

111

80

R O

2. Materials and Methods

78 79

135 136

P

110

76 77

sequences. Therefore, 47.2 million sequencing reads with 82-mer length. The sequencing data are deposited in NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra) with accession number SRP041153.

2.4. SSR Mining and Confirmation MISA (http://pgrc.ipk-gatersleben.de/misa/), a Perl script, was used to identify microsatellites (SSR, Simple Sequence Repeat) in the unigenes identified in the study. The parameters for the SSR search were defined as follows: the size of motifs was two to six nucleotides, and the minimum repeat unit was defined as six for dinucleotides, and five for trinucleotides to hexanucleotides. Primer Premier 6.0 (PREMIER Biosoft International, Palo Alto, CA) was used to design PCR primers in the flanking regions of the SSRs. The criteria of the primer design were as follows: primer length of 18–24 bp, GC content between 40–65%, and melting temperature between 50–65 °C. The expected product size was between 100 bp and 350 bp with no secondary structures. The redundancy analysis of these primers for all SSRs obtained in this study was carried out using SSRD software (W. Wang et al., 2012) after comparing with primers from CMD (http://www.cottonmarker.org/). Among all the nonredundant designed primers, 300 primer pairs were randomly selected to evaluate their application and polymorphisms in 22 cotton accessions that include TM-1, CCRI36, zhong221, zhongR014121, xinluzao24, CCRI19, CCRI35, Acala3080, NM970513, 7235, TAM91D-3, CCRI60, 0–153, 9708, 177 and 48 from G. hirsutum, hai1, 7124, P62ne10, Pima_S6 and 3–79 from G. barbadense, and shixiya1 from G. arboreum. For SSR confirmation using PCR, the genomic DNA was extracted with 1 g young cotton leaf according to the previous method (Paterson et al., 1993). Each 10 μL PCR reaction mixture contained 1 μL 10× PCR buffer, 0.5 μL 10 mM dNTP, 0.5 μL forward primer (10 μM), 0.5 μL reverse primer (10 μM), 1.2 μL DNA (30 ng/μL), 0.1 μL Taq DNA polymerase (5 U/μL) and 6.2 μL ddH2O. DNA amplification was programmed at 94 °C for 5 min for initial denaturation, then 30 cycles at 94 °C (30 s)/58 °C (45 s)/72 °C (45 s). The final extension step was 2 min at 72 °C. Each PCR product was analyzed using 8% native PAGE (polyacrylamide gel electrophoresis), and SSR marker analysis was done according to the previous reports (Zhang et al., 2000, 2002). The similarity and clustering analysis on 22 cotton accessions was performed using NTSYS-pc 2.20 with UPGMA (Unweighted Pair Group Method with Arithmetic Mean) (Rohlf, 2005), and finally the systematic tree was produced by MEGA4 software (Tamura et al., 2007). The number of alleles (Na), expected

T

109

(AG), with the primordia forming as a tetrad of archesporial cells. Then AG induces microsporogenesis via activation of NOZZLE/SPOROCYTELESS (NZZ/SPL) (Ito et al., 2004), and regulates stamen development at least in part by controlling the transcription of a catalytic enzyme of the lipid-derived phytohormone jasmonic acid (JA), DEFECTIVE IN ANTHER DEHISCENCE1 (Ito et al., 2007). The transcription factors JAGGED (JAG) and NUBBIN (NAB) are also involved in the process of defining stamen structure (Dinneny et al., 2006). A transcriptome analysis successfully selected from hundreds of transcripts several transcripts encoding potential proteins for lipid exine synthesis during early anther development in rice (Huang et al., 2009). The latest research on cotton (G. hirsutum) anthers of the wild type (WT) and the genetic male sterility (GMS) mutant (in the WT background) in three stages of meiosis, tetrad, and uninucleate microspore using digital gene expression (DGE) method identified many genes specific to anther development (Wei et al., 2013). Generation of ESTs from shoot apexes, squares, and flowers in upland cotton (G. hirsutum) forms a valuable foundation for gene expression profiling analysis, functional analysis of newly discovered genes, genetic linkage, and quantitative trait loci analysis (Lai et al., 2011). Though the basic mechanisms of pollen and anther development could be cross-referenced, each species has its own peculiarity. Furthermore, most of these studies were carried out on self-pollinated and cross-pollinated plants. Pollen and anther development in Upland cotton, an often cross-pollinated crop, may somewhat differ from these other species. Herein, in the present study, we generated the transcriptome profiling of anthers from Upland cotton TM-1 (G. hirsutum L.), a genetic standard line, and identified some transcription factor genes and some genes involved in cell wall formation. A total of 4376 potential SSRs were characterized, and further PCR verification confirmed the significant polymorphism of 82 SSR loci. The objective of the present study is to get a close understanding of the molecular mechanism of cotton anther development, and establish a sound foundation for functional genomics, comparative genomics analysis, molecular breeding and new gene cloning in cotton based on high throughput sequence.

D

74 75

X. Zhang et al. / Gene xxx (2014) xxx–xxx

E

2

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

137 138

142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196

X. Zhang et al. / Gene xxx (2014) xxx–xxx

226

We generated 47.2 million paired-end reads with average length of 82.87 bp for the developing anthers from an upland cotton genetic standard line, TM-1, encompassing 11 Gb of sequence data, and valid data ratio was 87.43%. The GC content was 45.64% (Fig. 1). Assembling these reads produced 210,965 unigenes of more than 100 bp, 19,090 unigenes of more than 500 bp, and 3873 unigenes of more than 1000 bp based on the single mapping to the cotton EST database (http://gossypium.info/) (Fig. 2), and the final sequence data was 59 Mb. The N50 value for these unigenes was 285 bp. Only the unigenes greater than 100 bp in length were further analyzed (Table S1). BLAST analysis showed that the abundance of the valid reads was 74.73% (35,307,445), and the abundance of the valid unigenes was 99.96% (210,885). The majority of the reads were in the range of 101–500 bp representing 90.38% of the unigenes, and 3867 unigenes (1.83% of all of the unigenes) were longer than 1 kb (Table S2). Among the 210,965 unigenes, the sequence similarity was determined using BLASTx against the NCBI non-redundant (NR), SWISS-PROT, CDD, PFAM and TrEMBL databases with similarity of more than 30%, and an E-value cut-off of 1e − 3. The annotation ratios of these unigenes in NR, SWISS-PROT, CDD, PFAM and TrEMBL databases were 44%, 29%, 25%, 36% and 51%, respectively (Table S3). The Venn diagram analysis on the BLASTx results showed that the number of the common unigenes among NR, CDD and SWISS-PROT was 38,972. These results would provide a sequence basis for future studies such as gene cloning and transgenic studies in cotton.

227

3.2. Functional Classification of the Cotton Unigenes

228 229

Of the 210,885 valid unigenes, only 31,119 of the unigenes could be assigned in KEGG, only 10,589 unigenes assigned in COG (Table S4), and 84,644 unigenes assigned at least one GO term (Table S5). The top 5 pathways in KEGG analysis consist of plant hormone signal transduction (ko04075), protein processing in endoplasmic reticulum (ko04141), ribosome (ko03010), RNA transport (ko03013), starch and sucrose metabolism (ko00500). The COG analysis revealed that the top 10 functional categories comprise posttranslational activity (12.50%),

214 215 216 217 218 219 220 221 222 223 224 225

230 231 232 233 234 235

C

212 213

E

210 211

R

208 209

R

206 207

N C O

204 205

236 237 238 239 240 241 242 243 244 245 246 247 248 249

3.3. Transcription Factor Genes

250

Transcription factors (TFs) are key regulators at gene transcriptional level in biological processes (Riechmann et al., 2000). All isogenes of G. hirsutum were performed by BLASTx with PlantTFDB to identify putative transcription factors expressed in the anther of G. hirsutum. A total of 6708 isogenes matched in PlantTFDB (Table S5), which fell into 54 families and represented 84.4% of the 64 TF families in A. thaliana genome. The most abundant TF family is WRKY and represented 13.9% of the total 6708 isogenes, followed by bHLH (6.9%), C3H (6.8%), NAC (6.5%), MYB (5.1%) and C2H2 (5.1%). Similar to the TFs identified in the ESTs from six developing xylem libraries in radiate pine (Li et al., 2009) and from leaves of sabaigrass (Eulaliopsis binata) (Zou et al., 2013), high abundance of WRKY, C3H, C2H2, and MYB was also observed in G. hirsutum. The recent studies revealed that NAC, MYB, AUX/IAA and HD are important in regulating secondary wall biosynthesis (Du and Groover, 2010; Groover and Robischon, 2006; Zhong and Ye, 2007). In this study, 436 and 345 isogenes match in NAC and MYB TFs, respectively, some of which may act as master switches regulating secondary wall biosynthesis. The TFs identified in our isogene data could be important resources for regulation of cell wall biosynthesis.

251 252

3.4. Genes Putatively Related to Cell Wall Formation

271

253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270

Plant cell wall formation is a very important process during cell divi- 272 sion and anther development, and it has been well illustrated that many 273 genes involved in exine formation are vital to anther development 274

U

202 203

F

201

O

3.1. Sequencing, De Novo Assembly and Characterization of the Unigenes

R O

200

P

3. Results and Discussion

D

199

translation, ribosomal structure and biogenesis (11.55%), general prediction function (10.74%), amino acid transport and metabolism (7.00%), transcription (6.05%), energy production and conversion (6.02%), DNA replication, recombination, and repair (5.95%), carbohydrate transport and metabolism (4.91%), cell division and chromosome partitioning (4.10%), and unknown function (4.02%) (Fig. 3). The GO analysis showed that the significantly enriched functional groups are related to transporter, primary metabolism such as protein and carbohydrate metabolism, and transcription (Fig. 4). These data indicated that, in the developing anthers, there are active processes of transcription, translation, and post-translation as well as hormone signal transduction, and these activities required abundant supply of amino acid, carbohydrate and energy, which in turn led to active cell division and related molecular events to facilitate anther development.

E

(He) heterozygosities and observed (Ho) heterozygosities were calculated using POPGEN1.32 (Yeh and Boyle, 1997).

T

197 198

3

Fig. 1. GC content of transcriptome sequence. The analysis on sequencing data showed that GC content in cotton is about 45.64%.

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

X. Zhang et al. / Gene xxx (2014) xxx–xxx

R O

O

F

4

Fig. 2. Assessment of unigene assembly quality. Assembling all reads from cotton EST sequencing produced 210,965 unigenes of more than 100 bp, 19,090 unigenes of more than 500 bp, and 3873 unigenes of more than 1000 bp based on the single mapping to the cotton EST database (http://gossypium.info/).

P

D

E T C E R R O C

279

result from sabaigrass (E. binata) (986 and 9279 isogenes) (Zou et al., 2013) and opposite to that in pine (1070 and 405 ESTs) (Li et al., 2009) and flax (110 and 50 ESTs) (Long et al., 2012). MAIZEWALL is a bioinformatics analysis platform on cell wall biosynthesis in maize. Among 722 isogenes identified in MAIZEWALL,

N

277 278

(Chen et al., 2011; Liu et al., 2014). Therefore, it's valuable for cotton anther development research to understand cell wall-related genes. In the present study, BLASTx searches revealed 722 and 4614 isogene matches in MAIZEWALL and the cell wall navigator (CWN) databases, respectively (Table S7), which is primarily consistent with the

U

275 276

Fig. 3. COG classification on all unigenes. The COG analysis on all unigenes revealed that the top 10 functional categories comprise posttranslational activity (12.50%), translation, ribosomal structure and biogenesis (11.55%), general prediction function (10.74%), amino acid transport and metabolism (7.00%), transcription (6.05%), energy production and conversion (6.02%), DNA replication, recombination, and repair (5.95%), carbohydrate transport and metabolism (4.91%), cell division and chromosome partitioning (4.10%), and unknown function (4.02%).

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

280 281 282 283 284

5

D

P

R O

O

F

X. Zhang et al. / Gene xxx (2014) xxx–xxx

285 286

T

E

Fig. 4. GO analysis on unigenes from cotton EST sequence. The GO analysis exhibited that the significantly enriched functional groups are related to transporter, primary metabolism such as protein and carbohydrate metabolism, and transcription.

304

3.5. Development of SSR Markers

305 306

For development of new molecular markers for cotton, all of the 210,965 unigenes generated in this study were used to mine potential

t1:1 t1:2

Table 1 Counting of SSR motif.

295 296 297 298 299 300 301

E

R

293 294

R

291 292

N C O

289 290

microsatellites that were defined as mononucleotide to hexanucleotide SSRs with a minimum of five repetitions for all motifs. Using the MISA Perl script, a total of 11,153 potential SSRs were identified in all unigenes (Table S7), and 4376 potential SSRs contained repeats of more than two nucleotides, of which, 879 sequences contained more than 1 SSR, and 1794 SSRs were present in compound form (Table S9). Considering that approximately 59,000 kb was analyzed, we detected a frequency of at least one SSR per 13.48 kb in the expressed fraction of the G. hirsutum genome.

Table 2 The 22 cotton accessions used in SSR marker testing.

t1:3

SSR motif

Number

Ratio

t1:4 t1:5 t1:6 t1:7 t1:8 t1:9

Dinucleotide (p2) Trinucleotide (p3) Tetranucleotide (p4) Pentanucleotide (p5) Hexanucleotide (p6) Compound SSR (c)

1520 1978 166 44 79 589

0.34735 0.45201 0.03793 0.01005 0.01805 0.1346

t2:1 t2:2

Specie

Accession

t2:3

G. hirsutum L.

TM-1 CCRI36 zhong221 zhongR014121 xinluzao24 CCRI19 CCRI35 Acala3080 NM970513 7235 TAM91D-3 CCRI60 0–153 9708 177 48 hai1 7124 P62ne10 Pima_S6 3–79 shixiya1

t2:4

U

287 288

C

302 303 Q3

the top 7 abundant BLASTx matches were expansin (29 isogenes), callose synthase (22 isogenes), glucosidase (18 isogenes), cellulose synthase (15 isogenes), glycosyltransferase (14 isogenes), sucrose synthase (14 isogenes) and chitinase (12 isogenes). The present transcriptome results revealed the importance of expansin, callose synthase and cellulose synthase in cell wall metabolism and fiber development, which is consistent with the discoveries in sabaigrass (E. binata) (Zou et al., 2013). Cell wall navigator is an integrated database and mining tool widely used for protein families involved in plant cell wall metabolism. The 4614 isogene matches in the CWN database identified in our study were classified into 35 families, representing all families in the database (Table S7). GH17, LRX, NSI, PME and CSL are the top 5 abundant families identified in the CWN database, followed by EXP, GH28, BGAL, GH9, GMP, PL1 and other families. GH17 family proteins function as glycosyl hydrolase, and LRX mainly consists of extensin-like proteins, leucinerich repeat proteins and transposon proteins. These data suggested the glycosyl hydrolysis, extensins would be very important in cotton anther development.

G. barbadense L.

G. arboreum L.

t2:5

t2:6

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

307 308 309 310 311 312 313 314 315

Accumulated percentage

44.59 10.39 0.87 0.43 0.00 43.72

44.59 54.98 55.85 56.28 56.28 100.00

333 334 335 336 337 338 339 340

T

331 332

C

329 330

E

327 328

R

325 326

R

323 324

O

321 322

348

C

319 320

Among the 5122 primer pairs, 300 primer pairs (Table S10) were randomly selected to evaluate their application across 22 cotton accessions from G. hirsutum, G. barbadense and G. arboreum species (Table 2). A total of 143 (47.7%) primer pairs resulted in successful PCR amplification with clear PCR bands, and 540 allelic loci with 155 polymorphic loci and 82 SSR markers with polymorphism were detected. The number of allelic genes on each locus is 1–15, and the average for each primer is 1.9. The HNAU185 locus has the most allelic loci of 15 (Table S11). The statistical analysis identified 27 SSRs with the highest gene diversity in the cotton plants tested. The similarity analysis across 22 cotton accessions revealed that the similarity coefficient of 44.59% is above 0.800, and 10.39% varied from 0.700 to 0.800, which means that 50.98% of cotton accessions tested had significant genetic similarity. About 0.87% is from 0.600 to 0.700, and 43.72% is less than 0.400 (Table 3). The further clustering of SSR verification data using UPGMA showed that the genetic similarity ranged from 0.19 to 0.98. The 22 cotton accessions were divided into three classes with the genetic similarity coefficient of 0.19, and Class I includes shiyaxi1, Class II includes hai1, 7124, P62ne10, Pima_S6 and 3–79, and the other 16 belong to Class III (Fig. 5), which is highly consistent with the classic taxonomy (Table 2). SSRs have become important molecular markers for applications in genome mapping and characterization, phenotype mapping, markerassisted selection of crop plants and a range of molecular ecology and diversity studies (Ellis and Burke, 2007). SSRs based on genome sequencing were further developed. Zou et al. isolated 6681 SSRs from

N

317 318

The 4376 SSRs contained six types of dinucleotide SSRs, thirty types of trinucleotide SSRs, and 41 types of tetranucleotide SSRs. Among all the SSRs, trinucleotides were the most common type of SSR, accounting for 45.2%. The second most common type of SSR was dinucleotide, accounting for 34.73%. The third is compound SSR, accounting for 13.46%. Tetranucleotides, pentanucleotides, and hexanucleotides were not common (Tables 1, S8). Of the trinucleotide repeats in the SSRs of G. hirsutum, TTC/GAA was most common, accounting for 13.25%, and the most common number of repeats was 15. The other major trinucleotide repeats include CTT/AAG, TGA/TCA, AGA/TCT, ATC/GAT, CAT/ATG, and ATT/AAT varying from 5% to 9%. The most common dinucleotide repeat was AT/AT with 29.6% of all dinucleotide repeats found in G. hirsutum unigenes. This was followed by TA/TA with 22.3%, AG/CT with 18% and TC/GA 17.2% (Tables S8, S9). Altogether, 5122 primers for all SSRs found in the present study were developed (Table S8). The redundancy analysis using SSRD software showed that the 5019 SSR primer pairs from our EST transcriptome data were non-redundant after comparing with 17,449 SSR primers from CMD (http://www.cottonmarker.org/). It has been widely investigated that SSRs of trinucleotide repeats are generally more robust since they tend to give fewer “stutter bands” than those based on dinucleotide and other repeats (Hearne et al., 1992; Yates et al., 2012), which consist of 30% to 78% of plant SSR motifs (Varshney et al., 2005). The results from G. hirsutum are in agreement

347

U

316

3.6. SSR Confirmation by PCR

F

Ratio

(0.800, 1) (0.700, 0.800) (0.600, 0.700) (0.500, 0.600) (0.400, 0.500) (0, 0.400)

O

Range

t3:4 t3:5 t3:6 t3:7 t3:8 t3:9

341 342

R O

t3:3

with these earlier studies. In this work, of the 4376 SSRs, 1978 (45.2%) were trinucleotide repeats. The present data showed that TTC/GAA was the most common trinucleotide repeat, accounting for 13.25%, which is coincident with An's research (An et al., 2009), but different from the other research that identified GCC/CCG as the most common trinucleotide repeat (Xie et al., 2013).

P

Table 3 Distribution of pair similarity coefficient.

D

t3:1 t3:2

X. Zhang et al. / Gene xxx (2014) xxx–xxx

E

6

Fig. 5. UPGMA clustering analysis on 22 cotton species using SSR primers. The UPGMA clustering of SSR verification data showed that the genetic similarity ranged from 0.19 to 0.98. The 22 cotton accessions were divided into three classes with the genetic similarity coefficient of 0.19, and Class I includes shiyaxi1, Class II includes hai1, 7124, P62ne10, Pima_S6 and 3–79, and the other 16 belong to Class III.

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

343 344 345 346

349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373

X. Zhang et al. / Gene xxx (2014) xxx–xxx

Competing interests

392 393 394 395 396 397 398 399 400 401

404

C

390 391

The authors declare that they have no competing interests.

E

388 389

Acknowledgments

406 407

410 411

This work was supported by the National High Technology Research and Development Program of China (2012AA101108), Hunan Provincial Natural Science Foundation of China (13JJ8024), the National Natural Science Foundation of China (Nos. 30900909 and 31000125), and the Youth Foundation from Department of Education of Hunan Province (No. 11B060).

412

References

413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436

Al-Ghazi, Y., Bourot, S., Arioli, T., Dennis, E.S., Llewellyn, D.J., 2009. Transcript profiling during fiber development identifies pathways in secondary metabolism and cell wall structure that may contribute to cotton fiber quality. Plant Cell Physiol. 50, 1364–1381. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. An, Z.W., Zhao, Y.H., Cheng, H., 2009. Development and application of EST-SSR markers in Hevea brasiliensis Muell. Arg. Yi Chuan 31, 311–319. Chen, W., Yu, X.-H., Zhang, K., Shi, J., De Oliveira, S., Schreiber, L., Shanklin, J., Zhang, D., 2011. Male Sterile2 encodes a plastid-localized fatty acyl carrier protein reductase required for pollen exine development in Arabidopsis. Plant Physiol. 157, 842–853. Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M., Robles, M., 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676. Deng, J., Xiong, G., Yuan, X., Jia, F., Liu, Z., 2010. Differences in SOD, POD, CAT activities and MDA content and their responses to high temperature stress at peak flowering stage in cotton lines with different tolerance to high temperature. Cotton Sci. 22, 6. Dinneny, J.R., Weigel, D., Yanofsky, M.F., 2006. NUBBIN and JAGGED define stamen and carpel shape in Arabidopsis. Development 133, 1645–1655. Du, J., Groover, A., 2010. Transcriptional regulation of secondary growth and wood formation. J. Integr. Plant Biol. 52, 17–27. Ellis, J.R., Burke, J.M., 2007. EST-SSRs as a resource for population genetic analyses. Heredity (Edinb) 99, 125–132.

U

N C O

R

R

405

408 409 Q5

F

403

386 387

O

402

Transcriptome analysis from different tissues and organs provides the basis for functional genomics research. In the present work, the G. hirsutum anther cDNA libraries and transcriptome sequencing were performed, and 210,965 unigenes were obtained. A systematic analysis was further carried out including BLAST against nucleotides and protein databases, and KEGG, COG, and GO analyses, and characterized some specific transcription factors and cell wall related genes involved in anther development. Based on the present transcriptome sequence, a total of 11,153 potential SSRs were identified, and the 5019 non-redundant SSR primer pairs from our transcriptome data were developed. Further PCR verification across 22 cotton accessions from G. hirsutum, G. barbadense and G. arboreum species with 300 SSR primer pairs confirmed the significant polymorphism of 82 SSR loci, which suggested the potentially valuable application of the achievements made in the present study for novel gene discovery, phenotype improvement and other researches in cotton. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.08.058.

381 382

R O

385

380

P

4. Conclusions

378 379

Groover, A., Robischon, M., 2006. Developmental mechanisms regulating secondary growth in woody plants. Curr. Opin. Plant Biol. 9, 55–58. Guo, W., Cai, C., Wang, C., Zhao, L., Wang, L., Zhang, T., 2008. A preliminary analysis of genome structure and composition in Gossypium hirsutum. BMC Genomics 9, 314. Hearne, C.M., Ghosh, S., Todd, J.A., 1992. Microsatellites for linkage analysis of genetic traits. Trends Genet. 8, 288–294. Huang, M.D., Wei, F.J., Wu, C.C., Hsing, Y.I., Huang, A.H., 2009. Analyses of advanced rice anther transcriptomes reveal global tapetum secretory functions and potential proteins for lipid exine formation. Plant Physiol. 149, 694–707. Ito, T., Wellmer, F., Yu, H., Das, P., Ito, N., Alves-Ferreira, M., Riechmann, J.L., Meyerowitz, E.M., 2004. The homeotic protein AGAMOUS controls microsporogenesis by regulation of SPOROCYTELESS. Nature 430, 356–360. Ito, T., Ng, K.H., Lim, T.S., Yu, H., Meyerowitz, E.M., 2007. The homeotic protein AGAMOUS controls late stamen development by regulating a jasmonate biosynthetic gene in Arabidopsis. Plant Cell 19, 3516–3529. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., Yamanishi, Y., 2008. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484. Lai, D., Li, H., Fan, S., Song, M., Pang, C., Wei, H., Liu, J., Wu, D., Gong, W., Yu, S., 2011. Generation of ESTs for flowering gene discovery and SSR marker development in upland cotton. PLoS One 6, e28676. Li, X., Wu, H.X., Dillon, S.K., Southerton, S.G., 2009. Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don. BMC Genomics 10, 41. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Yang, H., Wang, J., 2010. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272. Liu, M.-C., Yang, C.-S., Yeh, F.-L., Wei, C.-H., Jane, W.-N., Chung, M.-C., Wang, C.-S., 2014. A novel lily anther-specific gene encodes adhesin-like proteins associated with exine formation during anther development. J. Exp. Bot. Long, S.H., Deng, X., Wang, Y.F., Li, X., Qiao, R.Q., Qiu, C.S., Guo, Y., Hao, D.M., Jia, W.Q., Chen, X.B., 2012. Analysis of 2,297 expressed sequence tags (ESTs) from a cDNA library of flax (Linum ustitatissimum L.) bark tissue. Mol. Biol. Rep. 39, 6289–6296. Paterson, A., Brubaker, C., Wendel, J., 1993. A rapid method for extraction of cotton (Gossypium spp.) genomic DNA suitable for RFLP or PCR analysis. Plant Mol. Biol. Report. 6. Paterson, A.H., Wendel, J.F., Gundlach, H., Guo, H., Jenkins, J., Jin, D., Llewellyn, D., Showmaker, K.C., Shu, S., Udall, J., Yoo, M.J., Byers, R., Chen, W., Doron-Faigenboim, A., Duke, M.V., Gong, L., Grimwood, J., Grover, C., Grupp, K., Hu, G., Lee, T.H., Li, J., Lin, L., Liu, T., Marler, B.S., Page, J.T., Roberts, A.W., Romanel, E., Sanders, W.S., Szadkowski, E., Tan, X., Tang, H., Xu, C., Wang, J., Wang, Z., Zhang, D., Zhang, L., Ashrafi, H., Bedon, F., Bowers, J.E., Brubaker, C.L., Chee, P.W., Das, S., Gingle, A.R., Haigler, C.H., Harker, D., Hoffmann, L.V., Hovav, R., Jones, D.C., Lemke, C., Mansoor, S., ur Rahman, M., Rainville, L.N., Rambani, A., Reddy, U.K., Rong, J.K., Saranga, Y., Scheffler, B.E., Scheffler, J.A., Stelly, D.M., Triplett, B.A., Van Deynze, A., Vaslin, M.F., Waghmare, V.N., Walford, S.A., Wright, R.J., Zaki, E.A., Zhang, T., Dennis, E.S., Mayer, K.F., Peterson, D.G., Rokhsar, D.S., Wang, X., Schmutz, J., 2012. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427. Qin, Y.M., Zhu, Y.X., 2011. How cotton fibers elongate: a tale of linear cell-growth mode. Curr. Opin. Plant Biol. 14, 106–111. Riechmann, J.L., Heard, J., Martin, G., Reuber, L., Jiang, C., Keddie, J., Adam, L., Pineda, O., Ratcliffe, O.J., Samaha, R.R., Creelman, R., Pilgrim, M., Broun, P., Zhang, J.Z., Ghandehari, D., Sherman, B.K., Yu, G., 2000. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290, 2105–2110. Rohlf, F., 2005. NTSYS-pc: numerical taxonomy and multivariate analysis system. Applied Biostatistics (New York). Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 4. Varshney, R.K., Graner, A., Sorrells, M.E., 2005. Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 23, 48–55. Wang, K., Wang, Z., Li, F., Ye, W., Wang, J., Song, G., Yue, Z., Cong, L., Shang, H., Zhu, S., Zou, C., Li, Q., Yuan, Y., Lu, C., Wei, H., Gou, C., Zheng, Z., Yin, Y., Zhang, X., Liu, K., Wang, B., Song, C., Shi, N., Kohel, R.J., Percy, R.G., Yu, J.Z., Zhu, Y.X., Yu, S., 2012a. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103. Wang, W., Wang, C., Liu, F., Chen, H., Wang, L., Wang, C., Zhang, X., Wang, Y., Wang, K., 2012b. Development and evaluation of new non-redundant EST-SSR markers from Gossypium. Acta Agron. Sin. 38, 9. Wei, M., Song, M., Fan, S., Yu, S., 2013. Transcriptomic analysis of differentially expressed genes during anther development in genetic male sterile and wild type cotton by digital gene-expression profiling. BMC Genomics 14, 97. Wilson, Z.A., Zhang, D.B., 2009. From Arabidopsis to rice: pathways in pollen development. J. Exp. Bot. 60, 1479–1492. Wilson, Z.A., Song, J., Taylor, B., Yang, C., 2011. The final split: the regulation of anther dehiscence. J. Exp. Bot. 62, 1633–1649. Xie, C., Li, B., Xu, Y., Ji, D., Chen, C., 2013. Characterization of the global transcriptome for Pyropia haitanensis (Bangiales, Rhodophyta) and development of cSSR markers. BMC Genomics 14, 107. Yates, J.L., Boerma, H.R., Fasoula, V.A., 2012. SSR-marker analysis of the intracultivar phenotypic variation discovered within 3 soybean cultivars. J. Hered. 103, 570–578. Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S., Li, R., Bolund, L., 2006. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 34, W293–W297. Yeh, F., Boyle, T., 1997. Population genetic analysis of co-dominant and dominant markers and quantitative traits. Belg. J. Bot. 129, 1. Yu, Y., Yuan, D., Liang, S., Li, X., Wang, X., Lin, Z., Zhang, X., 2011. Genome structure of cotton revealed by a genome-wide SSR genetic map constructed from a BC1 population between Gossypium hirsutum and G. barbadense. BMC Genomics 12, 15.

D

384

376 377

T

383

sabaigrass (E. binata) transcriptome sequencing data (Zou et al., 2013). Zhang et al. developed 3919 SSRs in the unigene library of peanut (Arachis hypogaea L.) (Zhang et al., 2012). Although the research on SSR markers in cotton has gained significant progress during recent years (http://www.cottonmarker.org/), few have been verified and applied in reality. In the present study, 9861 primer pairs for novel SSRs were designed, 5019 non-redundant SSR were identified based on the comparison with the cotton SSR database, and 300 randomly selected primer pairs were used to evaluate their application across 22 cotton species. PCR amplification confirmed 82 polymorphic SSR markers.

E

374 375

7

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 Q6 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 Q7 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522

8

523 524 525 526 527 528 529 530 531

X. Zhang et al. / Gene xxx (2014) xxx–xxx

Zhang, D.B., Wilson, Z.A., 2009. Stamen specification and anther development in rice. Chin. Sci. Bull. 54, 2342–2353. Zhang, J., Wu, Y., Guo, W., Zhang, T., 2000. Fast screening of microsatellite markers in cotton with PAGE/silver staining. Cotton Sci. 12, 3. Zhang, J., Guo, W., Zhang, T., 2002. Molecular linkage map of allotetraploid cotton (Gossypium hirsutum L. × Gossypium barbadense L.) with a haploid population. Theor. Appl. Genet. 105, 9. Zhang, J., Liang, S., Duan, J., Wang, J., Chen, S., Cheng, Z., Zhang, Q., Liang, X., Li, Y., 2012. De novo assembly and characterisation of the transcriptome during seed development,

and generation of genic-SSR markers in peanut (Arachis hypogaea L.). BMC Genomics 13, 90. Zhong, R., Ye, Z.H., 2007. Regulation of cell wall biosynthesis. Curr. Opin. Plant Biol. 10, 564–572. Zou, D., Chen, X., Zou, D., 2013. Sequencing, de novo assembly, annotation and SSR and SNP detection of sabaigrass (Eulaliopsis binata) transcriptome. Genomics 102, 57–62.

U

N

C

O

R

R

E

C

T

E

D

P

R O

O

F

539

Please cite this article as: Zhang, X., et al., Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker, Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.08.058

532 533 534 535 536 537 538

Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker.

Cotton is an important fiber plant, and it's attractive to elucidate the molecular mechanism of anther development due to the close relationship betwe...
2MB Sizes 0 Downloads 4 Views