Planta DOI 10.1007/s00425-014-2215-y

ORIGINAL ARTICLE

Identification and phylogenetic analysis of late embryogenesis abundant proteins family in tomato (Solanum lycopersicum) Jun Cao • Xiang Li

Received: 23 September 2014 / Accepted: 25 November 2014 Ó Springer-Verlag Berlin Heidelberg 2014

Abstract Main conclusion This study provided a comparative genomic analysis of the LEA gene family, and these may provide valuable information for their functional investigations in the future. Late embryogenesis abundant (LEA) proteins are a group of proteins that accumulate in response to cellular dehydration in many organisms. Here, we identified 27 LEA genes in tomato. A strong correlation between phylogeny, gene structure, and motif composition was found. The predicted SlLEA genes were non-randomly distributed within their chromosomes, and segmental and tandem duplications were probably important for their expansion. Many cis-elements potentially mediating transcription in response to abiotic stress were also found in the 1,000 bp upstream sequence of the promoter region. An additional intragenic recombination played an important role in the evolution of SlLEA genes. Selection analysis also identified some significant site-specific constraints that acted on the evolution of most LEA paralogs. Expression analysis using both microarray data and quantitative real-time PCR indicated that SlLEA genes were widely expressed in various tissues, and that a few members responded to some abiotic stresses. Our study provides useful information on the LEA genes in tomato and will facilitate their further characterization to better understand their functions. Keywords Expression profile  LEA  Phylogenetic analysis  Recombination  Selective pressure  Tomato

J. Cao (&)  X. Li Institute of Life Science, Jiangsu University, Zhenjiang 212013, Jiangsu, People’s Republic of China e-mail: [email protected]; [email protected]

Abbreviations LEA Late embryogenesis abundant MEME Multiple EM for Motif Elicitation MRCA The most recent common ancestor NJ Neighbor joining qRT-PCR Quantitative real-time polymerase chain reaction TSS Transcription start site

Introduction Late embryogenesis abundant (LEA) proteins are a large and highly diverse group of polypeptides, which were originally discovered in cotton (Gossypium hirsutum) seeds (Dure et al. 1981). Subsequently, LEAs were also identified in plant vegetative tissues under stress conditions, in some micro-organisms in response to water limitation and in anhydrobiotic invertebrates (Tunnacliffe and Wise 2007; Hand et al. 2011). According to sequence similarity and the presence of particular motifs, LEA proteins are separated into different groups. The typical LEA proteins are highly hydrophilic and intrinsically unstructured, containing a high percentage of charged amino acid residues, as well as glycine or other small amino acids such as alanine, serine and threonine, while lacking or containing small amounts of tryptophan and cysteine residues (Garay-Arroyo et al. 2000). These biochemical properties contribute to the LEA proteins capacity to withstand heat and acid (Oliveira et al. 2007). Although LEA proteins are intrinsically disordered and unstructured in aqueous solutions, some members can adopt a degree of three-dimensional structures during drying (Goyal et al. 2003).

123

Planta

LEA proteins are involved in many physiological processes related to plant development and stress responses (Ingram and Bartels 1996; Olvera-Carrillo et al. 2011; Battaglia and Covarrubias 2013). Overexpression of LEA genes can improve the stress tolerance in transgenic plants, yeast and bacteria (Swire-Clark and Marcotte 1999; Honjoh et al. 1999; Puhakainen et al. 2004; Houde et al. 2004; Liu and Zheng 2005; Liu et al. 2010; Duan and Cai 2012). Mutant analysis showed that the deficiency of one of the three LEA4 proteins of Arabidopsis causes water deficit susceptibility (Olvera-Carrillo et al. 2010) and that one LEA protein (EM6) is required for normal seed development in Arabidopsis (Manfre et al. 2006). LEA proteins have been considered to act as protectors of enzyme activities. They were able to prevent the inactivation of enzymes such as lactate dehydrogenase or malate dehydrogenase upon water deficit demonstrated by in vitro experiments (Goyal et al. 2005; Reyes et al. 2008). Similarly, this protective property of LEA proteins was also extended to catalase (Hara et al. 2001), citrate synthase (Goyal et al. 2005), and some mitochondrial enzymes such as rhodanese and fumarase (Grelet et al. 2005). Some LEA proteins have been shown to function by stabilizing membranes, where they were found associated with anionic phospholipid vesicles stabilizing model membranes in the dry state or at freezing temperatures (Kosova´ et al. 2007; Tolleter et al. 2010). Other roles for LEA proteins including membrane maintenance in combination with sugars, and ion sequestration have also been reported. It has been suggested that the presence of sugars can enhance LEA proteins’ protective effect under dehydration (Wolkers et al. 2001; Liu et al. 2010), and binding divalent cations can endow LEA proteins with an oxidant scavenger activity induced by abiotic stress in plants (Hara et al. 2005; Liu et al. 2011). In addition, an antibacterial activity of some LEA proteins has been reported recently (Liu et al. 2013). Although quite a few LEAs have been functionally characterized in the model plant Arabidopsis, rice and others, the functions of most members of the LEA family remain unknown. This is especially true in tomato (Solanum lycopersicum), a model plant for the Solanaceae family, where there are very limited reports on the characterization of any LEA genes. Completion of the tomato genome sequencing effort greatly facilitated the identification of gene families at the whole-genome level (Tomato Genome Consortium 2012). In the present study, detailed analysis of the sequence identification and characteristics, gene structure and conserved motifs, duplication status, cis-elements, recombination, selective pressure, and expression profile of S. lycopersicon LEA genes was performed. These data will provide a basis for further evolutionary and functional characterizations of the LEA gene family in tomato.

123

Materials and methods Sequence retrieval and characterization analysis of tomato LEAs We performed multiple database searches to identify potential members of the LEA gene family in tomato. Arabidopsis LEA sequences (Hundertmark and Hincha 2008) were retrieved and used as queries in BLAST searches against the tomato database (Tomato Genome Consortium 2012) on phytozome servers (http://www. phytozome.net). ProtParam tool (http://web.expasy.org/ protparam) was used to analyze the physicochemical parameters of SlLEA proteins. Intrinsically disordered region analysis of protein was predicted with the IUPred Server (http://iupred.enzim.hu/) (Doszta´nyi et al. 2005). Subcellular localization prediction of each SlLEA was carried out using the CELLO v2.5 server (http://cello.life. nctu.edu.tw) (Yu et al. 2004) and PSORT (http://psort.hgc. jp/form.html). Phylogenetic analyses of the LEA gene family We used MUSCLE 3.52 (Edgar 2004) to perform multiple sequence alignments of full-length protein sequences, followed by manual comparisons and refinement. Phylogenetic analyses of the LEA proteins based on amino acid sequences were carried out using the neighbor joining (NJ) method in MEGA v5 (Tamura et al. 2011). NJ analyses were done using p-distance methods, pairwise deletion of gaps, and default assumptions. Support for each node was tested with 1,000 bootstrap replicates. Estimation of the maximum number of gained and lost LEAs To determine the degrees of gene family expansion in the analyzed plant lineages, we divided the phylogeny into different clades. Nodes basal to the split among lineages denoted the most recent common ancestor (MRCA) and were labeled as V: Viridiplantae; E: Embryophyte; T: Tracheophyte; A: Angiosperm; Eu: eudicots; R: Rosid; B: Brassicaceae; P: Papilionoideae. Gene duplication and loss events were inferred by reconciling the gene tree for each cluster/subcluster with the species tree using Notung v2.6 (Chen et al. 2000). Chromosomal location and gene structure of the SlLEA genes The chromosomal locations of the SlLEA genes were determined using their annotation information on the tomato database. Gene intron–extron structure information

Planta

was also collected for genome annotations of tomato from phytozome (http://www.phytozome.net). Inference of duplication time We first determined paralogous gene pairs by protein phylogeny, and used them as references for pairwise DNA coding sequence alignments using embedded ClustalW software in MEGA v5 (Tamura et al. 2011). These alignments were fed into K-Estimator 6.0 (Comeron 1999) for Ka and Ks estimation. The Ks value was then used to calculate the approximate date of the duplication event (T = Ks/2k), assuming clock-like rates (k) of 1.5 9 10-8 synonymous/substitution site/year for tomato (Blanc and Wolfe 2004). Upstream sequence analysis of SlLEA genes We used the expressed sequence tag (EST) information to define the transcription start site (TSS) of each SlLEA gene. Once the TSS starting positions of SlLEA genes were determined in the tomato genome, 1,000 bp genomic sequences upstream of the TSS were acquired from tomato genomic database. PLACE (http://www.dna.affrc.go.jp/ PLACE/signalscan.html) (Higo et al. 1999), a database of plant cis-acting regulatory DNA elements, was used to search for stress-responsive elements in the promoter regions of the SlLEA genes. Detection of recombination events Alignment results using CDS in different LEA groups were analyzed. Potential recombination events between divergent nucleotide sequences were explored by the recombination detection program RDP v3.44 (Martin et al. 2010), which embeds different methods for detecting recombination signals. In this study, we used the RDP (Martin and Rybicki 2000), Geneconv (Padidam et al. 1999), and MaxChi (Posada and Crandall 2001) methods to detect signals. The highest acceptable P value cutoff was set to 0.05. Significance was evaluated with 100 permutation tests. Site-specific selection assessment and testing We used a Bayesian inference approach [Selecton Server (http://selecton.tau.ac.il/)] (Stern et al. 2007) for site-specific positive and purifying selection calculations. Here, Ka/ Ks values are used to estimate two types of substitution events by calculating the synonymous rate (Ks) and the non-synonymous rate (Ka), at each codon. In this study, four evolutionary models [M8 (xs C 1), M8a (xs = 1), M7 (beta) and M5 (gamma)] were used to describe, in

probabilistic terms, how the characters evolve. Each of the models uses different biological assumptions and the model that best fits the data was selected. These models all assume a statistical distribution to account for heterogeneous Ka/Ks values among sites. The distributions are approximated using eight discrete categories and the Ka/Ks values are computed by calculating the expectation of a posterior distribution (Stern et al. 2007). Microarray-based expression analysis Genome-wide microarray data of tomato published by The Tomato Genome Consortium (Tomato Genome Consortium 2012) were obtained from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) with Accession Number GSE33507. Expression data were gene-wise normalized and hierarchically clustered based on Pearson coefficients with average linkages in the Genesis (v 1.7.6) program (Sturn et al. 2002). Plant materials and treatments In this study, we used one-week-old tomato seedlings (Solanum lycopersicum L. cv Hezuo 908) from Shanghai Changzhong tomato seed industry co., LTD, to examine the expression patterns of LEA genes under different stress treatments. Plants were grown in a plant growth chamber at 23 ± 1 °C with a 14 h light/10 h dark photoperiod. Tomato seedlings were kept at 4 ± 1 °C for 3 h for cold treatment. For drought treatment, the seedlings were dried between folds of tissue paper at 23 ± 1 °C for 3 h. For salt stress, the roots of the seedlings were put in 150 mM NaCl for 24 h. Control (CK) seedlings were grown at 23 ± 1 °C with normal irrigation. Three replicates were performed for each sample. RNA isolation and quantitative real-time PCR (qRTPCR) analysis We used the Trizol total RNA extraction kit (Sangon, Shanghai, China, SK1321) to extract total RNA which then treated with DNase I (TakaRa, Dalian, China), and used M-MLV (TakaRa) to perform reverse transcription used random primers with total RNA (2 lg). Triplicate quantitative assays were performed on about 8 ng of each cDNA dilution using SYBR Green Master Mix (TakaRa) with an ABI 7500 sequence detection system, according to the manufacturer’s protocol. The gene-specific primers were synthesized in Sangon. Eight SlLEA genes were randomly selected for qRT-PCR analysis from different major branches of the phylogenetic tree. The expression level of the tomato Actin (Solyc01g104770.2) gene was used as the

123

Name

SlLEA8

SlLEA10

SlLEA6

SlLEA7

SlLEA1

SlLEA5

SlLEA24

SlLEA23

SlLEA25

SlLEA13

SlLEA17

SlLEA15

SlLEA27

SlLEA22

SlLEA2

SlLEA3

SlLEA16

SlLEA11

SlLEA20

Group

Dehydrin

123

Dehydrin

Dehydrin

Dehydrin

Dehydrin

Dehydrin

LEA_1

LEA_1

LEA_1

LEA_1

LEA_1

LEA_2

LEA_2

LEA_2

LEA_2

LEA_2

LEA_3

LEA_3

LEA_3

Solyc09g075210.2

Solyc06g009140.2

Solyc08g074720.2

Solyc01g095150.2

Solyc01g095140.2

Solyc10g011900.2

Solyc12g017520.1

Solyc08g015690.2

Solyc09g008770.2

Solyc06g061110.1

Solyc10g078780.1

Solyc10g078760.1

Solyc10g078770.1

Solyc01g109920.2

Solyc01g065820.1

Solyc02g084840.2

Solyc02g062390.2

Solyc04g082200.2

Solyc02g084850.2

ID

97

94

101

174

160

185

73

317

435

126

221

93

88

249

78

157

133

206

130

Length

-0.441

-0.46

-0.487

-0.157

-0.143

-0.582

-1.023

-0.372

-1.149

-0.816

-0.659

-1.165

-1.065

-0.838

-1.636

-1.261

-0.984

-1.488

-1.368

GRAVY

Table 1 Characteristics of the LEA proteins in S. lycopersicum

10385.6

10266.6

11303

19122.9

17635.2

20981.8

8076.9

35247.9

46663.1

13557.2

21881.6

10032.1

9258.3

25363.8

8871

16660

14432.2

23111.4

13948.2

MW

9.89

9.84

9.72

4.6

4.64

4.84

4.78

4.65

5.51

9.56

7.37

9.3

9.48

7.09

9.82

7.23

9.02

5.12

6.06

PI

48.42 (unstable)

39.73 (stable)

34.79 (stable)

26.47 (stable)

22.18 (stable)

26.28 (stable)

21.95 (stable)

23.67 (stable)

28.99 (stable)

48.37 (unstable)

6.68 (stable)

23.90 (stable)

8.28 (stable)

25.61 (stable)

35.35 (stable)

22.40 (stable)

31.45 (stable)

57.55 (unstable)

32.99 (stable)

Instability index

0.4409

0.3863

0.3691

0.2662

0.2399

0.328

0.5973

0.2809

0.6766

0.7937

0.825

0.8261

0.7921

0.8177

0.8925

0.8391

0.7823

0.7535

0.8337

Disorder tendency

Chloroplast (1.912); mitochondrial (1.355)

Mitochondrial (1.493); nuclear (1.403)

Mitochondrial (1.854); nuclear (1.306)

Cytoplasmic (3.429)

Cytoplasmic (2.183)

Cytoplasmic (2.904)

Nuclear (1.657); mitochondrial (1.119); cytochondrial (1.032)

Cytoplasmic (3.751)

Nuclear (1.991); cytoplasmic (1.542)

Nuclear (2.644)

Nuclear (1.475)

Nuclear (2.817)

Nuclear (2.282)

Nuclear (2.848)

Nuclear (2.431)

Nuclear (2.881)

Nuclear (3.412)

Nuclear (3.024)

Nuclear (2.664)

CELLO localization (reliability)

Chloroplast (0.903); mitochondrial (0.759)

Chloroplast (0.625); mitochondrial (0.581)

Mitochondrial (0.811)

Plasma membrane(0.700)

Cytoplasmic (0.650)

Nuclear (0.820)

Peroxisome (0.463); cytoplasmic (0.450)

Cytoplasmic (0.650)

Cytoplasmic (0.450); mitochondrial (0.424)

Cytoplasmic (0.450)

Peroxisome (0.640); cytoplasmic (0.450)

Cytoplasmic (0.650)

Nuclear (0.760)

Nuclear (0.600)

Nuclear (0.945)

Nuclear (0.600)

Nuclear (0.700)

Nuclear (0.700)

Peroxisome (0.364)

PSORT prediction (affirmative)

Planta

Cytoplasmic (0.450) Cytoplasmic (1.717); chloroplast (1.107) 0.5749 36.69 (stable) 4.7 26140 -0.272 258

SlLEA4

SlLEA21

SMP

SMP

Solyc09g082110.2

Cytoplasmic (0.450) Nuclear (2.242) 0.6166 50.97 (unstable) 8.66 21038.3 -0.632 200

SlLEA19 SMP

Solyc01g097960.2

Outside (0.767) Chloroplast (1.618); cytoplasmic (1.471) 0.4667 35.87 (stable) 4.5 17299.1 -0.156 167

SlLEA18 LEA_5

Solyc09g065360.2

Cytoplasmic (0.650) Nuclear (2.595) 0.8219 47.20 (unstable) 9.05 10874.8 -1.406 101

SlLEA12 LEA_5

Solyc09g014750.1

Cytoplasmic (0.650) Nuclear (2.594) 0.8423 53.01 (unstable) 5.47 10035.8 -1.439 93

SlLEA26 LEA_4

Solyc06g048840.2

Cytoplasmic (0.650) Nuclear (1.685); cytoplasmic (1.290) 0.6044 32.68 (stable) 6.39 44071.2 -0.742 428

SlLEA14 LEA_4

Solyc12g010820.1

Cytoplasmic (0.650) Cytoplasmic (1.886); nuclear (1.853) 0.6225 27.09 (stable) 5.43 56415.2 -0.826 532

SlLEA9 LEA_4

Solyc07g053360.2

Vacuole (0.879); outside (0.820) Nuclear (1.674); extracellular (1.122) 0.5303 27.64 (stable) 7.66 43557.4 -0.915 408

Name

Solyc02g085150.2

endogenous control. The relative expression level was calculated as 2-DDCT method (Livak and Schmittgen 2001). LEA gene-specific primers are shown in Table 5.

Group

Table 1 continued

ID

Length

GRAVY

MW

PI

Instability index

Disorder tendency

CELLO localization (reliability)

PSORT prediction (affirmative)

Planta

Results Identification, annotation and characteristics of the SlLEA family genes To identify LEA family genes in tomato, we used Arabidopsis LEA proteins (Hundertmark and Hincha 2008) as query sequences to perform BLAST searches against the tomato database (Tomato Genome Consortium 2012) in phytozome (http://www.phytozome.net/). The tomato sequences returned from such searches were confirmed as encoding LEA using the Pfam program (Punta et al. 2012) to conduct searches for the presence of LEA, SMP (seed maturation protein), dehydrin, and small hydrophilic plant seed protein signatures conserved in other LEA family proteins (Hundertmark and Hincha 2008). As a result, 27 LEA genes were identified in tomato (Table 1). All these SlLEAs were named depending on their position from the top to the bottom of the tomato chromosomes 1–12. SlLEA genes encode polypeptides of 73–532 amino acids in length, with a predicted molecular mass range of 8.07–56.4 kDa (Table 1). According to the Pfam signatures and sequence similarities to the Arabdiopsis LEAs (Hundertmark and Hincha 2008), the SlLEA genes could be divided into seven groups: LEA_1, LEA_2, LEA_3, LEA_4, LEA_5, dehydrin and SMP. Physicochemical analysis revealed that 14 members (51.8 %) were of a basic nature (pI [ 7), while 13 proteins (48.2 %) were considered to be acidic (pI \ 7) (Table 1). This latter group included the LEA_2 and LEA_4 proteins. The most acidic protein was found in the SMP group (SlLEA19), while the most basic protein was found in the LEA_3 group (SlLEA20). Over 77.8 % of SlLEA proteins have a low instability index (less than 40), indicating that most SlLEA proteins are stable in a test tube (Table 1). One feature of LEA proteins is their natively unfolded nature. To identify which SlLEAs are the intrinsically disordered proteins (IDPs), we predicted the intrinsically unstructured regions of proteins with IUPred Server (http://iupred.enzim.hu/) (Doszta´nyi et al. 2005). The result indicated that over 70 % of SlLEA proteins are predicted to be IDPs (Table 1). Interestingly, we also found that all members of the dehydrin, LEA_1, LEA_4, and LEA_5 groups belong to the IDPs, and that all members of the LEA_3 group and most proteins of the LEA_2 group are part of the ordered (or folded) proteins. We further predicted the probable protein localization for each of the candidate SlLEAs, and found that most of the tomato LEA proteins target to nucleus or

123

Planta +4/-14 +4/-24 +21/-5 +3/-7

R52 +23/-20

+17/-12

Eu56

+7/-11 B55

+12/-14

+0/-3 +26/-7

+8/-34 +0/-5

A51

+7/-7 +4/-1

+22/-0

1

+6/-0

T32

E29

V7

+13/-24 +1/-9 +1/-18 +8/-19 +0/-6

(22) M. truncatula (48) G. max (51) A. thaliana (53) B. rapa (27) S. lycopersicum (25) S. pennellii (40) B. distachyon (32) O. sativa (15) S. moellendorffii (18) P. patens (1) C. reinhardtii

Fig. 1 Gene gain and loss of LEAs in the evolution of plants. The names of internal nodes are abbreviated (V Viridiplantae, E Embryophyte, T Tracheophyte, A Angiosperm, Eu eudicots, R Rosid, B Brassicaceae, P Papilionoideae). The numbers of common

ancestors at the seven internal nodes (V, E, T, A, G, Eu and R) are shown in the rectangles. Numbers after the plus signs indicate the numbers of gene gain events, whereas numbers after the minus signs indicate gene loss events

cytoplasm. Some other members were predicted to localize from mitochondria, chloroplast to the extracellular matrix. The Grand Average of hydropathy index (GRAVY) showed that all SlLEA proteins were hydrophilic, and the LEA_5 and dehydrin proteins were particularly hydrophilic with average -1.42 and -1.26 GRAVY values, respectively.

After that, many LEA genes have been lost in these species. It appears that the LEA family had been reduced in all the eudicot species analyzed compared with the numbers in the MRCA of the eudicots. For example, the number of LEAs decreased approximately 52 and 60 % since the divergence of the various eudicot species from their respective MRCA in tomato and Medicago truncatula, respectively. Whereas, when the number of ancestral genes was compared with those in the chlamydomonas, it appeared that the LEA family had expanded in all the other species analyzed. In addition, the expansion was uneven among these plant species. For instance, there are 48, 25, 40 and 15 genes in soybean, Solanum pennellii, Brachypodium distachyon, and Selaginella moellendorffii, respectively, while the estimated numbers of genes in the MRCA of Viridiplantae are seven. Therefore, soybean, S. pennellii, B. distachyon, and S. moellendorffii have gained 17, 18, 33 and 8 genes, respectively, since diverging from the Viridiplantae. The numbers of genes gained in the B. rapa lineage are much greater than that in other lineages.

Estimation of the maximum number of gained and lost LEA genes in 11 species To better understand how LEA genes have evolved in the Plantae, we also estimated the maximum number of gained and lost LEA genes in 11 plant species. Our search for LEAs in Chlamydomonas reinhardtii found only one member. The non-vascular moss, Physcomitrella patens has eighteen LEA genes. The expansion of the LEA gene family continued with Brassica rapa exhibiting 53 paralogous gene sequences. Reconciliation of gene trees with the species phylogeny suggested that there was one ancestral LEA gene in the most recent common ancestor (MRCA) of Viridiplantae (Fig. 1). Furthermore, we identified 29 orthologous genes in the Embryophyte MRCA and 32 in the MRCA of Tracheophyte (Fig. 1). We also found that the number of LEAs remained relatively high through evolutionary history from the land plants (P. patens) to the angiosperms. We identified about fifty-six ancestral LEA genes in the MRCA of eudicots.

123

Gene structure and motif composition in Arabidopsis and tomato LEAs Analysis of gene structures and conserved motif compositions provides additional clues about the evolutionary relationships of multigene families (Cao et al. 2012; Cao

Planta

and Shi 2012). To gain further insight into the organizational diversity of LEA genes, we constructed an unrooted tree for examining their phylogenetic relationships, and then compared the exon–intron organization of the coding sequences of individual LEA genes in Arabidopsis and tomato. A detailed illustration of exon–intron structures is shown in Fig. 2. Our results indicated a strong correlation 94

between LEA protein phylogeny and LEA gene exon– intron structure. In other words, the most closely related members in the same clade generally showed a very similar exon–intron pattern. We further assessed the conserved motifs in the tomato LEA family by the MEME program (http://meme.sdsc.edu) (Bailey et al. 2006). As a result, 25 distinct motifs were

AT1G52690 AT3G15670

69

AT3G02480 25

41

AT4G13560 SlLEA9

26

LEA_4

AT2G18340

29 95

AT4G36600 AT4G21020

5

99

AT5G44310

100

AT2G03740

31

AT2G03850 AT1G72100 AT4G13230

7

LEA_1

15

SlLEA17

41

AT2G36640

49 100

AT3G53040

60

AT2G42530

24

LEA_4

AT2G42540 AT3G17520

4

AT2G42560 SlLEA14

19

4

33

SlLEA26

99

AT2G40170

81

LEA_5

SlLEA12

100

SlLEA18 AT3G51810 97

30

AT1G20440 AT1G76180

64

AT1G20450

43 61

AT4G38410 SlLEA10

Dehydrin

58

5

64

AT3G50970 SlLEA1

75 42 33

AT3G50980 SlLEA7

82

AT2G21490 SlLEA6 67

31

AT4G39130 SlLEA5 AT5G66400

26 76

SlLEA8

29

AT1G54410

LEA_2

SlLEA27 98

PvLEA18 4

AT2G23110 AT2G33690

100 21

AT5G53260 AT5G53270 AT1G03120

100

SMP

SlLEA19 79

SlLEA21 87 100 73

AT3G22490 AT3G22500 AT5G27980

48

AtM

71

SlLEA4

100

AT2G41260 AT2G41280

52 93

LEA_1

AT5G06760 SlLEA23

54

SlLEA24 SlLEA25

73

SlLEA13 26

AT1G32560

100 68

PvLEA18

AT2G35300 AT2G23120

54 68

LEA_2

SlLEA3 SlLEA22

82

13

SlLEA2

100

AT2G46140

94

AT1G01470 AT2G44060 100

26

SlLEA15 AT3G53770 AT4G15910

98

LEA_3

SlLEA16

99

AT1G02820

77

SlLEA20

87

AT4G02380

68 56

SlLEA11

0 phase intron

1 phase intron

2 phase intron

Fig. 2 Phylogenetic relationships, gene structure and motif composition of LEA proteins in Arabidopsis and tomato. The phylogenetic tree of Arabidopsis and tomato LEA proteins (left panel) was constructed from a complete alignment of 78 LEA proteins using MEGA 5.2 by the NJ method with 1,000 bootstrap replicates. The major groups are marked with different color backgrounds.

Conservation and variability of intron positions and phases of the LEA genes are shown in the middle panel. The 0, 1, and 2 phase introns are marked with triangles, diamonds, and rectangles, respectively. A schematic representation of conserved motifs obtained using MEME in LEA proteins is displayed in the panel on the right. Different motifs are represented by different colored boxes

123

Planta

putative segmental duplication events (Fig. 3). We further determined tandem duplications of SlLEA genes among the tomato chromosomes. As indicated in Fig. 3, three SlLEA gene clusters (SlLEA2-2 cluster; SlLEA7-8-9 cluster; SlLEA23-24-25 cluster) containing 8 tandemly duplicated genes were identified on chromosomes 1, 2, and 10, respectively. To trace the dates of the duplication blocks, we estimated evolutionary dates of duplicated SlLEA genes using Ks as the proxy for time (Table 2). Segmental duplications of LEA genes in tomato originated from 3.4 Mya (Ks = 0.10123) to 71.8 Mya (Ks = 2.15522) while tandem

identified in these proteins (Fig. 2). Motifs 8, 7 and 3 constitute the dehydrin domain. Motif 24 and 10 make up the LEA_1 and LEA_5 domain, respectively. The LEA_2 and LEA_3 domain includes motifs 6 and 12, and motifs 19 and 4, respectively. We also found that motifs 16 and 9 are further identified as part of the LEA_4 domain, and motifs 22 and 5 represent the potential SMP domain. Chromosomal location and duplication of tomato LEA genes To further investigate the relationship between the genetic divergence within the LEA family and gene duplication and loss in tomato, we determined the chromosomal location of each LEA gene. The results demonstrated that 27 SlLEA genes are located on nine different tomato chromosomes, with chromosomes 3, 5, and 11 being devoid of LEA genes (Fig. 3). Five SlLEA genes are present on chromosomes 1 and 9; four on chromosomes 2 and 10; three on chromosome 6; two on chromosomes 8 and 12; and one on chromosomes 4 and 7. Segmental duplication, tandem duplication and transposition events are the main reasons for gene family expansion (Cao 2012; Chen and Cao 2014). To analyze potential duplications and the evolutionary patterns of LEA genes in tomato, we detected 9 pairs of paralogous SlLEA genes based on phylogenetic analysis. Among them, seven pairs of paralogous genes were Chr.1

Chr.2

Chr.3 Chr.4

Chr.5 Chr.6

Table 2 Inference of duplication time of LEA paralogous pairs in tomato Paralogous pairs

Ka

Ks

Ka/Ks

Data (Mya)

SlLEA24/SlLEA25

0.28694

1.73676

0.16522

57.9

SlLEA19/SlLEA21

0.41715

2.15522

0.19355

71.8

SlLEA9/SlLEA17

1.04873

0.78552

1.33508

26.2

SlLEA11/SlLEA20

0.23978

0.63548

0.37732

21.2

SlLEA12/SlLEA18

0.29023

0.84732

0.34253

28.2

SlLEA14/SlLEA26

0.56260

0.92905

0.60556

30.9

SlLEA2/SlLEA3 SlLEA15/SlLEA27

0.20372 0.05066

0.67674 1.35531

0.30103 0.03738

22.6 45.2

SlLEA1/SlLEA5

0.06774

0.10123

0.66917

3.4

Chr.7

Chr.8

Chr.9

Chr.10

SlLEA17

SlLEA11

SlLEA22

SlLEA15

Chr.11 Chr.12 SlLEA26 SlLEA27

SlLEA18

SlLEA12

SlLEA6

SlLEA13 SlLEA7 SlLEA8 SlLEA9

SlLEA16 SlLEA14 SlLEA1

SlLEA10

SlLEA19 SlLEA20 SlLEA21

SlLEA23 SlLEA24 SlLEA25

SlLEA2 SlLEA3 SlLEA4 10 Mb SlLEA5

Fig. 3 Positions of the LEA family genes on the tomato chromosomes. Scale represents a 10 Mb chromosomal distance. Segmental duplicate genes are linked by a black line

123

Planta -1000bp

-900bp

-800bp

-700bp

-600bp

-500bp

-400bp

-300bp

-200bp

-100bp

TSS

LEA_4

SMP

LEA_1

Dehydrin

LEA_2

LEA_1 LEA_5

LEA_3

LTRE

ABRE

MYC

Fig. 4 Distribution of major abiotic stress-responsive cis-elements in the promoter sequences of the 27 SlLEA genes. Three putative ciselements are represented by different symbols as indicated. The major

groups are marked with different color backgrounds according to Hundertmark and Hincha (2008)

duplication events occurred between 22.6 and 57.9 Mya. The oldest duplication of LEA genes (SlLEA19/SlLEA21) occurred approximately 71.8 Mya. We also found that duplication events in 5 of 9 pairs of paralogous SlLEA genes occurred between 21.2 and 30.9 Mya. In addition, we also estimated that the duplication event of the SlLEA1/ SlLEA5 gene pair occurred about 3.4 Mya, after the tomato–potato split (Wang et al. 2008a).

Table 3 The predicted recombination events for the LEA gene family in tomato

Cis-regulatory element analysis in the promoters of SlLEA genes To obtain hints for how expression of the SlLEA genes may be regulated, potential cis-elements in the 1,000 bp

Groups

Recombination Methods

Genes undergone recombination events

RDP

GENECONV

MaxChi

Dehydrin

0

2

1

SlLEA1, SlLEA5, SlLEA6, SlLEA8

LEA_1

0

0

1

SlLEA23, SlLEA25

LEA_2

1

0

1

SlLEA2, SlLEA22

LEA_3

0

0

0

Not detected

LEA_4

4

1

2

SlLEA9, SlLEA14, SlLEA26

LEA_5









SMP

1

0

1

SlLEA19, SlLEA21

123

Planta Fig. 5 Identification of recombination events between SlLEA6 and SlLEA8 genes. The plot display of recombination events was detected by the MaxChi method

SlLEA6 SlLEA8

-Log[Chi2 P-Val]

3.30270

2.47700

1.65130

0.825690

0.00000 1

203

407

611

815

Position in alignment SlLEA8 SlLEA6

SlLEA6

-Log[Chi2 P-Val]

4.72640 Sites excluded from analysis Tract of sequence with a recombinant origin SlLEA8 - SlLEA6 Bonferroni corrected P-Value = 0.05

3.54480

2.36320

1.18160

0.00000 1

203

407

611

815

Position in alignment

promoter regions upstream of the transcription start site (TSS) of the SlLEA genes were identified by searching the PLACE database (Higo et al. 1999). The results revealed that SlLEA genes upstream sequences carry a variety of potential cis-regulatory elements, including low-temperature-responsive elements (LTRE), ABA-responsive elements (ABRE), myelocytomatosis regulatory elements (MYC), etc. The patterns of cis-elements differed significantly among the SlLEA genes. The SlLEA10 gene promoter contained 10 elements, while SlLEA16 and SlLEA19 lacked any identifiable elements using the software and procedure noted above. At least one ABRE, LTRE, and MYC element was found in 81.48, 48.15, and 37.04 % of the SlLEA promoters, respectively (Fig. 4), suggesting that SlLEA genes may play critical roles in abiotic stress tolerance. Through comparing the distribution of the three regulatory elements in the promoter regions, all sister pairs of SlLEA genes were found to exhibit significant differences in their promoter sequences, indicating expression divergence between duplicated genes, which might represent an important evolutionary mechanism of

123

neofunctionalization or subfunctionalization (Prince and Pickett 2002). Recombination analysis within SlLEA genes Recombination signals of SlLEA genes were investigated with the RDP (Martin and Rybicki 2000), Geneconv (Padidam et al. 1999), and MaxChi (Posada and Crandall 2001) methods embedded in the program RDP v3.44 (Martin et al. 2010). Five groups were found to contain similar mosaic segments, demonstrating that intragenic recombination had occurred. As summarized in Table 3, thirteen SlLEA genes in these groups exhibited evidence of intragenic recombination (P \ 0.05 based on 100 permutations). As an example, we presented a recombination event of SlLEA6–SlLEA8 detected by the MaxChi method (Fig. 5). A significant recombination event has occurred between the 50 - and 30 -ends of SlLEA8 and the intermediate portion of SlLEA6. Our results indicated that SlLEA genes underwent frequent recombination events.

Planta

Selective pressure at amino acid sites in the SlLEA family members To test for the presence of positive or negative selection at individual amino acids, the Ka/Ks ratios were calculated Table 4 Likelihood values and parameter estimates of the selection pressure for SlLEA proteins Gene branches

Selection models

Ka/Ks

Loglikelihood

Dehydrin

M8 (xs C 1)

0.454

-3122.35

0

0

M8a (xs = 1)

0.4587

-3121.05

0

0

M7 (beta)

0.4528

-3122.79

0

0

M5 (gamma)

0.5457

-3121.52

21

M8 (xs C 1)

0.6690

-3453.58

0

0

M8a (xs = 1)

0.7165

-3453.8

0

0

M7 (beta)

0.5746

-3454.06

0

0

M5 (gamma) M8 (xs C 1)

0.6725

-3455.69

25

0.3091

-2814.15

0

0

M8a (xs = 1)

0.3097

-2813.81

0

0

M7 (beta)

0.2987

-2813.06

0

0

M5 (gamma)

0.3625

-2821.7

0

0

M8 (xs C 1)

0.5058

-938.76

0

0

M8a (xs = 1)

0.4908

-938.61

0

0

M7 (beta)

0.3397

-939.23

0

0

M5 (gamma)

0.4897

-938.786

14

M8 (xs C 1)

0.7084

-4930.25

0

0

M8a (xs = 1)

0.6999

-4930.27

0

0

M7 (beta)

0.6109

-4931.3

0

0

M5 (gamma)

0.7924

-4930.94

79

M8 (xs C 1)

0.2339

-2112.87

0

0

M8a (xs = 1)

0.2342

-2112.69

0

0

M7 (beta) M5 (gamma)

0.2782 0.2951

-2115.89 -2113.55

0 0

0 0

LEA_1

LEA_2

LEA_3

LEA_4

SMP

PST positive-selection site

Number of PSTs

Ratios of PSTs versus all residues

0.1019

0.0575

0.1386

0.1936

with the Selecton Server (http://selecton.tau.ac.il) (Stern et al. 2007). We used four evolutionary models [M8 (xs C 1), M8a (xs = 1), M7 (beta) and M5 (gamma)] implemented in this server to perform the tests. The results show that the Ka/Ks ratios of the sequences from different SlLEA groups are significantly different (Table 4). For example, higher Ka/Ks ratios existed in LEA_1 and LEA_4 groups, indicating a higher evolutionary rate or site-specific selective relaxation within members of the same group. Despite the differences in Ka/Ks values, all the estimated Ka/Ks values are substantially lower than 1, suggesting the SlLEA sequences within each of the groups are under strong purifying selection pressure. The selection models M8, M8a and M7 do not indicate the presence of positively selected sites, whereas the M5 model does in dehydrin, LEA_1, LEA_3 and LEA_4 groups (Table 4). Expression pattern of SlLEA genes in vegetative and reproductive tissues or stages Analysis of the expression of family genes can provide more insights into their possible role (Cao et al. 2011; Chen et al. 2014). To investigate the LEA gene expression patterns in tomato, systematic analyses of microarray data were carried out. Expression patterns of SlLEA genes in unopened flower buds, fully opened flowers, leaves, roots, and six additional fruit developmental periods are shown in Fig. 6. We found that all SlLEA genes were expressed in at least one of the tissues tested, and these genes displayed various expression levels. Expression of several genes exhibited unique profiles. For example, SlLEA11, SlLEA15 and SlLEA21 were highly expressed in all of these tissues with minimal variation in expression levels. SlLEA9 and SlLEA23 might play an important role in flower bud development due to their unusually high expression levels in this organ. SlLEA6, SlLEA10, and SlLEA16 showed higher expression levels in breaker fruit, indicating that they might be involved in the growth and development of the breaking fruit in tomato. Expression pattern of SlLEA genes in response to some abiotic stresses To gain insight into the comprehensive roles of SlLEA family members in response to some stresses, we investigated the expression patterns of eight SlLEAs detected in tomato seedlings subjected to salt, drought and cold treatments by quantitative real-time PCR. Gene-specific primers are listed in Table 5. The analysis revealed that these genes are differently expressed under these stress conditions (Fig. 7). Among the eight SlLEA genes, expression of five members (SlLEA3, SlLEA10, SlLEA13, SlLEA15 and SlLEA26) was upregulated with salt or drought treatments

123

Planta Fig. 6 Expression profiles of the tomato LEA genes. Dynamic expression profiles of SlLEA genes for 10 different development tissues or organ systems using publicly available microarray data

SlLEA9 SlLEA17 SlLEA14 SlLEA26 SlLEA19 SlLEA4 SlLEA21 SlLEA24 SlLEA23 SlLEA25 SlLEA8 SlLEA10 SlLEA6 SlLEA7 SlLEA1 SlLEA5 SlLEA15 SlLEA27 SlLEA22 SlLEA2 SlLEA3 SlLEA13 SlLEA12 SlLEA18 SlLEA16 SlLEA11 SlLEA20 -10

1:1

(Fig. 7), suggesting that they play specific roles under these stress conditions. Regarding cold treatment, the eight SlLEA genes tested exhibited different responses. Expression of SlLEA3, SlLEA10 and SlLEA22 was downregulated, whereas SlLEA9, SlLEA13 and SlLEA26 expression was upregulated. Expression data also demonstrated that SlLEA18 showed similar expression patterns in response to salt, drought and cold in comparison with the unstressed control (Fig. 7). In general, SlLEAs expression was much more sensitive to salt, drought and cold treatments.

Discussion LEA is a group of family proteins involved in cellular dehydration in many organisms. To date, the LEA genes have been identified and characterized in some species. Nevertheless, little is known about the tomato LEA family. In this study, a total of 27 LEA family members have been identified in tomato and divided into 7 groups based on sequence similarities to the Arabidopsis LEAs

123

4

(Hundertmark and Hincha 2008). Comparative genomic analysis is an effective method for studying gene structures (Guo et al. 2013). In this study, we assessed the conserved motifs of the predicted LEA proteins in Arabidopsis and tomato by the MEME program. The majority of the LEA proteins in the same clade (or group) shared similar motifs, suggesting that these conserved motifs may play crucial roles in clade-specific functions. However, high divergence was found in the structures between the different clades. For example, LEA_3 contains motifs 4 and 19, whereas LEA_5 has motif 10, etc., suggesting the complex nature of the function of LEA proteins in Arabidopsis and tomato. The motif distribution indicated that the genes containing the same motifs are likely produced from gene expansion within the same clades or groups. On the other hand, the divergence in motif composition among different clades suggested that they may have evolved from different ancestors with various motif structures in evolution. Segmental duplication, tandem duplication and transposition events were the main reasons for gene family expansion (Kong et al. 2007). In our analysis, we found

Planta

the divergence of monocots–dicots around 170–235 Mya (Blanc and Wolfe 2004). The other was the recent polyploidy duplication occurring after the split of Arabidopsistomato around 90 Mya (Ku et al. 2000). The estimation of evolutionary dates of duplicated SlLEA genes (Table 2) showed that duplication events in 5 of 9 pairs of paralogous SlLEA genes occurred between 21.2 and 30.9 Mya. These results were supported by Blanc and Wolfe (2004), who estimated that secondary large-scale duplications in Plantae occurred within the past 30 million years. The implication is that the expansion of the LEA gene family in tomato might be a consequence of these large-scale genome duplications. Recombination plays a key role in the generation of genetic diversity. To determine whether homologous recombination shaped the evolution of SlLEA genes, we analyzed 25 SlLEA CDS segments to test whether some of them underwent an intragenic recombination event. We found that thirteen SlLEA genes exhibited evidence of intragenic recombination (P \ 0.05 based on 100 permutations). The results indicated that SlLEA genes underwent frequent recombination events. That is, intragenic recombination plays an important role in the evolution of SlLEA genes. Studies will be required to evaluate the influence of recombination on function and to study the mechanisms underlying SlLEAs recombination. The Ka/Ks ratio measures selection pressure on amino acid substitutions. A Ka/Ks ratio greater than 1 suggests positive selection and a ratio less than 1 suggests purifying selection (Hurst 2002). Amino acids in a protein are usually expected to be under different selective pressures and to have different Ka/Ks ratios (Yang et al. 2000). We also test

that seven pairs of LEA paralogs in tomato are involved in regional duplication (Fig. 3), that eight SlLEA genes (SlLEA2-2 cluster; SlLEA7-8-9 cluster; SlLEA23-24-25 cluster) are tandemly duplicated. These data indicated that segmental and tandem duplications predominate as the mechanisms responsible for SlLEA gene number expansion. There were at least two large-scale segmental duplication events in the evolutionary process of tomato. One was a chromosomal block duplication that occurred after Table 5 Primers used in this study Primer names

Primer sequences (50 –30 )

SlLEA3-F

TATTCCGGTCATTGGCAACA

SlLEA3-R

AGGAAGCTTATACTCGCCGCTAT

SlLEA9-F

GGCTGCGGGAGGGATAA

SlLEA9-R

GACCCAATCAGTCCAAGAATCAC

SlLEA10-F

GCATCCGTTGAAGAGACTGTTG

SlLEA10-R

TGAAATCAAACAAACCACGATCA

SlLEA13-F

ACGGCCACCACCATCATG

SlLEA13-R

ACGGTACTACCCACCACTGGAT

SlLEA15-F SlLEA15-R

GGGCTCAATAGCCTTGATTATGA CAAGCTCTGCACCACCAACA

SlLEA18-F

TCAACTTGATGCTAGGGCTAGACA

SlLEA18-R

TGAGCTTCAAGGCTTTTTCCA

SlLEA22-F

GGCGAAAACCACAACCCTAGT

SlLEA22-R

AAGATCGCGCTGTGTGGAA

SlLEA26-F

TGCGGACAAGTCTAGAGTTGCA

SlLEA26-R

CACAAACTTGCTGGCCTTTCT

Actin-F

TCCCAGCCATGTATGTTGCTAT

Actin-R

TGTACGGCCACTGGCGTATA

0.5

0 CK

Salt

Drought

1.5 1

0.5

CK

Salt

Drought

SlLEA15

SlLEA18

1 0.8 0.6 0.4 0.2

0 CK

Salt

Drought

4

0.5

Salt

Drought

CK

Salt

Drought

4

1.5

1

0.5

CK

4

Salt

1.8

SlLEA22

1.2 1 0.8 0.6 0.4 0.2

0

2

0 CK

Relative mRNA level

0.5

1

1.4

1.2

Relative mRNA level

1

1.5

4

1.4

1.5

2

0

0

4

2

Relative mRNA level

2

Relative mRNA level

1

Relative mRNA level

1.5

SlLEA13

SlLEA10

SlLEA9 Relative mRNA level

Relative mRNA level

2

2.5

2.5

2.5

SlLEA3

Drought

4

SlLEA26

1.6

Relative mRNA level

2.5

1.4 1.2 1 0.8 0.6 0.4 0.2 0

0 CK

Salt

Drought

4

CK

Salt

Drought

4

Fig. 7 Quantitative RT-PCR analysis of eight selected SlLEA genes under salt, drought, and low-temperature treatments. The relative expression level of each transcript is shown. Error bars indicate standard deviations (SD, n = 3) of independent biological replicates

123

Planta

for the presence of positive or negative selection at individual amino acids using the Selecton Server (Stern et al. 2007). The results show that the Ka/Ks ratios of the sequences from different SlLEA groups are significantly different (Table 4), and some positively selected sites are also predicted. These observations suggested that selection spurs the potential for amino acid diversity at some residues, whereas other residues are evolving under purifying or neutral selection. These positively selected residues might have changed the protein structure, thus accelerated functional divergence during long periods of evolution. Since gene expression patterns can provide important clues for gene function, we first examine the expression of SlLEA genes in different tissues and developmental periods by microarray data. The expression profiles reveal spatial variations in the expression of SlLEAs in different organs. The LEA genes highly expressed in specific organs of plants will likely play key roles in plant development. SlLEA9 and SlLEA23 had a higher expression in flower bud, suggesting important roles of these genes in this organ development. SlLEA6, SlLEA10, and SlLEA16 showed higher expression levels in breaker fruit, indicating that they might be involved in the growth and development of the breaking fruit in tomato. SlLEA4 with maximum similarity with At5g27980 (Fig. 2) had similar expression patterns. At5g27980 may regulate pollen germination and tube growth because of highly expressed in the mature pollen (Schmid et al. 2005; Wang et al. 2008b). Overall, the tissue-preferential expression displayed by some SlLEA genes could be indicative of their involvement in specific plant tissues and developmental processes. In addition, expression profiles of the SlLEA family genes under different stress conditions (i.e. salt, drought, and cold) were also examined using the qRT-PCR in our study. Detailed expression profiles of the SlLEA genes under these stresses are shown in Fig. 7. The data revealed that 5, 6, and 3 SlLEA genes were upregulated under salt, drought, and cold stress conditions, respectively. Moreover, of eight detected SlLEA genes, five were upregulated under more than one stress condition. For examples, SlLEA3, SlLEA10, SlLEA13, SlLEA15, and SlLEA26 were upregulated in both salt and drought treatments. Obviously, SlLEA gene expression is influenced by a broad range of abiotic stress resistances, indicating that they may play different roles in regulation of plant responses to various abiotic stress-regulated biological processes. Thus, highly or differentially expressed SlLEA genes reported in this study may play a regulatory role in tomato development or resistance regulation. More research is needed to determine the functions of the SlLEA genes. In conclusion, this study provided a comparative genomic analysis addressing phylogeny, chromosomal location, cis-elements analysis, selective pressures, and

123

expression profiling of the LEA gene family in tomato. The exon–intron organization and motif compositions of the LEAs are highly conserved in each clade, indicative of their functional conservation. LEA genes were non-randomly distributed across the tomato chromosomes, and a high proportion of the LEA genes might be derived from segmental duplication and tandem duplications. Some abiotic stress-responsive cis-elements were also found in the upstream sequence of most of the SlLEAs. An additional intragenic recombination played an important role in the evolution of SlLEA genes. Selection analyses suggested that site-specific selection may play an important role in LEA multi-functionalization. Furthermore, comprehensive analysis of the expression profiles provided insights into possible functional divergence among members of the LEA gene family. These data may provide valuable information for future functional investigations of this gene family. Author contribution JC carried out the gene structure, estimation of gained and lost LEA genes, location and duplication, recombination analysis, site-specific selective pressure analyses, qRT-PCR, and drafted the manuscript. XL identified the LEA gene family and participated in the phylogenetic and cis-regulatory element analysis. JC and XL conceived the study, participated in its design and coordination. All authors read and approved the final manuscript. Acknowledgments This project is supported by grants from the National Science Foundation of China (No. 31100923), the National Science Foundation of Jiangsu Province (BK2011467), the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), and Jiangsu University ‘‘Youth Backbone Teacher Training Project’’ (2012-2016) to JC. Conflict of interest peting interests.

The authors declare that they have no com-

References Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34:W369–W373 Battaglia M, Covarrubias AA (2013) Late Embryogenesis Abundant (LEA) proteins in legumes. Front Plant Sci 4:190 Blanc G, Wolfe KH (2004) Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16:1667–1678 Cao J (2012) The pectin lyases in Arabidopsis thaliana: evolution, selection and expression profiles. PLoS One 7:e46944 Cao J, Shi F (2012) Evolution of the RALF gene family in plants: gene duplication and selection patterns. Evol Bioinform Online 8:271–292 Cao J, Huang J, Yang Y, Hu X (2011) Analyses of the oligopeptide transporter gene family in poplar and grape. BMC Genom 12:465

Planta Cao J, Shi F, Liu X, Huang G, Zhou M (2012) Phylogenetic analysis and evolution of aromatic amino acid hydroxylase. FEBS Lett 584:4775–4782 Chen Y, Cao J (2014) Comparative genomic analysis of the Sm gene family in rice and maize. Gene 539:238–249 Chen K, Durand D, Farach-Colton M (2000) NOTUNG: a program for dating gene duplications and optimizing gene family trees. J Comput Biol 7:429–447 Chen Y, Hao X, Cao J (2014) Small auxin upregulated RNA (SAUR) gene family in maize: identification, evolution, and its phylogenetic comparison with Arabidopsis, rice and sorghum. J Integr Plant Biol 56:133–150 Comeron JM (1999) K-Estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals. Bioinformatics 15:763–764 Doszta´nyi Z, Csizmok V, Tompa P, Simon I (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21:3433–3434 Duan J, Cai W (2012) OsLEA3-2, an abiotic stress induced gene of rice plays a key role in salt and drought tolerance. PLoS One 7:e45117 Dure L III, Greenway SC, Galau GA (1981) Developmental biochemistry of cottonseed embryogenesis and germination: changing messenger ribonucleic acid populations as shown by in vitro and in vivo protein synthesis. Biochemistry 20:4162– 4168 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797 Garay-Arroyo A, Colmenero-Flores JM, Garciarrubio A, Covarrubias AA (2000) Highly hydrophilic proteins in prokaryotes and eukaryotes are common during conditions of water deficit. J Biol Chem 275:5668–5674 Goyal K, Tisi L, Basran A, Browne J, Burnell A, Zurdo J, Tunnacliffe A (2003) Transition from natively unfolded to folded state induced by desiccation in an anhydrobiotic nematode protein. J Biol Chem 278:12977–12984 Goyal K, Walton LJ, Tunnacliffe A (2005) LEA proteins prevent protein aggregation due to water stress. Biochem J 388:151–157 Grelet J, Benamar A, Teyssier E, Avelange-Macherel M-H, Grunwald D, Macherel D (2005) Identification in pea seed mitochondria of a late-embryogenesis abundant protein able to protect enzymes from drying. Plant Physiol 137:157–167 Guo R, Xu X, Carole B, Li X, Gao M, Zheng Y, Wang X (2013) Genome-wide identification, evolutionary and expression analysis of the aspartic protease gene superfamily in grape. BMC Genom 14:554 Hand SC, Menze MA, Toner M, Boswell L, Moore D (2011) LEA proteins during water stress: not just for plants anymore. Annu Rev Physiol 73:115–134 Hara M, Terashima S, Kuboi T (2001) Characterization and cryoprotective activity of cold-responsive dehydrin from Citrus unshiu. J Plant Physiol 158:1333–1339 Hara M, Fujinaga M, Kuboi T (2005) Metal binding by citrus dehydrin with histidine-rich domains. J Exp Bot 56:2695–2703 Higo K, Ugawa Y, Iwamoto M, Korenaga T (1999) Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Res 27:297–300 Honjoh K, Oda Y, Takata R, Miyamoto T, Hatano S (1999) Introduction of the hiC6 gene, which encodes a homologue of a late embryogenesis abundant (LEA) protein, enhances freezing tolerance of yeast. J Plant Physiol 155:509–512 Houde M, Dallaire S, N’Dong D, Sarhan F (2004) Overexpression of the acidic dehydrin WCOR410 improves freezing tolerance in transgenic strawberry leaves. Plant Biotechnol J 2:381–387

Hundertmark M, Hincha DK (2008) LEA (late embryogenesis abundant) proteins and their encoding genes in Arabidopsis thaliana. BMC Genom 9:118 Hurst LD (2002) The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet 18:486–487 Ingram J, Bartels D (1996) The molecular basis of dehydration tolerance in plants. Annu Rev Plant Physiol Plant Mol Biol 47:377–403 Kong H, Landherr LL, Frohlich MW, Leebens-Mack J, Ma H, dePamphilis CW (2007) Patterns of gene duplication in the plant SKP1 gene family in angiosperms: evidence for multiple mechanisms of rapid gene birth. Plant J 50:873–885 Kosova´ K, Vı´ta´mva´s P, Pra´sˇil IT (2007) The role of dehydrins in plant response to cold. Biol Plant 51:601–617 Ku HM, Vision T, Liu J, Tanksley SD (2000) Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc Natl Acad Sci USA 97:9121–9126 Liu Y, Zheng Y (2005) PM2, a group 3 LEA protein from soybean, and its 22-mer repeating region confer salt tolerance in Escherichia coli. Biochem Biophys Res Commun 331:325–332 Liu Y, Zheng Y, Zhang Y, Wang W, Li R (2010) Soybean PM2 protein (LEA3) confers the tolerance of Escherichia coli and stabilization of enzyme activity under diverse stresses. Curr Microbiol 60:373–378 Liu G, Xu H, Zhang L, Zheng Y (2011) Fe binding properties of two soybean (Glycine max L.) LEA4 proteins associated with antioxidant activity. Plant Cell Physiol 52:994–1002 Liu Y, Wang L, Xing X, Sun L, Pan J, Kong X, Zhang M, Li D (2013) ZmLEA3, a multifunctional group 3 LEA protein from maize (Zea mays L.), is involved in biotic and abiotic stresses. Plant Cell Physiol 54:944–959 Livak KJ, Schmittgen TD (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2-DDCT method. Methods 25:402–408 Manfre AJ, Lanni LM, Marcotte WRJ (2006) The Arabidopsis group 1 LATE EMBRYOGENESIS ABUNDANT protein ATEM6 is required for normal seed development. Plant Physiol 140:140–149 Martin D, Rybicki E (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics 16:562–563 Martin DP, Lemey P, Lott M, Moulton V, Posada D, Lefeuvre P (2010) RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics 26:2462–2463 Oliveira E, Amara I, Bellido D, Odena MA, Dominguez E, Page`s M, Goday A (2007) LC-MSMS identification of Arabidopsis thaliana heat-stable seed proteins: enriching for LEA-type proteins by acid treatment. J Mass Spectrom 42:1485–1495 Olvera-Carrillo Y, Campos F, Reyes JL, Garciarrubio A, Covarrubias AA (2010) Functional analysis of the group 4 late embryogenesis abundant proteins reveals their relevance in the adaptive response during water deficit in Arabidopsis. Plant Physiol 154:373–390 Olvera-Carrillo Y, Luis Reyes J, Covarrubias AA (2011) Late embryogenesis abundant proteins: versatile players in the plant adaptation to water limiting environments. Plant Signal Behav 6:586–589 Padidam M, Sawyer S, Fauquet CM (1999) Possible emergence of new geminiviruses by frequent recombination. Virology 265:218–225 Posada D, Crandall KA (2001) Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA 98:13757–13762 Prince VE, Pickett FB (2002) Splitting pairs: the diverging fates of duplicated genes. Nat Rev Genet 3:827–837 Puhakainen T, Hess MW, Ma¨kela¨ P, Svensson J, Heino P, Palva ET (2004) Overexpression of multiple dehydrin genes enhances

123

Planta tolerance to freezing stress in Arabidopsis. Plant Mol Biol 54:743–753 Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD (2012) The Pfam protein families database. Nucleic Acids Res 40:D290– D301 Reyes JL, Campos F, Wei H, Arora R, Yang Y, Karlson DT, Covarrubias AA (2008) Functional dissection of hydrophilins during in vitro freeze protection. Plant Cell Environ 31:1781–1790 Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Scho¨lkopf B, Weigel D, Lohmann JU (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37:501–506 Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, Pupko T (2007) Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res 35:W506–W511 Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis of microarray data. Bioinformatics 18:207–208 Swire-Clark GA, Marcotte WR (1999) The wheat LEA protein Em functions as an osmoprotective molecule in Saccharomyces cerevisiae. Plant Mol Biol 39:117–128 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731–2739

123

Tolleter D, Hincha DK, Macherel D (2010) A mitochondrial late embryogenesis abundant protein stabilizes model membranes in the dry state. Biochim Biophys Acta 1798:1926–1933 Tomato Genome Consortium (2012) The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485:635–641 Tunnacliffe A, Wise MJ (2007) The continuing conundrum of LEA proteins. Naturwissenschaften 94:791–812 Wang Y, Diehl A, Wu F, Vrebalov J, Giovannoni J, Siepel A, Tanksley SD (2008a) Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae. Genetics 180:391–408 Wang Y, Zhang WZ, Song LF, Zou JJ, Su Z, Wu WH (2008b) Transcriptome analyses show changes in gene expression to accompany pollen germination and tube growth in Arabidopsis. Plant Physiol 148:1201–1211 Wolkers WF, McCready S, Brandt WF, Lindsey GG, Hoekstra FA (2001) Isolation and characterization of a D-7 LEA protein from pollen that stabilizes glasses in vitro. Biochim Biophys Acta 1544:196–206 Yang Z, Nielsen R, Goldman N, Pedersen AM (2000) Codonsubstitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449 Yu CS, Lin CJ, Hwang JK (2004) Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci 13: 1402–1406

Identification and phylogenetic analysis of late embryogenesis abundant proteins family in tomato (Solanum lycopersicum).

This study provided a comparative genomic analysis of the LEA gene family, and these may provide valuable information for their functional investigati...
1MB Sizes 3 Downloads 5 Views