Gene 551 (2014) 92–102

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Sequence determinants of prokaryotic gene expression level under heat stress Heng Xiong, Yi Yang, Xiao-Pan Hu, Yi-Ming He, Bin-Guang Ma ⁎ Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China

a r t i c l e

i n f o

Article history: Received 19 April 2014 Accepted 25 August 2014 Available online 26 August 2014 Keywords: Gene regulation Temperature adaptation Sequence analysis Transcriptome Proteome

a b s t r a c t Prokaryotic gene expression is environment-dependent and temperature plays an important role in shaping the gene expression profile. Revealing the regulation mechanisms of gene expression pertaining to temperature has attracted tremendous efforts in recent years particularly owning to the yielding of transcriptome and proteome data by high-throughput techniques. However, most of the previous works concentrated on the characterization of the gene expression profile of individual organism and little effort has been made to disclose the commonality among organisms, especially for the gene sequence features. In this report, we collected the transcriptome and proteome data measured under heat stress condition from recently published literature and studied the sequence determinants for the expression level of heat-responsive genes on multiple layers. Our results showed that there indeed exist commonness and consistent patterns of the sequence features among organisms for the differentially expressed genes under heat stress condition. Some features are attributed to the requirement of thermostability while some are dominated by gene function. The revealed sequence determinants of bacterial gene expression level under heat stress complement the knowledge about the regulation factors of prokaryotic gene expression responding to the change of environmental conditions. Furthermore, comparisons to thermophilic adaption have been performed to reveal the similarity and dissimilarity of the sequence determinants for the response to heat stress and for the adaption to high habitat temperature, which elucidates the complex landscape of gene expression related to the same physical factor of temperature. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Prokaryotes are widely distributed on the earth due to their prominent adaptation capability to environments. Microbes can be found in cold (temperature b 10 °C) and very hot (temperature N 100 °C) environments that are not suitable or even lethal for human and other eukaryotes to live. The molecular mechanisms for this superior capability have attracted intensive research efforts in recent years (Berezovsky and Shakhnovich, 2005; Hickey and Singer, 2004; Kumar and Nussinov, 2001; Ma et al., 2010; Mallik and Kundu, 2013; Mamonova et al., 2013; Sabath et al., 2013; Sterner and Liebl, 2001; Taylor and Vaisman, 2010; van Wolferen et al., 2013; Zeldovich et al., 2007). One important aspect of the prokaryotic molecular adaptation is about gene expression. Because gene expression is environment-dependent, how the gene expression level is adapted to temperature is intriguing. The prokaryotes living under moderate temperature condition are constantly used in the studies of temperature adaptation by applying an environmental stress and checking the gene expression profiles.

Abbreviations: DEGs, differentially expressed genes; CAI, Codon Adaptation Index; OGT, Optimal Growth Temperature; COG, Clusters Of Orthologous Groups of proteins. ⁎ Corresponding author. E-mail address: [email protected] (B.-G. Ma).

http://dx.doi.org/10.1016/j.gene.2014.08.049 0378-1119/© 2014 Elsevier B.V. All rights reserved.

Gene expression is a fairly complex process that is regulated at multiple levels by different mechanisms. According to the central dogma of molecular biology (Crick, 1970; Koonin, 2012), gene expression is a flow of information from genetically encoded DNA to the messenger RNA to polypeptide and finally to the folded protein. Both the transcription and translation processes are involved in the regulation of gene expression. Affected by the external environments, the expression patterns of mRNA and proteins are determined by the intrinsic genetic information encoded in the DNA sequences of a bacterium genome. The sequencing technology has helped in accumulation of thousands prokaryotic genomes and the gradual maturity and declined cost of high-throughput approaches for the detection of gene transcription and translation levels have resulted in numerous transcriptomes and proteomes of bacteria under heat stress conditions (Barreiro et al., 2009; Chhabra et al., 2006; Fleury et al., 2009; Gunasekera et al., 2008; Han et al., 2007; Jain et al., 2011; Koide et al., 2006; Lenco et al., 2009; Luders et al., 2009; Ribeiro et al., 2011; Ternan et al., 2012; van der Veen et al., 2007). Although the host organisms of these transcriptomes and proteomes are diverse in life styles, the common building blocks for the macromolecules (namely nucleotides and amino acids) and the same physical factor of temperature may confer commonality of the gene regulation mechanisms for adaptation to heat stress. The previous studies of the adaptation and regulation mechanisms of gene expression were concentrated on the

H. Xiong et al. / Gene 551 (2014) 92–102

93

regulated genes. Codon usage and Codon Adaptation Index (CAI) values were calculated by using the ‘cusp’ and ‘cai’ program in the EMBOSS software package with ribosomal proteins as the reference of highly expressed proteins following the standard CAI procedure (Sharp and Li, 1987). The protein secondary structures were predicted by locally running the PSIpred program (version 3.3) (McGuffin et al., 2000) on a Linux platform with default parameters. In addition to the differentially expressed proteins, the same amount of proteins whose expression levels do not vary in the heat stress condition were randomly sampled for each proteome and used as contrast. The protein secondary structure contents were calculated as the fractions of C (coil), H (helix) and E (strand). The protein folding rates were predicted by the seqRate server (Lin et al., 2010) with “Fold Type” set as ‘Unknown’. The protein aggregation propensity was predicted by the Zyggregator online server (Tartaglia and Vendruscolo, 2010). In addition to the differentially expressed proteins, the same amount of proteins whose expression levels do not vary under the heat stress condition were randomly sampled from each proteome and used as contrast.

characterization of the expression profiles of individual organisms and little effort has been devoted to the investigation of the commonness of the differentially expressed genes and their regulation mechanisms from the viewpoint of intrinsic sequence features with respect to the consequences of high temperature. In this work, we try to answer the following questions pertaining to heat stress condition. Do the differentially expressed genes (DEGs) have peculiarity in their nucleotide and protein sequence features? Do upregulated and down-regulated genes under heat stress have different sequence features? Are these features common or conserved among different organisms? Meanwhile, we want to know the similarity and difference of the sequence features of these DEGs in response to heat stress and those of the genes in thermophilic organisms who live under high temperature environment perennially. We gathered the transcription and translation profiles under heat stress conditions reported in recent years and examined the sequence features along the gene expression information flow to reveal how the genotypic sequence determinants are interpreted into phenotypic responses to the elevated living temperatures. 2. Materials and Methods 2.1. Transcriptome and Proteome Data Under Heat Stress

2.4. Comparison and Statistics

The transcriptome data for 7 bacteria were collected from the literature and the basic information for this data set was listed in Table 1. The proteome data for 6 bacteria were collected from the literature and the basic information for this data set was presented in Table 2. The DNA and protein sequences were taken from NCBI (RefSeq).

For the examined features in the 7 transcriptomes and 6 proteomes, comparisons were performed between “up-regulated” and “downregulated” (up-down), “up-regulated” and “overall” (up-all), “downregulated” and “overall” (down-all) by using Wilcoxon rank sum tests, respectively, and the corresponding p-values were listed in Tables S1 and S2. Similar statistical comparisons were also made for the content sum of hydrophobic amino acids in the 6 thermophilic bacteria (Fig. S4) and the corresponding p-values were listed in Table S4. The significance of the difference was considered at three levels of p b 0.1, p b 0.05 and p b 0.001 that were marked on the figures with “*”, “**” and “***”, respectively.

2.2. Distribution of Differentially Expressed Genes in Operons The locations of the differentially expressed genes in the genome were determined according to the RefSeq annotation (Pruitt et al., 2012). The operon information was taken from the DOOR database (Mao et al., 2009). The distance between the starting position of a gene and the middle point of its corresponding operon was used to indicate the location: d = Si − M, where Si is the coordinate of the start of the considered gene and M is the coordinate of the middle point of the operon where the gene i is located. If d is negative, then the gene i is located at the first-half in the operon; otherwise, the second-half.

3. Results and Discussion By mining the literature, we have collected the expression data of 7 transcriptomes (Table 1) and 6 proteomes (Table 2) for bacteria under heat stress condition. The sequence features of the DEGs were investigated at multiple levels (Fig. 1) including gene length and distribution in operons, the nucleotide composition in mRNA sequence, the features of mRNA secondary structure, the codon usage in the coding sequences, the amino acid composition in the protein sequences, the contents of protein secondary structures and the predicted folding rates and aggregation propensity of the protein sequences. We examined the differences between the up-regulated and down-regulated genes as well as the differences between the DEGs and the overall genes of the whole genome by using Wilcoxon rank sum tests (p-values listed in Tables S1, S2). Comparison to thermophilic adaption was performed to reveal the similarity and dissimilarity of the sequence determinants for the response to heat stress and for the adaption to high habitat temperature.

2.3. Sequence Feature Calculation and Prediction The mRNA secondary structures were predicted by the RNAfold program in the Vienna RNA secondary structure package (version 1.8.5) (Arsene et al., 2000) with the folding temperature as the temperature of heat stress reported in the original literature and other parameters as the default. Codon frequencies in the mRNA sequence were counted and compared with the genomic frequencies taken from genomic tRNA database (Lowe and Eddy, 1997) to identify the deviation from the global genomic frequencies and the difference between up-regulated and downTable 1 Basic information for the 7 transcriptomes collected from literature. Species

Abbr.

Stress

Clostridium difficile 630 Escherichia coli K-12 Corynebacterium glutamicum ATCC 13032 Staphylococcus aureus ISP794 (NCTC8325) Listeria monocytogenes EGD-e Yersinia pestis strain 201 Xylella fastidiosa 9a5c

CD EC CG SA LM YP XF

Heat (41 Heat (43 Heat (40 Heat (43 Heat (48 Heat (45 Heat (40

a

The number of differentially expressed genes reported in the original literature.

°C) °C) °C) °C) °C) °C) °C)

Control

DEG no.a

Reference

37 30 30 37 37 37 29

337 221 778 89 147 573 438

Ternan et al. (2012) Gunasekera et al. (2008) Barreiro et al. (2009) Fleury et al. (2009) van der Veen et al. (2007) Han et al. (2007) Koide et al. (2006)

°C °C °C °C °C °C °C

94

H. Xiong et al. / Gene 551 (2014) 92–102

Table 2 Basic information for the 6 proteomes from literature. Species

Abbr.

Stress

Control

DEG No.a

Reference

Clostridium difficile 630 Escherichia coli K-12 Francisella tularensis ssp. holarctica live vaccine Francisella tularensis ssp. tularensis SCHU S4 Acidithiobacillus ferrooxidans Desulfovibrio vulgaris Hildenborough

CD EC FL FT AF DH

Heat (41 °C) Heat (47.5 °C) Heat (42 °C) Heat (42 °C) Heat (40 °C) Heat (50 °C)

37 37 37 37 30 37

49 28 72 32 30 32

Jain et al. (2011) Luders et al. (2009) Lenco et al. (2009) Lenco et al. (2009) Ribeiro et al. (2011) Chhabra et al. (2006)

a

°C °C °C °C °C °C

The number of differentially expressed genes (proteins) reported in the original literature.

3.1. Gene Length and Location in Operons For the DEGs under heat stress condition, we firstly checked their sequence length and made comparison between up-regulated and down regulated genes. As shown in Fig. S1, no consistent trend can be found among organisms at either transcription level or translation level. Meanwhile, we noticed that the sequence lengths of the DEGs seem longer than the genomic overall, especially for the differentially expressed proteins in the 6 proteomes (Fig. S1B). According to previous finding, the genes are shorter in (hyper)thermophiles than mesophiles (Bolshoy and Tatarinova, 2012; Das and Gerstein, 2000) and the evolution pressure tends to make highly expressed genes shorter (Moriyama and Powell, 1998), but the up-regulated genes are not shorter than down-regulated genes under heat stress condition and actually the differentially expressed proteins are longer than the proteomic average. For the DEGs in the 7 transcriptomes, we investigated their positions relative to the corresponding operons. We found that the up-regulated genes within long operons tend to appear at the first-half of their corresponding operons (Fig. 2), while for the down-regulated genes, it seems opposite but not as consistent (Fig. S2). It has been noticed recently that in Mycoplasma pneumoniae about half of the operons consisting of multiple genes show staircase-like decaying expression along them (Guell et al., 2009) and that the phototrophic bacterium Rhodobacter sphaeroides show polar expression patterns in response to singlet oxygen stress (Berghoff et al.,

2013). Our results confirmed this regulation mechanism in terms of operon direction and extended to more organisms under heat stress condition. Our results also showed that this effect is pronounced in the long operons, which is consistent with a recent report on the motility development in Bacillus subtilis where the expression was enhanced when a sigD gene is moved forward in the fla/che operon (Cozy and Kearns, 2010). These findings might mean that the genes located on the 5' end of the operon are of advantage to be transcribed probably due to their proximity to the promoters.

3.2. mRNA Sequence Features Inspired by a previous work where it was found that the purine content and successive purine content in the thermophilic bacteria are higher than those in the mesophilic bacteria (Zeldovich et al., 2007), we examined the purine contents (A% + G%) and successive purine contents (AG% + GA%) for the 7 transcriptomes. As shown in Fig. 3, the average purine contents and successive purine contents in the transcriptionally up-regulated genes are higher than those in the downregulated genes and for most of the transcriptomes the genomic (successive) purine contents are in the middle (Fig. 3A, B). Similarly, for the 6 proteomes, purine and successive purine contents are high in the mRNA of the up-regulated proteins and for most of the proteomes,

Fig. 1. Schematic map for this work. Both transcriptomes and proteomes under heat stress condition were used in the study. Multiple features were examined for the DNA, mRNA and protein sequences and comparisons were made between the up- and down-regulated genes, the differentially expressed genes and the genomic average, as well as the similarity and difference of sequence features between heat stress response and thermophilic adaptation.

H. Xiong et al. / Gene 551 (2014) 92–102

95

Fig. 2. Illustration for the distribution of up-regulated genes in operons. Each point in the graph represents a gene; the abscissa axis is the operon length; the vertical axis is the distance from the starting point of a gene and the middle point of its operon. Sub-graphs (A), (B), (C), (D), (E), (F), and (G) correspond to the 7 bacterial transcriptomes, respectively. The sub-graph (H) is the comparison of the number of the up-regulated genes distributed in the first-half and second-half positions in the long operons (length N genomic average). It can be found that the up-regulated genes are preferentially located in the first-half of the operon.

the average purine and successive purine contents of all the CDS regions are at the lowest level (Fig. 3C, D). Since mRNA is transcribed from DNA, the mRNA sequence features reflect on one hand the features of DNA, and on the other hand, the nucleotide composition also affects the stability of mRNA itself. As reported previously, the fact that thermophilic prokaryotes have enriched purine contents in the mRNA sequences can be attributed to the advantages of the speedy transcription rate, the lessening of undesired RNA– RNA interactions, and the stability of mRNA against spontaneous hydrolysis (Paz et al., 2004). The enriched successive purine contents can stabilize the mRNA spatial structure by stacking effects (Friedman and Honig, 1995). Our results showed the similarity of up-regulated genes at both transcription and translation level to the genes sourcing from thermophilic bacteria, implying that temperature is a unique physical factor that can have impact on both the genes natively living at high temperature and the genes from mesophilic organisms that are upregulated under heat stress condition. The similarity of the mRNA sequence features in the thermophilic genes and the genes up-regulated at heat stress condition suggests that the purine content gives the same advantage to the mRNA structure to adapt to the same physical factor temperature for keeping molecular stability and function. 3.3. Features of the Predicted mRNA Secondary Structures After transcribing from DNA, the mRNA molecules exist in particular secondary structures whose features such as the fraction of stem region, the G + C content in the stem region, the purine content in the loop region and the overall folding free energy could affect their structural thermostability and the easiness of opening the stem regions for protein translation (Katz and Burge, 2003). Firstly, we checked the mRNA secondary structure features for the 7 transcriptomes. As shown in Fig. 4, the percentage of stem region is higher in down-regulated genes (Fig. 4A) and the C + G content in the stem region is higher in the

down-regulated genes for most (5/7) of the studied organisms (Fig. 4B). The higher percentage of stem region and the higher G + C content in the stem region of the down-regulated genes imply that the transcriptionally down-regulated genes are more thermostable than up-regulated ones. The comparison of folding free energy (normalized by sequence length) indeed showed that the down-regulated genes have a lower folding energy (Fig. 4D), meaning a better thermostability in their secondary structures that may make it difficult to open the stem region for protein translation and consistent with the relationship between a protein's translation level and the difficulty of opening its corresponding mRNA structure reported in the previous work (Katz and Burge, 2003). However, the A + G content in the loop regions is higher in the up-regulated genes (Fig. 4C), indicating that the secondary structure of the up-regulated genes might be stabilized by the purine-stacking effects. For the protein translation level, the results were shown in Fig. 5. In contrast to the situation at transcription level, few significant differences could be found for the percentage of the stem region and the G + C content in the stem region between the up-regulated proteins and the down-regulated proteins (Fig. 5A, B). The folding free energy also shows no consistent difference between the up- and down-regulated proteins (Fig. 5D). Comparison of the A + G content in the loop region showed that the up-regulated proteins have higher values than genomic average for most organisms (Fig. 5C). This result indicates that the secondary structures of mRNA molecules of the upregulated proteins are mainly stabilized by the purine stacking effects in the single-strand loop region, which may decrease its degradation speed and increase its half-life, so as to benefit the translation and accumulation of the heat-responsive proteins and thus benefit the host for adaption to high temperature stress. Specially, the bacteria Clostridium difficile 630 has both transcription and translation data measured at the same heat stress condition (Ternan et al., 2012). The analysis for its expression showed that

96

H. Xiong et al. / Gene 551 (2014) 92–102

Fig. 3. The purine content (A% + G%) and successive purine content (ApG% + GpA%). (A), (B) for the 7 transcriptomes; (C), (D) for the 6 proteomes. Orange, blue and gray colors indicate the up-regulated, down-regulated and the genomic average, respectively. The significance levels of the differences were marked by “*” for p b 0.1, “**” for p b 0.05 and “***” for p b 0.001, respectively.

the genes up-regulated at transcription level do not have corresponding up-regulation at translation level, and even have opposite direction of regulation for some proteins. A certain degree of inconsistency of gene expression alterations between transcription level and translation level is a well-known phenomenon (Vogel and Marcotte, 2012) and is due to on one hand the experimental technical limitations and on the other hand the biological consequences. Except for the reported factors of post-transcriptional (and post-translational) modifications (Bevilacqua et al., 2003; Fukuda et al., 2006), it could also have contribution from the mRNA secondary structures. As we observed, for some genes (like CD1056, CD1054), though their transcription levels are up-regulated under heat stress condition, the stability of their mRNA structures is declined and thus the degradation is accelerated, resulting in lower translation levels. On the contrary, for some genes (like CD0900, CD2120), although their transcription level is down-regulated under heat stress condition, their mRNA structures are stabilized, which reduces the degradation rate and elongates the mRNA half-life and thus results in a constant or even higher level of translation. 3.4. The Codon Usage and Gene Expression Level Indicator We examined the codon usage in the up- and down-regulated genes under heat stress condition for the 6 proteomes and made comparison between the two classes of genes as well as between the DEGs with the genomic overall. We found that the codon usage in the DEGs indeed have different patterns compared with the genomic average (Table S3).

Analysis showed that the DEGs apt to use codons that have high abundance of the cognate tRNA, especially for the amino acids who have only two alternative codons, similar to the codon usage patterns of the ribosome proteins. The codon usage patterns of the differentially expressed proteins are characterized in two aspects: (i) they apt to adopt the codons able to rigorously pair with the cognate tRNA, especially for the amino acids with two or three alternative codons and more pronounced in the up-regulated proteins; (ii) the third codon position is more frequently cytosine if the first two codon positions are successive purines (Table S3). As previous works have reported that there is positive correlation between the proportion of highly expressed proteins and the Optimal Growth Temperature (OGT) of a bacterium and that the proteins with better stability incline to be highly expressed (Wang et al., 2008; Xu and Ma, 2007), we speculated that those up-regulated genes under heat stress condition might have a higher level of expression than the overall gene expression level of the whole genome. By calculating CAI (Codon Adaptation Index), an indicator of gene expression level, we found that the up-regulated genes under heat stress indeed have a higher average CAI value than the genomic overall (Fig. 6). Since the CAI values reflect the protein translation efficiency (Tuller et al., 2010), the higher CAI values of the differentially expressed proteins indicate the higher level of expression, especially for the up-regulated proteins. The fast translation of the up-regulated proteins will benefit the microbes to respond rapidly to the elevation of the environmental temperature under heat stress.

H. Xiong et al. / Gene 551 (2014) 92–102

97

Fig. 4. The mRNA secondary structure features for the 7 transcriptomes. (A) The percentage of the stem region; (B) the G + C content in the stem region; (C) the A + G content in the loop region; (D) the folding energy normalized by sequence length. Orange, blue and gray colors indicate the up-regulated, down-regulated and the genomic average, respectively. The significance levels of the differences were marked by “*” for p b 0.1, “**” for p b 0.05 and “***” for p b 0.001, respectively.

An interesting comparison was made between the DEGs and the default highly expressed ribosomal genes. On one hand, the codon usage pattern of the DEGs is similar to the ribosomal proteins: the Pearson correlation coefficients (R) between the codon usage pattern (indicated by a 59 dimensional vector, see Table S3) of the DEGs (both up- and downregulated) and that of the ribosomal genes are larger than those between the genomic overall and ribosomal genes (Table 3), which make the DEGs have high translation efficiency; on the other hand, the average mRNA folding energy of the DEGs are smaller than that of the ribosomal genes (Table 3), which is partially due to the length difference (data not shown), meaning that the mRNA secondary structures of the DEGs are more thermostable than the ribosomal genes. The similarity of codon usage pattern between DEGs and the ribosome genes offers DEGs higher translation efficiency while the difference in thermostability makes DEGs have longer half-lives so as to benefit protein accumulation. These two aspects help DEGs better fulfill the regulation function under a heat stress condition. 3.5. Amino Acid Sequence Features Amino acids themselves have different thermostability and previous works have given the following descending order Gly ~ Ala N Val ~ Leu N Phe N Ile N Tyr N Lys N His N Met N Thr N Ser N Trp N Glu N Asp N Gln ~ Asn ~ Arg ~ Cys (Wang et al., 2012). We checked the contents of the 2 most thermostable residues A, G and the 4 least thermostable residues N, C, Q and R in the up-regulated and down-regulated

proteins. As shown in Fig. 7, it can be found that the 2 most thermostable amino acids are enriched in the up-regulated proteins for most (5/6) of the studied organisms compared with the down-regulated proteins (Fig. 7A) (though the differences are not quite significant due to the small dataset), while for the 4 least thermostable amino acids (Fig. 7B) and the other 14 amino acids with middle-level thermostability (Fig. 7C), no universal trend can be found. Interestingly, the thermostable amino acids A and G show a trend of over-representation in both the up-regulated and down-regulated proteins compared with the proteomic average. For the up-regulated proteins, the increase of the contents of thermostable residues will stabilize the protein structures under heat stress so as to keep the protein activities at an elevated temperature. With reference to the Kyte and Doolittle's hydropathy scale (Kyte and Doolittle, 1982), the amino acids were classified into charged (D, E, K, R), hydrophobic (C, F, I, L, M, P, V, W) and polar (A, G, H, N, Q, S, T, Y) ones, and the contents of the amino acids in each class were examined. As shown in Fig. 7, the content sum of the charged residues is higher in the up-regulated proteins for most (5/6) of the studied organisms than the down-regulated proteins (Fig. 7D), while the content sum of the hydrophobic residues is higher in down-regulated proteins than the up-regulated proteins (Fig. 7E); in comparison with the proteomic average, both the up-regulated and down-regulated proteins have over-represented charged residues and under-represented hydrophobic residues (Fig. 7D, E); the polar residues show no consistent trend (Fig. 7F). The about results implicate the adaptation strategies to high

98

H. Xiong et al. / Gene 551 (2014) 92–102

Fig. 5. The mRNA secondary structure features for the 6 proteomes. (A) The percentage of the stem region; (B) the GC content in the stem region; (C) the A + G content in the loop region; (D) the folding energy normalized by sequence length. Orange, blue and gray colors indicate the up-regulated, down-regulated and the genomic average, respectively. The significance levels of the differences were marked by “*” for p b 0.1, “**” for p b 0.05 and “***” for p b 0.001, respectively.

Fig. 6. Comparison of CAI (Codon Adaptation Index) values for the genes from the 6 proteomes. Orange, blue and gray colors indicate the up-regulated, down-regulated and genomic average, respectively. The significance levels of the differences were marked by “**” for p b 0.05 and “***” for p b 0.001, respectively.

temperature by the adjustment of amino acid contents. Under heat stress condition, the higher contents of charged and thermostable residues will increase the thermostability of the up-regulated proteins and help in keeping their activities in front of high temperature for better regulation function, which could be regarded as a physical mechanism for gene expression regulation. It is interesting to compare the residue compositions in thermophilic proteins and the heat stress responsive proteins since they are all related to temperature adaptation. Previous work has revealed that thermophiles apt to use charged and hydrophobic residues to stabilize their protein structures by the formation of ion pairs (salt bridge) and higher packing density (Taylor and Vaisman, 2010; Zeldovich et al., 2007). In our work, we found that the differentially expressed proteins (both up- and down-regulated) have higher charged residues than the proteomic average and that the up-regulated proteins have higher charged residues than the down-regulated proteins (Fig. 7D), which shows similar trend to the situation in thermophilic proteins compared with mesophilic ones in the usage of charged residues for temperature adaptation. However, there are also differences between the heat stress responsive proteins and the thermophilic proteins in the usage of hydrophobic residues: the differentially expressed proteins (both upand down-regulated) have less hydrophobic residues than the proteomic average and the up-regulated proteins have less hydrophobic residues than the down-regulated proteins (Fig. 7E) while in contrast

H. Xiong et al. / Gene 551 (2014) 92–102 Table 3 The correlation coefficient (R) and mRNA folding energy (E) of the DEGs in the 6 proteomes. Species

EC CD FL FT AF DH

Correlation (R)

a

Energy (E)

b

Up

Down

Overall

Up

Down

Ribosome

0.89 0.85 0.90 0.94 0.91 0.94

0.91 0.80 0.89 0.86 0.85 0.95

0.56 0.72 0.87 0.87 0.84 0.88

−311.08 −250.94 −248.84 −244.39 −493.57 −332.94

−384.19 −200.48 −264.54 −198.85 −280.36 −370.95

−97.37 −68.13 −82.69 −84.00 −141.09 −114.67

a The Pearson correlation coefficients (R) of codon usage frequencies between the upregulated, down-regulated and genomic overall (genes) and the ribosomal genes from the 6 studied proteomes, respectively. b The predicted mRNA folding energy (average of the minimum) for the up-regulated, down-regulated and the ribosomal genes from the 6 studied proteomes, respectively.

thermophilic proteins have more hydrophobic residues compared with mesophilic proteins (Berezovsky et al., 2007; Ma et al., 2010; Zeldovich et al., 2007). How to explain these differences? We should notice that the comparisons of residue composition between thermophilic and mesophilic proteins in previous reports were performed on the whole proteome level while the residue compositions

99

under the heat stress condition were only for a special subset of proteins differentially expressed in response to temperature elevation that could have particular functions. Could the function requirements account for the different usage of hydrophobic residues? We checked the protein functions for the differentially expressed proteins and found that the functions are indeed skew to some particular COG functional groups, mainly O (posttranslational modification, protein turnover, chaperones), E (amino acid transport and metabolism) and C (energy production and conversion) (Fig. S3), which implies that the peculiarity of these functions might need lower contents of hydrophobic residues than the proteomic average rather than the requirement for high temperature adaptation. To see if this is the case, we examined the residue compositions for the homologues of these differentially expressed proteins taken from 6 thermophilic bacteria with OGT above 70 °C and found that indeed the hydrophobic residues are scarcer in these homologues than the proteomic average in the 6 bacteria (Fig. S4, Table S4). Therefore, we conclude that the lower contents of hydrophobic residues in the differentially expressed proteins are mainly determined by the protein function. As for why the up-regulated proteins have lower contents of hydrophobic residues than the down-regulated proteins, it is still not clear, but we noticed that more than half (57%) of the differentially expressed proteins are up-regulated ones and thus the up-regulated proteins

Fig. 7. Comparison of amino acid contents for the genes from the 6 proteomes. (A) The content sum of the 2 most thermostable amino acids alanine and glycine; (B) the content sum of the 4 least thermostable amino acids asparagine, cysteine, glutamine, and arginine; (C) the content sum of the other amino acids with middle-level thermostability; (D) the content sum of the charged amino acids; (E) the content sum of the hydrophobic amino acids; (F) the content sum of the polar residues. Orange, blue and gray colors represent the up-regulated, downregulated and proteomic overall, respectively. The significance levels of the differences were marked by “*” for p b 0.1, “**” for p b 0.05 and “***” for p b 0.001, respectively.

100

H. Xiong et al. / Gene 551 (2014) 92–102

3.6. The Predicted Protein Secondary Structures The secondary structure contents (fractions of helix, strand and coil) of proteins were examined for the 6 proteomes. As can be seen in Fig. 8, the differentially expressed proteins have a lower content of alpha-helix (H) while higher contents of beta-strand (E) and coil (C) than the proteomic average for most (5/6) of the organisms, but the differences in some organisms cannot pass the statistical tests due to the small dataset (Table S2). The comparison between up- and down-regulated proteins showed that the up-regulated proteins seem to have higher content of coil and lower content of helix than down-regulated proteins but the trend is not quite strong. This is not similar to the trend observed in the comparison of thermophilic and mesophilic proteins where higher contents of helix and strand were found in thermophiles (Szilagyi and Zavodszky, 2000; Taylor and Vaisman, 2010).

3.7. The Predicted Protein Folding Rate and Aggregation Propensity

contribute the larger component to the difference between the differentially expressed proteins and the proteomic average. When facing the same factor of temperature elevation, the temporarily up-regulated proteins under heat stress condition apt to use charged residues instead of polar and hydrophobic residues for structure stabilization while the thermophilic proteins permanently presenting at high temperature environment instead usually adopt both charged and hydrophobic residues for thermal adaptation; this difference in the adaptation mechanism to high temperature may reflect a kind of optimization in terms of physics and evolution for the smaller cost to keep thermostability and proper function that remains to be disclosed.

As what can be seen in Fig. 9A, the folding rates of the up-regulated protein are lower than those of the down-regulated proteins and the proteomic average for most of the proteomes, thought the differences cannot pass the statistical tests due to the small dataset. The slower folding of the up-regulated proteins is not a simple consequence of sequence length since there is no consistent and significant length difference between the up- and down-regulated proteins (Fig. S1), but might be partially attributed to the higher helix content in the up-regulated proteins (Fig. 8) because previous work has revealed a significant negative correlation between helix content and folding rate for multi-state folders (Huang et al., 2007) and indeed the up-regulated proteins are most (88%) multi-state folders. Another noticeable point is that the average folding rates of the differentially expressed proteins seem smaller than the proteomic average except for Francisella tularensis ssp. tularensis SCHU S4 (FT) (Fig. 9A), which might be attributed to the sequence length difference because the differentially expressed proteins are longer than the proteomic average (Fig. S1B) and protein length is a major determinant (negative correlation) of folding rate especially for multi-state folders (Galzitskaya et al., 2003; Ma et al., 2006, 2007) and indeed 82% of the differentially expressed proteins are multi-state folders. Meanwhile, the prediction of the aggregation propensity showed that the differentially expressed proteins have lower

Fig. 9. The predicted folding rate (A) and aggregation propensity (B) for the proteins in the 6 studied proteomes. The significance levels of the differences were marked by “*” for p b 0.1, and “**” for p b 0.05, respectively.

Fig. 10. The regulation mechanisms for the prokaryotic gene expression under heat stress. Heat stress from environment stimulates cellular response. Most known studies concentrated on the signal induced regulatory elements in individual organisms; the present work explored the roles of the gene sequence features in the determination of the gene expression level common among organisms. The findings obtained in this work revealed new determinants of bacterial gene expression level and complement the picture of prokaryotic gene regulation landscape.

Fig. 8. The protein secondary structure contents for the proteins from the 6 proteomes. C, H and E represent the contents of coil (C), helix (H) and beta strand (E), respectively; orange, blue and gray colors represent the up-regulated, down-regulated and genomic overall, respectively. The significance levels of the differences were marked by “*” for p b 0.1, “**” for p b 0.05 and “***” for p b 0.001, respectively.

H. Xiong et al. / Gene 551 (2014) 92–102

aggregation propensity than the proteomic average and that the upregulated proteins have lower aggregation propensity than the downregulated proteins (Fig. 9B). To summarize, on one hand, though the up-regulated proteins are slow folders, they can perform accurate folding with the assistance from the similarly up-regulated molecular chaperons at heat stress condition; on the other hand, the lower aggregation propensity of the up-regulated proteins could also compensate the disadvantage of slower folding to avoid misassembly and ensure correct folding into native conformations. 3.8. Regulation Mechanisms of Heat-stress Responsive Gene Expression Levels Bacteria live in an ever changing environment and the physiological factors like temperature, PH, oxygen concentration, osmotic pressure and availability of nutrients are all subject to change during their life span. Among these factors, temperature is the most significant and extensively studied one. Previous investigations have revealed substantial knowledge about how the bacteria respond to the elevated environmental temperature to keep a physiological steady state to survive heat stress. Numerous experiments have verified that the gene expression levels, namely, the abundance of mRNA and proteins, alter in response to temperature adjustment. To know what determine the up- and down-regulation of the responsive genes is indispensable to understanding the regulation mechanisms of prokaryotic gene expression levels under heat stress condition. Some aspects of the acquired regulation mechanisms are illustrated in Fig. 10 and summarized as follows: (I) positive regulation dependent on sigma factors, such as б32 in the Gram-negative bacterium Escherichia coli (Arsene et al., 2000; Yura and Nakahigashi, 1999) and бB in the Gram-positive bacterium Bacillus subtilis (Hecker et al., 1996), where the expression levels of the sigma factors increase with an elevated temperature due to a decreased degradation rate; (II) negative regulation dependent on CIRCE/HrcA alike regulation systems, such as the CIRCE/HrcA system in Bacillus subtilis (Zuber and Schumann, 1994) and the HAIR/HspR system in Streptomyces albus (Narberhaus, 1999), where the HrcA/HspR encoded proteins bind to a conserved part of the DNA sequences of the regulated genes to control the expression of these genes; (III) regulation dependent on the 5′ UTR region of mRNA sequence where the SD sequence is covered in the stem region of the mRNA secondary structure at low temperature and exposed to ribosome when temperature is high (Storz, 1999); (IV) regulation dependent on DNA topology because temperature can change the DNA supercoil structure to affect gene transcription efficiency (Lopez-Garcia and Forterre, 2000; Pruss and Drlica, 1989). Most of the previous studies investigated the features outside the gene sequences themselves. In our work, we focused on the sequence feature of the differentially expressed genes to see to what degree the gene sequence itself can influence the gene expression level and if the patterns of this influence are common among organisms. We found that the traits of the differentially expressed genes do have significant influence on their expression levels reflected at both transcription and translation layers and some of them are conserved in multiple organisms. Our results partly include the following: (I) the up-regulated genes tend to present in the first-half of the long operons; (II) the mRNA sequences have higher purine contents in the up-regulated genes than the down-regulated genes; (III) the mRNA secondary structures of the transcriptionally up-regulated genes have higher folding free energy while those of the translationally up-regulated genes have lower folding free energy; (IV) the differentially expressed genes under heat stress have similar codon usage pattern to the ribosomal genes but obviously longer sequence length; (V) the up-regulated proteins have enriched contents of thermostable amino acids than the downregulated proteins and the differentially expressed proteins have higher content of charged residues and lower content of hydrophobic residues than the proteomic average; (VI) the secondary structures of differentially expressed proteins show lower content of alpha-helix while higher

101

contents of beta-strand and coil; and (VII) the up-regulated proteins have smaller folding rates and lower aggregation propensity where the benefit of the latter can compensate the disadvantage of the former. These findings revealed new determinants of bacterial gene expression level under heat stress and complement the knowledge about the regulation mechanisms of prokaryotic gene expression in response to the change of environmental condition. Moreover, we compared the sequence determinants of heat induced differential gene expression to those of thermophilic adaptation and the similarity and dissimilarity found in the sequence patterns elucidated the complex landscape of gene expression related to the same physical factor of temperature. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.08.049. Conflict of Interest The authors declare that there is no conflict of interest. Acknowledgments Helpful discussion with Bao-Hai Hao and Hong-Rui Xu is highly appreciated.This work was supported by the National Natural Science Foundation of China (Grant 31100602), the National Basic Research Program of China (973 project, Grant 2013CB127103), Project J1103510 supported by NSFC, Projects 2010QC016, 2011PY070 and 2013JC009 supported by the Fundamental Research Funds for the Central Universities, and the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry of China. References Arsene, F., Tomoyasu, T., Bukau, B., 2000. The heat shock response of Escherichia coli. Int. J. Food Microbiol. 55, 3–9. Barreiro, C., Nakunst, D., Huser, A.T., de Paz, H.D., Kalinowski, J., et al., 2009. Microarray studies reveal a ‘differential response’ to moderate or severe heat shock of the HrcA- and HspR-dependent systems in Corynebacterium glutamicum. Microbiology 155, 359–372. Berezovsky, I.N., Shakhnovich, E.I., 2005. Physics and evolution of thermophilic adaptation. Proc. Natl. Acad. Sci. U. S. A. 102, 12742–12747. Berezovsky, I.N., Zeldovich, K.B., Shakhnovich, E.I., 2007. Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput. Biol. 3, e52. Berghoff, B.A., Konzer, A., Mank, N.N., Looso, M., Rische, T., et al., 2013. Integrative “omics”-approach discovers dynamic and regulatory features of bacterial stress responses. PLoS Genet. 9, e1003576. Bevilacqua, A., Ceriani, M.C., Capaccioli, S., Nicolin, A., 2003. Post-transcriptional regulation of gene expression by degradation of messenger RNAs. J. Cell. Physiol. 195, 356–372. Bolshoy, A., Tatarinova, T., 2012. Methods of combinatorial optimization to reveal factors affecting gene length. Bioinform. Biol. Insights 6, 317–327. Chhabra, S.R., He, Q., Huang, K.H., Gaucher, S.P., Alm, E.J., et al., 2006. Global analysis of heat shock response in Desulfovibrio vulgaris Hildenborough. J. Bacteriol. 188, 1817–1828. Cozy, L.M., Kearns, D.B., 2010. Gene position in a long operon governs motility development in Bacillus subtilis. Mol. Microbiol. 76, 273–285. Crick, F., 1970. Central dogma of molecular biology. Nature 227, 561–563. Das, R., Gerstein, M., 2000. The stability of thermophilic proteins: a study based on comprehensive genome comparison. Funct. Integr. Genomics 1, 76–88. Fleury, B., Kelley, W.L., Lew, D., Gotz, F., Proctor, R.A., et al., 2009. Transcriptomic and metabolic responses of Staphylococcus aureus exposed to supra-physiological temperatures. BMC Microbiol. 9, 76. Friedman, R.A., Honig, B., 1995. A free energy analysis of nucleic acid base stacking in aqueous solution. Biophys. J. 69, 1528–1535. Fukuda, H., Sano, N., Muto, S., Horikoshi, M., 2006. Simple histone acetylation plays a complex role in the regulation of gene expression. Brief. Funct. Genomics Proteomics 5, 190–208. Galzitskaya, O.V., Garbuzynskiy, S.O., Ivankov, D.N., Finkelstein, A.V., 2003. Chain length is the main determinant of the folding rate for proteins with three-state folding kinetics. Proteins 51, 162–166. Guell, M., van Noort, V., Yus, E., Chen, W.H., Leigh-Bell, J., et al., 2009. Transcriptome complexity in a genome-reduced bacterium. Science 326, 1268–1271. Gunasekera, T.S., Csonka, L.N., Paliy, O., 2008. Genome-wide transcriptional responses of Escherichia coli K-12 to continuous osmotic and heat stresses. J. Bacteriol. 190, 3712–3720. Han, Y., Qiu, J., Guo, Z., Gao, H., Song, Y., et al., 2007. Comparative transcriptomics in Yersinia pestis: a global view of environmental modulation of gene expression. BMC Microbiol. 7, 96.

102

H. Xiong et al. / Gene 551 (2014) 92–102

Hecker, M., Schumann, W., Volker, U., 1996. Heat-shock and general stress response in Bacillus subtilis. Mol. Microbiol. 19, 417–428. Hickey, D.A., Singer, G.A., 2004. Genomic and proteomic adaptations to growth at high temperature. Genome Biol. 5, 117. Huang, J.T., Cheng, J.P., Chen, H., 2007. Secondary structure length as a determinant of folding rate of proteins with two- and three-state kinetics. Proteins 67, 12–17. Jain, S., Graham, C., Graham, R.L., McMullan, G., Ternan, N.G., 2011. Quantitative proteomic analysis of the heat stress response in Clostridium difficile strain 630. J. Proteome Res. 10, 3880–3890. Katz, L., Burge, C.B., 2003. Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res. 13, 2042–2051. Koide, T., Vencio, R.Z., Gomes, S.L., 2006. Global gene expression analysis of the heat shock response in the phytopathogen Xylella fastidiosa. J. Bacteriol. 188, 5821–5830. Koonin, E.V., 2012. Does the central dogma still stand? Biol. Direct 7, 27. Kumar, S., Nussinov, R., 2001. How do thermophilic proteins deal with heat? Cell. Mol. Life Sci. 58, 1216–1233. Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. Lenco, J., Link, M., Tambor, V., Zakova, J., Cerveny, L., et al., 2009. iTRAQ quantitative analysis of Francisella tularensis ssp. holarctica live vaccine strain and Francisella tularensis ssp. tularensis SCHU S4 response to different temperatures and stationary phases of growth. Proteomics 9, 2875–2882. Lin, G.N., Wang, Z., Xu, D., Cheng, J., 2010. SeqRate: sequence-based protein folding type classification and rates prediction. BMC Bioinformatics 11 (Suppl. 3), S1. Lopez-Garcia, P., Forterre, P., 2000. DNA topology and the thermal stress response, a tale from mesophiles and hyperthermophiles. Bioessays 22, 738–746. Lowe, T.M., Eddy, S.R., 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964. Luders, S., Fallet, C., Franco-Lara, E., 2009. Proteome analysis of the Escherichia coli heat shock response under steady-state conditions. Proteome Sci. 7, 36. Ma, B.G., Guo, J.X., Zhang, H.Y., 2006. Direct correlation between proteins' folding rates and their amino acid compositions: an ab initio folding rate prediction. Proteins 65, 362–372. Ma, B.G., Chen, L.L., Zhang, H.Y., 2007. What determines protein folding type? An investigation of intrinsic structural properties and its implications for understanding folding mechanisms. J. Mol. Biol. 370, 439–448. Ma, B.G., Goncearenco, A., Berezovsky, I.N., 2010. Thermophilic adaptation of protein complexes inferred from proteomic homology modeling. Structure 18, 819–828. Mallik, S., Kundu, S., 2013. A comparison of structural and evolutionary attributes of Escherichia coli and Thermus thermophilus small ribosomal subunits: signatures of thermal adaptation. PLoS One 8, e69898. Mamonova, T.B., Glyakina, A.V., Galzitskaya, O.V., Kurnikova, M.G., 2013. Stability and rigidity/flexibility-two sides of the same coin? Biochim. Biophys. Acta 1834, 854–866. Mao, F., Dam, P., Chou, J., Olman, V., Xu, Y., 2009. DOOR: a database for prokaryotic operons. Nucleic Acids Res. 37, D459–D463. McGuffin, L.J., Bryson, K., Jones, D.T., 2000. The PSIPRED protein structure prediction server. Bioinformatics 16, 404–405. Moriyama, E.N., Powell, J.R., 1998. Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res. 26, 3188–3193. Narberhaus, F., 1999. Negative regulation of bacterial heat shock genes. Mol. Microbiol. 31, 1–8. Paz, A., Mester, D., Baca, I., Nevo, E., Korol, A., 2004. Adaptive role of increased frequency of polypurine tracts in mRNA sequences of thermophilic prokaryotes. Proc. Natl. Acad. Sci. U. S. A. 101, 2951–2956.

Pruitt, K.D., Tatusova, T., Brown, G.R., Maglott, D.R., 2012. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135. Pruss, G.J., Drlica, K., 1989. DNA supercoiling and prokaryotic transcription. Cell 56, 521–523. Ribeiro, D., Maretto, D., Nogueira, F.S., Silva, M., Campos, F.P., et al., 2011. Heat and phosphate starvation effects on the proteome, morphology and chemical composition of the biomining bacteria Acidithiobacillus ferrooxidans. World J. Microbiol. Biotechnol. 27, 1469–1479. Sabath, N., Ferrada, E., Barve, A., Wagner, A., 2013. Growth temperature and genome size in bacteria are negatively correlated, suggesting genomic streamlining during thermal adaptation. Genome Biol. Evol. 5, 966–977. Sharp, P.M., Li, W.H., 1987. The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295. Sterner, R., Liebl, W., 2001. Thermophilic adaptation of proteins. Crit. Rev. Biochem. Mol. Biol. 36, 39–106. Storz, G., 1999. An RNA thermometer. Genes Dev. 13, 633–636. Szilagyi, A., Zavodszky, P., 2000. Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure 8, 493–504. Tartaglia, G.G., Vendruscolo, M., 2010. Proteome-level interplay between folding and aggregation propensities of proteins. J. Mol. Biol. 402, 919–928. Taylor, T.J., Vaisman, I.I., 2010. Discrimination of thermophilic and mesophilic proteins. BMC Struct. Biol. 10 (Suppl. 1), S5. Ternan, N.G., Jain, S., Srivastava, M., McMullan, G., 2012. Comparative transcriptional analysis of clinically relevant heat stress response in Clostridium difficile strain 630. PLoS One 7, e42410. Tuller, T., Waldman, Y.Y., Kupiec, M., Ruppin, E., 2010. Translation efficiency is determined by both codon bias and folding energy. Proc. Natl. Acad. Sci. U. S. A. 107, 3645–3650. van der Veen, S., Hain, T., Wouters, J.A., Hossain, H., de Vos, W.M., et al., 2007. The heat-shock response of Listeria monocytogenes comprises genes involved in heat shock, cell division, cell wall synthesis, and the SOS response. Microbiology 153, 3593–3607. van Wolferen, M., Ajon, M., Driessen, A.J., Albers, S.V., 2013. How hyperthermophiles adapt to change their lives: DNA exchange in extreme conditions. Extremophiles 17, 545–563. Vogel, C., Marcotte, E.M., 2012. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 13, 227–232. Wang, J., Ma, B.G., Zhang, H.Y., Chen, L.L., Zhang, S.C., 2008. How does gene expression level contribute to thermophilic adaptation of prokaryotes? An exploration based on predictors. Gene 421, 32–36. Wang, S.Y., Cappellini, E., Zhang, H.Y., 2012. Why collagens best survived in fossils? Clues from amino acid thermal stability. Biochem. Biophys. Res. Commun. 422, 5–7. Xu, K., Ma, B.G., 2007. Comparative analysis of predicted gene expression among deep-sea genomes. Gene 397, 136–142. Yura, T., Nakahigashi, K., 1999. Regulation of the heat-shock response. Curr. Opin. Microbiol. 2, 153–158. Zeldovich, K.B., Berezovsky, I.N., Shakhnovich, E.I., 2007. Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput. Biol. 3, e5. Zuber, U., Schumann, W., 1994. CIRCE, a novel heat shock element involved in regulation of heat shock operon dnaK of Bacillus subtilis. J. Bacteriol. 176, 1359–1363.

Sequence determinants of prokaryotic gene expression level under heat stress.

Prokaryotic gene expression is environment-dependent and temperature plays an important role in shaping the gene expression profile. Revealing the reg...
2MB Sizes 0 Downloads 3 Views