Computational Biology and Chemistry 58 (2015) 93–103

Contents lists available at ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem

Research article

The functional landscape bound to the transcription factors of Escherichia coli K-12 Ernesto Pérez-Ruedaa,b,* , Silvia Tenorio-Salgadoa , Alejandro Huerta-Saqueroc , Yalbi I. Balderas-Martínezd, Gabriel Moreno-Hagelsiebe a Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62100, Mexico b Unidad Multidisciplinaria de Docencia e Investigación, Sisal, Facultad de Ciencias, Universidad Nacional Autónoma de México, Sisal, Yucatán, Mexico c Departamento de Bionanotecnología Centro de Nanociencias y Nanotecnología, Universidad Nacional Autonóma de México, Ensenada, Baja California, Mexico d Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico e Department of Biology, Wilfrid Laurier University, 75 University Ave. W., Waterloo, ON N2L 3C5, Canada

A R T I C L E I N F O

A B S T R A C T

Article history: Received 13 August 2014 Received in revised form 31 May 2015 Accepted 3 June 2015 Available online 6 June 2015

Motivated by the experimental evidences accumulated in the last ten years and based on information deposited in RegulonDB, literature look up, and sequence analysis, we analyze the repertoire of 304 DNAbinding Transcription factors (TFs) in Escherichia coli K-12. These regulators were grouped in 78 evolutionary families and are regulating almost half of the total genes in this bacterium. In structural terms, 60% of TFs are composed by two-domains, 30% are monodomain, and 10% three- and fourstructural domains. As previously noticed, the most abundant DNA-binding domain corresponds to the winged helix-turn-helix, with few alternative DNA-binding structures, resembling the hypothesis of successful protein structures with the emergence of new ones at low scales. In summary, we identified and described the characteristics associated to the DNA-binding TF in E. coli K-12. We also identified twelve functional modules based on a co-regulated gene matrix. Finally, diverse regulons were predicted based on direct associations between the TFs and potential regulated genes. This analysis should increase our knowledge about the gene regulation in the bacterium E. coli K-12, and provide more additional clues for comprehensive modelling of transcriptional regulatory networks in other bacteria. ã2015 Elsevier Ltd. All rights reserved.

Keywords: K-12 Transcription factors Bacteria Evolution Genomics

1. Introduction Escherichia coli K-12 substr. MG1655 represents one of the most important model organisms in biology. Its chromosome is composed of a 4.6 MB circular, negatively supercoiled DNA molecule that contains 4319 genes. An important element associated with gene expression in this bacterium corresponds to DNA-binding Transcription Factors (TFs). These proteins provide E. coli the ability to contend against environmental changes, by blocking (negative regulation) or allowing (positive regulation) the access of the RNA polymerase (RNAP) to the promoters, depending on the operator context and ligand-binding status (MartinezAntonio et al., 2006; Miroslavova and Busby, 2006; Wall et al.,

* Corresponding author at: Universidad Nacional Autónoma de México, Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Cuernavaca, Morelos 62210 Mexico. E-mail addresses: [email protected] (E. Pérez-Rueda), [email protected] (G. Moreno-Hagelsieb). http://dx.doi.org/10.1016/j.compbiolchem.2015.06.002 1476-9271/ ã 2015 Elsevier Ltd. All rights reserved.

2004). Previous analyses identified around 300 TFs that could regulate gene expression in this bacterium (Perez-Rueda and Collado-Vides, 2000; Madan Babu and Teichmann, 2003). Recently, around 180 well experimentally characterized TFs have been deposited in RegulonDB, a specialized database devoted to gene regulation in E. coli K-12 (Gama-Castro et al., 2008, 2011). Therefore, diverse questions associated to the repertoire of TFs arise, such as: how many new TFs can be identified by computational methods, ten years after the first attempt at describing the complete set of TFs in this bacterium? How many genes are regulated by these TFs? And how many genes could be associated with alternative regulatory mechanisms? In order to identify new regulatory proteins in E. coli K-12 and to elucidate the diverse regulatory functions associated to hypothetical TFs, structural and functional analyses were performed in this organism. We found a total of 304 TFs that could be regulating, around 50% of the total genes in this bacterium. We identified eight functional modules and new putative regulons; and finally, we

94

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

describe a global perspective of the functional and evolutionary role of these TFs.

2.4. Functional classes of the regulated genes

2. Material and methods

Regulated genes were classified according to their functional class based on COGs, Uniprot and genome annotations.

2.1. Identification of DNA-binding TFs

2.5. TF functional interactions

184 TFs experimentally characterized and deposited in RegulonDB (Gama-Castro et al., 2011) were used as seeds in BLASTP searches against the complete proteome of E. coli. E-values  1e-6 and a coverage of 70% were considered. In addition, TFs specifically associated to E. coli K-12 and deposited in DBD database, HAMAP (Lima et al., 2009), Superfamily DB (Wilson et al., 2009) and PFAM (Punta et al., 2011) databases were retrieved.

Functional interactions among E. coli TFs and other genes were predicted considering the Nebulon system (Janga et al., 2005). In brief, Nebulon considers and integrates four Genomic Context methods: (a) functional linkages among genes which fuse to form a single open reading frame in at least one other genome, i.e., gene fusion; (b) mutual information, measuring the coordinated presence or absence of pairs of genes across a set of nonredundant genomes, i.e., phylogenetic profiles; and the natural chromosomal association of bacterial genes in operons as detected by two alternative methods, namely (c) the tendency of genes forming operons to show small intergenic distances (Moreno-Hagelsieb and Collado-Vides, 2002; Salgado et al., 2000), and (d) conservation of gene order, in which a confidence value for each pair of adjacent genes in the same strand was used as indicator that those genes likely form an operon, as compared with the conservation of adjacent genes in opposite strands (Ermolaeva et al., 2001; Janga and Moreno-Hagelsieb, 2004).

2.2. Classification of TFs based on the DNA-binding domain Superfamily and family assignations were based on SUPERFAMILY annotations (Wilson et al., 2009), PFAM (Punta et al., 2011) and Conserved Domain Database (CDD) (Wilson et al., 2009; Punta et al., 2011; Gough and Chothia, 2002; Marchler-Bauer et al., 2007). 2.3. Paralogous genes 42 groups of paralogs defined by BLASTP (Altschul et al., 1990) comparisons with E-values  1e-6, and a coverage of at least 50% of any of the proteins in the alignment, were identified in the total set of TFs. See Supplementary Table S4.

3. Identification of functional modules In order to identify clusters of TFs functionally related, we constructed a corregulation matrix based on experimental data deposited in RegulonDB and from interactions obtained from Nebulon. In brief, the interactions profile of the ith position is

Fig. 1. Distribution of genes regulated by (a) negative autoregulated TFs; (b) positive autoregulated TFs; and (c) dual autoregulated TFs.

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

represented by vector xi, consisting of the total of genes regulated by one transcriptional regulator or the total of interacted genes identified by Nebulon. In a second step, this interaction matrix was converted into a symmetric matrix, where Pearson correlation coefficients (rst) were estimated for all pairs of vectors xs and xt (s = 1, 2, ..., M; t = 1, 2, ..., M). In a posterior step, two TFs were assigned into a candidate group if they were clustered together using a hierarchical clustering analysis, with an average linkage method, optimizing the leaf order, and Kendall's tau metric. The analysis was performed using the program Mev (http://www.tm4. org/mev). A cluster was considered as significant if they have a distance threshold value of 0.7. This criterion allowed us the identification of twelve clusters. Finally, all members of group were compared and analyzed in terms of their target genes. 3.1. Identification of regulons Nebulon was considered to identify potential regulons. In brief, this method integrates diverse approaches previously described and their thresholds can be adjusted to improve the confidence of the prediction. Therefore, we considered a threshold of 0.85 of confidence value between TFs and other genes, which might be target genes. Because we are interested in the identification of potential regulons, the Markov Cluster Algorithm (MCL) program was used to discriminate between direct and indirect interactions, using an inflation value of 1.8. i.e., those genes directly linked to the TFs from those associated by another gene, like a cascade of regulation. Finally, only direct interactions were considered in this work. 4. Results 4.1. Almost half of all genes in the bacterium E. coli K-12 are regulated by TFs Transcription factors (TFs) were defined as DNA-binding proteins needed to activate or repress the transcription of a gene, but are themselves neither part of the RNAP core nor of the Holoenzyme (Zhou and Yang, 2006). Therefore, sigma factors were not considered as TFs in this study. Based on the information deposited in RegulonDB, literature look up and sequence analysis, 184 TFs experimentally characterized and 120 TFs predicted, were collected. (Table S1). The 184 experimentally characterized TFs regulate a total of 1579 genes, which correspond to 36% of the total genes in E. coli K-12. This number of TFs does not contrast significantly with the original set of 159 TFs regulating 933 genes, described more than ten years ago (Perez-Rueda and Collado-Vides, 2000). Indeed, in the last decade only 25 new TFs have been characterized, such as NikR (GI:16131353) (Chivers and Sauer, 2000), YdeO (GI:16129458) (Ma et al., 2004) and YqhC (GI:90111526) (Lee et al., 2010). Thus, although there are more genes with regulatory mechanism described (1579 against 933 from the first version), there are no dramatic changes in the number of new TFs and target genes. Another interesting observation associated with the experimentally characterized TFs is the increasing of TFs that can activate and/or repress gene expression, 39% of them against 22% in the first version, and a slight decreasing in the proportion of activators (28% versus 35%) and repressor proteins (33% versus 43%). This finding suggests that with more experimental evidences the number of dual regulators has increased as a consequence of new DNAbinding sites and activities associated to them. In this regard, we found evidences of autoregulation for 112 out 184 TFs; from these, 4% are positively and negatively auto-regulated, 26% are only autoactivated and 70% are exclusively negatively auto-regulated (Fig. 1). From these results, two main results are evident, (a) that almost

95

81% of the total of genes regulated in E. coli K-12 are regulated by TFs with autoregulation mechanism, and (b) that negative autoregulation is the most predominant regulatory mechanism associated to TFs (McAdams and Arkin, 1997). In addition, we found a large proportion of TFs regulating other TFs, such as CRP that regulates the transcription of 52 regulators, reinforcing their role as global regulator. It is important to mention that the proportion of self-regulating TFs has not dramatically changed from previous reported data. These data indicate that autoregulation is a conserved property to exclude cross-transcriptional regulation and to preserve bacteria from over-expression of regulatory genes and their associated possibility of toxic effects, as suggested by Thieffry et al. (1998). Finally, it has been estimated that around 400 proteins is the optimum number of TFs in the E. coli genome (Thieffry et al., 1998). However, our current set of 304 TFs suggest that we have identified similar proportions of TFs as previous works (Perez-Rueda and Collado-Vides, 2000; Madan Babu and Teichmann, 2003), and that we are close to the total number of TFs in this bacterium. Therefore, the existence of alternative regulatory mechanisms already described in this organism, such as riboswitches, DNA-curvature or attenuation could influence gene expression where there is not evidence of regulation mediated by TFs. 4.2. TFs can be classified into families with similar evolutionary histories and common regulatory processes In general, the DNA-binding domains associated with TFs are ancient in evolution, and they have been proposed as derived from a relatively small set of folds (Aravind and Koonin, 1999). These domains have been used to classify TFs into families (Perez-Rueda et al., 2004). In this regard, the complete collection of TFs can be classified in 78 different evolutionary families based on PFAM, CDD, and Superfamily annotations (Wilson et al., 2009; Punta et al., 2011; Gough and Chothia, 2002; Marchler-Bauer et al., 2007). These families do not only vary on their number of members but also in the functions they are regulating. For instance, the most populated family of TFs identified in E. coli K-12 corresponds to LysR, a family where a large number of local regulators associated to amino acid biosynthesis have been described. This family contributes with 46 proteins (15%) of the total number of TFs. However, almost 50% of them do not have any experimental evidence. Recently, it has been proposed that this successful family is a consequence of large duplication events and lateral gene transfer (Maddocks and Oyston, 2008). A similar abundance of this family has been reported in other bacterial genomes, where the family contributes with around 24% of the TFs identified so far (Perez-Rueda et al., 2009). Two additional large families were also identified in E. coli K-12, AraC/XylS (26 proteins) and GntR (20 proteins). These families have been associated with a large diversity of regulated functions, such as carbon source assimilation, stress responses and nitrogen assimilation, among others (Ibarra et al., 2008). Additional large families (more than 10 members per family) include, LuxR, OmpR, LacI, EBP, TetR, DeoR and HTH_3. In contrast, the rest of the repertoire is integrated by small families with less than nine members each one, representing almost 40% of the collection, in which a large proportion of families exhibit only one member each, such as BirA, ArgR, and LexA families (Table S1). Interestingly, only two global regulators, Fis, and ArcA from the EBP and OmpR families respectively, are included in large families, whereas the other four global regulators were identified in small families (less than three family members each). In addition, we asked whether the families identified here are regulating genes involved in related metabolic functions. Based on the functional categories defined by the Cluster of Orthologous

96

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

Genes (COGs) to all genes in E. coli, we calculated the percentage of members for each family devoted to the regulation for one particular function. (Fig. S1). From this analysis, we found that regulators from the LysR family are regulating two main functional categories, those involved to inorganic transport and metabolism and amino acid transport, as it was previously suggested (PerezRueda and Collado-Vides, 2001). In addition, the MarR family contains proteins regulating carbohydrate transport and metabolism related-genes, whereas the LuxR, AraC, GntR, Ebp, OmpR families are mainly associated with the regulation of energy production and conversion processes related genes; whereas the IclR, LacI, DeoR, and MarR families regulate carbohydrate transport and metabolism; and finally the TetR family is associated with genes classified in “transcription category”. Taking all, this functional distribution suggest that TF families are regulating similar processes in the genome of E. coli, reinforcing the notion that protein families have similar evolutionary histories and are functionally consistent. 4.3. TF’s structural characterization shows a predominance of twodomain proteins and the Helix-turn-helix as the most abundant DNAbinding domain In order to evaluate the diversity of structural domains associated with the collection of TFs, the complete set of proteins was analyzed in terms of their domain organization based on the Superfamily and PFAM databases assignations. From this analysis we found that 80 out of 304 (26%) TFs were identified as monodomain proteins according to their Superfamily and PFAM assignations, where the DBD covers almost the whole length sequence, as occurs in the Tryptophan repressor, members of the Cold-shock family, H-NS, IHF and Fur families. In these TFs, the DBD includes a specific motif for ligand recognition such as occurs in members of the Fur and TrpR families, and/or the regulatory proteins do not require a ligand binding compound to bind DNA, as occurs in members of the H-NS and IHF families. 169 TFs that represent 55% of TFs are two domains proteins. These proteins exhibit a DBD and a Partner domain where the ligand binding and protein–protein interaction sites are located. 29 TFs (9.5%) exhibit three structural domains, where members of two families were included, EBP and AraC/XylS. Eight TFs (2.6%) exhibit four structural domains, where proteins belonging to the EBP and AraC/XylS families were also included; and finally, 17 (5.6%) TFs do not exhibit any domain assignation. From this data, it is evident that two domain proteins are the most abundant TFs, whereas three- and four domains are the less abundant, however these proteins belong to two main families, AraC/XylS and EBP. For a complete description of the protein domains see Supplementary Table S3. In order to evaluate the diversity of DNA-binding domains, all TFs were evaluated considering these structures. From this analysis, we found that the most abundant DNA-binding domain is the helix-turn-helix (HTH), identified in more than 80% of TFs, as it was previously reported (Perez-Rueda and Collado-Vides, 2000, 2001) (Fig. 2 and Tables S2 and S3). In low proportions, DNAbinding structures, such as IHF-like DNA binding proteins, ribbon– helix–helix and flagellar transcriptional activator FlhD structures were identified. An intriguing observation is the emergence of proteins traditionally not associated with DNA-binding, such as “aminopeptidase A/I” involved in the synthesis of carbamoyl phosphate, an intermediate to the pyrimidine nucleotide pathway (Minh et al., 2009) and RelE-like identified in the regulator RelE associated with toxin–antitoxin systems (Gotfredsen and Gerdes, 1998). In this regard, a hypothetical limit in the number and diversity of DNA-binding TFs can be proposed based on the number of TFs identified and the diversity of structural domains associated

Fig. 2. Distribution of DNA-binding domains of TFs in the repertoire of TFs. The winged HTH represents 36% of the total repertoire of DNA-binding domains, being the most abundant structure. The homeodomain is present at the second highest abundance, in 18% of the repertoire. In minor proportions occur alternative DNAbinding domains, such as the C-terminal effector domain of the bipartite response regulators (9.8%), the lambda-repressor DNA-binding domains (9.48%), the nucleic acid-binding proteins (2.9%) and the putative DNA-binding domain with 1.9%.

with DNA-binding, whereas new DNA-binding structures associated to TFs could emerge in few proportions (Itzkovitz et al., 2006). Altogether the diversity of DNA-binding domains, the number of TFs and the number of regulated genes, suggest that alternative regulatory mechanisms are defining gene expression in E. coli and that the combination with regulatory mechanisms mediated by TFs would increase the genetic versatility to contend against diverse environmental challenges. In this sense, supercoiling can be considered as another important source of regulation in E. coli, because it adjusts the basal levels of expression of a large diversity of genes directly (Hatfield and Benham, 2002) or indirectly, acting in combination with some TFs and sigma factors. In E. coli, nucleoid associated proteins play an important role in the DNA-supercoiling, such as IHF, Fis, HU, and H-NS expanding the regulatory response in a hierarchical manner. Indeed, 17% and 11% of the complete E. coli genome falls within the binding regions of H-NS and Fis, respectively (Kahramanoglou et al., 2010). 4.4. Gene duplication contributes to the diversity of transcription factors and their regulated genes In this section we evaluate the role of duplicated genes, or paralogs in the repertoire of TFs. In previous works, Teichmann and Babu (2004) proposed that the loss and gain of regulatory interactions may occur following the duplication of either a TF, or a target gene, or following the duplication of both a TF as well as a target gene. For instance, 42 groups of paralogs were identified in the complete repertoire of transcriptional regulators (Table 1). From these data diverse and interesting groups were found, such as the family GalR/LacI where proteins not defined as TFs were included, such as RbsB (GI:1790192) a protein described as transporter of D-ribose or the D-allose transporter subunit AlsB (85676841) that lack the DNA-binding domain or proteins from the family NagC that contains two kinase proteins with absence of the HTH, and LexA (GI:16131869) that is paralogous of UmuD (GI:1787431) that encodes for DNA polymerase V, subunit D. These groups of paralogs show that TFs have been recruited from previous proteins with alternative functions, such as is observed in the Ribose, allose transport, kinase and DNA-polymerase proteins (Table S4).

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

97

Table 1 Paralogs identified in the collection of TFs and their regulatory mechanisms. Columns are as follow: paralogous proteins, functional description, TF regulating the paralogous gene (positively + or negatively ), RNA elements associated, Sigma factors and supercoiling mechanism associated. Gene

Function

TF (+/ )

RNA elements

Sigma factors

Supercoiling

appY envY

DpiA( ); HNS( ) NA

NA NA

NA NA

NA NA

ydeO

Transcriptional activator Transcriptional activator of porin biosynthesis Transcriptional activator

EvgA(+)

NA

NA

adiY gadW

Transcriptional activator Transcriptional activator

NA NA

gadX

Transcriptional dual regulator

NA FNR( ); GadE(+); GadW( ); GadX( ); HNS ( ); PhoP(+);RutR( ); SdiA(+) FNR( ); GadE(+); GadW( ); HNS( ); RutR( ); GadX (+)

Sigma70; Sigma32 NA Sigma38

Antizyme RNA frameshifting stimulation element ( ), gadY

Sigma70; Sigma38

NA

acrB mdtB mdtC acrD acrF mdtF

multidrug efflux system protein multidrug efflux system, subunit B multidrug efflux system, subunit C aminoglycoside/multidrug efflux system multidrug efflux system protein multidrug transporter, RpoS-dependent

AcrR( ); MarA(+); PhoP( ); Rob(+); SoxS(+) BaeR(+); CpxR(+) BaeR(+); CpxR(+) BaeR(+); CpxR(+); EvgA(+) FNR(+) CRP( ); EvgA(+); GadE(+); GadW(+); GadX(+); YdeO(+)

NA NA NA NA NA NA

Sigma70 NA NA NA NA Sigma38

NA NA NA NA NA NA

acrA

NA

NA

Sigma70

NA

acrE mdtE

multidrug efflux system AcrR( ); MarA (+); PhoP( ); Rob(+); SoxS(+) cytoplasmic membrane lipoprotein multidrug resistance efflux transporter

NA NA

NA NA

NA NA

mdtA

multidrug efflux system, subunit A

FNR(+) CRP( ); EvgA(+); GadE(+); GadW(+); GadX(+); YdeO(+) BaeR(+); CpxR(+)

NA

NA

NA

yeaY slp ycdY dmsD

predicted lipoprotein outer membrane lipoprotein conserved protein twin-argninine leader-binding protein for DmsA and TorA

NA MarA( ) NA FNR(+)

NA NA NA

NA Sigma70 NA NA

NA NA NA NA

gadA

glutamate decarboxylase A, PLPdependent glutamate decarboxylase B, PLPdependent

ArcA(+); CRP( ); FNR( ); Fis( ); GadE(+); GadW ( ); GadX(+); HNS( ); TorR( ); AdiY (?); CRP( ); Fis( ); GadE(+); GadW( ); GadX(+)

NA

NA

NA

oxyS

Sigma70; Sigma38

Supercoiling

predicted DNA-binding transcriptional regulator (rscB)

MarA( );HNS( );GadW(+); GadX(+);GadE(+)

NA

NA

NA

DNA-binding transcriptional activator

HNS( ); PhoP (+), CRP( ); EvgA (+); GadE(+); GadW(+); GadX(+); YdeO(+)

NA

Sigma38

gadB

dctR gadE

NA

NA

In addition, we identified regulatory systems with a large number of duplications, both TFs and regulated genes, such as the gadXW system (Table 1 and Fig. 3). This regulatory system is associated with acid resistance, which is located in the “Acid Fitness Island” or AFI (Tramonti et al., 2008). The AFI contains 13 operons, among them slp-yhiF, hdeAB-yhiD, gadE-mtdEF, gadXW and gadAX (Tramonti et al., 2008; Kobayashi et al., 2006; Tucker et al., 2003). GadE, the master regulator, belongs to the LuxR-like family (Tucker et al., 2003; Masuda and Church, 2003), and at least nine regulators appear to converge on the gadE promoter to either activate or repress its transcription in response to different environmental conditions (Foster, 2004). In general, these finds highlight the evolutionary plasticity of the regulatory networks (Moreno-Hagelsieb and Jokic, 2012); not only as a result of the duplication of TF interactions on the regulatory network (Teichmann and Babu, 2004), but also as a result of the divergent effect of the TF interactions, activating or repressing the transcription of paralogous genes. Divergence of paralogous genes may include the modification or acquisition of new regulatory mechanisms, changes in gene dosage (Gu et al., 2002), subdivision of ancestral functions and the evolution of new functions (Conant and Wolfe, 2008).

4.5. Identification of functional clusters based on corregulated genes Previous analyses describing co-regulation as an important role in the regulatory network of E. coli K-12 have been discussed (Martinez-Antonio et al., 2003), showing that the interplay of TFs in a regulatory region will determine expression. In this regard, we asked whether using a corregulation matrix where the well-known and predicted regulatory proteins are included, functional clusters can be identified. Based on this approach, we identified 82 out of 304 (22%) TFs clustered in 12 relevant and functional modules (Fig. S2). These clusters include TFs belonging to the same evolutionary families and regulating genes involved in similar physiological functions, suggesting that in functional and evolutionary terms, those clusters are robust. In what follows, we describe the most relevant clusters identified. For a complete description of all clusters see Supplementary material Table S5. 4.6. Protection against organic solvents and antibiotics In this cluster eight TFs experimentally described were identified, SoxR, AcrR and EnvR; MprA and MarR; and MarA, Rob and SoxS (Fig. 4A). It is important to note that MarA, SoxS, SoxR, and Rob comprise the regulon involved in antibiotic and

98

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

Fig. 3. The system GadXW. This system GadXW represents one of the most amazing systems of duplicated paralogous genes, such as the TFs GadX, GadW and AdiY (AraC/XylS family of TFs); TorR, RcsB, PhoP and ArcA TFs (OmpR family); GadE and DctR; and GadA and GadB. The system involves 14 TFs, where five global regulators described in E. coli, named ArcA, Fis, Crp, Fnr and H-NS are finely defining the expression of GadX and GadW. Finally, GadX is also regualted by gadY an antizyme RNA frameshifting stimulation element. Nomenclature is as follows: In upper section are indicated global regulators (ArcA, Fis, Crp, Fnr, Hns). Round rectangles show four groups of paralogous proteins, where GadX, GadW and AdiY are included as only one group. Negative regulatory role is represented in black; green positive regulation in green and dual regulation in blue. For instance, gadA is positiviely regulated by GadX (green) but negatively regulated by GadW (black). Loop shows negative autoregulation for GadW and positive for GadX. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

superoxide resistance, in response to nitric oxide and antibiotics (Fabrega et al., 2009). SoxR induces soxS expression, and SoxS, in turn, activates transcription of diverse genes in the regulon (Demple, 1996). In addition, MarR plays an important role in the control of resistance to multiple antibiotics, organic solvents, household disinfectants, and oxidative stress agents, among others. MarR is part of the marRAB operon and negatively autoregulates its gene expression. The marA gene encodes a transcriptional activator that activates expression of the marRAB operon and that regulates the expression of a global network of at least 40 chromosomal genes (Martin and Rosner, 2002). Finally, EnvR, represses the transcription of genes encoding a drug efflux pump that has a role in resistance to antibiotics (Hirakawa et al., 2008), whereas the “acriflavine resistance regulator”, AcrR, regulates the expression of genes involved in multidrug transport (Su et al., 2007). In summary, this cluster contains TFs involved in resistance against diverse external conditions and compounds, such as oxidative stress and heavy metals and antibiotics, suggesting that they are coordinating common regulatory processes. 4.7. Global regulators are clustered together This cluster includes most of the global regulators described in E. coli, such as Fnr and Crp, ArcA, Fis, IHF, and H-NS (MartinezAntonio et al., 2003), but also other TFs described as potential global regulators, such as FruR (Ow et al., 2007), Fur, and PdhR (Fig. 4B). In this regard, FruR, that regulates around 67 genes in E. coli, has been considered as a global regulator modulating the direction of carbon flow through the different metabolic pathways of energy metabolism such as those genes encoding

gluconeogenic, and glyoxylate shunt enzymes, but in an independent way of CRP (Shimada et al., 2010). The Fur regulator has been also considered as a global regulator associated with ferric assimilation (Lee and Helmann, 2007). This regulator is involved in the regulation of around 100 different genes in E. coli (GamaCastro et al., 2008; Gama-Castro et al., 2011). Finally, PdhR, “Pyruvate Dehydrogenase complex Regulator,” regulates around 40 genes involved in the pyruvate dehydrogenase complex. This protein has been proposed as a master regulator for the formation of the PDH complex and the respiratory electron transport system (Gohler et al., 2011; Ogasawara et al., 2007a). Therefore, in this cluster well-defined global regulators and some TFs that could also be considered as master regulators were included, indeed these global TFs are regulating around 60% of the total genes described in the known regulatory network in this bacterium. 4.8. TFs regulating sugar carbon catabolism related genes are clustered together In this cluster, four TFs belonging to the GalR/LacI family were included, such as YcjW that was identified in a screen for genes that reduce the lethal effects of stress (Han et al., 2010), GalS and GalR that repress transcription of operons involved in transport and catabolism of D-galactose (Semsey et al., 2007), and EbgR that represses genes involved in the beta-galactosidase system, which constitutes an alternative for lactose utilization in cells with mutations on the lacZ gene (Swint-Kruse and Matthews, 2009). Another TF included in this cluster corresponds to the Maltose regulator, MalT, belonging to the LuxR family that activates transcription of several genes and operons involved in maltose catabolism and transport (Boos and Bohm, 2000), whereas the HU

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

99

Fig. 4. Hierarchical clustering analysis performed with an average linkage method, optimizing the leaf order, and Kendall’s tau metric. Four clusters are shown. For a complete description of additional clusters see Supplementary material.

protein, for Heat Unstable protein, is a small DNA-binding protein considered as a global regulatory protein, which plays an important role in nucleoid organization and regulation (Oberto et al., 2009). An intriguing result was PhoB (OmpR family) included in this cluster. This regulatory protein has been involved in phosphorous uptake and metabolism (Baek and Lee, 2007) and glycerol phosphate metabolism (Baek and Lee, 2006). Finally, two hypothetical proteins, YagI and YgbI from the IclR and DeoR families respectively were included in this cluster, however not clear functions were identified so far (Fig. 4C). 4.9. TFs regulating curli assembly, and biofilm formation components were clustered together RstA and CpxR from the OmpR family, MlrA (MerR), and CsgD (LuxR/UhpA) TFs are included in this cluster (Fig. 4D). These four regulatory proteins are associated with common physiological processes, as cell surface-associated structures, such as CsgD and MlrA. Indeed, CsgD participates in the control of biofilm formation in E. coli by controlling the production of curli fimbriae and other biofilm components (Brown et al., 2001). In concert with its regulatory role, the promoter for the csgD operon is under the

control of more than 10 different TFs, each monitoring a different condition in stressful environments, such as RstA, OmpR and CpxR, included in this group, and recently MlrA (Ogasawara et al., 2010), that has been suggested to participate in control of curli formation. Therefore, the possible interplay between these activators allows E. coli to respond in a more efficient fashion to diverse stressful conditions. In addition, RstA also control genes involved in acid tolerance, and anaerobic respiration, among other processes (Ogasawara et al., 2007b). CpxR prevents a variety of extracytoplasmic protein-mediated toxicities (Batchelor et al., 2005). Finally, MlrA is associated to control curli formation (Ogasawara et al., 2010). In summary, this cluster reflects functional commonalities among the TFs identified, in particular regulation of biofilm components production. 4.10. Regulons inferred In this section we discussed some of the putative regulons deduced by predicted associations between TFs and other genes, which might be target genes, using a clustering program (MCL) with a threshold of 0.85 of confidence value (Hu et al., 2009; Janga et al., 2010). We only considered those direct interactions as

100

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

described in methods. Predicted interactions based on the total TFs involved 2404 genes, whereas the experimentally-based network comprises 1578 genes. The intersection corresponds to 1062 genes or 67% of the known network deposited in RegulonDB. Although we found a good consistency between the number of associated predictions and known regulons, further analyses are required to corroborate some of these associations. In the follow section, we describe some of the most representative groups of regulons identified in the predicted TFs. A complete list of TFs and their regulons can be accessed as supplementary material (Table S6).

an outer membrane factor is involved in the copper and silver ions detoxification in E. coli K-12 as part of the cusCFBA copper/silver efflux system. AcrE, a periplasmic lipoprotein associated with the drug efflux system with AcrF and TolC outer membrane protein (Elkins and Nikaido, 2003). Another protein included here is AcrB, a proton–substrate antiporter involved in the AcrAB/TolC multidrug–efflux complex, linking electrochemical-gradient energy to the efflux of drugs from the cytoplasm. Finally, YbhG described as a hypothetical protein, was included in this regulon. In summary, it is probable that PerR is involved in transport mechanisms as consequence of detoxification processes.

4.11. Ethanolamine utilization system (Eut) 5. Discussion In general, many bacteria use diols, such as 1,2-propanediol, or their substituted analogs, such as ethanolamine as carbon and/or nitrogen source. In this work, we identified the regulon eut in E. coli K-12 (Table 2 and Fig. 5A and B). In E. coli, EutR was identified based on sequence analysis and 17 genes associated with ethanolamine metabolism, except one of them (puuP) associated to putrescine metabolism (Tsoy et al., 2009), were identified as members of this system (Table 2). EutR could also respond to cobalamin and ethanolamine, based on sequence comparison against EutR of Salmonella typhimurium. It is plausible that E. coli uses a conserved upstream region, previously identified as control element to coordinate production of the eutBC system and simultaneous synthesis or import of its cobalamin cofactor. Such coregulation may be achieved if EutR indeed serves as a sensor of both compounds and if its activated form upregulates the expression of both eut operons. 4.12. PerR regulon PerR, a member of the LysR family, has been described as responsible of oxidative stress response in diverse bacteria, such as Staphylococcus, Cyanobacteria, and Bacillales. We found that PerR is intimately associated with the hypothetical regulator YbiH from the family TetR/AcrR, i.e. they could be corregulators of the oxidative stress response system (Table 3 and Fig. 5C and D). Therefore, seven proteins associated with transport were also identified as related to the system. YbhF, YbhR and YbhS members of the ABC Superfamily of transporters (Saurin et al., 1994). CusC,

The compilation and analysis of regulatory elements in E. coli led us to understand the regulatory network organization of this bacterium. Despite the fact that TFs are the most extensively used element in regulatory networks, the extended repertoire of other regulatory mechanisms has resulted in a significant increase in the versatility of the network, accurately modulating the organism’s gene expression. It is interesting to note that the repertoire of TFs in this bacterium has slightly increased in the last decade, suggesting that we are close to the probable limit of regulation associated with this class of proteins. We also found diverse TFs and regulated genes duplicated, a recurrent process of the growth of regulatory networks, with posterior divergence in the effect of the TF interactions. An intriguing observation is the fact that orthologous distribution associated to TFs shows that predicted TFs are constrained to few organisms whereas known TFs exhibit a large taxonomical distribution (see Supplementary Fig. S3). In addition, we found a large proportion of TFs co-regulating similar functions and when they are analyzed together with the uncharacterized TFs, their coupling are consistent as robust clusters of similar functions. Finally, global regulators were found clustered together, suggesting functional and evolutionary relationships. Altogether, this analysis provides new clues about E. coli genetic regulation network that can be expanded to other organisms.

Table 2 Genes associated to the eut regulon. Gene name, Functional description, COG id and COG description. Gene

Function

COG number

COG description

eutB eutC eutL eutK

ethanolamine ammonia-lyase, large subunit, heavy chain ethanolamine ammonia-lyase, small subunit (light chain) Carboxysome structural protein with predicted role in ethanolamine utilization Carbon dioxide concentrating mechanism/carbozysome shell protein

COG4303 COG4302 COG4816 COG4577

eutR eutS eutP eutQ eutT eutD eutM

AraC-type DNA-binding transcriptional regulator predicted carboxysome structural protein with predicted role in ethanol utilization conserved protein with nucleoside triphosphate hydrolase domain Ethanolamine utilization protein predicted cobalamin adenosyltransferase in ethanolamine utilization predicted phosphotransacetylase subunit Carbon dioxide concentrating mechanism/carbozysome shell protein

COG2207 COG4810 COG4917 COG4766 COG4812 COG0280 COG4577

eutN

Carbon dioxide concentrating mechanism/carbozysome shell protein

COG4576

eutE eutJ eutG eutH eutA

NAD-dependent aldehyde dehydrogenase, ethanolamine utilization protein Predicted chaperonin, ethanolamine utilization protein Alcohol dehydrogenase class IV. Ethanolamine utilization Predicted inner membrane protein. Ethanolamine utilization Ethanolamine utilization protein, possible chaperonin protecting lyase from inhibition Amino acid transporters. Putrescine importer

COG1012 COG4820 COG1454 COG3192 COG4819

Amino acid transport and metabolism Amino acid transport and metabolism Amino acid transport and metabolism Secondary metabolites biosynthesis, transport and catabolism Transcription Amino acid transport and metabolism Amino acid transport and metabolism Amino acid transport and metabolism Amino acid transport and metabolism Energy production and conversion Secondary metabolites biosynthesis, transport and catabolism Secondary metabolites biosynthesis, transport and catabolism Amino acid transport and metabolism Amino acid transport and metabolism Energy production and conversion Amino acid transport and metabolism Amino acid transport and metabolism

COG0531

Amino acid transport and metabolism

puuP

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

101

Fig. 5. (A) EutR regulon based on genomic context methods and represented as a network. (B) All genes are organized in five different operons and colored according their COG functional category. (C) PerR regulon, (D) genes are clustered in five operons.

Table 3 Genes associated to the perR regulon. Gene name, Functional description, COG id and COG description. Gene

Function

COG number

COG description

perR ybiH ybhF ybhG ybhS ybhR cusC acrE acrF acrAa acrB

CP4-6 prophage; predicted DNA-binding transcriptional regulator DNA-binding transcriptional regulator Transporter subunits of ABC Superfamily: ATP-binding components membrane fusion protein (MFP) component of efflux pump, membrane anchor transporter subunit: membrane component of ABC Superfamily transporter subunit: membrane component of ABC Superfamily copper/silver efflux system, outer membrane component cytoplasmic membrane lipoprotein Multidrug efflux system protein Multidrug efflux system protein Multidrug efflux system protein

COG0583 COG1309 COG1131 COG0845 COG0842 COG0842 COG1538 COG0845 COG0841 COG0845 COG0841

Transcription Transcription Defense mechanisms Cell/wall/membrane/envelope biogenesis Defense mechanisms Defense mechanisms cell/wall/membrane/envelope biogenesis cell/wall/membrane/envelope biogenesis Defense mechanisms Defense mechanisms Defense mechanisms

a

acrA was associated to the regulator YcFQ.

Acknowledgments

References

Work reported here was completed during a sabbatical leave of absence supported by CONACYT Fellowship (165772) to E.P-R. Support from DGAPA-UNAM (IN-204714) and CONACYT (155116) is gratefully acknowledged.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Aravind, L., Koonin, E.V., 1999. DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res. 27, 4658–4670. Baek, J.H., Lee, S.Y., 2006. Novel gene members in the Pho regulon of Escherichia coli. FEMS Microbiol. Lett. 264, 104–109. Baek, J.H., Lee, S.Y., 2007. Transcriptome analysis of phosphate starvation response in Escherichia coli. J. Microbiol. Biotechnol. 17, 244–252. Batchelor, E., Walthers, D., Kenney, L.J., Goulian, M., 2005. The Escherichia coli CpxA– CpxR envelope stress response system regulates expression of the porins ompF and ompC. J. Bacteriol. 187, 5723–5731. Boos, W., Bohm, A., 2000. Learning new tricks from an old dog: MalT of the Escherichia coli maltose system is part of a complex regulatory network. Trends Genet. 16, 404–409.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.compbiolchem. 2015.06.002.

102

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103

Brown, P.K., Dozois, C.M., Nickerson, C.A., Zuppardo, A., Terlonge, J., et al., 2001. MlrA, a novel regulator of curli (AgF) and extracellular matrix synthesis by Escherichia coli and Salmonella enterica serovar Typhimurium. Mol. Microbiol. 41, 349–363. Chivers, P.T., Sauer, R.T., 2000. Regulation of high affinity nickel uptake in bacteria. Ni2+-dependent interaction of NikR with wild-type and mutant operator sites. J. Biol. Chem. 275, 19735–19741. Conant, G.C., Wolfe, K.H., 2008. Turning a hobby into a job: how duplicated genes find new functions. Nat. Rev. Genet. 9, 938–950. Demple, B., 1996. Redox signaling and gene control in the Escherichia coli soxRS oxidative stress regulon–a review. Gene 179, 53–57. Elkins, C.A., Nikaido, H., 2003. Chimeric analysis of AcrA function reveals the importance of its C-terminal domain in its interaction with the AcrB multidrug efflux pump. J. Bacteriol. 185, 5349–5356. Ermolaeva, M.D., White, O., Salzberg, S.L., 2001. Prediction of operons in microbial genomes. Nucleic Acids Res. 29, 1216–1221. Fabrega, A., Martin, R.G., Rosner, J.L., Tavio, M.M., Vila, J., 2009. Constitutive SoxS expression in a fluoroquinolone-resistant strain with a truncated SoxR protein and identification of a new member of the marA-soxS-rob regulon, mdtG. Antimicrob. Agents Chemother. 54, 1218–1225. Foster, J.W., 2004. Escherichia coli acid resistance: tales of an amateur acidophile. Nat. Rev. Microbiol. 2, 898–907. Gama-Castro, S., Jimenez-Jacinto, V., Peralta-Gil, M., Santos-Zavaleta, A., PenalozaSpinola, M.I., et al., 2008. RegulonDB (version 6. 0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res. 36, D120–D124. Gama-Castro, S., Salgado, H., Peralta-Gil, M., Santos-Zavaleta, A., Muniz-Rascado, L., et al., 2011. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 39, D98–105. Gohler, A.K., Kokpinar, O., Schmidt-Heck, W., Geffers, R., Guthke, R., et al., 2011. More than just a metabolic regulator–elucidation and validation of new targets of PdhR in Escherichia coli. BMC Syst. Biol. 5, 197. Gotfredsen, M., Gerdes, K., 1998. The Escherichia coli relBE genes belong to a new toxin–antitoxin gene family. Mol. Microbiol. 29, 1065–1076. Gough, J., Chothia, C., 2002. SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30, 268–272. Gu, Z., Cavalcanti, A., Chen, F.C., Bouman, P., Li, W.H., 2002. Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol. Biol. Evol. 19, 256–262. Han, X., Dorsey-Oresto, A., Malik, M., Wang, J.Y., Drlica, K., et al., 2010. Escherichia coli genes that reduce the lethal effects of stress. BMC Microbiol. 10, 35. Hatfield, G.W., Benham, C.J., 2002. DNA topology-mediated control of global gene expression in Escherichia coli. Annu. Rev. Genet. 36, 175–203. Hirakawa, H., Takumi-Kobayashi, A., Theisen, U., Hirata, T., Nishino, K., et al., 2008. AcrS/EnvR represses expression of the acrAB multidrug efflux genes in E. coli. J. Bacteriol. 190, 6276–6279. Hu, P., Janga, S.C., Babu, M., Diaz-Mejia, J.J., Butland, G., et al., 2009. Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol. 7, e96. Ibarra, J.A., Perez-Rueda, E., Segovia, L., Puente, J.L., 2008. The DNA-binding domain as a functional indicator: the case of the AraC/XylS family of transcription factors. Genetica 133, 65–76. Itzkovitz, S., Tlusty, T., Alon, U., 2006. Coding limits on the number of transcription factors. BMC Genomics 7, 239. Janga, S.C., Moreno-Hagelsieb, G., 2004. Conservation of adjacency as evidence of paralogous operons. Nucleic Acids Res. 32, 5392–5397. Janga, S.C., Collado-Vides, J., Moreno-Hagelsieb, G., 2005. Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons. Nucleic Acids Res. 33, 2521–2530. Janga, S.C., Diaz-Mejia, J.J., Moreno-Hagelsieb, G., 2010. Network-based function prediction and interactomics: the case for metabolic enzymes. Metab. Eng. 13, 1–10. Kahramanoglou, C., Seshasayee, A.S., Prieto, A.I., Ibberson, D., Schmidt, S., et al., 2010. Direct and indirect effects of H-NS and Fis on global gene expression control in Escherichia coli. Nucleic Acids Res. 39, 2073–2091. Kobayashi, A., Hirakawa, H., Hirata, T., Nishino, K., Yamaguchi, A., 2006. Growth phase-dependent expression of drug exporters in Escherichia coli and its contribution to drug tolerance. J. Bacteriol. 188, 5693–5703. Lee, J.W., Helmann, J.D., 2007. Functional specialization within the Fur family of metalloregulators. Biometals 20, 485–499. Lee, C., Kim, I., Lee, J., Lee, K.L., Min, B., et al., 2010. Transcriptional activation of the aldehyde reductase YqhD by YqhC and its implication in glyoxal metabolism of Escherichia coli K-12. J. Bacteriol. 192, 4205–4214. Lima, T., Auchincloss, A.H., Coudert, E., Keller, G., Michoud, K., et al., 2009. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 37, D471–D478. Ma, Z., Masuda, N., Foster, J.W., 2004. Characterization of EvgAS-YdeO-GadE branched regulatory circuit governing glutamate-dependent acid resistance in Escherichia coli. J. Bacteriol. 186, 7378–7389. Madan Babu, M., Teichmann, S.A., 2003. Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res. 31, 1234–1244. Maddocks, S.E., Oyston, P.C., 2008. Structure and function of the LysR-type transcriptional regulator (LTTR) family proteins. Microbiology 154, 3609–3623.

Marchler-Bauer, A., Anderson, J.B., Derbyshire, M.K., DeWeese-Scott, C., Gonzales, N. R., et al., 2007. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 35, D237–D240. Martin, R.G., Rosner, J.L., 2002. Genomics of the marA/soxS/rob regulon of Escherichia coli: identification of directly activated promoters by application of molecular genetics and informatics to microarray data. Mol. Microbiol. 44, 1611–1624. Martinez-Antonio, A., Salgado, H., Gama-Castro, S., Gutierrez-Rios, R.M., JimenezJacinto, V., et al., 2003. Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach. Biotechnol. Bioeng. 84, 743–749. Martinez-Antonio, A., Janga, S.C., Salgado, H., Collado-Vides, J., 2006. Internalsensing machinery directs the activity of the regulatory network in Escherichia coli. Trends Microbiol. 14, 22–27. Masuda, N., Church, G.M., 2003. Regulatory network of acid resistance genes in Escherichia coli. Mol. Microbiol. 48, 699–712. McAdams, H.H., Arkin, A., 1997. Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. U. S. A. 94, 814–819. Minh, P.N., Devroede, N., Massant, J., Maes, D., Charlier, D., 2009. Insights into the architecture and stoichiometry of Escherichia coli PepA*DNA complexes involved in transcriptional control and site-specific DNA recombination by atomic force microscopy. Nucleic Acids Res. 37, 1463–1476. Miroslavova, N.S., Busby, S.J., 2006. Investigations of the modular structure of bacterial promoters. Biochem. Soc. Symp. 1–10. Moreno-Hagelsieb, G., Collado-Vides, J., 2002. A powerful non-homology method for the prediction of operons in prokaryotes. Bioinformatics 18 (Suppl. 1), S329– S336. Moreno-Hagelsieb, G., Jokic, P., 2012. The evolutionary dynamics of functional modules and the extraordinary plasticity of regulons: the Escherichia coli perspective. Nucleic Acids Res.. Oberto, J., Nabti, S., Jooste, V., Mignot, H., Rouviere-Yaniv, J., 2009. The HU regulon is composed of genes responding to anaerobiosis, acid stress, high osmolarity and SOS induction. PLoS One 4, e4367. Ogasawara, H., Ishida, Y., Yamada, K., Yamamoto, K., Ishihama, A., 2007a. PdhR (pyruvate dehydrogenase complex regulator) controls the respiratory electron transport system in Escherichia coli. J. Bacteriol. 189, 5534–5541. Ogasawara, H., Hasegawa, A., Kanda, E., Miki, T., Yamamoto, K., et al., 2007b. Genomic SELEX search for target promoters under the control of the PhoQPRstBA signal relay cascade. J. Bacteriol. 189, 4791–4799. Ogasawara, H., Yamamoto, K., Ishihama, A., 2010. Regulatory role of MlrA in transcription activation of csgD, the master regulator of biofilm formation in Escherichia coli. FEMS Microbiol. Lett. 312, 160–168. Ow, D.S., Lee, R.M., Nissom, P.M., Philp, R., Oh, S.K., et al., 2007. Inactivating FruR global regulator in plasmid-bearing Escherichia coli alters metabolic gene expression and improves growth rate. J Biotechnol 131, 261–269. Perez-Rueda, E., Collado-Vides, J., 2000. The repertoire of DNA-binding transcriptional regulators in E. coli K-12. Nucleic Acids Res. 28, 1838–1847. Perez-Rueda, E., Collado-Vides, J., 2001. Common history at the origin of the position-function correlation in transcriptional regulators in archaea and bacteria. J. Mol. Evol. 53, 172–179. Perez-Rueda, E., Collado-Vides, J., Segovia, L., 2004. Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput. Biol. Chem. 28, 341–350. Perez-Rueda, E., Janga, S.C., Martinez-Antonio, A., 2009. Scaling relationship in the gene content of transcriptional machinery in bacteria. Mol. Biosyst. 5, 1494– 1501. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., et al., 2011. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301. Salgado, H., Moreno-Hagelsieb, G., Smith, T.F., Collado-Vides, J., 2000. Operons in Escherichia coli: genomic analyses and predictions. Proc. Natl. Acad. Sci. U. S. A. 97, 6652–6657. Saurin, W., Koster, W., Dassa, E., 1994. Bacterial binding protein-dependent permeases: characterization of distinctive signatures for functionally related integral cytoplasmic membrane proteins. Mol. Microbiol. 12, 993–1004. Semsey, S., Krishna, S., Sneppen, K., Adhya, S., 2007. Signal integration in the galactose network of Escherichia coli. Mol. Microbiol. 65, 465–476. Shimada, T., Yamamoto, K., Ishihama, A., 2010. Novel members of the Cra regulon involved in carbon metabolism in Escherichia coli. J. Bacteriol. 193, 649–659. Su, C.C., Rutherford, D.J., Yu, E.W., 2007. Characterization of the multidrug efflux regulator AcrR from Escherichia coli. Biochem. Biophys. Res. Commun. 361, 85– 90. Swint-Kruse, L., Matthews, K.S., 2009. Allostery in the LacI/GalR family: variations on a theme. Curr. Opin. Microbiol. 12, 129–137. Teichmann, S.A., Babu, M.M., 2004. Gene regulatory network growth by duplication. Nat. Genet. 36, 492–496. Thieffry, D., Huerta, A.M., Perez-Rueda, E., Collado-Vides, J., 1998. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli. Bioessays 20, 433–440. Tramonti, A., De Canio, M., De Biase, D., 2008. GadX/GadW-dependent regulation of the Escherichia coli acid fitness island: transcriptional control at the gadY–gadW divergent promoters and identification of four novel 42bp GadX/GadW-specific binding sites. Mol. Microbiol. 70, 965–982. Tsoy, O., Ravcheev, D., Mushegian, A., 2009. Comparative genomics of ethanolamine utilization. J. Bacteriol. 191, 7157–7164. Tucker, D.L., Tucker, N., Ma, Z., Foster, J.W., Miranda, R.L., et al., 2003. Genes of the GadX–GadW regulon in Escherichia coli. J. Bacteriol. 185, 3190–3201.

E. Pérez-Rueda et al. / Computational Biology and Chemistry 58 (2015) 93–103 Wall, M.E., Hlavacek, W.S., Savageau, M.A., 2004. Design of gene circuits: lessons from bacteria. Nat. Rev. Genet. 5, 34–42. Wilson, D., Pethica, R., Zhou, Y., Talbot, C., Vogel, C., et al., 2009. SUPERFAMILYsophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386.

Zhou, D., Yang, R., 2006. Global analysis of gene transcription regulation in prokaryotes. Cell. Mol. Life Sci. 63, 2260–2290.

103

The functional landscape bound to the transcription factors of Escherichia coli K-12.

Motivated by the experimental evidences accumulated in the last ten years and based on information deposited in RegulonDB, literature look up, and seq...
2MB Sizes 2 Downloads 7 Views