Mutation Research 769 (2014) 69–79

Contents lists available at ScienceDirect

Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis journal homepage: www.elsevier.com/locate/molmut Community address: www.elsevier.com/locate/mutres

Conserved motifs of MutL proteins Michał Banasik, Paweł Sachadyn ∗ Gda´ nsk University of Technology, Microbiology Department, Gda´ nsk, Poland

a r t i c l e

i n f o

Article history: Received 14 April 2014 Received in revised form 16 July 2014 Accepted 22 July 2014 Available online 1 August 2014 Keywords: MutL Multiple sequence alignment MMR DNA repair MLH1 PMS2

a b s t r a c t The MutL protein is best known for its function in DNA mismatch repair (MMR). However, there is evidence to suggest that MutL is not only the linker connecting the functions of MutS and MutH in MMR, but that it also participates in other repair systems, such as Very Short Patch (VSP), Base Excision (BER) and Nucleotide Excision Repair (NER). This study set out to identify the most highly conserved amino acid sequence motifs in MutL proteins. We analyzed 208 MutL amino acid sequences of 199 representative prokaryotic species belonging to 28 classes of bacteria and archaea. The analysis revealed 16 conserved motifs situated in the ATPase and endonuclease domains, as well as within the disordered loop, and in the MutL regions interacting with the ␤ clamp of DNA polymerase III. The conserved sequence motifs thus determined constitute a structural definition of MutL and they may be used in site-directed mutagenesis studies. We found conserved residues within the potential regions where binding with MutS occurs. However, the existing data does not provide clues as to the possible sites of MutL interactions with the proteins involved in other DNA repair systems such as NER, BER and VSP. We determined the 57 most highly conserved amino acid residues, including 43 which were identical in all the sequences analyzed. The greater part of the most predominantly conserved amino acid residues identified in MutL are identical to the corresponding residues reported as mutational hot-spots in one of its human homologues, MLH1, but not in the other, PMS2. This is the first study to present the conserved sequence motifs of MutL widespread in bacteria and archaea and the classification of MutLs into five groups distinguished on the basis of differences in the C-terminal region. Our analysis is of use in better understanding MutL functions. © 2014 Published by Elsevier B.V.

1. Introduction Along with horizontal gene transfer and DNA recombination, DNA mutations are the main driving molecular mechanisms of evolution. However, their occurrence is often deleterious to the cell. Living organisms have thus developed a panel of DNA repair systems in order to maintain genome stability. The mismatch repair (MMR), nucleotide excision repair (NER), base excision repair (BER) and very short patch repair (VSP) systems play crucial roles in DNA repair in prokaryotic cells. Each of these systems has a different substrate specificity and different protein constituents; however, all of them share one common feature, the involvement of MutL [1]. They are all present in Escherichia coli, but not in all prokaryotic cells. The MMR system is directed toward mispaired and unpaired DNA bases. It is triggered by the recognition of DNA mismatch by MutS, followed by the joining of MutL. The resulting ternary

∗ Corresponding author. Tel.: +48 58 3471605; fax: +48 58 3471822. E-mail address: [email protected] (P. Sachadyn). http://dx.doi.org/10.1016/j.mrfmmm.2014.07.006 0027-5107/© 2014 Published by Elsevier B.V.

complex, mismatched DNA–MutS–MutL, is capable of interacting with the MutH endonuclease in recognizing hemi-methylated dGATC DNA sequences. The interaction between the ternary complex and MutH activates the latent endonuclease activity of the latter. The unmethylated, newly synthesized DNA strand is then incised by MutH on the 5 side of the nearest hemi-methylated dGATC sequence. The incision site is a substrate for the UvrD helicase responsible for the unwinding of the DNA helix. The nascent DNA strand separated from the parent one by UvrD is quickly digested by exonucleases. In E. coli, four enzymes participating in this step have been identified so far; the 3 –5 exonucleases ExoI and ExoX, and the 5 –3 RecJ and ExoVII [2]. The gap resulting from the exonucleolytic digestion is filled by the DNA polymerase III and the remaining nick is sealed by the DNA ligase [3]. In this DNA repair system, MutL is a factor connecting the activity of MutS and MutH which incites a sequence of MMR steps. If MutH is absent, as it is in the majority of organisms, its function is supported by the endonuclease domain of MutL [4]. In addition to MMR, there are three other DNA repair systems: BER, NER and VSP. BER is responsible for the removal of oxidative DNA damage, while the role of NER is to obliterate DNA bulky

70

M. Banasik, P. Sachadyn / Mutation Research 769 (2014) 69–79

Fig. 1. The interactions of MutL with the proteins engaged in the MMR, BER, NER and VSP DNA repair systems.

lesions distorting DNA structure. VSP repair is directed toward the T:G mismatches resulting from the hydrolytic deamination of 5methylcytosine. The main agent of the VSP repair system is the Vsr endonuclease recognizing and excising mismatched thymidine within the hemi-methylated 5 -dCTWGG/5 -dCCWGG DNA sequence. Although MutL is primarily known for its role in MMR, there is suggestive evidence to suggest, that besides this system, it also participates in BER, NER and VSP repair (Fig. 1). In the case of the VSP system, it is possible that MutL recruits the Vsr endonuclease to its target on the DNA. The structural significance of the interaction between MutL and the proteins belonging to the VSP system was confirmed through the use of the yeast two-hybrid assay [5] and analytical centrifugation [6]. The functional interaction between these two proteins was visualized in experiments in which MutL strengthened the Vsr activity toward a T/G mismatch located within the covalently closed DNA substrate [7]. Moreover, these two systems are definitely interrelated with one another, given that it has been proved that overproduction of the Vsr endonuclease in vivo leads to a higher level of T→C mutation occurrence within the CTWGG sequence owing to the competition of VSP with the MMR system proteins for repair of the T/G mismatches which are the consequence of the incorrect replication of the CTWGG sites [8]. The simultaneous overproduction of MutL (or MutS) alleviates this phenotype [9]. In the instance of BER, it has been suggested that MutL recruits the MutY protein to those mismatches which can be recognized by MutS but cannot, however, be repaired by the MMR system [10–13]. The sole evidence of the interaction between MutL and the BER proteins comes from co-immunoprecipitation experiments [11]. The NER system is responsible for the removal of more extensive DNA damage, such as thymine dimers. There are three proteins involved in the NER system; UvrA, UvrB and UvrC. UvrB interacts with UvrC, which, in turn, incises the damaged DNA strand. It has been shown that UvrB interacts with MutL in the presence of exogenous DNA damaging chemical agents such as 5-azacytidine. One of the hypothesis regarding the interaction between MMR and NER is based on the assumption that DNA damages which are well recognized by MutS but very poorly repaired by MMR, may be corrected by the recruitment of NER proteins by MutS via MutL [1]. Prokaryotic MutL proteins act as homodimers, whereas, in Eukaryotes, there are up to four MutL homologues which form three heterodimers, namely, MutL␣ (MLH1–PMS2; MLH1–PMS1 heterodimer in yeasts), which participates in MMR; MutL␥ (MLH1–MLH3 heterodimer), which is involved in meiotic recombination; and MutL␤ (MLH1–PMS1 heterodimer), the role of which is still not understood [1]. Although different from its prokaryotic counterpart at the detailed level, the eukaryotic MMR system, retains the same basic principles. Each monomer of prokaryotic MutL dimer consists of two domains, which are separated by a flexible, disordered linker that is unable to attain any secondary structure. While the MutL protein exhibits a high level of conservation within its N-terminal domain, no conservation has been reported for that of the Cterminal [14,15]. The amino acid motifs responsible for this activity

are located in the C-terminal domain of MutL [16,17]. To date, all efforts to obtain the crystal structure of the complete MutL protein have proved abortive. The primary cause of the problem is the presence of this flexible, disordered linker, in other words, 100 amino acid residues in the E. coli MutL [18] between the N- and C-terminal domains of the protein. The crystal structures of the Nand C-terminal domains of E. coli MutL have been solved [14,18]. The N-terminal domain of MutL contains ATPase subdomains, as well as the amino acid sequence motifs responsible for DNA binding, while the C-terminal domain includes the motifs participating in the MutL dimer formation. MutL is able to bind both dsDNA and ssDNA, but reveals no sequence specificity [19]. In the apo state, which is to say, without ATP bound within the N-terminal domain, only the C-terminal domains of MutL dimerize. After ATP binding, the N-terminal domains also dimerize, leading to the formation of the most compact structure assumed by MutL. After ATP hydrolysis, the interactions between the N-terminal domains of the dimer are partially disrupted, causing the emergence of a less compact structure. The subsequent release of ADP terminates the hydrolysis cycle and the N-terminal domain returns to its monomeric conformation. The significance of this ATP hydrolysis cycle is confirmed by the observation that the MutL mutants which are unable to bind and/or hydrolyze ATP are defective in the MMR system [20,21]. It seems that the MMR protein which attracts the most attention is MutS. However, among the mutations causing hereditary nonpolyposis colorectal cancer (HNPCC), 50% have been found within the MLH1 gene, the human homologue of mutL, and 39% in the MSH2 gene, the human homologue of mutS [22]. This observation indicates the relevance of MLH1 mutations in carcinogenesis and, indirectly, the importance of MutL in DNA repair [22] The main objective of this study was to determine the regions conserved among bacterial and archaeal MutL proteins. The primary structure of the MutL N-terminal domain, is known as being highly conserved, unlike that of the C-terminal. Our multiple sequence alignment analyses demonstrate that this is only partially true. We determined a high conservation of amino acid sequence motifs located within the C-terminal endonuclease domain. Moreover, we have shown that the majority of the MutLs contain highly conserved sequence motifs which are the sites interacting with other proteins such as the ␤ subunit of the DNA polymerase III. The precise definition of these regions will help to understand the role of the MutL protein not only in the MMR, but also in other DNA repair systems. 2. Methods The amino acid sequences of MutL proteins were downloaded from the NCBI database (http://www.ncbi.nlm.nih.gov/) and compared using the default settings of the VNTI 9.0.0 AlignX software (Invitrogen). Protein blast searches (http://blast.ncbi. nlm.nih.gov/Blast.cgi) restricted to taxonomic groups were carried out in order to confirm the results of multiple sequence alignments which were relevant to microbial classes. The phylogenetic analyses were performed by means of multiple sequence alignments employing ClustalX 2.0 with the default settings and visualized by phylograms generated using Dendroscope 2.4 [23,24]. 3. Results and discussion The analysis of conserved sequence motifs presented in this study was based on the multiple sequence alignment of 208 representative amino acid sequences of prokaryotic MutL homologues selected from a potentially broad range of taxonomic groups including 28 classes of bacteria and archaea (Table 1, Supplementary Data Table S1). We identified 16 sequence motifs comprising

M. Banasik, P. Sachadyn / Mutation Research 769 (2014) 69–79 Table 1 The taxonomic groups included in the analysis of amino acid sequences of MutL proteins.

Phylum Acidobacteria Actinobacteria Aquificae Bacteroidetes Chlamydiae Tenericutes Chloroflexi Cyanobacteria Deinococcus-Thermus Dictyoglomi Elusimicrobia

Euryarchaeota

Crenarchaeota Nanoarchaeota Fibrobacteres Firmicutes Fusobacteria Gemmatimonadetes Nitrospirae Planctomycetes Proteobacteria

Spirochaetes Thermotogae

Class Acidobacteria Actinobacteria Aquificae Bacteroidia Chlamydiales Mollicutes Chloroflexi Gloeobacteria Oscillatoriophycideae Deinococci Dictyoglomia Elusimicrobia Halobacteria Methanomicrobia Methanopyrales Methanococci Archaeoglobi Thermococci Thermoplasmatales

Fibrobacteres Bacilli Clostridia Fusobacteria Gemmatimonadetes Nitrospira Planctomycetacia Alphaproteobacteria Betaproteobacteria Deltaproteobacteria Epsilonproteobacteria Gammaproteobacteria Spirochaetes Thermotogae

57 highly conserved amino acid residues, including 43 identical in all the MutL proteins being analyzed (Table 2). 3.1. The distribution of the MutL homologues in Prokaryotes Though widespread in Prokaryotes, MutL is absent in organisms lacking MMR. No MutL homologues were found with the data mining we conducted using a protein blast search for 6 classes of Prokaryotes (Table 1), including all Epsilonproteobacteria and the majority of classes of Archaea. The search showed no MutL homologue in the Crenarchaeota, Nanoarchaeota, Archaeoglobales, Methanococcales or Methanopyrales. In the Actinobacteria, only one species out of the 751 with genomes deposited in GenBank, Rubrobacter xylanophilus, has a MutL homologue (GeneBank accession no. YP 642827.1). No MutS homologue found for these organisms; neither are there any MutL homologues among the Methanobacteriales, Thermococcales, Thermoplasmales archaeal orders, where MutS homologues other than MutS1 occur [25]. However, in the case of Archaea from the Methanomicrobia and Halobacteria, both MutS1 and MutL are present. It is worth emphasizing that both two MutS1 and two MutL homologues exist in a single organism in the species of Halobacteriales [25]. It has previously been reported that, within the Mycoplasma genus, which belongs to Mollicutes. [25]. This study has confirmed the absence of MutL and MutS homologues in the Mollicutes for all available complete genome sequences of this class, with the exception of the

71

Mycoplasma and Acholeplasma species reported in 2013, in other words, Mycoplasma sp. CAG:877, CAG:776, CAG:472, CAG:956, Acholeplasma sp. CAG:878, A. brassicae, A. palmae J233, and A. laidlawii PG-8A, which possess the MutS1 and MutL coding genes. Polynunucleobacter necessaries lacks both MutS [25] and MutL, which makes it an exception among the Betaproteobacteria. Most of the prokaryotic species analyzed have one MutL homologue accompanying a single MutS1 homologue, with the exception of Epsilonproteobacteria, in which MutL is absent, even in the species possessing MutS1 [25]. The taxa where no MutL homologue has been found are marked red and those where MutL has been found in selected species only are marked green. In Actinobacteria, the genome of R. xylanophilus has been found to be the only one possessing a MutL coding gene among the 751 complete genome sequences available. 3.2. The conserved motifs of MutL homologues The multiple sequence alignment of the 208 representative amino acid MutL sequences under analysis in this study reveals the existence of five groups of conserved motifs, as follows: (i) those responsible for ATP binding/hydrolysis, (ii) those within the endonuclease domain (iii) the amino acid residues crucial for the interaction between MutS and MutL, (iv) the motifs responsible for MutL interactions with the ␤ subunit of DNA Polymerase III, and (v) the motifs involved in the interactions between MutL and MutH, in the species where the MutL protein is accompanied by MutH. These motifs are discussed in the following chapters (Table 2, Fig. 2). 3.3. Motifs responsible for ATP-binding Five ATP-binding/hydrolysis motifs are situated in the Nterminal domain of MutL (Fig. 2). These motifs are involved in ATP-binding/hydrolysis. ATP binding by MutL, but not hydrolysis, is necessary to the activation of MutH and UvrD [26]. Hydrolysis of ATP by MutL is most probably necessary for the termination of the unwinding of the DNA helix by UvrD [27]. Motif 1, VVKELVENALDAGA, where the residues identical in all the MutL amino acid sequences in the analysis are underlined, contains the conserved amino acid residues which participate in the direct interaction with the bound ATP molecule [18]. The conserved Glu residue, which is to say, KEL, activates a water molecule, thus enabling ATP hydrolysis. The substitution of this Glu for the Ala residue results in the total inactivation of MutL ATPase activity; however, the mutant protein is still able to bind the adenine nucleotide [18]. The conserved Asp residue is responsible for Mg2+ ion binding [28]. The consensus amino acid sequence for motif 2 is DNGXGM. All the amino acid residues within this motif participate in the direct interaction with ATP [18]. The consensus amino acid sequence for the third ATP-binding/hydrolysis motif is TLGFRGEALXS. The underlined residues in the motif most probably create the hydrogen bonds with the ␥-phosphate group of ATP. In the absence of the ligand, the motif is partly disordered. The conserved amino acid residue of the Arg within the motif blocks ATP-binding by means of occupying the adenine binding pocket [18]. The consensus sequence of motif 4 is GT. The conserved motif 5 involved in ATP binding is NGR, where the first amino acid residue stabilizes the ␥-phosphate of ATP [18]. The conserved amino acid residues in the consensus motifs and the MutL sequence of Bacillus subtilis MutL (GenBank accession no. NP 389587.1) are colored according to the degree of conservation, as follows: red indicates an identical status in all the amino acid sequences of MutL proteins analyzed; blue marks those exhibiting 100% conservation but not identical in all the sequences in question; green highlights those conserved in more than 90% of the sequences and orange shows those conserved at 60 to 90%. For

72

M. Banasik, P. Sachadyn / Mutation Research 769 (2014) 69–79

Table 2 The consensus sequences of the conserved motifs found in MutL proteins.

Consensus sequence of the motif

XXXIXILXDXLXNXIAAGEVV

Corresponding nucleotide sequence in Bacillus subtilis MutL Human MLH1 Human PMS2 MutL: 1 MAKVIQLSDELSNKIAAGEVV 21 MLH1: 1 MAKVIQLSDELSNKIAAGEVV 21 PMS2: 1 MAKVIQLSDELSNKIAAGEVV 21

Function

Disordered loop L1 participating in ATP binding

VVKELVENALDAGA

MutL:27 VVKELVENAIDADS 40 MLH1:27 VVKELVENAIDADS 40 PMS2:27 VVKELVENAIDADS 40

Conserved ATP – binding motif

DNGXGM

MutL:59 MLH1:59 PMS2:59 MutL:75 MLH1:75 PMS2:75

DNGEGM 64 DNGEGM 64 DNGEGM 64 RHATSKI 81 RHATSKI 81 RHATSKI 81

Conserved ATP – binding motif

RHATSKI

Disordered loop L2 participating in ATP binding

TLGFRGEALXS

MutL:92 TLGFRGEALPS 102 MLH1:92 TLGFRGEALPS 102 PMS2:92 TLGFRGEALPS 102

Conserved ATP – binding motif

(K/H/R/E)Xn(K/H/R/E)(I/L)X(I/L/V)

(Bs): 108 HLEITTSTGEGAGTKLVL 125

MutS interactions

GT

MutL:141 GT 142 MLH1:141 GT 142 PMS2:141 GT 142

Conserved ATP – binding motif

LFFNTPARRKFLK

MutL:149 LFFNTPARLKYMK 161 MLH1:149 LFFNTPARLKYMK 161 PMS2:149 LFFNTPARLKYMK 161

β clamp binding motif

NGR

MutL:254 NGR 256 MLH1:254 NGR 256 PMS2:254 NGR 256

Conserved ATP – binding motif

VDVNVHPXKXEVRFXX

MutL:294 VDVNVHPSKLEVRLSK 309 MLH1:294 VDVNVHPSKLEVRLSK 309 PMS2:294 VDVNVHPSKLEVRLSK 309

Disordered loop L3 participating in ATP binding

GQ

MutL:443 GQ 444

Endonuclease motif

DQHAAHERILYE

MutL:462 DQHAAQERIKYE 473 PMS2:462 DQHAAQERIKYE 473

Endonuclease motif

QXLLIP

MutL:473 QEMIVP 492

β clamp binding motif

ACK

MutL:572 SCK 574 PMS2:572 SCK 574

Endonuclease motif

CPHGRP

MutL:604 CPHGRP 609 PMS2:604 CPHGRP 609

Endonuclease motif

FXR

MutL:623 FKR 625

Endonuclease motif

M. Banasik, P. Sachadyn / Mutation Research 769 (2014) 69–79

73

Fig. 2. The conserved sequence motifs of MutL.

the amino acid residues in the corresponding sequences of human MLH1 (GenBank accession no. NP 000240.1) and PMS2 (GenBank accession no. NP 000526.1), those marked turquoise have been reported as mutational hot-spots associated with HNPCC, while those shown in pink have not. The conserved amino acid residues within the endonuclease domain were determined for 192 MutLs proteins which exist in species lacking MutH and possess an endonuclease domain. The similarity criteria for the conserved amino acid residues are given in Table S2. The conserved sequence motifs determined by multiple sequence alignment are marked in the MutL amino acid sequence of B. subtilis (GenBank accession no. AAB19236.1). They are shown in colored boxes, as follows: ATPase, in red; disordered loop, in orange; endonuclease, in blue; MutL-MutS interactions, in green; and the binding of the ␤ subunit of DNA polymerase III, in violet. The B. subtilis MutL is not accompanied by MutH; therefore no motifs participating in the interaction between MutL and MutH are shown. The amino acid residues are colored according to the degree of conservation. as indicated in the legend to Table 2. The predicted secondary structures generated using PsiPred [29] are shown above

the amino acid sequence as pink cylinders, in the case of ␣-helices, and yellow arrows in that of ␤-sheets. The blue rectangles reflect the level of confidence in the prediction.

3.4. Conserved motifs of disordered loops There are three conserved motifs in the disordered loop of MutL which participate in the conformational switches of the protein following ATP binding [18] (Fig. 2). The consensus amino acid sequence of loop 1 determined in this analysis is XXXIXILXDXLXNXIAAGEVV. Loop 1 directly influences ATP binding by the MutL dimer and it therefore plays a significant role in the dimerization of the MutL N-terminal regions. This motif is also crucial for the MutH endonuclease activation [18]. The consensus sequences of loops 2 and 3 are RHATSKI and VDVNVHPXKXEVRFXX, respectively. Along with loop 1, loop 2 creates an interface for the interaction of MutL with other proteins after the initial binding of ATP and stabilization of the protein structure. The disordered loop 3 interacts directly with ATP [18].

74

M. Banasik, P. Sachadyn / Mutation Research 769 (2014) 69–79

Fig. 3. The most highly conserved amino acid residues in prokaryotic MutL proteins and the mutational hot-spots in the human MLH1 protein.

3.5. Motifs conferring the endonuclease activity of MutL MutH endonucleases do not exist in the majority of prokaryotic taxons [4]. Our analysis showed that only the selected orders of Gammaproteobacteria listed in Table 2 possess MutH; however, selected species within these taxons do not. For example, Francisella tularensis (ID: 263) and Methylophaga thiooxidans (ID: 637616) belong to the same Thiotrichales order but, unlike the former, the latter lacks MutH. Moreover, F. tularensis MutL contains the conserved DMHAAHERILYE motif, which is required for endonuclease activity [30], while the MutL homologue from M. thiooxidans does not. Five amino acid sequence motifs associated with MutL endonuclease activity have been identified in B. subtilis on the basis of crystal structure studies [31]. In our analysis, 192 out of the

208 representative amino acid sequences of MutL contain the motifs conferring endonuclease activity and the MutLs are not accompanied by MutH. The multiple sequence alignment analysis comprising the 192 amino acid sequences in question revealed the consensus sequences of these motifs and the most conservative residues within the motifs (underlined), as follows: GQ (motif 1), which contributes indirectly to the overall stability of the endonuclease active site [31]; DQHAAHERILYE (motif 2), which is responsible for the metal binding/endonucleolytic activity of MutL [32]; ACK (motif 3), CPHGRP (motif 4), where C604 and H606, together with E468, form the Zn2+ -binding site in the B. subtilis MutL protein [31]; and FXR (motif 5). In the course of this study, only the DQHAAHERILYE motif was found in all 192 MutL amino acid sequences. The remaining endonuclease motifs could be absent in selected classes. Motifs 1,

M. Banasik, P. Sachadyn / Mutation Research 769 (2014) 69–79

75

Fig. 4. The most highly conserved amino acid residues in prokaryotic MutL and the mutational hot-spots in the MLH1 and PMS2, the human homologues of MutL.

3, 4 and 5 were not found in the MutLs from Chlamydiales and Bacteroidia. In addition, the search for motif 1 was unsuccessful in the MutLs of Thermotogae and Gemmatimonadetes, as it was for motifs 3 and 4 in those of Fibrobacteres, and motif 4 in those of Gemmatimonadetes. The absence of the given motifs within the MutL proteins from the selected classes was confirmed through the use of protein BLAST. 3.6. Motifs responsible for MutL interactions with other proteins Prokaryotic MutL proteins interact with several other proteins, including the MutS protein, the MutH endonuclease, the DNA ␤-clamp, UvrD, which is to say, the DNA helicase II, and the exonuclease X [33,34]. Although there is abundant literature data both on MutL and its homologous, and on MMR, the exact sites of MutL interactions with other proteins involved in DNA repair have not been precisely determined. 3.7. Interactions with MutS It has been shown that the N-terminal, or connector, domain of the MutS protein participates in the interaction with MutL [35]. The N-terminal region of the MLH1 subunit is responsible for the formation of the complex between the human MutL␣ (MLH1-PMS2 heterodimer) and MutS␣ [36]. Four amino acid residues have been found to be crucial for the proper interactions between these two MMR proteins; they are H112, R127, A128 and Y130. We identified the counterparts of these amino acid residues in the prokaryotic homologues of MLH1. On the basis of the analysis, which is presented in detail in the Supplementary Data section, Table S3, we have concluded that, in prokaryotic organisms, the consensus sequence of the region containing the amino acid residues analyzed in this study is (K/H/R/E)Xn (K/H/R/E)(I/L)X(I/L/V), where n is a variable number of the amino acid residues depending on the species and class of the organism in question. If presented in a more general form, as [Basic/Acidic]Xn [Basic/Acidic][Hydrophobic]X[Hydrophobic], this consensus sequence harmonizes with the sequence of this region in the human MLH1 protein HVTITTKTADGKCAYRASY. As the MutL amino acid residues responsible for interactions with MutS have not yet been identified, follow-up, site-directed mutagenesis studies might confirm this in silico finding. 3.8. Interactions with MutH The literature data on MutL interactions with MutH are scarce. According to the analyses performed by Hall and Matson [37], the regions of the E. coli MutL protein participating in the interaction with MutH are located within the fragments of the MutL protein between amino acid residues 399–439 (region 1) and 556–615 (region 2). Unfortunately, owing to the low amino acid conservation of these regions, the multiple sequence alignment results

we obtained were inconclusive. Another motif which might potentially be engaged in this protein–protein interaction is bracketed by the residues A525 and E541 in E. coli [38]. In order to search for possible sites of MutL interactions with MutH, we carried out multiple sequence alignment of the amino acid sequences corresponding to this motif in MutLs from 22 bacteria with MutH (Supplementary data Fig. S1). The consensus sequence we obtained is XVPAXLRXXXLXXLIXX, which corresponds to the sequence motif AVPLPLRQQNLQILIPE of E. coli. 3.9. Interactions with the ˇ subunit of DNA polymerase III The ␤ subunit of DNA polymerase III and its eukaryotic counterpart, proliferating cell nuclear antigen (PCNA) are collectively referred to as sliding clamps. These ring-shaped proteins, the role of which is to tether the core of DNA polymerase to the DNA, ensure the high efficiency of DNA synthesis during replication. The precise role of the sliding clamps in the MMR action is not fully understood, but it has been speculated that these subunits of the polymerases might participate in targeting the MMR proteins to the replication forks where the probability of mismatch occurrence is the highest. Another possibility is that sliding clamps enable the MMR components to discriminate between the newly synthesized and the parental DNA strand, since the sliding clamps assume strictly defined orientations on DNA [39]. Lopez de Saro et al. identified the conserved loops within the E. coli MutL protein, which was able to bind the ␤ subunit of DNA polymerase III [40]. The consensus sequence of this motif for the 208 prokaryotic MutL sequences in our analysis is LFFNTPARRKFLK. In addition to the ␤-subunit binding motif located in the C-terminal part of MutL, there is another in the N-terminal domain [16]. The consensus sequence of this motif, as determined in this study, is QXLLIP. The role of the Pro residue in this sequence is most probably to ensure the appropriate orientation of the motif, thus facilitating the contact with the ␤ subunit of DNA polymerase III [16]. No similar motif was found among MutL proteins from the Aquificae class. 3.10. MutL conserved motifs and the mutation hot-spots of human MLH1 and PMS2 The mutations in the human MLH1 are one of the primary causes of human non-polyposis colorectal cancer. We aligned the mutation hot-spots in the MLH1 protein from H. sapiens with the most highly conserved amino acid residues in the prokaryotic MutL proteins marked within the E. coli MutL sequence (Fig. 3). We found that 25 out of the 43 most highly conserved amino acid residues which were identical in all the MutL sequences (Table 2, Fig. 4) overlap with the mutational hot-spots in MLH1 leading to colorectal cancers. It is interesting to note that, though 8 of those 43 amino acid residues correspond to identical amino acid residues in the human MLH1, they have not been reported

76

M. Banasik, P. Sachadyn / Mutation Research 769 (2014) 69–79

Table 3 The division of MutL proteins based on the presence/absence and primary structure of an endonuclease domain. Group of MutL homologues

ATP binding/hydrolysis domain

Endonuclease domain

Organisms

Distinguishing factor

Group 1

+

+

Most Prokaryotes

Endonuclease domain present; C-terminal region 250–300 aa; average protein size – 600 aa

Group 2

+



Part of Gammaproteobacteria including Pasteurellales, Aeromonadales, Alteromonadales, Enterobacteriales, Legionellales, Thiotrichales, Chromatiales, Vibrionales

Endonuclease domain absent; average protein size – 600 aa

Group 3

+

+

Selected Halobacteria (a class of Archaea)

Endonuclease domain present; long C-terminal region >400 aa Average protein size – >700 aa

Group 4

+

+

Group 5

+

+

as mutation hot-spots in human MLH1 in either the literature or the databases we searched (HGMD, http://www.hgmd.org; InSiGHT, http://www.insight-group.org; Woods MMR Databases, http://www.med.mun.ca/mmrvariants). The majority of the mutational hot-spots in the human MLH1 are located in the conserved motifs responsible for ATP binding/hydrolysis. An analogous analysis carried out for the human protein PMS2 resulted in quite the opposite finding. Only three out of the 43 most highly conserved amino acid residues of MutL correspond to identical mutational hot-spots in the human PMS2, while a further 36 have not been reported as mutational hot-spots, although they correspond to identical amino acid residues in PMS2 (Fig. 4, Supplementary data, Fig. S2). This observation corroborates with the finding that the Pms2 deficient mice unlike the Mlh1 deficient ones do not develop intestinal tumors as well as they exhibit dramatically lower frequency of mononucleotide repeat mutations [41]. As PMS2 possesses an endonuclease domain like the B. subtilis MutL, the amino acid sequence of the latter was used in this comparison.

3.11. MutL homologues in Halobacteria We found that selected species from the Halobacteria class of Archaea possess two genes encoding MutL homologues of different lengths and amino acid sequences within one genome. We compared the amino acid sequences of the shorter and the longer MutL homologues from ten microorganisms belonging to the Halobacteria class (Supplementary data Table S1). The sizes of the shorter MutL homologues of Halobacteria range from 550 to 593, with an average of 568 amino acid residues, and they are smaller than other prokaryotic MutL proteins of sizes ranging from 425 to 762, with an average of 608 amino acid residues. In turn, the sizes of the longer homologues range from 712 to 787, with an average of 751 amino acid residues, thus exceeding the sizes of other prokaryotic MutLs. Both the short and long homologues of the MutL protein from the Halobacteria class possess all the motifs characteristic of the majority of prokaryotic MutL homologues. There is no data as to whether these two homologues act either as two homodimers or as a heterodimer. As each of two MutL homologues from Halobacteria possesses all the MutL conserved motifs, the formation of heterodimer is not necessary in order to combine the motifs necessary for MutL functions. It is unlike the Eukaryotic homologues,

Endonuclease domain present; average protein size –

Conserved motifs of MutL proteins.

The MutL protein is best known for its function in DNA mismatch repair (MMR). However, there is evidence to suggest that MutL is not only the linker c...
3MB Sizes 8 Downloads 8 Views