YEAST

0

VOL.

0 0 0

0

0 00 0

x

7: 981-988 (1991)

0

Yeast Sequencing Reports

0

. O 0

The Sequence of a 6.3 kb Segment of Yeast Chromosome I11 Reveals an Open Reading Frame Coding for a Putative Mismatch Binding Protein GIORGIO VALLE*, ELISABETTA BERGANTINOT, GEROLAMO LANFRANCHI* AND GIOVANNA CARIGNANIt

* Dipartimento di Biologia and TDipartimento di Chimica Biologica, Universita degli Studi di Padova, via Trieste 75, 35121 Padova, Italy Received 12 August 1991; accepted 3 September 1991

We report the sequence of a 6.3 kb segment of DNA mapping near the end of the right arm of chromosome 111 of Saccharomyces cerevisiae. The sequence reveals a major open reading frame coding for a putative protein of 1047 amino acids with a striking similarity to the bacterial proteins involved in recognition of mismatched DNA base pairs. This is particularly interesting as the existence of a yeast mismatch repair system similar to that of bacteria has been postulated for some years, but a yeast protein homologous to the bacterial mismatch binding protein had not been identified. The results of a comparison of the putative yeast mismatch binding protein with the bacterial mismatch binding proteins and with two cognate mammalian sequences, support the idea that a similar mismatch repair system may be present also in mammalian cells. The possibility that all of these proteins may have evolved from a common ancestral gene is also discussed. KEY WORDS

-Chromosome 111; genome sequencing; mismatch repair; post-meiotic segregation.

INTRODUCTION The work presented in this paper is part of the European project for sequencing chromosome I11 of Saccharomyces cerevisiae. The segment of DNA assigned to our group maps near the end of the right arm of the chromosome, between the cdc39 and the H M R loci (Mortimer et al., 1989). No functions nor genetic markers had previously been associated with this segment of DNA.

The Escherichia coli strains used for cloning were: JMlOl (F' traD36 lac14 A(1acZ) MI5 proABISupE thi A(1ac-proAB)) and XL1-Blue (F::TnlO proA+ B+ lac19 A(1acZ) MISlrecAl endAl gyrA96 (Nal') thi hsdR17 (rk-mk+) supE44 relA1 lac). Single-stranded DNA was produced in M 13mpl8 vector supplied by Boehringer. DNA subcloning and double-stranded DNA manipulations were performed in pGEM vectors supplied by Promega.

MATERIALS AND METHODS

Sequencing strategies Methods for cloning, restriction analysis and DNA preparations were as described by Maniatis et al. (1982). The sequencing strategy made use of random sequencing of fragments generated by sonication (Bankier et al., 1987) as well as sequencing of progressively deleted fragments of DNA by Exonuclease I11 (Henikoff, 1987). The remaining

Vectors and strains The segment of DNA assigned to our group was part of the insert of the lambda clone PM3712 constructed by M. Olson. Preparations of PM3712 DNA were kindly supplied by M. Gent and S. Oliver. 0749 -503X/9 I /09098 1-08 $05.00 0 199 I by John Wiley & Sons Ltd

982

G . VALLE ET AL. 1 8 8 0 I

I

S

H

2136

X

2859 3592 4155 I

I

1

H

B

H

5688 6101

8707

,

I

E E

E

I

10081 10430 I

1

E H

11345 1

E

Figure I . Restriction map and general sequencing information. The figure displays a restriction map of the fragment of DNA assigned to our group. At the top are indicated the positions ofthe following restriction sites: SmaI ( S ) , Hind11 (H), XhoI (X), BamHI (B), EcoRI (E). The DNA sequence presented in this paper covers the Hind11 fragment between positions 41 55 and 10430.The arrows in the central part of the figure show the extension of the readings from each sequencing gel, while the direction of the arrows indicates the forward or reverse DNA strand. The bottom part of the figure shows the number of readings per base.

gaps were completed by oligonucleotide-directed sequencing on double-stranded DNA. DNA sequencing was performed by the enzymatic chain-termination method (Sanger et al., 1977). About half of the sequences were analysed by an Applied Biosystems model 370 DNA sequencer, using fluorescent primers as tracers. The remaining sequences were analysed by the standard procedure, using deoxyadenosine 5’-[c~-~~S]thiotriphosphate as tracer, followed by autoradiography. Either T7 polymerase or Taq polymerase was used for the elongation reaction. RESULTS The top part of Figure 1 shows a map displaying the recognition sites of five restriction endonucleases, covering the whole fragment of 11 345 bases assigned to our group. The clone PM3712 begins at the SmaI site (base 1 at the top of Figure 1) and

continues approximately 10 kb after the EcoRI site at position 11 345. As already mentioned, the fragment belongs to the right arm of chromosome I11 and its orientation is such that the left side of Figure 1 is facing the telomere, while the right side is facing the centromere. Figure 1 also shows the location of the 6.3 kb fragment, and other data related to the sequencing procedure. The nucleotide sequence of the 6.3 kb segment is presented in Figure 2. An overall view of the sequence reveals a compositional bias, with 36% G C pairs. The results of a search for the presence of open reading frames (ORFs) are presented in Figure 3: a major OR F (YCR1152) with forward orientation can be observed in the central position, flanked on either side by ORFs with reverse orientation. The O R F on the right end of Figure 3 (YCR 1153) extends over the limits of this fragment of DNA into the part of the clone PM3712 sequenced by L. Frontali’s group, and is described in a separate paper (Wilson et al., 1991). The O R F

Figure 2. DNA sequence of the 6.3 kb fragment and translation of the YCRll52 ORF. In the first 2000 bp, TATA boxes are underlined with a double line and CAP sites with a single line. In the amino acid sequence, the ATP/GTP binding site is underlined, starting at position 820.

AAGCTTCATTGAAAGGAAGTT~AATGACCGTATCTAATCTTCCCGTAT 51 174 CTAACAACGATTTTAAAAAGTTGGCAATAGGTGCGTCATCCWIGTTGATTATCATTTTTCCTGCATCTACGATTTTAGAATCATCGGTTATTGTAGAAATAACGTCCTTGAACGATGACGGTG A T T ~ T G C A C T T G A A T C A A ~ G T A A ~ T A T T T C G T C T A T G A C T G T C C T G T T T T C T A T C A A T A A A A C G A A C C G C T T C A T T T C T T T G A C T A A T G C T A A A A ~ A T A T T C A G G G G297 CGT AGTTGAAATCTTCAAAGAAGCACTCTTGAAAATAAGCTAGCATAGGACCATCAGACAATTCCTCCCTTTCTAGGAACAATTCTAAATTCAATGATGCAATGGATAATAAGTAGAGTAAAGATT 420 GCTTTGTGTTTCGGATGCTTTCA~CTTTCCTGGTACCTTCTGCATTTAATAAGTCAAAGCAACCATTCTGTACCGACCACTTATGAGTTGAATGGCCAAATCTAGTTTGAAAATAACTT 543 TCCAGTCGCAACTGAAAAATTCATCAATAACTGGACCATCGTGCAATGCTGCAAA~CAACTCAAAGATGCTGTAGTAGGCATCAATGGTGCGCT~AAATACTTTGTTGACATTAAAT 6-55 TAAACACACGATTCCAATTTATCTGGTTGGCTTGCAGAGAACTTATAGATCTTGC~CGTTCCCACGGTTCGTTGCGTCTAAAACCATCGGAGGTGTCAACCAACTCCGTGTGGGATCAT 789 TTTGTAAATTCTGAGACCCAGGTGAAAGTATTTCGGACAGTAAAAGAGCTAA~TCATTTAGACTTTCACCTTCCAAGGAATGCAA~TCGCGTAGATTGGACTCTAGTCTTTCTGGTG912 TGGTTTGAAGAAAAAAATTCCTTAAATAATTTGTATGATTCATCTTAATTCTCTTGAAATCGTTAATCTGAGAGGAGCGCT~TGACTC~TGTCCTTGTTTGCAGATTTATCAAATA 1035 1158 GATCAATGAAAAGTGATAATGTMTTCCTGTTTCTTTTTGAAACTCCTGGTTATTGAACGTTCGTTCTTTCAAAAGCTCAATAAGGTCTGTGCTTTTCTTCGGTAAATCCTTCGCAAATTCTA ATAGTAGTCTCTGTACTAGATTTTCTTGAAGTAAAAATTTTCCCGCTTGTAAAATATCCCGAGAACTTAAGGTTAATAATCGTTCCCAAACTTTTATGTAAATATCTACGWICGACCTGTCTA 1281 AAATATGTCTAATTTCTCTTTCCACGGATTCATTGTCGTTGTTAAGAGTCGTGAATAATAAACTAATTTGTGCAATGACGATTTGAGCGGCCTGTTTTTCCTTTGATGTTTCTAAATTAG 1404 1527 ATGCTGTCTTCAAATCACGGTATGTGGCCGATAGCATGTAATCAATTGTCCCTGAGTGACGTTTCTCAATTGCTTATCTTTTACCAATAGTCTTTCCCCGACTCACCTTTGCCCCTGTTTATC 1650 TTATTCGAATTCGATCGTTTGACTTTCGTTGCCACCGAAAAAAAAGAAAAATTTTCGTGTATTTATCACTCCTWICGGAATATTGCGATCACGTGAATTTTCAATWITAAA~CTGCAACA A T G G T W I T A G G T M T G A A C C T A A A C T G G T A C T T T T W I G A G C C A A A A G C A G T G C A A A T A G A T T T A T T T T G T T G A A T C ~ A A C A A T A A T G G C G G G A C A A C C C A C A A ~ C A G G T T T T T1773 ~ METValIleGlyAsffiluProLysLeuValLeuLeuAr~laLysSerSerAlaAsnArgPhelleLeuLeuAsnLeuLeuThrIl~ETAlaGlyClnProThr1leSerArgPhePheLys41

AACGCGGTAAAATCAGAGCTGACGCATAAGCAAGAACAAGAAGTTGCGGTTGGAAATGGCGCTGGTAGCGAATCCATCTGCCTTGACACTGATGAAGAGGACAATTTATCTTCTGTTGCAAGC1896 LysAlaVelLysSerCluLeuThrHisLysGlffil~lffiluValAlaValGlyAsnGlyAlaGlrjerGluSerlleCysLeuAspThrAspGlffiluAspAsnLeuSerSerValAlaSer 82 ACAACAGTAACTAATWITAGCTTTCCACTCAAAGGCAGTGTTTCTTC~AATTCGAAAAATTCAGAAAAGACTAGTGGTACTTCGACAACATTTAATGATATTWICTTTGCTAAGAAATTG 2019 123 ThrThrValThrAsnAspSerPheProLeuLysGlrjerValSerSerLysAsnSerLysAsnSerGluLysThrSerGlyThrSerThrThrPheAsnAsplleAs~heAlaLysLysLeu GATAGGATTATW\AAAGACGAAGTGATGAAAATGTTGAGGCTGAAGATCATGAGGAAGAGGGTGACGAACATTTCGTAAAAAAAAAAGCCAGAGTCCCCTACAGCGAAACTTACTCCCTTG 2142 164 AspArgll~ETLySArgArgSerAspGluAsnValGluAlaGluAs~spGluGluGluGlyCluGluAspPheValLysLysLysAlaArgLysSerProThrAlaLysLeuThrProLeu

GACAAACAGGTGAAGGACCTGAAAATGCATCATAGACATAGTGCTTGTTATTAGAGTAGGCTACAAGTACATGTTTTGCAGAGGATCCAGTAACCGTTAGCAGAATACTTCACATCAAA2265 205 ASpLySGlnValLysAspLeuLysMetHisHisArgAspLysValLeuVal~leArgValGlyTyrLysTyrLysCysPheAlaGl~s~laVa~ThrValSerArg~~eLeuHis~~eLys CTTGTGCCTGCAAAATTGACTATCGATCAGTCTMTCCTCAAGATTGCAATCATAGGCACTTTGCGTACTGTTCTTTCCCGGATGTCAGATTAAACGTTCACCTAGACAGACTTGTGCATCAT 2388 LeuValProClyLysLeuThrIleAspGluSerAsnProGlnAs~ysAsnHisArgGlnPheAlaTyrCysSerPheProAspValArgL~snValHisLeuGluArgLeuValHisHis 246 AATTTAAAGCTTGCCGTGGTAGAGCAAGCAGAAACAAGCGCTATTAAGAAGCATGATCCAGGTGCCAGCAAATCAAGCGTTTTTGAAAGAAAGATTTCAAATGTCTTTACCAAAGCTACATTT 2511

AsnLeuLysValAlaValValGlffilnAlaGluThrSerAlalleLysLysHisAspProGlyAlaSerLysSerSerValPheGluArgLys~leSerAsnValPheThrLysAlaThrPhe 287 GGTGTTAATTCCACCTTTGTCCTTAGGGGGAAACGTATTCTCGGTGATACAAACAGTATATGCGCTTTGTCCCGTGACGTACATCAGGGAAAGGTGGCTAAATATTCCTTAATTTCTGTCAAT 2634 GlyValAsnSerThrPheValLeuArgGlyLysArgIleLeuGlyAspThrAsnSertleTrpAlaLeuSerArgAspValHisGlnG~yLysValAlaLysTyrSerLeu~leSerValAsn 328

TTAAATAACCGGGAAGTCGTGTATGATGAATTTGAAGAGCCTAATCTTGCTGATGAGAAACTACAGATACGAATCAAATATTTACAGCCCATAGAAGTACTGGTAAATACAGATGATCTTCCA 2757 369 LeuAsnAsnGlyCluVelValTyrAspGl~heGluGluProAsnLeuAlaAspGluLysLeuGlnlleArglleLysTyrLeuGlnProlleGluValLeuVa~AsnThrAspAspLeuPro TTACATCTAGCGAAATTTTTCAAAGATATTTCATGTCCTTTAATACACAAGCAGGAGTATGATTTGGAAGATCATGTAGTTCAGGCAATAAAAGTAATGAATGAGAAAATTCAACTCTCGCCG 2880

LeuHisValAlaLysPhePheLysAsplleSerCysProLeulleHisLysGlnGluTyrAspLeuGluAspHisValValGlnAla~leLysValMetAsnGluLyslleGlnLeuSerPro

410

TCTCTCATACGCTTAGTTTCTMGTTATATTCGCATATGGTTGACTACAATAATGAGCAGGTGATGTTGATTCCTTCTATCTATTCGCCCTTCGCATCAAAAATACATATGTTACTTGATCCT3003

SerLeuIleArgLeuValSerLysLeuTyrSerHisMetValGluTyrAsnAsnGluGlnValMetLeulleProSerlleTyrSerProPheAlaSerLyslleHisMetLeuLeuAspPro

451

3126 AsnSerLeuGlnSerLeuAspllePheThrHisAspGlyClyLysGlrjerLeuPheTrpLeuLeuAspHisThrArgThrSerPheGlyLeuArgMetLeuArgGluTrplleLeuLysPro 492 MCTCCCTGCAAACTTTGGACATTTTTACCCATGATGGTGGTAAAGGTTCTTTGTTTTGGTTATTGGACCATACAAGGACATCGTTTGCATTAAGAATGTTGAGAGAATCGATTCTCAAACCT

TTGGTTGATGTACACCAAATTCAAGAGCGGCTTGATGCCATTGAGTGCATTACATCCGAAATCAACAACAGTATATTTTTTGAATCGTTGAATCAAATGTTGAATCATACCCCTGACTTATTA 3249 533 LeuValAspValHisGlnIleGlffiluArgLeuAspAla~leGluCyslleThrSerGlu~leAs~snSer~lePhePheGluSerLeuAsnGl~etLeuAsnHisThrProAspLeuLeu

AGAACTTTAAATCGCATAATGTATGCTACMCTTCTAGAAAAGAAGTCTATTTCTATTTAAAGCAAATAACTTCTTTCGTTGATCACTTCAAGATGCATCMTCTTACCTGTCAGAACATTTC3372

ArgThrLeuAsnArgll~etTyrGlyThrThrSerArgLysGluValTyrPheTyrLeuLysGlnlleThrSerPheValAspHisPheLysMetHisGlnSerTyrLeuSerGluHisPhe

574

3495 LysSerSerAspGlyArgIleGlyLysGlnSerProLeuLeuPheArgLeuPheSerGluLeuAsffiluLeuLeuSerThrThrGlnLeuProHisPheLeuThrMet~leAsnValSerAla 615

AAGTCATCAGATGGMGGATAGGCAAACAATCTCCTTTACTTTTTAGACTATTTAGTGAATTGAATGAACTACTTTCTACCACTCAGTTGCCTCATTTTTTGACCATGATCAACGTTTCTGCG

GTAATGGAAAAAAATTCAGATAAGCAAGTAATGGATTTTTTTAATTTAAATAACTATGATTGTTCAGAGGGTATAATAAAAATTCAAAGGGAAAGCGAATCAGTACGGTCACAGTTAAAGGAA 3618 ValMetGluLySASnSerASpLysGlnValMetAspPhePheAsnLeuAs~snTyrAs~ysSerGluGlyllelleLyslleClnArgC~uSerCluSerValArgSerGlnLeuLysGlu 656 GAATTGGCAGAAATACGAAAATATCTCAAACCTCCATATCTATTTTAGAGATGAAGTTGATTACTTAATCGAAGTGACTCGCAAATTAAGGACTTGCCAGATGATTGGATAAAACTT

3741

GluLeuAlaGluIleArgLysTyrLeuLysArgProTyrLeuAsnPheArgAspGluValAspTyrLeulleCluValLysAsnSerClnlleLysAspLeuProAspAspTrpIleLysVal 697 AACAATACGAAGATGGTCAGTAGATTTACCACTCCCAGAACCCAGACTGACTCAAAAGCTAGAATATTACAAGGACTTATTAATTCGCGAATCTGAACTACAGTATAAAGAATTCTTGAAC

3864

AsnAsnThrLysMetValSerArgPheThrThrProArgThrGlnLysLeuThrGlnLysLeuGluTyrTyrLysAspLeuLeulleArgGluSerGluLeuGlnTyrLysGluPheLeuAsn 738 AAAATTACGGCAGMTATACAGAGCTCCGTAAAATTACACTCAATTTGGCGCAGTATGACTGTATTTTGTCGTTACCAGCCACATCATGCAACGTAAATTATGTTAGACCAACTTTTGTGAAT

LyslleThrAlaGluTyrThrGluLeuArgLyslLeThrLeuAsnLeuAlaClnTyrAspCyslleLeuSerLeuAlaAlaThrSerCysAsnValAsnTyrValArgProThrPheValAsn

3987 779

CGTCAACAAGCCATMTCGCATGCAAGAAATCCAATTATCGAGTCGCTGGATGTTCATTATGTACCAAATGATATCATGATGTCCCCAGAAAACGGTAAAATCAATATTATAACGGGG 4110 G l y C l f f i l ~ l a l l e ~ l e A l a L y s A s n A l ~ r g A s ~ r o l l e ~ l e G l u S e r L e u A s p V a l H i s T y r V a l P r o A s ~ s p ~ l ~ e t M e t S e r P r o G l u A s n G l y L y s l ~ e A s n l l e820 lleThr~

CCGAATATGGGTGGGAAATCATCTTATATTAGACAAGTGGCACTGCTTACTATAATGGCACAGATCGGCTCATTTGTCCCCGCAGAAGAGATCAGATTAAGCATATTTGAAAACGTACTCACT 4233

861 ProAS~etClyGlvLvSSerSerTyrlleArgGlnValAlaLeuLeuThrll~etAlaGln~leGlySerPheValProA~aGluGlulleArgLeuSer~lePheGluAsnValLeuThr CGAATCCGTCCGCACWITGATATTATAAACGGTGATTCTACTTTTAAAGTGGAAATGCTTGATATCCTACACATCTTG~TTGCAATAAACGGTCTTTACTATTATTAGACGAAGTGGGA 4356 902 ArglleGlyAlaHisAspAspllelleAsffilyAspSerThrPheLysValGl~etLeuAsplleLeuHis~leLeuLysAsnCysAsnLysArgSerLeuLeuLeuLe~spGluValGly AGAGGTACTGCCACGCACGATGGTATAGCAATTTCTTATGCTTTAATAAAGTATTTTTCTGAGTTAAGTGACTCCCCCTTGATATTATTTACTACCCATTTTCCCATGCTGGGAGAAATCAAA 4479 ArgGlyThrClyThrHisAspGlylleAlalleSerTyrAlaLeulleLysTyrPheSerGluLeuSerAs~ysProLeulleLe~heThrThrHisPhePr~etLeuGlyClulleLys943

4602 SerProLeulleArgAsnTyrHisMetAspTyrValGl~GluGlnLysThrGlyCluAspTr~etSerValllePheLeuTyrLysLeuLysLysGlyLeuThrTyrAsnSerTyrGl~et 984 4725 AATGTGGC~TTGGCACGCCTGGACAAAGATATTATAAATCGGGCATTCACTATTTCAGAAGAATTGCGGAAGGAATCCATTAACGAAGACGCGTTGAAATTATTCAGCTCTTTGAAAAGA 1025 AsnValAlaLysLeuAlaArgLeuAs~ysAsplleIleAsnArgAlaPheSerlleSerCluC~uLeuArgLysGluSerlleAsnGluAspAlaLeuLysLeuPheSerSerLeuLysArg TCTCCGTTAATAAGGAATTATCATATGGATTACGTGGAGAACACTGGCGAGGACTGGATGAGTGTAATTTTTCTATATAAGTTAAAAAAGGGATTGACTTATAATAGTTATGGGATC

ATATTAAAAAGTGATMTATAACAGCAACGGATAAACTCGCGAAATTACTATCATTGGATATCCACTGATGCGGATGGTACGTATCTTCTAAATGCAGCATTATCAGACAATAAATGATATTA lleLeuLysSerAspAsnIleThrAlaThrAs~ysLeuAlaLysLeuLeuSerLeuAsplleHis***

4848 1047

TATTTTTCATTCACTTTCTCGTTTTTGMTTCTTCCTTTGTTTATTTGTTTACTTTTAAATAATTTTTTACATTTTTATCATAAGATAAATTGAGAGACAATACAGAAATTTGGCCATGGG 4971 TTTATCTTTCTGTAAACATGTTATAGTTCAGGTATWITGTCTTTAAAGAAACTTCTATGTGAGCTCTTATGCCTTGGCCTTGTACAAGTTCCTTTCTATAGAATTTTTCCGTAGTATAATTTT 5094 CACAGTAGGTTMGATATTGTGATCTTTCTTCGTTAAACTCATAGATTAAAATCATGGAATGGATCAGCCTCATCAATTTCATCATCCTTTGCAACTTTCTCACAGAACATTTTGGTTTCCT 5217 CTTCACTCACTGAGTTCCTTTTCCGGTGTTTATTCCAAGACMTATAAAAGGAAGTTCGCAGCCGTTATCATTTAATGCAGGTATTAGAGGGGGGTCTTGGTTTCTTAAGAACGACCACTGAA 5340 CTTTTTTGAACAAGCGATGTCTCTTTATGTCTGCAGCTCCTGATTTGGAACCAAGCCTTTTTGCCTCGTTTTTGTTTAGTAGTTTCTTTATCAGGTCTTTACAATTCTTCGAAACTTCCTTAT 5463 5586 CATGTGGAAATTTGACGTCCTTGGTTAAAATGTTAGAGAATGTTTCATTTGAATTATCTCCTTTAAATGGAGTACAGCCAAATAGCATCTCGTAAATCAATATTCCTAAAGTCCACCAGTCTA 5709 CTGCTGCAGTGTGGCCATTCCCTCTGATTACTTCTGGAGCTAAATACTCTTCAGTACCAACAAAGCAATTAGTTCTGAATCCATCTGAACAAATTTTTGTATCTAAATACGTAGAGTCTTTCA TGGTGGGTTTTTTTGATCCCGTTGCTTGGATGWITAAATCAGTCACAAAGCATGACATGACCAGATTGATGCAGTMTATGTTTTCCGGTTTCAAATCTCTGTATATGAAGCCCAGTAGGT 5832 GTAAATATTCCAAAGCTGCTACTACTTCACTCGCGTAAAACTTCGCATCTTCTTCTGCAATGCATTTACTTTTTCTTGTTTGTAAGCCTCTAAAGAATTCCCCTCCCATGCAGTATTCCATAC 5955 6078 ACAGATACAAATAGTCTTTGCTTTGAAAGGAATWITACAGTGTCACAATAAATGGATGATCACTTGTCGCGAGAATTTCCTGTTCAGTGAGTACTCGTTTAATTTTCTTCCTCTTGATCATCT CATGTTTATTCAAAACTTTCAGGGCCAATATCTGGTTGGTATCGCGTTCCCTCACTAAATACACTTTACCTACGTCACCTTGGCCAAGTAGTCTAATTTTTTCGAAGGATTGAGGTTCGACAG 6201 6281 TAATGTCTTGGAACTTGTTACCGAATGACTTGGTTCTTAAACGACGAGATCTCCTCGGTTCTACAATATCTCCCAAGCTT

984

G . VALLE ET AL.

ir

1000'

2000~

mod

4m'

Soool

~ O ' b P

Figure 3. Map of stop and initiation codons within the 6 . 3 kb DNA segment. The full vertical bars show the position of the stop codons; short vertical lines represent the initiation codons. The three top rows correspond to the three reading frames with forward orientation, while the three bottom rows correspond to the three reading frames with reverse orientation. The three major ORFs are indicated following the nomenclature recommended by the BAP program.

Figure4. Results of the database search. The search for similarity was done by the FASTA program (Pearson and Lipman, 1988)on the SwissProt database. As a query sequence, the putative protein of 1047 amino acids corresponding to the ORF YCRl152 was used.

on the left end of Figure 3 (YCRl151) extends over the limits of the clone PM3712 and is currently the object of further investigation. In this paper we focus our attention on the YCRll52 ORF, whose amino acid sequence is presented in Figure 2. The region at the 5' side of this O R F was searched for the presence of basic initiation signatures. Possible TATA boxes (5'TAT(T/A)A3'; Hahn et al., 1985) and CAP sites (S'YAAG 3'; Dobson et al., 1982) are underlined in Figure 2. The putative protein sequence encoded by YCRl152 was used to search the PROSITE database (Bairoch, 1990) in order to find known protein signatures. The most relevant feature that emerged is the presence of a sequence denoting an ATP/GTP binding site between amino acids 820 and 827. This also is underlined in Figure 2. The calculation of the codon adaptation index (Sharp and Li, 1987)gave a value of 0.12, indicating

that the sequence corresponds to a low expression gene. The FASTA program (Pearson and Lipman, 1988) was used to search the SwissProt database for sequence homologies. Four proteins showing a high level of similarity with the YCRl152 product were thus identified, as shown in Figure 4. The two bacterial proteins, MutS from Salmonella typhimurium and HexA from Streptococcus pneumoniae, have a known function in base mismatch repair (Haber et al., 1988;Priebeetal., 1988).Thefunctionofthetwo mammalian sequences, DUP from human cells and REP1 from mouse, is uncertain (Linton et al., 1989; F u j i and Shimada, 1989); however, the name REPairl was given to the mouse sequence on the basis of its similarity to the two bacterial genes. A common feature of the two mammalian sequences is their location next to the dihydrofolate reductase gene (dhfr) with which they share a

SEQUENCE OF A 6.3 Kb SEGMENT OF CHROMOSOME I11

Y MBP

YMBP I

I

\

\

.

985

.

.

. .

\

I

Y MBP

Y MBP I

.

I

I m

\. \

\

.

.

\

Figure 5. Pairwise alignment. The yeast mismatch binding protein (YMPB) is compared to human (HSDUP), mouse (REPI), Salmonella typhimurium (MUTS) and Sfreptococcus pneumoniae (HEXA) proteins. Diagonals are drawn where a local similarity is found with a score higher than threshold. Scores are calculated on a window of 16 amino acids, adding up the values deduced from the altprot scoring matrix of FASTA; these values are in the range - 7 to 12, depending on the pair of amino acids compared. The total

+

value of the threshold for the window was set to 28.

bidirectional promoter. This seems not to be the case in S. cerevisiae, where the dhfr gene ( d f r l )maps on chromosome XV (Barclay et al., 1988). Moreover, we did not find any significant similarity between the dfrl gene of S. cerevisiae and the first 2500 bp of the 6.3 kb fragment. DISCUSSION Mismatch repair Mismatches can be generated in double-stranded DNA by several events such as genetic recombination or biosynthetic errors during DNA replication. Gram-negative enterobacteria like E. coli and S. typhimurium have been extensively studied in this respect. At least four genes have been identified, which are essential for efficient repair: MutS, MutH, MutL and UvrD (for a review see Moldrich, 1987). In particular, the product of the MutS gene is

responsible for the recognition and binding to mismatched base pairs (Su and Moldrich, 1986). In these bacteria the choice of the strand which must be taken as 'good' is determined by the level of dam methylation. Since methylation is delayed with respect to replication, this system allows the discrimination of the newly synthesized strand which can then be corrected. Foreign DNA lacking dam methylation can also be corrected in recombinant heteroduplex strands. Mismatch repair has also been studied in Grampositive bacteria, particularly in s. pneumoniae, where the target strand is recognized simply on the presence of single-strand breaks (Claverys and Lacks, 1986). In S. pneumoniae there are at least two loci, HexA and HexB, which are involved in mismatch repair. Sequence analyses of the two corresponding proteins show significant homology of HexA with MutS (Priebe et al., 1988; Haber et al.,

k

45

'v

Hs Ym

z

JY

4

2 2

k

/a5

Hs

/60

h'm

/a

JY

2 2

4

/7/

S C

m

Hs Ym

/Y7 22 /6

JY

4 k

250 304 z6z Y2 66

Hs Ym

JY

4

w m

sr HS

Ym

32 QDNLLAAIWQDGKG----- /44 Q N N F L ~ I D R E G N Q - - - - - /37

9

4

-________-___-__-

sr Hs Ym

LPSALSEQTEALIHRAT

JY

YAEDFAE-----------MALIEGRRGLRRRPL LGYDLSE---------------EEEQILSRQMN

4 sr

374 460 4/6

rn /Y2

45/

H. Ym

SY

JY

2z3 2%

4Y7

4 k HS

528 6/7

Ym

575

JY

346

sp

27/

k

669 6&F

HS

Ym

6(/

JY

4

VMwAS@S

- - - - - - 40 LLQ LATT LS sv- PRYRA i LE GMYQ PT LAY L IAQ LD A IF@E

S ~ I B.%1A A AP

k

xr

Ym

SAQYVTVSGQEF KV

4/6 4aJ

6% 30 7/5

JT

46/

4

468

k Hs Ym

JY

7bY

m 'w 56/

4

546

k

642 Y/6 673

Hs

Ym 57

m

4

624

k

Yz?

1s h'm 9

M Y/6 7/6

9

704

k

m /or

Hs Ym

JY

Yn 780

4

766

k

/047

Hs Ym

//37

m

JY

m

4

&/I

Figure 6 . Multiple alignment. The multiple alignment was performed using CLUSTAL (Higgins and Sharp, 1988). Sequences are: Sc (YMBP, Saccharomyces cerevisiae);Hs (DUP, Homo sapiens); Mm (REPI, Mus musculus);ST (MutS, Salmonella typhimurium);Sp (HexA, Streptococcuspneumoniae).Amino acids are boxed when a match with the yeast sequence is found. The double-line mark starting at position 820 of the yeast sequence shows the ATP/GTP binding site (see note added in proof).

SEQUENCE OF A 6.3 Kb SEGMENT OF CHROMOSOME I11

vy::: yeast

J: /p&lmb J:pnemomae

Figure 7: Phylogenetic tree. The topology of the tree and the values of the branching points (indicated on the scale at the top) were deduced from the dendrogram file produced by CLUSTAL (Higgins and Sharp, 1988). The values at the top are indicative of the similarity between sequences. Thus, human and mouse sequences are very similar, with a score approaching 700. These two sequences are more similar to the yeast sequence (score 214) than to the bacterial sequences (score 150). The similarity between the two bacterial sequences gave a score of 280.

1988) and HexB with MutL (Mankovich et al., 1989; Prudhomme et al., 1989). In yeast, the existence of a mismatch repair system has long been postulated to explain postmeiotic segregation (PMS) (see Meselson and Radding, 1975). Since then, mismatch correction has been reproduced in cell-free extracts of S. cerevisiae (Muster-Nassal and Kolodner, 1986) and a number of PMS mutants have been shown to be deficient in mismatch correction (Williamson et al., 1985; Kramer et al., 1989a). In particular, the product of the pmsl gene has been found to share homology with MutL of S. typhimurium and HexB of S. pneumoniae (Kramer et ul., 1989b), thus indicating that the mismatch repair systems in bacteria and yeast could be similar. This idea is further supported by the present report of a yeast sequence similar to MutS and HexA. On the basis of these considerations we propose the name of yeast mismatch binding protein (YMBP) for the protein coded by the YCRl152 ORF. Sequence alignmeni

Figure 5 shows a graphical representation of the alignments of YMBP with the four similar proteins found by FASTA. In all four cases the major local homologies can be clustered in a main diagonal which covers most of the sequence, meaning that the five proteins share global similarity. However, the first 150 amino acids of YMBP do not share homology with the other sequences. In particular, from the two panels of Figure 5 referring to the bacterial sequences HexA and MutS, it could be argued that the first part of YMBP is specific to yeast, as it is absent in bacteria. Another possibility for explain-

987 ing the longer NH, terminus of YMBP is that translation actually starts at a successive AUG. The latter hypothesis would be justified by the positions of TATA boxes and CAP sites (see Figure 2). Multiple alignment Figure 6 shows the multiple alignment of YMBP with the four sequences found by FASTA. The alignment was done using the CLUSTAL program produced by Higgins and Sharp (1988). Amino acids were boxed when an exact match with the yeast sequence was found. Although homologies are scattered throughout the entire sequence, several conserved clusters can be identified. The most conserved region seems to be localized towards the end of the yeast sequence, between amino acids 750 and 910. In this region the ATP/ GTP binding domain is marked by a double line. It can be observed that only the mouse sequence is different in this region (see note added in proof,). CLUSTAL also produces a ‘dendrogram’ file which is used by the program to establish the priority of pairing during the final alignment stage. Although the dendrogram file is not intended to be used as a phylogenetic tree, it has been shown to yield a satisfactory branching topology in a wide variety of situations (for a detailed discussion see Higgins and Sharp, 1988). Figure 7 shows the phylogenetic tree corresponding to the dendrogram file used for the multiple alignment of Figure 6. The observation that this phylogenetic tree reproduces quite well the generally accepted phylogenetic branching of the five species considered is consistent with the hypothesis that these proteins evolved from a common ancestral precursor.

NOTE ADDED IN PROOF We have learnt from Professor G. F. Crouse that the mouse Rep-1 sequence (now renamed Rep-3) has recently been revised and will shortly be resubmitted to the GenBank database. The revised amino acids sequence displays the ATP/GTP binding site and extends for more than 100 amino acids over the end of the old Rep-1 sequence, making mouse and human sequences very similar to each other. ACKNOWLEDGEMENTS We are grateful to F. Borlina, D. Marvulli, M. Gerotto, G. Sartori and M. Pelosi for helping with the sequencing work. We thank M. Gent and S. G.

G . VALLE ET AL.

Oliver for sending the PM3712 DNA and P. M. Sharp for the CODONS program. This work was supported by the Commission of the European Communities under the BAP program of the Division of Biotechnology and by the Consiglio Nazionale delle Ricerche, Comitato per le Biotecnologie e la Biologia Molecolare. REFERENCES Bairoch, A. (1990). PROSITE: a Dictionary of Protein Sites and Patterns. University of Geneva. Fifth release. Bankier, A. T., Weston, K. M. and Barrel, B. G. (1 987). Random cloning and sequencing by the M 13/ dideoxynucleotide chain termination method. Methods Enzymol. 155,51-93. Barclay, B. J., Huang, T., Nagel, M. G., Misener, V. L., Game, J. C. and Wahl, G . M. (1988). Mapping and sequencing of the dihydrofolate reductase gene ( D F R I ) of Saccharomyces cerevisiae. Gene 63, 175-185. Claverys, J. P. and Lacks, S. A. (1986). Heteroduplex deoxyribonucleic acid base mismatch repair in bacteria. Microbiol. Rev. 50, 133-165. Dobson, M . J., Tuite, M. F., Roberts, M. A., Kingsman, A. J., Kingsman, S. M., Perkins, S. E., Conroy, S. C., Dunbar, B. and Fothergill. (1982). Conservation of high efficiency promoter sequences in Saccharomyces cerevisiae. Nucleic Acids Res. 10,2625-2637. Fujii, H. and Shimada, T. (1989). Isolation and characterization of cDNA clones derived from the divergently transcribed gene in the region upstream from the human dihydrofolate reductase gene. J . Biol. Chem. 264,10057-10064. Haber, L. T., Pang, P. P., Sobel, D. I., Mankovich, J. A. and Walker, G. C. (1988). Nucleotide sequence of the Salmonella typhimurium mutS gene required for mismatch repair: homology of Muts and HexA of Streptococcus pneumoniae. J. Bacteriol. 170, 197-202. Hahn, S., Hoar, E. T. and Guarente, L. (1985). Each of three “TATA elements” specifies a subset of the transcription initiation sites at the C Y C l promoter of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. U S A 82,8562-8566. Henikoff, S. (1987). Unidirectional digestion with exonuclease 111 in DNA sequence analysis. Methods Enzymol. 155, 156165. Higgins, D. G. and Sharp, P. M. (1988). CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73,237-244. Kramer, B., Kramer, W., Williamson, M. S. and Fogel, S. (1989a). Heteroduplex DNA correction in Saccharomyces cerevisiae is mismatch specific and requires functional P M S genes. Mol. Cell. Biol. 9,44324440. Kramer, W., Kramer, B., Williamson, M. S. and Fogel, S. (1989b). Cloning and nucleotide sequence of DNA mismatch repair gene P M S l from Saccharomyces cerevisiae: homology of PMSl to procaryotic MutL and HexB. J . Bacteriol. 171,5339-5346.

Linton, J. P., Yen, J. Y. J., Selby, E., Chen, Z., Chinsky, J. M., Liu, K., Kellems, R. E. and Crouse, G. F., (1989). Dual bidirectional promoters at the mouse dhfr locus: cloning and characterization of two mRNA classes of the divergently transcribed Rep-I gene. Mol. Cell. Biol. 9,3058-3072. Maniatis, T., Fritsch, E. F. and Sambrook, J. (1982). Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. Mankovich, J. A., McIntyre, C. A. and Walker, G. C. (1989). Nucleotide sequence of the Salmonella typhimurium mutL gene required for mismatch repair: homology of MutL to HexB of Streptococcus pneumonine and to PMS 1 of the yeast Saccharomyces cerevisiae. J . Bacteriol. 171,5325-5331. Meselson, M. S. and Radding, C. M. (1975). A general model for genetic recombination. Proc. Natl. Acad. Sci. U S A 72,358-361. Moldrich, P. (1987). DNA mismatch correction. Ann. Rev. Biochem. 56,435466. Mortimer, R. K., Shild, D., Contopoulou, C. R. and Kans, J. A. (1989). Genetic map of Succharomyces cerevisiae. Yeast 5,321404. Muster-Nassal, C. and Kolodner, R. (1 986). Mismatch correction catalized by cell-free extracts of Succharomyces cerevisiae. Proc. Natl. Acad. Sci.USA 83,761 8-7622. Pearson, W. R. and Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc. Null. Acad. Sci.USA 85,2444-2448. Priebe, S.D., Hadi, S.M., Greenberg, B. and Lacks, S.A. (1988). Nucleotide sequence of the hexA gene for DNA mismatch repair in Streptococcus pneumoniue and homology of hexA to mutS of Escherichia coli and Salmonella typhimurium. J . Bacteriol. 170, 190-196. Prudhomme, M., Martin, B., Mejean, V. and Claverys, J. P. (1989). Nucleotide sequence of the Streptococcus pneumoniae HexB mismatch repair gene: homology of HexB to MutL of Salmonella typhimurium and to PMSl of the yeast Saccharomycrs cerevisiue. J . Bacteriol. 171,5332-5338. Sanger, F., Nicklen, S. and Coulson, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acud. Sci. USA 74,5463-5467. Sharp, P. M. and Li, W. H. (1987). The coden adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15,1281-1295. Su, S. S. and Moldrich, P. (1986). Escherichia coli mutSencoded protein binds to mismatched DNA base pairs. Proc. Natl. Acad. Sci. U S A 83,5057-5061. Williamson, M. S., Game, J. C. and Fogel, S. (1985). Meiotic gene conversion mutants in Succharomyces cerevisiae. I. Isolation and Characterization of PMSI-I and PMSI-2. Genetics 110,609-646. Wilson, C., Bergantino, E., Lanfranchi, G., Valle, G . , Carignani, G . and Frontali, L. (1992). A putative serine/threonine protein kinase gene on chromosome 111 of Saccharomyces cerevisiae. Yeast. In press.

The sequence of a 6.3 kb segment of yeast chromosome III reveals an open reading frame coding for a putative mismatch binding protein.

We report the sequence of a 6.3 kb segment of DNA mapping near the end of the right arm of chromosome III of Saccharomyces cerevisiae. The sequence re...
749KB Sizes 0 Downloads 0 Views