YEAST

0

VOL. 8: 761-768 (1992)

0 0 .

0

0

0

11

0 0

0

00

Yeast Sequencing Reports

0 0 0 0

Sequence of a 12-7kb Segment of Yeast Chromosome I1 Identifies a PDR-like Gene and Several New Open Reading Frames TH. DELAVEAU, C. JACQ AND J. PEREA Laboratoire de Genetique Moleculaire, CNRS URA 1302, Ecole Norrnale Supirieure, 46 rue d Ulrn 75230. Paris cedex 05, France Received 1 June 1992; accepted 5 June 1992

A 12,684 bp DNA fragment, between FUS3 and the centromere, from the left arm of chromosome I1 of Saccharornyces cerevisiae was sequenced as part of the European project to sequence the whole chromosome. This segment contains at least five complete new open reading frames (ORFs) and the beginning (191 first 5’ codons ) of an ORF whose putative translational product is highly similar to the multidrug resistance PDRl gene previously characterized by Balzi et al. (1987) on chromosome VII. KEY W O R D S 4 e n o m e sequencing; Saccharornyces cerevisiae; chromosome II; PDRl; multidrug resistance; Zn binuclear cluster; Leu zipper

INTRODUCTION

MATERIALS AND METHODS

As part of the European BRIDGE project to sequence chromosome I1 of Saccharomyces cerevisiae, we have determined the sequence of a 12,684 bp DNA segment located between the FUS3 locus and the centromere (Figure 1). This fragment represents the centromereproximal part of the insert contained in the cosmid alpha 1008.5 constructed by R. Stucka and H. Feldmann (Munchen), whereas the rest of the insert has been sequenced by J. Skda et al. (Yeast submitted).

Strains

K

a 1008.5

Saccharomyces cerevisiae alpha S288C strain was used by Feldmann et al. to construct a cosmid library in the vector pYc3030 (Hohn and Hinnen, 1980. Escherichia coli TG1 strain (Delta(Zac pro), thil, sup E44, hsdD5, F’ (truD36,proA+BlacIq lacZ delta M15)) was used for every step of subcloning and sequencing. ) I

Left

C

: Clal

B

: BamHl

S

: Sall :Xbal

X

,

C /,B

>

qr ,

B

S

1 KB H

Figure 1 . Map of the centromere-proximal part of the insert in the cosmid alpha 1008.5. The 32 kb insert is indicated as a thick line. The left part of the insert (20 kb) has been sequenced by J. Skala et al. (Yeasr, submitted). A previously known gene, FUS3 (Elion et d., 1990) has been localized. (C=ClaI, B=BamHl, S=SalI, X=XbaI.) 0749-503X/92/09076 1-08$09.00 01992 by John Wiley & Sons Ltd

762

TH. DELAVEAU ETAL.

Sequencing strategy and methods (Figure 2 ) YBL 03-158

YBL 03-17

+

YBL 03-16

2000

YBL 03-18

+ +

YBL 03.19

YBL 03-23

YBL 03-20

4000

YBL 03-21

+

YBL 03-22 YBL 03-24

1 +

8000

10000

12000 bp

Figure 2. Sequencing strategy and localizations of the ORFs. The lower part of the figure presents the sequencing strategy. Single arrows correspond to sequencing reactions made from exonuclease I11 nested deletions. Starred arrows indicate sequencing reactions made from oligonucleotides. A simple restriction map is indicated; C = CluI, B = BumHI, X =XbuI,S= Sun. The upper part presents the seven non-overlappingORFs indicated with arrows. Three short ORFs are indicated as overlapping longer ORFs.

The alpha 1008.5 cosmid was digested with CZaI producing four fragments of 22-0, 8.5, 8.0 and 3.4 kb length. The 22 kb fragment, containing the pYc3030 vector, was ligated to itself and its 12.6 kb insert was sequenced. This 12.6 kb insert represents the centromere-proximal part of the 32 kb insert in the original alpha 1008.5 cosmid. ClaI-SaZI digestion of the 22 kb cosmid generated three fragments of 6.8 kb, 4.4 kb, and 1.5 kb length. These fragments were cloned in the Bluescript phagemids in either pBS-KS+, pBS-KSor pGEM-7 Z f (-). Double-stranded DNA deletions were made using exonuclease 111 digestions. Dideoxy-sequencing reactions were performed with either T7 DNA polymerase (Pharmacia) or Sequenase (USB) and [35S]dATP on either single-stranded or double-stranded DNA. 240 sequencing reactions were carried out on 132 of the clones obtained after nested deletions. In addition, 14 oligonucleotide primers were used to cover gaps and junctions between the subfragments. Assembly of the sequences was made using the DNA STAR software. Preliminary analyses of sequences were carried out using DNA strider software (Marck, 1988).

Comparisons of nucleotide and amino acid sequences to data banks (GenBank release 66, NBRF release 27 and EMBL release 26) were performed using the Genetics Computer Group, Inc. package (GCG). RESULTS AND DISCUSSION Sequence determination

A few ambiguities remained after reading sequences on both strands. Running dITP sequencing reactions clarified these uncertainties, which were due to compressions. Sequence analysis

The complete sequence of 12,684 bp is given in Figure 3. Seven new non-overlapping open reading frames (ORFs) are presented and three other ORFs YBL03-19, YBL03-20 and YBL03-24 are included in longer ORFs. At the two extremities of the sequenced fragment, two interrupted ORFs are likely to continue in the flanking sequences. This has already been demonstrated for YBL03-15B, which is the 3' end of

763

SEQUENCE OF A 12.7 KB SEGMENT OF YEAST CHROMOSOME I1

1

I

ATCCATCAAC A C C M C M C A T C C M M A C C T C A C M T T C A ATCCCACCAA

AATCCTAACA M A C A C M A A T A A M A C A C A ACAAAAACAT T C C T C A T C A C

100

CTATCATCAC CCAACCTCAC AACCACCATC CCTATCAACA CCATAACCAA

CATTCCCATC CCCTCTCATT CCTCAATACT CACAATTCCC TCTCTAACAT

200

TCCATTATTC TCTTCTACTT TTCATCCTAA GTCACACTCT TCCTTACCTT

CGACATCCCT TCCACCTTCT TCTTCCTCCC AATTTCACCT A G A A A A T C M

300

ATCTTCCACC A A A M A A I C C ATTACCAACT M M T C C C A C ACCCCCTCTT

A M C A A C A C A ATTCCTCAAA ATACTCCCAC CCAACACCAA CACCAACAAC

400 500

ATTMTCCTC M T C T T T A T G TATATCCCTC CCACACCGCC T T T T T A T T T C

600

TTCACCCCCC CACTAATTCT ATAATTCTAT E T T T T A T A M A M E T A T A T 1 ECTCTTTTAC CACTTTTCCT CTCCTCCCCT A T C C T A C T M TAACATTTTA

700

CCCTTCTATT TATCTAATAA ACCCATTATT CCTCATTCCC AETTETAAAT

TTCTCTTCAA ACTTCAACCT TCAACCTCCT TTCCCEAGAC CGCC TTACT

AICCTACCCA ACTCAACCTT TTCTTTTTCT TTCCCTTT71 CTTATACAAT

n00

C C M T T T C C T CTCCCTCTAC TCTTTCCGTA TCTCCTETCA AATTCETCAC

TACATTTTCA AAACTTCTCT TATCCTTTAC CACCTCTCCC TGTTGTTTCT

900

T C M T C A T C T TGATTTACTA A C C M A A A C T CAATCCATTC ACACAATCTA

TCCTCCTATT ECCCTAMCA TTCTTCATCC A T A C T A G T M AATCTGCCAT

1000

AACCAATTCC CCTATATCCT CCTCTTCCTC ACCCTCCTCT TCCTTTTCAA

CACATTTTAA ACTCATTCGT CCTCTATACT CAATTTTGAA ATTATCCCTC

1100

ACCAACACAC CACTACCCCT TTCCTCATTT TTCTCCACTC CATCCCATAT

CCTTTTCAAC AACTCATCAT TCAACCGTTC TTCCTCCCTC TTCCTACCAA

1200

CACCCTCACA CCCATTATTA TCCCATAATC T C A A M T C T C CACCAACTTT

CTCCCTATAT GAGATTTATT CCATATCAAT CGCTCTAACA CACCATCTAA

1300

TTCTACCACT TCTTCATAAA TCACAICCCC TCCTTCATCT T C A A T M C T T

TCCTCACCTT C A M C A C T C A TTCTTTATCC T T T C T T C T T C CACCTCAAAT Eco RI ACACCTCCTT AAACTACAAT TCTCCTTATA CAGCTCATTT TCAACTCAGC

1500

TCTCCCTTTG CTACTATCTT ATTCAACCAC TTATCTACTA CACATTCCCA

1400

C A T C T C M A A ATTCCCAAAA TCATCACTAC M T C A C T T T C TCCATCTAAT

CCATTTCTCA CAGCTTCTAT CTCATCTCTA TCCCTCA

1 CCTTGCTTTG

1600

AAACACCTCC C T T T T T C T A T CTTCCTACAC C T T A M T T C T CCTTTCTGAT

ACCCCTAACC TTTTTTTTTT T T T T C A C T T T TCATAGCAAT TCGCCAAACT

1700

TTTTGTAACA TCTTCTCTTC T C C M A A C C C M A A T A G T M A C A A T M C T T

A M C C C T A T T G C T A T T A A T A T A T T T A C T T C TACTAAAATC ATTTCACCAA

1800

T C A A T T T TCATGCCCTA

CCACATCAAC ACATCACTCA CACAACCTAT TTTCCTCTCC AAGTAGCACA

1900

AAGCCATGAT GCCGACAGCT CATTAAATTC TTCATCCATC CCAACCCCCC

CACTGGATGT A G G A A C M A A GTTTACAAAA TCACTTCGCA CAACCCTTCA

2000

ACACACATAC ATCCGCTATT TCTATTCACA A

GCACAAGATG ACACTCAATC C T T T T T T A C T TCTTCAGATT CTCCAACATC

CAAAACAACC CCTGTACCTA AAACCATCCA AAACGATGAT TACTATCCTA

2100

AAAGATCTTC TACACCTTCA TCGCTCAAAC M C T C T T C A A TAAAATTAAT

ATTAATGATA CCCCTCACTC TTCAAACAAA CAAAATGTAT CTCACTCGCT

2200

CCTATCCGAA AACAAGCTCC TCTCTCCATC TAAAACCTTC TCGAACCAAC

CTCTTACAAA GGTGACTAAC TCCAAGTTTC GTACACCCTT CAGACCTATT

2300

TCAAACCAAT CCACTTTATC AAGCCATGAC CCTCTTAAAC A T T T T A G A T C

ACTTAACTTT CGGACCCCTA GTCATTTCAA ATCCTCCCGT GACGAGAAGA

2400

CAACTTCTCA TCTTCATTCA TCCACTCTCA A C T C A C T T M TTCCTTTACT

TCTACCACCT CTTCTTCAAA GTCCAAGTTC TGCAAAAATC ATAACCTATT

2500

CTCCACCTCC C T I T C T T C C A CATCTCTCAA TCACCAACAT CCCAACTTTC

TTCACCCAAA ACCAACCAAT TCCTTACAAA ACAACTCTTC AATTTCAACT

2600

Em RI T T T C A C M T T CTATTTTTGC T C G T C C C M A CACACACAGA ACAACACCAA

TTCTCCATTT ATTATGCCCG ATCATCAGAG CACGAAGCAG CTAAACCACA

2700

AACATTCATC ATCCAACCTT TCCTTTACAA CTTTCAACCA T M A A C A T C A

CATTCTTCAC T A M T A A C C T CAAACTAACG C G T A M C G A A ATACACAAEA

2800

ACTCAATCAT CCCATCAAAA AAACTTCCCA M T A T C A T T C CCTCTTCCAC

ACCAACTCTC AAAGGACAAC ATTCAACTCA M T T G A A A A A TTCAACATCA

2900

T T C C C A T C C T T A T C T T C A C A ACTCACTCCC ATAAACACTT TCCACTACAA

TCATTCAATT TTCCAACAAA TATTACAACT T T G T C A T G T T AACTATATAT

x)OO

TACATGATCT ACCTGAAGCT CACTCATTAG CCTTCTTTAC GTTCAATACT

ACATCCCTTC ACCTGTCTCA T A A C T T T T C G CAAACTTATC ATAGCCATAT

3100

CCAAACTTCA CTCATTTCCA A C A M C T A T G TCTACCCGCT CTAACTCATT

TCACTACTTC AAACTTGATA TCCTTACATC AATTGAAATC ATTACGGTTA

3200

ATACAACGAA CTACCCCTCT CCCCAATTTA C T G C M C C T T ATCTTCTCCC

CTCAAATCAA T C T G A A M T C ACCAAAACTT CATACTCTAC T T A T T T T T C A

3300

AATACCACCC AACTCCTCTA TCAACGTGCT CTAACATTCA TTACTCTCAA

CCATTCTCTA TTTTCTCGCA CTGCAGCACT A T T T T A T A T G TTGCTGAATC

3400

C A A A T T T C A C CTCCAACACA C C A A T C T M C TTTCCACCAT A T T T T C A T A G

A C T C T M A C E CAATCTTACT T T A A T T C A T A TCAACTCCTC T C C T T T C T T C

3500

AATATACACA ACAATAACGC TTCCTATACA ACATTAGACC ATCATTATTT

C T T C C M C G C CCCCGCACTC TTCAATTCGA GATATATGAA CTCATCACAA

3600

GCATCCTCCC TCACCCAATA TCCTCGCCTA CATTTGAACC AACAACAAAC

CTATTATCCT TATATCACCT AACTAGCACC CTGCTAAAAA TCGCCAAAAA

3700

ACCCCTAGTC ACCCCCCCTT TCAACCCCCA A G A A M T A T C TTAATCCACT

TCACCCACTT ACTCGATCCT CCTCCAAAAC ATTCCAAAAC AATTTTCAAA

3800

AAGCAACTTC TTATAACAAC TTCCCCTCAT TTCTTATCTT TCAACCCACA

AATAATCCAC T I

3900

TCTTAATGAT TTCTAACCAA AAAAAAATTA CCTACATCAA TTTCTAATCC

A M C C M A T A A A T A C T A T A A TATCAACACT CTTTCTGAGA ACACTTATCT

4000

T T A C C T T C C C CCACAAACTC AAACCACATT TTTAAAATCT TTTACACATG

A M A T G A A C C CTTCTACTCT TATCCTCTCT AACTCGTTAT T T T T C T T T T G

4100

C C T T T T T C A C CTTGACTCCT ACTCTAATCC TAAGAATATT C M A A T C A A T

CATCTCCCAT CCCAACAAAA ATAAAACGAA CTTACTGCTT TACCTCCCAC

4200

AAACATATAA TATTCGCCAA CATTCATAAA ATAATTAACG CTTACCAACG

AAGCATAATA AAATTCCCAC TAACCAAACG TCTCTCATAA CI\TGAAAGTG

4300

CACATC CTAGTAATAA CGTGCAACCT TTTCTATCCT

-I

Figure 3. Complete sequence of a 12,684 bp segment of chromosome 11. The sequence reads 5’ to 3’ from the left telomere towards the centromere. The seven non-overlapping ORFs are boxed.

764

TH. DELAVEAU ET AL.

........................................................................................ ~ r ? ? ? ~ ~ ~ ~ c - ~ ~ ~ ~ ~ ? ~ ~ C ~ C A~A T CCC A C CCM CA A C TlACT CAA A MT

ATACACACTC GATGTTACCC ACCATCCCAA AACCTTGGCC ACACCCCCCC

4400

TAGATCCCAA CATTACCATA TCCTCCATAC A T T C C A T T T T CCCTTCCATC

C M C T A C M T C T C T M C T C C TCACATACCT CTTCCCCAAC ACTTCCACAT Barn HI TCTCACCCCA T C C A M C T A C CTTCCAACTC CATCCCATCA T C C M T T T T A

4500

C C C C T T A T C T ACCATCACTA CCCATACCCC C T C C A T M C A T C T C T C A M T

4600 4700

TCACCATCAC ACACACCATT CCACAC??~.C.?r.~~~rr?E.E?lEE.rZ.?FE. A r J J s ~ r ~ ~ ~ - ~ ~ p p ~ p S p J J _ ! E ~ ] P E P ~ ~ ~ - ~ ~ ~ ? ~ ~ ~ ~ ~ ~ - C~ TCCAACGGTT ~ ? ~ ~ l . C~T A~C A?T T~TCA- ~ - 4800 CJ-C~~~~J~-~~~~~

CTCATATCCC CTCTTCACCA C C A C C M A C C TCACMCCCC C C T T T C C T T C

AAAACTCAAC ACATTTCACC TACATCAATC ACTTCTTAAC CCCCTTCTTT

TTCATCCCCC T M T M C T A T TTTCCCACTA CATCTCATCA TACAACCATC

4900

AACATATTTA CATATCACAA CACGCCCCAT ATTTCTTTTA CMTACACCA

T A T M T C A C C CACCCATTCA A A C M T C A C C CTTCACTACT T A T T T T A C A A

5MM

CCCCATCCTC C T C T C C C C A T CCCCAACATA TTCCACTTCC C M C C C C A C T

AATCCCCCCC TAACTTCTAT C C C M T T C T A M T A C A C C A A CTTCCCATAC

5100

AAATCTTACT T T A A T T C C T C ATCATCCACC TACACAACTC CCAACGTTTA

ATCCTACATT CTTTCACACA AATCCCCCTC T C M C C M M A M A C A C C A T

5200

CACCCACAAA A T C C T C T A C T C C C C C M M C C A T C A T M A C TTCACCATTT

T C A T M A M C ATCCACTCTC T T C T C C C M C ACCTCCCCAA CATAAATCCT

5300

TACCCCTCTC CACTACAACT A C A C C M C C C CAATATTACT TCCTTTCCAT

A T C C C M A T A M T C M T T A C TCATATCTCT TCGAATCCAC A C C C C T C T T T

5400

A T T A T T T G T T CCGTCATTAC A C T C T T C M T CACCTTGTTT A M T T C C M A

ATAATCAACT A C C C M C C C A A T T C C T T T C C M M A A A T A T CCACCACTTA

5500

TATACCTACC CTCTTCACAA CCACTCATTA CATTTCCCTC AAACTATCAA

C C A A C ~ . ~ ~ ~ ~ - ~ ? C - C - ~ - C ~ - T ~ - r ~ J s ~ - ~ ~ ? ~ - p ~ ~ ?56W pp~~*~-p~*r~~~!l!,

cnrcpJS4*?-rSI?EPlcb*"*cc*cccc*-~~~l-?~~A-~C--A-~~r~~~~J-JJ&TCMA MCATAACM

TCATCCTTCT CTCTCTACAA CTCAACATAT

5700

AAATATCTTC ATTCCAAACA C C A M M C C A TGCTATACTG A A T M A C C C C

TAACTCTTAA M C C C C T A M AACCCTCTCC CACCCACATT CATATCCACT

5800

TCATCATCTT CTCCCTTTTC T M T C C A A T C A A M A C C C T A CATTCCATAC

T M M C C A T T C A M A T M T C T M M T C A T C CACCAAGACC ATTAACACTA

5900

AAAATACATT CCTAAATCTC CCACAACCCC T T C A M A C M AATTACTATC

TCATCCTTCC CATTGCCAAC CTTCCGCATA C A T T C T T T C A TTATGCCTAC

6000

h A A A c y 5-

! ~ ? ~ ~ ~ L % % ! ~"~~E~E~~~-C*CMTC*?S--b?~ ~t~t?~-~~SbS? !??~!ccccc~cc?A!%%%LlLt% ~ ~ - % C ? ~ 6~1 0 0~ ~ E - ~ ~ ~ ~

ZJJJJsjiflC

C A C T T C T C T C A C A C C M C A A M T G A C T T T C ATAGCATCAC

TTTCAACCCC AACCTAACCC A G C A C M C A T CTCGACCCAA CAGCCGACAA

6200

:ACGATGCCT

TCTCCACTCC CATCTCATAC CACACACACA T C T T C T C C T T

CTTCACCCCC C T A C T T T A C A CCATATACCC CTTTTACAAA TAACAAACCC

6300

TGTTGACACA TCAATACAAT T T C A T T C T C A ACCACTTTTA C A T M T C C T A

CCACAATCTT A C C C T A T C M CCACCTAAAA CAACCATCCA C A C C T T T A T T

6400

X A C A A C T T A T C A T T T C T G C AATAGCATCA M A C A T T C T A ACTCCTGGTC

CCTACCTTCA GCTAATCGCT C T A T T T A C A T A C T T T C A T K AATGGACAGC

6500

4CACAATCCC AAAAATTTGC CTACCACATA AACTTATTAA AATGGTAACA

T C A A C C A M T ACCTCCTTGT TCTAACTCAA AGACCTTTAT T T T T T G C A T C

6600

X A C T T C C T C C A T T T A A A A T TACTACTCCC M A T C T A C C T ATATTACCAA

T T T T C M C C C TCACCCAATT CATCCAAACA ACCTAACAAT CAACAAGCTT

6100

4 T T A M T C C T TTCCTTTCCA T C C T T C T A C T T C T C A T T T C C TTTTLCACCT

TCCCCATCCT M M A T C T C T A T A M T C G A C TAACCATCTC C C T i C T T C C T

6800

XTTATATAA ATA

AACCT TATACCCCTT CCTTTCCACA A C T T T T T T C C

C T C A T M T T A CCTCTATATT T T A T A A A T A T CATTGTAGTA T T C C C A T T C T

6900

T T A A C T C C T T ACCATTTCAA ACCACTCATA CACACAAGCC CTTACACATC

A A C C T T T C T C M C C C A C A C A 7CACAACTAT T T T A A T C T C C C T T C T C T A T T

7000

ACTCCAACCA AACTCACTAC ACACCTCCTC TCTCTCCAAC A A A T A M T T C

TTACCCTCCA CTCCACCATT T C A A C T T T A C TATAATATCA CAATAAAACA

sal I 7100 7200 CAGATTCTAC G A A A C T A T T T C A T A T A C C T T C T T T T A G T T A TTATCCTATA

AAATCTTAAA A T A C A T T A A T CTACAATCCA AACCCATTTG ATCCACTACC

7300

7400 7500 7600 77w 7800 7900 8000

8100 8200

8300 84W 8500

8600

Figure 3.

(cont'd)

765

SEQUENCE OF A 12.7 KB SEGMENT OF YEAST CHROMOSOME I1

TCCTGCATCC AACCAGTAAA T T C T C A T T T C T T A C A C A C T A M T T A C A T T C

A C A C T T A C A A G C C G G T T T A A CACTCCAACC A T C A T C T C T C T C A C C T C A T C

8700

TCCAACCACC T C T T T C A C C A C C A C T T C T T C T T T C T C C C A A C A A T T C C T C A

CCCACTCACC T T T C C C C A C T T C T A C C A A C C T T C A A T C A A C C A T C A C C T T C

8800

TCAAAACATA T T T C C A C T C C C A T T T C T A C C A A T T C A T C C A A T A T T T T C T C

T T C C M A T T T CTTCTCAACA TCTTTCATTA C T C T M C A A T ATCACCTTCC

8900

CGCAGACCCA A C G T T C T T A A C A T T C A C T T A T T C A T C T C A C C C A T C A T C T C

T T C A C T M C C TCTTCTCTAT C A M A T T T A T TCTATATCTT TGACAATTCC

5000

T T A C A T C A A C GCCGCAATTC A A C A M M T T CGAACCAATC A T A T T T C T T A

C T C C A C T T G T T G T T A T T C C T G C T T T C C C C T GGTCCTACCC A C G T C G T A T T

9100

TCCAACGCAG GTAGTACACC T T C C T C T T C C T C C T T T T A T T C C A C C C M T T

C C T T T T C l T C C M C C T T C A T CTTTCTTCAT CTACCACTTC TCTCCCTTTT

9200

TTCAATTCCT ATAATTCACC CTCTTTCAAT CTCCTATCCC CTTCTTTTTC

CTCCTCCTCC T T C M T C T C C TTCTTCTTTC TCTTTCACAC TCTCTTCAAT

9300

CACTGCCACC GCAACTAGAC C C A T C A T T C C C C T T A A A T T T C T C T M C C A C

M C C C A C T M TTTTTTCCAC A T A C C C T A M TCTTCATTCC ATACTTTATC

9400

AGCCGCCACA G C A A T C T T G A C A C C A T T A C C C T T G T G C M G T C C A T T T T C C

CCTTACCACA T C C M T C M T TCTCCATCCA CTTTGAAAGT ACCACTTCTL

9500

T C A 4 C C C A l A G A C G i C A T T T T T T T G C A T T C G C C M A T C T T TCCTCCCCGA

ACACTTTTTT T T C T G A C A G G A C M T G M C T T T T T C T T C A C C T T T T A C C C G

9600

CAXACTACC l A C A A C 4 l C C TTTTGTACTT CATCATCTTT CCMCTCCCA

T T G G C A T T C G ATTTACATCT T C A T C T A C A T C T C C A C C T A C A T C C A C A T T T

9700

~ t l G A A G T T TTTCTTGATAG ACTTCATCAT ACCGCTTGCT CTAGATTCAC

T A T C T T T T T T G T C A C C A A C A GGCTCAATAA ACTGTGCACC AACCAGACCC

9800

CTTTTCCCTC AATCAACCAC TTCCCACATC CACCACTCCT TAGATTTTTT

A T C A T C T A M ATGTAGACTT TATCCCCTCA TTTTATGCTT AATTCCTCCT

9900

t T G 4 T T C A C C C A T G A A C T C A T A T T G A A C M T A C C T C T C T T TTTCCATTTC

C A A C C C A T T T C C A C T T C T C T CAACCCACGA TCACCCCAAC C A C C T T T A T A

1d000

TTCACCAATA ATCTTCATGA TCTCTTCACA TCTCCTCCTA TTACCACTGT

G T M T T C A A C CCTTCTATAT G G A T C A A C M ATTCCAACAA CATCTCTTTC

i0100

T T T T C C T T A T C A T A A C A C A C C A A T T T A T C A A T T C A C C A C T CATGACGACT

C C C C T T T T G C GCTATCAACT T T A T C T T C T T CTTACCCA'T

CATACCTTGC

10200

CCTTCTTCTT TTTCCTTCCC TCCATTTCCG T M C C T T C C A ACACTCATAC

T C T C T A T l C T ATTCGTCATT TCCAACCTTC TTCCTCTTGC

TGTTCTAATA

10300

E w RI

GTAATCATCC TCTTCATCGT CATTATCCTT ATCCCTGTAT CATAGTCTTC

T A C C A C A T C G A A C A C C C C C A CCCGTCGCAT C A C T A G T T T C C G T A C T A C C C

10400

G T T C G T C T T C CAGGCATGCC ACCAGGAGCC C C C T C T T C A T C C T C A T C I G G

AGCTTGCTCT T C T T T A C T T T CCATCATACC AGCTCTATCA TTGTGTTGAC

10500

CAGCTCGTAA A A A C T T C C T G G C T A C T A C C C CACCACAAGC ACCACCCCCT

CCACCACTAC CTGCAGGACC TTCAGCGCCA CCGGGACCCT C T T C C T G C T T

10600

Em RI A T T C A T T C G A A A C C C T A C A C T T M C C A A C A CCCAATCACC A T C T T T A T C A

TCGAATACAT CAAAAACCTC ATTTTCATCA AACCTCAATT CTTCATCACC

ATTTTCCACC T C T T C A T M T C A T A M T G G C TCTTACCTTC TTCAAAACAC

10800

C A G C T T C T T C A A T C T A A C T C C A C C C C A C T A CACCCACCCC T T C T T C C C T A

T C C C M C C M TCACTCTTTT CTTTACTCTC C A C C M T C C T CAATCTCTCA

10900

CTTCTGTAAC ACCTACAACA GATCGTCTTC TTCCATCCCC ACTTCTTCTG

CTCTCTCCCC C T C A T A G E C A TAGACCCCCC T A T A C A T G C C CAGAAACACL

1 loo0

CTCT ACCTCTTTTG TATATAACAC ACTCTCTCCC CTTGCCTATC

TTACACTATC C C T T C T T T T A CCATTGCAAT ACATATACAT ACTATATATG

11100

CCCTAAATAT TCGTTCTGTT TTCGTCTGAT GTATCTCCAT TTCGCATGAC

TCCAACCAAC C C C C T T T T A C G A A T C C T A T A C T C A C T T T C T ACCAAGCAAC

11200

A T T C T A C C T C T A T C A A T T A C ATGGTTTAAT CTACCCACGC A T C T T C T C T A

AETTCACACC PCCTATCCEA A C C T A T M A T CCTGATTGLC ACCCTTTCCA

11300 11400

TATTCCTC CCGCTCTCTA

TCTCTCATTC CAACGCTAGA C A C A A T T C T C CTCCCCCCCC AACAAAAACA

T C C T G T C T C T CTAGAGCAGC T T T G C A T T C T A A T C T C C C C T G C M A T C G T G

C C M A A C C T C C A C A C T T C A C A G C C C T T C C A ACTGCTCCCC TGGCACATAA

115w

CTGCCCACCA GTGAATGGCC T A C T T C A T A C T C C A C C T C T C CCCCATCCCC

GACCCACACT T T A C C C C C T A G C T A A C C A T C T A C A C C T A C T C T T T T C C C C T

11600

TGGATACCTT CGCTTTACTA TTCTCACTCC CCTCTATGCC CTTCTCGGTA

C C A T G A C A T T C T T G C G T A A C C T G T T C M C C TCACTCGCAC T G A A C A C T A C

11700

C T C C C A T G A C T T T T C C A G C C CTCAAACCCC T C C C A C T A C C T C A T A C T A T C

CCATATTACT TCCACTCA

CTCACCTATT CCTTCTTAAC ACCCTCCTCT

11800

CCTCCTTTCA CCTAAAGCCA TTATTTCCAC CACAACTCCA TTTCACTTTT

C A T T T T C T T C C A G A A C A A C I CCCCCTCCCC T T C T T T C C G C C C A C C T C C C T

11903

CCTCTGCCCC TCGGACTTTC C C C G G A A T M TAAATCAACT ATCACACTEA

CTCTTACAAT C C A M T A T C T A C A T A C T C T T A T A C A A T C T C A T T A A T C T A T

12000

T T A T G T A T T T C C T C T C T T T C T C T C T T C A C C A C C C T T C C A A ACCAACCATA

CCTCTTACCC C A C M C T C C A TCACCACTTT TATTMTTTT TTCTTATTCC

12100

CTCACCGC

A T C T C T C A A T TGCACAAAAA

12200

TATTATCACT TTTCTTTTCC TTTTACCCTC

sal I

T C A A A C T C A A C A A A T C A A C T AGATCAAAAC T T T C C A C A C C

GTACCAACTC CATTTCTTAC GATTCTACCT GTGTATTCCT A A M A A A C A T

G G A A A A T C M ATCCACAGGT A A A T A T C C A T

12300 TTACCGCACA A G G A C C A ~ ? F - ! ! F F F I I G T C T - ? ? ~ ~ ~ T - A ~ ~ ~ - : * C C ! _ ~ ~ ~ ~ C C

r c c a c _ ~ ~ ~ ~ ~ ~ r ~ - r c r r c * c c c - ! ! S n E F ? F ? ! - E ! ? c ? c c A I c - - ~ ~ ~ ~ ~ ~ ~ CTA -T ACA-T CCA-A cT TA TAT A C T T C A A

Figure 3. (conr'd)

10700

C C A A G l C C A C C C A T T C l C T C C T T C C A C C T A A T T C C C T C C A ATGAACCCCA

ACTGATCAAC

12400

C A C C T G A l A C A C A C T C C A C T C T C T C C G A C T CCCACCCATG C T C C T C A T A C

TTCCAATAAT CCTACTAATC ATAATAATAT TCTCTTTAAA GATGATTCCA

12500

AATATCAAAA TCAACTGCTT ACCTATCAAA ATATTCTGAC AAATTTCTAC

CCTCTCCCCC CTTCTCATCA CACTCACCTC TTCATTCATA AAACCAACTC

12600

G C A G T T C A A T A A C C T C A T T A ACAGTTCGAA T C C C C A A A T A AACTACCCCA

A C C T T T C C A C T T T C T C T C C T CCCCCACAAA G A T C

12684

766

TH. DELAVEAU ETAL

Table 1. Characteristics of the different translation ORF products. The complete sizes of the putative proteins are given in amino acids except for the interrupted ORFs: 03-19,03-20,03-24.CAI indicates the values of of the codon adaptation indexes (Sharp and Li, 1987).

CAI

MAIN FEATURES

> 151

0.1 96

C-terminal end of YBL 03-15A (Goffeau eta/.)Glutamic-rich protein

YBL 03-16

280

0.1 23

Leucine zipper + basic region

YBL 03-18

840

0.1 28

YBL 03-19

151

0.1 06

?

YBL 03-20

150

0.082

?

YBL 03-21

1244

0.1 90

Non polar repetitive C-terminal domain

YBL 03-22

145

0.058

YBL 03-23

> 191

0.1 22

Zn (II), Cys6 binuclear cluster Homology with PDR7 (Balzi eta/.,1987)

YBL 03-24

> 104

0.083

?

SIZE (amino acids) YBL 03-158

a longer ORF, YBL03-15A, recently sequenced by J. Skala et al. (Yeast, submitted ). Table 1 presents the main features of the putative translational products of the different ORFs. The codon adaptation index (Sharp and Li, 1987) is generally low, suggesting that most of these ORFs have a very low level of expression. The lowest levels for YBL03-19 and YBL03-20 may in fact suggest that these ORFs, overlapping YBL03-18, are not expressed. More careful analyses of the putative ORF products revealed that at least three of them are worth considering. YBLO3-16 encodes a very interesting basic motif/ leucine zipper domain.This 280 amino acid long protein contains periodic repetitions of eight leucines plus one phenylalanine at every seven positions (Figure 4). More interestingly, this leucine zipper motif (Landschulz et al., 1988) is immediately followed by a short stretch of basic amino acid residues. Both features suggest that this protein might be a sequence-specific DNA binding protein.

1 81 161 241

I 10 MSDRDQIEPV LIEDERPHVI LFLRDNFKID TmNWTf.Q I 10

I 20 TNALDAESDS YEQLVQLDPV YTPPMTLKSL TGHTQ-RD

I

I 30 SDDFGNFSDA LQPFIWNKSH QKEEERDEEQ EIWNKKRN 20 I 30

I

40

SVENDLYNQN IRRNLLHILR HIPQLLMAoF KKKRFSWVGY I 40

YBLO3-21 encodes a 1244 amino acid long protein. This long protein has a clear double-domained structure which is apparent from the distribution of charged amino acids. Figure 5 reveals that the C-terminal part of the protein (amino acids 860 to 1244) contains very few basic or acidic amino acids, whereas the N-terminal part of the protein is rich in charged amino acids. Moreover, the uncharged C-terminal domain is composed of homologous repeated sequences which, most of the time, start with the three amino acids TGG. Some of these repeats are actually very similar, for instance the repeat TGGAMMPQTSFNALPQVcan be found several times in very closed forms (Figure 5 ) . Homology has been detected between YBL03-21 and the human kinase-related transforming protein slk (Kawakami et aE., 1986). YBLO3-23 codes for the N-terminal end (191 amino acids) of a protein whose coding sequence certainly extends on the right of the sequenced clone. This fragment contains a cysteine-rich region homologous to

I 50 I 60 I 70 I 80 STLTTSSESV VDNCLNKILP KGEFDLEEET IKNDCFKLSK 80 LSDNNGSEGV GTKREEEPLN DELFKRICDA VEKNEQTATG 160 T S M D E E ~ QY H D ~ Q S I D~ S K S R S KK Q Q ~ K D K240 281 I 50 I 60 I 70 I 80

Figure 4. Translation product of YBL 03-16. The leucines (+ one phenylalanine) forming the heptad repeats are boxed and the adjacent basic-rich domain is underlined.

767

SEQUENCE OF A 12.7 KB SEGMENT OF YEAST CHROMOSOME I1

I Acid + Basic Map 500

1000

500

1000

11 Protein sequence I 10 I 20 I 30 I 40 1 MTVFLGIYRA VYAYEPQTPE ELAIQEDDLL YLLQKSDIDD 81 EQVQNADEEL TFHENDVFDV FDDKDADWLL VKSTVSNEFG 161 LPTNFLPPPQ HNDRARMMQS KEDQAPDEDE EGPPPAMPAR 241 SNNVGNHEYN TEYHSWNVTE IEGRKKKKAX LSIGNNKINF 321 GNTTTCEEIM NIIGEYKGAS RDPGLRXVEM A3KSKKRGIV 401 KSGLVPAQFI EPVRDKKHTE STASGIIKSI KKNFTKSPSR 481 HKKNSSATKD FPNPKKSRLW VDRSGTFKVD AEFIGCAKGK 561 DGSSSRGTDS RDSERERRRR LKEQEEKERD RRLKERELYE 641 PPAESSNNNN SSNKYDWFEF FLNCGVDVSN CQRYTINFDR 721 NIASIPTNAT GNMFSQPDGS LNVATSPETS LPQQLLPQTT 801 QDLLDLQPLE PKKAAASTPE PNLKDLEPVK TGGTTVPILAP 881 MLPMQVVPQ-Q SQ-ILPV Q-LIP1 961 PQTSFNVQGQ Q Q L P ~ L PVQKTANGLIS m M F ' T V 1041 F N A V P Q I m AMMPQTSFNA LPQPLQ-LN 1121 A P L N Q N Q m T L S m VLQQQQPQTM NTF-Q 1201 ~ Q Q P Q Q A Q L Q N Q FGNGPQQSRQ ~ ANIFNATASN I 10 1 20 I 30 I 40

I 50 I 60 I 70 I 80 WWTVKKRVIG SDSEEPVGLV PSTYIEEAPV LKKVRAIYDY 80 FIPGNYVEPE NGSTSKQEQA PAAAEAPAAT PAAAE'ASAAV 160 PTATTETTDA TAAAVRSRTR LSYSDNDNDD EEDDYYYNSN 240 IPQKGTPHEW SIDKLVSYDN EKKHMFLEFV DPYRSLELHT 320 QYDFMAESQD ELTIKSGDKV YILDDKKSKD WWMCQLVDSG 400 SRSRSRSKSN ANASWKDDEL QNDWGSAAG KRSRKSSLSS 480 IHLHKANGVK IAVAADKLSN EDLAYVEKIT GFSLEKFKAN 560 LKKARELLDE ERSRLQEKEL PPIKPPRPTS TTSVPNTTSV 640 EQLTEDMMPD INNSMLRTLG LREGDIVRVM KHLDKKFGRE 720 SPAQTAPSTS AETDDAWTVK PASKSESNLL SKKSEFTGSM 800 VSSAPVSSAP APLDPFN N I L P L m VMMF'M880 S N P Q -TVL PLQ-GLI PIA-QF 960 Q-MIPQ TSFGVSQQL-MTQPQN m M P Q T S 1040 GLTLQ1120 TF-IP QTSFSSQAQN ERPQSQF E L Q M M T T F N Q Q P Q M M N T F m I M Q QPQMMNTFN 1200 PFGF 1245 I 50 I 60 I 70 I 80

Figure 5. (I) Diagram of acidic (A) and basic (B) residues in the translation product of YBL 03-21. (11) Complete protein sequence of YBL 03-21 The TGG sequences at the beginning of the different repeat units are underlined.

60 YBL 03-23

KVKKSTRSKVSTACVNCRKRKIKCTGKYPCTNCISYDCTC~LKKHLPQK-ED-SSQSLPTTAVAPPSSH

PDR 1

KIRKP-RSKVSKACDNCRKRKIKCNGKFPCASCEIYSCECTFSTRQGG~IKNLHKTSLEGTTVQVKEET 40 50 60 70 80 90 100 80 90 100 110 120 130 ANVEAS-ADVQH-LDTAIKLDNQYYFKLMNDLIQTPVSPSATHAPDTSNNPTNDNNILFKDDSKYQNQLV -= - - = =- - -- = -- - = - - = - = - - = - _ _ _ _ _ DSSSTSFSNPQRCTDGPCAVEQPTKF-F-ENFKLGGRSSGDNSGSDGKNDDD~RNGFYEDDSESQATLT 110 120 130 140 150 160 170 140 150 160 170 180 TYQNILTNLYALPP-CDDTQLLIDKTKSQLNNLINSWNPEINYPKLSS

=__=

YBL 03-23

PDR 1 Y B L 03-23

_ =

PDR 1

===== == ========= ==-==_ =

===

-

=-

=-

-=-

= = = =

= -=

-

-

-

=

-

== _ = _ =

_-

SLQTTLKNLKEMAHLGTHVTSAIESIELQISDLLKRWEPKT 180 190 200 210

Figure 6. Homology between PDRI and YBL03-23. Only the NH,-terminal region of YBL 03-23, which is known , is represented. The PDRl sequence is from Balzi er al. (1987). Double and single lines indicate identity and similarity between residues respectively. Arrows indicate six putative metal-binding cysteines which define a Zn(II),Cys6 binuclear cluster, which was found to be homologous in several yeast regulatory proteins (Balzi et al., 1987; Pan and Coleman, 1990 Bai and Kohlhaw, 1991).

768 that of several yeast regulatory proteins (Figure 6; Balzi et al., 1987). This cysteine-rich region might form a ‘cloverleaf’-like Zn(II)&ys6 binuclear cluster, as proposed for similar proteins (Pan and Coleman, 1990; Bai and Kolhaw, 1991). More interestingly ,the similarity with the multidrug resistance gene PDRl (Balzi et al., 1987) extends on both sides of the putative Zn(II)2Cys6 binuclear cluster. This suggests that this new gene could turn out to be more similar to PDRl and could belong to the PDR gene family (Balzi and Goffeau, 1991). Mutations conferring multiple drug resistance or hypersensitivity phenotypes have been localized in this region (Guerineau et al., 1974; Subik et al., 1986). ACKNOWLEDGEMENTS We thank A. Goffeau for help and advice during this work and R. Stucka and H. Feldmann for providing the recombinant cosmid and the information concerning the localization of this sequence in the yeast chromosome. This work was supported in part by the Commission of the European Communities BRIDGE Program and by the Ministere de 1’EducationNationale (MEN, France). REFERENCES Bai, Y. and Kohlaw, G.B. (1991). Manipulation of the “zinc cluster” region of transcriptional activator LEU 3 by sitedirected mutagenesis. Nucl. Acids Res. 19, 5991-5997. Balzi, E., Chen, W., Ulaszewski, S., Capieaux, E. and Goffeau, A. ( 1987). The multidrug resistance gene PDRl

TH. DELAVEAU ET AL.

from Saccharomyces cerevisiae. J . Biol. Chem. 262, 16871-16879. Balzi, E. and Goffeau, A. (1991). Multiple or pleiotropic drug resistance in yeast. Biochim. Biophys. Acta 1073, 24 1-252. Elion, E.A., Grisofi, P.L. and Fink, G.R. (1990). FUS 3 Encodes a cdc2+/CD28-related kinase required for the transition from mitosis to conjugation. Cell 60,640-664. Guerineau, M., Slonimski, P.P. and Avner, P.R. (1974). Yeast episome: oligomycin resistance associated with a small covalently closed non mitochondria1DNA. Biochem. Biophys. Res. Commun. 61,462-469. Hohn, B . and Hinnen, A. (1980). Cosmids. In Setlow, J.K. and Hollander, A. (Eds), Genetic Engineering, vol. 2. Plenum Press, New York, pp. 169-183. Kawakami, T., Pennington, C.Y. and Robbins, K.C. (1986). Isolation and oncogenic potential of a novel human srclike gene. Mol. Cell. Biol. 12,41954201. Landschulz, W.H., Johnson, P.F. and McKnight, S.L. (1988). The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 240, 1759-1764. Marck, C. (1988). “DNA Strider”: a “C” program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers. Nucl. Acids Res. 16,1829-1836. Pan, T. and Coleman, J.E. (1990). GALA transcription factor is not a “zinc finger” but forms a Zn(II)2Cys6 binuclear cluster. Proc.Nat1. Acad. Sci. USA 87,2077-208 1. Sharp, P.M. and Li, W.H. (1987). The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucl. Acids Res. 15,1281-1295. Subik, J., Vlaszewski, S. and Goffeau, A. (1986). Genetic mapping of nuclear mucidin resistance mutations in Saccharomyces cerevisiae. A new pdr locus on chromosome 11. Curr. Genet. 10,665-670.

Sequence of a 12.7 kb segment of yeast chromosome II identifies a PDR-like gene and several new open reading frames.

A 12,684 bp DNA fragment, between FUS3 and the centromere, from the left arm of chromosome II of Saccharomyces cerevisiae was sequenced as part of the...
529KB Sizes 0 Downloads 0 Views

Recommend Documents