YEAST
0
VOL. 8: 761-768 (1992)
0 0 .
0
0
0
11
0 0
0
00
Yeast Sequencing Reports
0 0 0 0
Sequence of a 12-7kb Segment of Yeast Chromosome I1 Identifies a PDR-like Gene and Several New Open Reading Frames TH. DELAVEAU, C. JACQ AND J. PEREA Laboratoire de Genetique Moleculaire, CNRS URA 1302, Ecole Norrnale Supirieure, 46 rue d Ulrn 75230. Paris cedex 05, France Received 1 June 1992; accepted 5 June 1992
A 12,684 bp DNA fragment, between FUS3 and the centromere, from the left arm of chromosome I1 of Saccharornyces cerevisiae was sequenced as part of the European project to sequence the whole chromosome. This segment contains at least five complete new open reading frames (ORFs) and the beginning (191 first 5’ codons ) of an ORF whose putative translational product is highly similar to the multidrug resistance PDRl gene previously characterized by Balzi et al. (1987) on chromosome VII. KEY W O R D S 4 e n o m e sequencing; Saccharornyces cerevisiae; chromosome II; PDRl; multidrug resistance; Zn binuclear cluster; Leu zipper
INTRODUCTION
MATERIALS AND METHODS
As part of the European BRIDGE project to sequence chromosome I1 of Saccharomyces cerevisiae, we have determined the sequence of a 12,684 bp DNA segment located between the FUS3 locus and the centromere (Figure 1). This fragment represents the centromereproximal part of the insert contained in the cosmid alpha 1008.5 constructed by R. Stucka and H. Feldmann (Munchen), whereas the rest of the insert has been sequenced by J. Skda et al. (Yeast submitted).
Strains
K
a 1008.5
Saccharomyces cerevisiae alpha S288C strain was used by Feldmann et al. to construct a cosmid library in the vector pYc3030 (Hohn and Hinnen, 1980. Escherichia coli TG1 strain (Delta(Zac pro), thil, sup E44, hsdD5, F’ (truD36,proA+BlacIq lacZ delta M15)) was used for every step of subcloning and sequencing. ) I
Left
C
: Clal
B
: BamHl
S
: Sall :Xbal
X
,
C /,B
>
qr ,
B
S
1 KB H
Figure 1 . Map of the centromere-proximal part of the insert in the cosmid alpha 1008.5. The 32 kb insert is indicated as a thick line. The left part of the insert (20 kb) has been sequenced by J. Skala et al. (Yeasr, submitted). A previously known gene, FUS3 (Elion et d., 1990) has been localized. (C=ClaI, B=BamHl, S=SalI, X=XbaI.) 0749-503X/92/09076 1-08$09.00 01992 by John Wiley & Sons Ltd
762
TH. DELAVEAU ETAL.
Sequencing strategy and methods (Figure 2 ) YBL 03-158
YBL 03-17
+
YBL 03-16
2000
YBL 03-18
+ +
YBL 03.19
YBL 03-23
YBL 03-20
4000
YBL 03-21
+
YBL 03-22 YBL 03-24
1 +
8000
10000
12000 bp
Figure 2. Sequencing strategy and localizations of the ORFs. The lower part of the figure presents the sequencing strategy. Single arrows correspond to sequencing reactions made from exonuclease I11 nested deletions. Starred arrows indicate sequencing reactions made from oligonucleotides. A simple restriction map is indicated; C = CluI, B = BumHI, X =XbuI,S= Sun. The upper part presents the seven non-overlappingORFs indicated with arrows. Three short ORFs are indicated as overlapping longer ORFs.
The alpha 1008.5 cosmid was digested with CZaI producing four fragments of 22-0, 8.5, 8.0 and 3.4 kb length. The 22 kb fragment, containing the pYc3030 vector, was ligated to itself and its 12.6 kb insert was sequenced. This 12.6 kb insert represents the centromere-proximal part of the 32 kb insert in the original alpha 1008.5 cosmid. ClaI-SaZI digestion of the 22 kb cosmid generated three fragments of 6.8 kb, 4.4 kb, and 1.5 kb length. These fragments were cloned in the Bluescript phagemids in either pBS-KS+, pBS-KSor pGEM-7 Z f (-). Double-stranded DNA deletions were made using exonuclease 111 digestions. Dideoxy-sequencing reactions were performed with either T7 DNA polymerase (Pharmacia) or Sequenase (USB) and [35S]dATP on either single-stranded or double-stranded DNA. 240 sequencing reactions were carried out on 132 of the clones obtained after nested deletions. In addition, 14 oligonucleotide primers were used to cover gaps and junctions between the subfragments. Assembly of the sequences was made using the DNA STAR software. Preliminary analyses of sequences were carried out using DNA strider software (Marck, 1988).
Comparisons of nucleotide and amino acid sequences to data banks (GenBank release 66, NBRF release 27 and EMBL release 26) were performed using the Genetics Computer Group, Inc. package (GCG). RESULTS AND DISCUSSION Sequence determination
A few ambiguities remained after reading sequences on both strands. Running dITP sequencing reactions clarified these uncertainties, which were due to compressions. Sequence analysis
The complete sequence of 12,684 bp is given in Figure 3. Seven new non-overlapping open reading frames (ORFs) are presented and three other ORFs YBL03-19, YBL03-20 and YBL03-24 are included in longer ORFs. At the two extremities of the sequenced fragment, two interrupted ORFs are likely to continue in the flanking sequences. This has already been demonstrated for YBL03-15B, which is the 3' end of
763
SEQUENCE OF A 12.7 KB SEGMENT OF YEAST CHROMOSOME I1
1
I
ATCCATCAAC A C C M C M C A T C C M M A C C T C A C M T T C A ATCCCACCAA
AATCCTAACA M A C A C M A A T A A M A C A C A ACAAAAACAT T C C T C A T C A C
100
CTATCATCAC CCAACCTCAC AACCACCATC CCTATCAACA CCATAACCAA
CATTCCCATC CCCTCTCATT CCTCAATACT CACAATTCCC TCTCTAACAT
200
TCCATTATTC TCTTCTACTT TTCATCCTAA GTCACACTCT TCCTTACCTT
CGACATCCCT TCCACCTTCT TCTTCCTCCC AATTTCACCT A G A A A A T C M
300
ATCTTCCACC A A A M A A I C C ATTACCAACT M M T C C C A C ACCCCCTCTT
A M C A A C A C A ATTCCTCAAA ATACTCCCAC CCAACACCAA CACCAACAAC
400 500
ATTMTCCTC M T C T T T A T G TATATCCCTC CCACACCGCC T T T T T A T T T C
600
TTCACCCCCC CACTAATTCT ATAATTCTAT E T T T T A T A M A M E T A T A T 1 ECTCTTTTAC CACTTTTCCT CTCCTCCCCT A T C C T A C T M TAACATTTTA
700
CCCTTCTATT TATCTAATAA ACCCATTATT CCTCATTCCC AETTETAAAT
TTCTCTTCAA ACTTCAACCT TCAACCTCCT TTCCCEAGAC CGCC TTACT
AICCTACCCA ACTCAACCTT TTCTTTTTCT TTCCCTTT71 CTTATACAAT
n00
C C M T T T C C T CTCCCTCTAC TCTTTCCGTA TCTCCTETCA AATTCETCAC
TACATTTTCA AAACTTCTCT TATCCTTTAC CACCTCTCCC TGTTGTTTCT
900
T C M T C A T C T TGATTTACTA A C C M A A A C T CAATCCATTC ACACAATCTA
TCCTCCTATT ECCCTAMCA TTCTTCATCC A T A C T A G T M AATCTGCCAT
1000
AACCAATTCC CCTATATCCT CCTCTTCCTC ACCCTCCTCT TCCTTTTCAA
CACATTTTAA ACTCATTCGT CCTCTATACT CAATTTTGAA ATTATCCCTC
1100
ACCAACACAC CACTACCCCT TTCCTCATTT TTCTCCACTC CATCCCATAT
CCTTTTCAAC AACTCATCAT TCAACCGTTC TTCCTCCCTC TTCCTACCAA
1200
CACCCTCACA CCCATTATTA TCCCATAATC T C A A M T C T C CACCAACTTT
CTCCCTATAT GAGATTTATT CCATATCAAT CGCTCTAACA CACCATCTAA
1300
TTCTACCACT TCTTCATAAA TCACAICCCC TCCTTCATCT T C A A T M C T T
TCCTCACCTT C A M C A C T C A TTCTTTATCC T T T C T T C T T C CACCTCAAAT Eco RI ACACCTCCTT AAACTACAAT TCTCCTTATA CAGCTCATTT TCAACTCAGC
1500
TCTCCCTTTG CTACTATCTT ATTCAACCAC TTATCTACTA CACATTCCCA
1400
C A T C T C M A A ATTCCCAAAA TCATCACTAC M T C A C T T T C TCCATCTAAT
CCATTTCTCA CAGCTTCTAT CTCATCTCTA TCCCTCA
1 CCTTGCTTTG
1600
AAACACCTCC C T T T T T C T A T CTTCCTACAC C T T A M T T C T CCTTTCTGAT
ACCCCTAACC TTTTTTTTTT T T T T C A C T T T TCATAGCAAT TCGCCAAACT
1700
TTTTGTAACA TCTTCTCTTC T C C M A A C C C M A A T A G T M A C A A T M C T T
A M C C C T A T T G C T A T T A A T A T A T T T A C T T C TACTAAAATC ATTTCACCAA
1800
T C A A T T T TCATGCCCTA
CCACATCAAC ACATCACTCA CACAACCTAT TTTCCTCTCC AAGTAGCACA
1900
AAGCCATGAT GCCGACAGCT CATTAAATTC TTCATCCATC CCAACCCCCC
CACTGGATGT A G G A A C M A A GTTTACAAAA TCACTTCGCA CAACCCTTCA
2000
ACACACATAC ATCCGCTATT TCTATTCACA A
GCACAAGATG ACACTCAATC C T T T T T T A C T TCTTCAGATT CTCCAACATC
CAAAACAACC CCTGTACCTA AAACCATCCA AAACGATGAT TACTATCCTA
2100
AAAGATCTTC TACACCTTCA TCGCTCAAAC M C T C T T C A A TAAAATTAAT
ATTAATGATA CCCCTCACTC TTCAAACAAA CAAAATGTAT CTCACTCGCT
2200
CCTATCCGAA AACAAGCTCC TCTCTCCATC TAAAACCTTC TCGAACCAAC
CTCTTACAAA GGTGACTAAC TCCAAGTTTC GTACACCCTT CAGACCTATT
2300
TCAAACCAAT CCACTTTATC AAGCCATGAC CCTCTTAAAC A T T T T A G A T C
ACTTAACTTT CGGACCCCTA GTCATTTCAA ATCCTCCCGT GACGAGAAGA
2400
CAACTTCTCA TCTTCATTCA TCCACTCTCA A C T C A C T T M TTCCTTTACT
TCTACCACCT CTTCTTCAAA GTCCAAGTTC TGCAAAAATC ATAACCTATT
2500
CTCCACCTCC C T I T C T T C C A CATCTCTCAA TCACCAACAT CCCAACTTTC
TTCACCCAAA ACCAACCAAT TCCTTACAAA ACAACTCTTC AATTTCAACT
2600
Em RI T T T C A C M T T CTATTTTTGC T C G T C C C M A CACACACAGA ACAACACCAA
TTCTCCATTT ATTATGCCCG ATCATCAGAG CACGAAGCAG CTAAACCACA
2700
AACATTCATC ATCCAACCTT TCCTTTACAA CTTTCAACCA T M A A C A T C A
CATTCTTCAC T A M T A A C C T CAAACTAACG C G T A M C G A A ATACACAAEA
2800
ACTCAATCAT CCCATCAAAA AAACTTCCCA M T A T C A T T C CCTCTTCCAC
ACCAACTCTC AAAGGACAAC ATTCAACTCA M T T G A A A A A TTCAACATCA
2900
T T C C C A T C C T T A T C T T C A C A ACTCACTCCC ATAAACACTT TCCACTACAA
TCATTCAATT TTCCAACAAA TATTACAACT T T G T C A T G T T AACTATATAT
x)OO
TACATGATCT ACCTGAAGCT CACTCATTAG CCTTCTTTAC GTTCAATACT
ACATCCCTTC ACCTGTCTCA T A A C T T T T C G CAAACTTATC ATAGCCATAT
3100
CCAAACTTCA CTCATTTCCA A C A M C T A T G TCTACCCGCT CTAACTCATT
TCACTACTTC AAACTTGATA TCCTTACATC AATTGAAATC ATTACGGTTA
3200
ATACAACGAA CTACCCCTCT CCCCAATTTA C T G C M C C T T ATCTTCTCCC
CTCAAATCAA T C T G A A M T C ACCAAAACTT CATACTCTAC T T A T T T T T C A
3300
AATACCACCC AACTCCTCTA TCAACGTGCT CTAACATTCA TTACTCTCAA
CCATTCTCTA TTTTCTCGCA CTGCAGCACT A T T T T A T A T G TTGCTGAATC
3400
C A A A T T T C A C CTCCAACACA C C A A T C T M C TTTCCACCAT A T T T T C A T A G
A C T C T M A C E CAATCTTACT T T A A T T C A T A TCAACTCCTC T C C T T T C T T C
3500
AATATACACA ACAATAACGC TTCCTATACA ACATTAGACC ATCATTATTT
C T T C C M C G C CCCCGCACTC TTCAATTCGA GATATATGAA CTCATCACAA
3600
GCATCCTCCC TCACCCAATA TCCTCGCCTA CATTTGAACC AACAACAAAC
CTATTATCCT TATATCACCT AACTAGCACC CTGCTAAAAA TCGCCAAAAA
3700
ACCCCTAGTC ACCCCCCCTT TCAACCCCCA A G A A M T A T C TTAATCCACT
TCACCCACTT ACTCGATCCT CCTCCAAAAC ATTCCAAAAC AATTTTCAAA
3800
AAGCAACTTC TTATAACAAC TTCCCCTCAT TTCTTATCTT TCAACCCACA
AATAATCCAC T I
3900
TCTTAATGAT TTCTAACCAA AAAAAAATTA CCTACATCAA TTTCTAATCC
A M C C M A T A A A T A C T A T A A TATCAACACT CTTTCTGAGA ACACTTATCT
4000
T T A C C T T C C C CCACAAACTC AAACCACATT TTTAAAATCT TTTACACATG
A M A T G A A C C CTTCTACTCT TATCCTCTCT AACTCGTTAT T T T T C T T T T G
4100
C C T T T T T C A C CTTGACTCCT ACTCTAATCC TAAGAATATT C M A A T C A A T
CATCTCCCAT CCCAACAAAA ATAAAACGAA CTTACTGCTT TACCTCCCAC
4200
AAACATATAA TATTCGCCAA CATTCATAAA ATAATTAACG CTTACCAACG
AAGCATAATA AAATTCCCAC TAACCAAACG TCTCTCATAA CI\TGAAAGTG
4300
CACATC CTAGTAATAA CGTGCAACCT TTTCTATCCT
-I
Figure 3. Complete sequence of a 12,684 bp segment of chromosome 11. The sequence reads 5’ to 3’ from the left telomere towards the centromere. The seven non-overlapping ORFs are boxed.
764
TH. DELAVEAU ET AL.
........................................................................................ ~ r ? ? ? ~ ~ ~ ~ c - ~ ~ ~ ~ ~ ? ~ ~ C ~ C A~A T CCC A C CCM CA A C TlACT CAA A MT
ATACACACTC GATGTTACCC ACCATCCCAA AACCTTGGCC ACACCCCCCC
4400
TAGATCCCAA CATTACCATA TCCTCCATAC A T T C C A T T T T CCCTTCCATC
C M C T A C M T C T C T M C T C C TCACATACCT CTTCCCCAAC ACTTCCACAT Barn HI TCTCACCCCA T C C A M C T A C CTTCCAACTC CATCCCATCA T C C M T T T T A
4500
C C C C T T A T C T ACCATCACTA CCCATACCCC C T C C A T M C A T C T C T C A M T
4600 4700
TCACCATCAC ACACACCATT CCACAC??~.C.?r.~~~rr?E.E?lEE.rZ.?FE. A r J J s ~ r ~ ~ ~ - ~ ~ p p ~ p S p J J _ ! E ~ ] P E P ~ ~ ~ - ~ ~ ~ ? ~ ~ ~ ~ ~ ~ - C~ TCCAACGGTT ~ ? ~ ~ l . C~T A~C A?T T~TCA- ~ - 4800 CJ-C~~~~J~-~~~~~
CTCATATCCC CTCTTCACCA C C A C C M A C C TCACMCCCC C C T T T C C T T C
AAAACTCAAC ACATTTCACC TACATCAATC ACTTCTTAAC CCCCTTCTTT
TTCATCCCCC T M T M C T A T TTTCCCACTA CATCTCATCA TACAACCATC
4900
AACATATTTA CATATCACAA CACGCCCCAT ATTTCTTTTA CMTACACCA
T A T M T C A C C CACCCATTCA A A C M T C A C C CTTCACTACT T A T T T T A C A A
5MM
CCCCATCCTC C T C T C C C C A T CCCCAACATA TTCCACTTCC C M C C C C A C T
AATCCCCCCC TAACTTCTAT C C C M T T C T A M T A C A C C A A CTTCCCATAC
5100
AAATCTTACT T T A A T T C C T C ATCATCCACC TACACAACTC CCAACGTTTA
ATCCTACATT CTTTCACACA AATCCCCCTC T C M C C M M A M A C A C C A T
5200
CACCCACAAA A T C C T C T A C T C C C C C M M C C A T C A T M A C TTCACCATTT
T C A T M A M C ATCCACTCTC T T C T C C C M C ACCTCCCCAA CATAAATCCT
5300
TACCCCTCTC CACTACAACT A C A C C M C C C CAATATTACT TCCTTTCCAT
A T C C C M A T A M T C M T T A C TCATATCTCT TCGAATCCAC A C C C C T C T T T
5400
A T T A T T T G T T CCGTCATTAC A C T C T T C M T CACCTTGTTT A M T T C C M A
ATAATCAACT A C C C M C C C A A T T C C T T T C C M M A A A T A T CCACCACTTA
5500
TATACCTACC CTCTTCACAA CCACTCATTA CATTTCCCTC AAACTATCAA
C C A A C ~ . ~ ~ ~ ~ - ~ ? C - C - ~ - C ~ - T ~ - r ~ J s ~ - ~ ~ ? ~ - p ~ ~ ?56W pp~~*~-p~*r~~~!l!,
cnrcpJS4*?-rSI?EPlcb*"*cc*cccc*-~~~l-?~~A-~C--A-~~r~~~~J-JJ&TCMA MCATAACM
TCATCCTTCT CTCTCTACAA CTCAACATAT
5700
AAATATCTTC ATTCCAAACA C C A M M C C A TGCTATACTG A A T M A C C C C
TAACTCTTAA M C C C C T A M AACCCTCTCC CACCCACATT CATATCCACT
5800
TCATCATCTT CTCCCTTTTC T M T C C A A T C A A M A C C C T A CATTCCATAC
T M M C C A T T C A M A T M T C T M M T C A T C CACCAAGACC ATTAACACTA
5900
AAAATACATT CCTAAATCTC CCACAACCCC T T C A M A C M AATTACTATC
TCATCCTTCC CATTGCCAAC CTTCCGCATA C A T T C T T T C A TTATGCCTAC
6000
h A A A c y 5-
! ~ ? ~ ~ ~ L % % ! ~"~~E~E~~~-C*CMTC*?S--b?~ ~t~t?~-~~SbS? !??~!ccccc~cc?A!%%%LlLt% ~ ~ - % C ? ~ 6~1 0 0~ ~ E - ~ ~ ~ ~
ZJJJJsjiflC
C A C T T C T C T C A C A C C M C A A M T G A C T T T C ATAGCATCAC
TTTCAACCCC AACCTAACCC A G C A C M C A T CTCGACCCAA CAGCCGACAA
6200
:ACGATGCCT
TCTCCACTCC CATCTCATAC CACACACACA T C T T C T C C T T
CTTCACCCCC C T A C T T T A C A CCATATACCC CTTTTACAAA TAACAAACCC
6300
TGTTGACACA TCAATACAAT T T C A T T C T C A ACCACTTTTA C A T M T C C T A
CCACAATCTT A C C C T A T C M CCACCTAAAA CAACCATCCA C A C C T T T A T T
6400
X A C A A C T T A T C A T T T C T G C AATAGCATCA M A C A T T C T A ACTCCTGGTC
CCTACCTTCA GCTAATCGCT C T A T T T A C A T A C T T T C A T K AATGGACAGC
6500
4CACAATCCC AAAAATTTGC CTACCACATA AACTTATTAA AATGGTAACA
T C A A C C A M T ACCTCCTTGT TCTAACTCAA AGACCTTTAT T T T T T G C A T C
6600
X A C T T C C T C C A T T T A A A A T TACTACTCCC M A T C T A C C T ATATTACCAA
T T T T C M C C C TCACCCAATT CATCCAAACA ACCTAACAAT CAACAAGCTT
6100
4 T T A M T C C T TTCCTTTCCA T C C T T C T A C T T C T C A T T T C C TTTTLCACCT
TCCCCATCCT M M A T C T C T A T A M T C G A C TAACCATCTC C C T i C T T C C T
6800
XTTATATAA ATA
AACCT TATACCCCTT CCTTTCCACA A C T T T T T T C C
C T C A T M T T A CCTCTATATT T T A T A A A T A T CATTGTAGTA T T C C C A T T C T
6900
T T A A C T C C T T ACCATTTCAA ACCACTCATA CACACAAGCC CTTACACATC
A A C C T T T C T C M C C C A C A C A 7CACAACTAT T T T A A T C T C C C T T C T C T A T T
7000
ACTCCAACCA AACTCACTAC ACACCTCCTC TCTCTCCAAC A A A T A M T T C
TTACCCTCCA CTCCACCATT T C A A C T T T A C TATAATATCA CAATAAAACA
sal I 7100 7200 CAGATTCTAC G A A A C T A T T T C A T A T A C C T T C T T T T A G T T A TTATCCTATA
AAATCTTAAA A T A C A T T A A T CTACAATCCA AACCCATTTG ATCCACTACC
7300
7400 7500 7600 77w 7800 7900 8000
8100 8200
8300 84W 8500
8600
Figure 3.
(cont'd)
765
SEQUENCE OF A 12.7 KB SEGMENT OF YEAST CHROMOSOME I1
TCCTGCATCC AACCAGTAAA T T C T C A T T T C T T A C A C A C T A M T T A C A T T C
A C A C T T A C A A G C C G G T T T A A CACTCCAACC A T C A T C T C T C T C A C C T C A T C
8700
TCCAACCACC T C T T T C A C C A C C A C T T C T T C T T T C T C C C A A C A A T T C C T C A
CCCACTCACC T T T C C C C A C T T C T A C C A A C C T T C A A T C A A C C A T C A C C T T C
8800
TCAAAACATA T T T C C A C T C C C A T T T C T A C C A A T T C A T C C A A T A T T T T C T C
T T C C M A T T T CTTCTCAACA TCTTTCATTA C T C T M C A A T ATCACCTTCC
8900
CGCAGACCCA A C G T T C T T A A C A T T C A C T T A T T C A T C T C A C C C A T C A T C T C
T T C A C T M C C TCTTCTCTAT C A M A T T T A T TCTATATCTT TGACAATTCC
5000
T T A C A T C A A C GCCGCAATTC A A C A M M T T CGAACCAATC A T A T T T C T T A
C T C C A C T T G T T G T T A T T C C T G C T T T C C C C T GGTCCTACCC A C G T C G T A T T
9100
TCCAACGCAG GTAGTACACC T T C C T C T T C C T C C T T T T A T T C C A C C C M T T
C C T T T T C l T C C M C C T T C A T CTTTCTTCAT CTACCACTTC TCTCCCTTTT
9200
TTCAATTCCT ATAATTCACC CTCTTTCAAT CTCCTATCCC CTTCTTTTTC
CTCCTCCTCC T T C M T C T C C TTCTTCTTTC TCTTTCACAC TCTCTTCAAT
9300
CACTGCCACC GCAACTAGAC C C A T C A T T C C C C T T A A A T T T C T C T M C C A C
M C C C A C T M TTTTTTCCAC A T A C C C T A M TCTTCATTCC ATACTTTATC
9400
AGCCGCCACA G C A A T C T T G A C A C C A T T A C C C T T G T G C M G T C C A T T T T C C
CCTTACCACA T C C M T C M T TCTCCATCCA CTTTGAAAGT ACCACTTCTL
9500
T C A 4 C C C A l A G A C G i C A T T T T T T T G C A T T C G C C M A T C T T TCCTCCCCGA
ACACTTTTTT T T C T G A C A G G A C M T G M C T T T T T C T T C A C C T T T T A C C C G
9600
CAXACTACC l A C A A C 4 l C C TTTTGTACTT CATCATCTTT CCMCTCCCA
T T G G C A T T C G ATTTACATCT T C A T C T A C A T C T C C A C C T A C A T C C A C A T T T
9700
~ t l G A A G T T TTTCTTGATAG ACTTCATCAT ACCGCTTGCT CTAGATTCAC
T A T C T T T T T T G T C A C C A A C A GGCTCAATAA ACTGTGCACC AACCAGACCC
9800
CTTTTCCCTC AATCAACCAC TTCCCACATC CACCACTCCT TAGATTTTTT
A T C A T C T A M ATGTAGACTT TATCCCCTCA TTTTATGCTT AATTCCTCCT
9900
t T G 4 T T C A C C C A T G A A C T C A T A T T G A A C M T A C C T C T C T T TTTCCATTTC
C A A C C C A T T T C C A C T T C T C T CAACCCACGA TCACCCCAAC C A C C T T T A T A
1d000
TTCACCAATA ATCTTCATGA TCTCTTCACA TCTCCTCCTA TTACCACTGT
G T M T T C A A C CCTTCTATAT G G A T C A A C M ATTCCAACAA CATCTCTTTC
i0100
T T T T C C T T A T C A T A A C A C A C C A A T T T A T C A A T T C A C C A C T CATGACGACT
C C C C T T T T G C GCTATCAACT T T A T C T T C T T CTTACCCA'T
CATACCTTGC
10200
CCTTCTTCTT TTTCCTTCCC TCCATTTCCG T M C C T T C C A ACACTCATAC
T C T C T A T l C T ATTCGTCATT TCCAACCTTC TTCCTCTTGC
TGTTCTAATA
10300
E w RI
GTAATCATCC TCTTCATCGT CATTATCCTT ATCCCTGTAT CATAGTCTTC
T A C C A C A T C G A A C A C C C C C A CCCGTCGCAT C A C T A G T T T C C G T A C T A C C C
10400
G T T C G T C T T C CAGGCATGCC ACCAGGAGCC C C C T C T T C A T C C T C A T C I G G
AGCTTGCTCT T C T T T A C T T T CCATCATACC AGCTCTATCA TTGTGTTGAC
10500
CAGCTCGTAA A A A C T T C C T G G C T A C T A C C C CACCACAAGC ACCACCCCCT
CCACCACTAC CTGCAGGACC TTCAGCGCCA CCGGGACCCT C T T C C T G C T T
10600
Em RI A T T C A T T C G A A A C C C T A C A C T T M C C A A C A CCCAATCACC A T C T T T A T C A
TCGAATACAT CAAAAACCTC ATTTTCATCA AACCTCAATT CTTCATCACC
ATTTTCCACC T C T T C A T M T C A T A M T G G C TCTTACCTTC TTCAAAACAC
10800
C A G C T T C T T C A A T C T A A C T C C A C C C C A C T A CACCCACCCC T T C T T C C C T A
T C C C M C C M TCACTCTTTT CTTTACTCTC C A C C M T C C T CAATCTCTCA
10900
CTTCTGTAAC ACCTACAACA GATCGTCTTC TTCCATCCCC ACTTCTTCTG
CTCTCTCCCC C T C A T A G E C A TAGACCCCCC T A T A C A T G C C CAGAAACACL
1 loo0
CTCT ACCTCTTTTG TATATAACAC ACTCTCTCCC CTTGCCTATC
TTACACTATC C C T T C T T T T A CCATTGCAAT ACATATACAT ACTATATATG
11100
CCCTAAATAT TCGTTCTGTT TTCGTCTGAT GTATCTCCAT TTCGCATGAC
TCCAACCAAC C C C C T T T T A C G A A T C C T A T A C T C A C T T T C T ACCAAGCAAC
11200
A T T C T A C C T C T A T C A A T T A C ATGGTTTAAT CTACCCACGC A T C T T C T C T A
AETTCACACC PCCTATCCEA A C C T A T M A T CCTGATTGLC ACCCTTTCCA
11300 11400
TATTCCTC CCGCTCTCTA
TCTCTCATTC CAACGCTAGA C A C A A T T C T C CTCCCCCCCC AACAAAAACA
T C C T G T C T C T CTAGAGCAGC T T T G C A T T C T A A T C T C C C C T G C M A T C G T G
C C M A A C C T C C A C A C T T C A C A G C C C T T C C A ACTGCTCCCC TGGCACATAA
115w
CTGCCCACCA GTGAATGGCC T A C T T C A T A C T C C A C C T C T C CCCCATCCCC
GACCCACACT T T A C C C C C T A G C T A A C C A T C T A C A C C T A C T C T T T T C C C C T
11600
TGGATACCTT CGCTTTACTA TTCTCACTCC CCTCTATGCC CTTCTCGGTA
C C A T G A C A T T C T T G C G T A A C C T G T T C M C C TCACTCGCAC T G A A C A C T A C
11700
C T C C C A T G A C T T T T C C A G C C CTCAAACCCC T C C C A C T A C C T C A T A C T A T C
CCATATTACT TCCACTCA
CTCACCTATT CCTTCTTAAC ACCCTCCTCT
11800
CCTCCTTTCA CCTAAAGCCA TTATTTCCAC CACAACTCCA TTTCACTTTT
C A T T T T C T T C C A G A A C A A C I CCCCCTCCCC T T C T T T C C G C C C A C C T C C C T
11903
CCTCTGCCCC TCGGACTTTC C C C G G A A T M TAAATCAACT ATCACACTEA
CTCTTACAAT C C A M T A T C T A C A T A C T C T T A T A C A A T C T C A T T A A T C T A T
12000
T T A T G T A T T T C C T C T C T T T C T C T C T T C A C C A C C C T T C C A A ACCAACCATA
CCTCTTACCC C A C M C T C C A TCACCACTTT TATTMTTTT TTCTTATTCC
12100
CTCACCGC
A T C T C T C A A T TGCACAAAAA
12200
TATTATCACT TTTCTTTTCC TTTTACCCTC
sal I
T C A A A C T C A A C A A A T C A A C T AGATCAAAAC T T T C C A C A C C
GTACCAACTC CATTTCTTAC GATTCTACCT GTGTATTCCT A A M A A A C A T
G G A A A A T C M ATCCACAGGT A A A T A T C C A T
12300 TTACCGCACA A G G A C C A ~ ? F - ! ! F F F I I G T C T - ? ? ~ ~ ~ T - A ~ ~ ~ - : * C C ! _ ~ ~ ~ ~ C C
r c c a c _ ~ ~ ~ ~ ~ ~ r ~ - r c r r c * c c c - ! ! S n E F ? F ? ! - E ! ? c ? c c A I c - - ~ ~ ~ ~ ~ ~ ~ CTA -T ACA-T CCA-A cT TA TAT A C T T C A A
Figure 3. (conr'd)
10700
C C A A G l C C A C C C A T T C l C T C C T T C C A C C T A A T T C C C T C C A ATGAACCCCA
ACTGATCAAC
12400
C A C C T G A l A C A C A C T C C A C T C T C T C C G A C T CCCACCCATG C T C C T C A T A C
TTCCAATAAT CCTACTAATC ATAATAATAT TCTCTTTAAA GATGATTCCA
12500
AATATCAAAA TCAACTGCTT ACCTATCAAA ATATTCTGAC AAATTTCTAC
CCTCTCCCCC CTTCTCATCA CACTCACCTC TTCATTCATA AAACCAACTC
12600
G C A G T T C A A T A A C C T C A T T A ACAGTTCGAA T C C C C A A A T A AACTACCCCA
A C C T T T C C A C T T T C T C T C C T CCCCCACAAA G A T C
12684
766
TH. DELAVEAU ETAL
Table 1. Characteristics of the different translation ORF products. The complete sizes of the putative proteins are given in amino acids except for the interrupted ORFs: 03-19,03-20,03-24.CAI indicates the values of of the codon adaptation indexes (Sharp and Li, 1987).
CAI
MAIN FEATURES
> 151
0.1 96
C-terminal end of YBL 03-15A (Goffeau eta/.)Glutamic-rich protein
YBL 03-16
280
0.1 23
Leucine zipper + basic region
YBL 03-18
840
0.1 28
YBL 03-19
151
0.1 06
?
YBL 03-20
150
0.082
?
YBL 03-21
1244
0.1 90
Non polar repetitive C-terminal domain
YBL 03-22
145
0.058
YBL 03-23
> 191
0.1 22
Zn (II), Cys6 binuclear cluster Homology with PDR7 (Balzi eta/.,1987)
YBL 03-24
> 104
0.083
?
SIZE (amino acids) YBL 03-158
a longer ORF, YBL03-15A, recently sequenced by J. Skala et al. (Yeast, submitted ). Table 1 presents the main features of the putative translational products of the different ORFs. The codon adaptation index (Sharp and Li, 1987) is generally low, suggesting that most of these ORFs have a very low level of expression. The lowest levels for YBL03-19 and YBL03-20 may in fact suggest that these ORFs, overlapping YBL03-18, are not expressed. More careful analyses of the putative ORF products revealed that at least three of them are worth considering. YBLO3-16 encodes a very interesting basic motif/ leucine zipper domain.This 280 amino acid long protein contains periodic repetitions of eight leucines plus one phenylalanine at every seven positions (Figure 4). More interestingly, this leucine zipper motif (Landschulz et al., 1988) is immediately followed by a short stretch of basic amino acid residues. Both features suggest that this protein might be a sequence-specific DNA binding protein.
1 81 161 241
I 10 MSDRDQIEPV LIEDERPHVI LFLRDNFKID TmNWTf.Q I 10
I 20 TNALDAESDS YEQLVQLDPV YTPPMTLKSL TGHTQ-RD
I
I 30 SDDFGNFSDA LQPFIWNKSH QKEEERDEEQ EIWNKKRN 20 I 30
I
40
SVENDLYNQN IRRNLLHILR HIPQLLMAoF KKKRFSWVGY I 40
YBLO3-21 encodes a 1244 amino acid long protein. This long protein has a clear double-domained structure which is apparent from the distribution of charged amino acids. Figure 5 reveals that the C-terminal part of the protein (amino acids 860 to 1244) contains very few basic or acidic amino acids, whereas the N-terminal part of the protein is rich in charged amino acids. Moreover, the uncharged C-terminal domain is composed of homologous repeated sequences which, most of the time, start with the three amino acids TGG. Some of these repeats are actually very similar, for instance the repeat TGGAMMPQTSFNALPQVcan be found several times in very closed forms (Figure 5 ) . Homology has been detected between YBL03-21 and the human kinase-related transforming protein slk (Kawakami et aE., 1986). YBLO3-23 codes for the N-terminal end (191 amino acids) of a protein whose coding sequence certainly extends on the right of the sequenced clone. This fragment contains a cysteine-rich region homologous to
I 50 I 60 I 70 I 80 STLTTSSESV VDNCLNKILP KGEFDLEEET IKNDCFKLSK 80 LSDNNGSEGV GTKREEEPLN DELFKRICDA VEKNEQTATG 160 T S M D E E ~ QY H D ~ Q S I D~ S K S R S KK Q Q ~ K D K240 281 I 50 I 60 I 70 I 80
Figure 4. Translation product of YBL 03-16. The leucines (+ one phenylalanine) forming the heptad repeats are boxed and the adjacent basic-rich domain is underlined.
767
SEQUENCE OF A 12.7 KB SEGMENT OF YEAST CHROMOSOME I1
I Acid + Basic Map 500
1000
500
1000
11 Protein sequence I 10 I 20 I 30 I 40 1 MTVFLGIYRA VYAYEPQTPE ELAIQEDDLL YLLQKSDIDD 81 EQVQNADEEL TFHENDVFDV FDDKDADWLL VKSTVSNEFG 161 LPTNFLPPPQ HNDRARMMQS KEDQAPDEDE EGPPPAMPAR 241 SNNVGNHEYN TEYHSWNVTE IEGRKKKKAX LSIGNNKINF 321 GNTTTCEEIM NIIGEYKGAS RDPGLRXVEM A3KSKKRGIV 401 KSGLVPAQFI EPVRDKKHTE STASGIIKSI KKNFTKSPSR 481 HKKNSSATKD FPNPKKSRLW VDRSGTFKVD AEFIGCAKGK 561 DGSSSRGTDS RDSERERRRR LKEQEEKERD RRLKERELYE 641 PPAESSNNNN SSNKYDWFEF FLNCGVDVSN CQRYTINFDR 721 NIASIPTNAT GNMFSQPDGS LNVATSPETS LPQQLLPQTT 801 QDLLDLQPLE PKKAAASTPE PNLKDLEPVK TGGTTVPILAP 881 MLPMQVVPQ-Q SQ-ILPV Q-LIP1 961 PQTSFNVQGQ Q Q L P ~ L PVQKTANGLIS m M F ' T V 1041 F N A V P Q I m AMMPQTSFNA LPQPLQ-LN 1121 A P L N Q N Q m T L S m VLQQQQPQTM NTF-Q 1201 ~ Q Q P Q Q A Q L Q N Q FGNGPQQSRQ ~ ANIFNATASN I 10 1 20 I 30 I 40
I 50 I 60 I 70 I 80 WWTVKKRVIG SDSEEPVGLV PSTYIEEAPV LKKVRAIYDY 80 FIPGNYVEPE NGSTSKQEQA PAAAEAPAAT PAAAE'ASAAV 160 PTATTETTDA TAAAVRSRTR LSYSDNDNDD EEDDYYYNSN 240 IPQKGTPHEW SIDKLVSYDN EKKHMFLEFV DPYRSLELHT 320 QYDFMAESQD ELTIKSGDKV YILDDKKSKD WWMCQLVDSG 400 SRSRSRSKSN ANASWKDDEL QNDWGSAAG KRSRKSSLSS 480 IHLHKANGVK IAVAADKLSN EDLAYVEKIT GFSLEKFKAN 560 LKKARELLDE ERSRLQEKEL PPIKPPRPTS TTSVPNTTSV 640 EQLTEDMMPD INNSMLRTLG LREGDIVRVM KHLDKKFGRE 720 SPAQTAPSTS AETDDAWTVK PASKSESNLL SKKSEFTGSM 800 VSSAPVSSAP APLDPFN N I L P L m VMMF'M880 S N P Q -TVL PLQ-GLI PIA-QF 960 Q-MIPQ TSFGVSQQL-MTQPQN m M P Q T S 1040 GLTLQ1120 TF-IP QTSFSSQAQN ERPQSQF E L Q M M T T F N Q Q P Q M M N T F m I M Q QPQMMNTFN 1200 PFGF 1245 I 50 I 60 I 70 I 80
Figure 5. (I) Diagram of acidic (A) and basic (B) residues in the translation product of YBL 03-21. (11) Complete protein sequence of YBL 03-21 The TGG sequences at the beginning of the different repeat units are underlined.
60 YBL 03-23
KVKKSTRSKVSTACVNCRKRKIKCTGKYPCTNCISYDCTC~LKKHLPQK-ED-SSQSLPTTAVAPPSSH
PDR 1
KIRKP-RSKVSKACDNCRKRKIKCNGKFPCASCEIYSCECTFSTRQGG~IKNLHKTSLEGTTVQVKEET 40 50 60 70 80 90 100 80 90 100 110 120 130 ANVEAS-ADVQH-LDTAIKLDNQYYFKLMNDLIQTPVSPSATHAPDTSNNPTNDNNILFKDDSKYQNQLV -= - - = =- - -- = -- - = - - = - = - - = - _ _ _ _ _ DSSSTSFSNPQRCTDGPCAVEQPTKF-F-ENFKLGGRSSGDNSGSDGKNDDD~RNGFYEDDSESQATLT 110 120 130 140 150 160 170 140 150 160 170 180 TYQNILTNLYALPP-CDDTQLLIDKTKSQLNNLINSWNPEINYPKLSS
=__=
YBL 03-23
PDR 1 Y B L 03-23
_ =
PDR 1
===== == ========= ==-==_ =
===
-
=-
=-
-=-
= = = =
= -=
-
-
-
=
-
== _ = _ =
_-
SLQTTLKNLKEMAHLGTHVTSAIESIELQISDLLKRWEPKT 180 190 200 210
Figure 6. Homology between PDRI and YBL03-23. Only the NH,-terminal region of YBL 03-23, which is known , is represented. The PDRl sequence is from Balzi er al. (1987). Double and single lines indicate identity and similarity between residues respectively. Arrows indicate six putative metal-binding cysteines which define a Zn(II),Cys6 binuclear cluster, which was found to be homologous in several yeast regulatory proteins (Balzi et al., 1987; Pan and Coleman, 1990 Bai and Kohlhaw, 1991).
768 that of several yeast regulatory proteins (Figure 6; Balzi et al., 1987). This cysteine-rich region might form a ‘cloverleaf’-like Zn(II)&ys6 binuclear cluster, as proposed for similar proteins (Pan and Coleman, 1990; Bai and Kolhaw, 1991). More interestingly ,the similarity with the multidrug resistance gene PDRl (Balzi et al., 1987) extends on both sides of the putative Zn(II)2Cys6 binuclear cluster. This suggests that this new gene could turn out to be more similar to PDRl and could belong to the PDR gene family (Balzi and Goffeau, 1991). Mutations conferring multiple drug resistance or hypersensitivity phenotypes have been localized in this region (Guerineau et al., 1974; Subik et al., 1986). ACKNOWLEDGEMENTS We thank A. Goffeau for help and advice during this work and R. Stucka and H. Feldmann for providing the recombinant cosmid and the information concerning the localization of this sequence in the yeast chromosome. This work was supported in part by the Commission of the European Communities BRIDGE Program and by the Ministere de 1’EducationNationale (MEN, France). REFERENCES Bai, Y. and Kohlaw, G.B. (1991). Manipulation of the “zinc cluster” region of transcriptional activator LEU 3 by sitedirected mutagenesis. Nucl. Acids Res. 19, 5991-5997. Balzi, E., Chen, W., Ulaszewski, S., Capieaux, E. and Goffeau, A. ( 1987). The multidrug resistance gene PDRl
TH. DELAVEAU ET AL.
from Saccharomyces cerevisiae. J . Biol. Chem. 262, 16871-16879. Balzi, E. and Goffeau, A. (1991). Multiple or pleiotropic drug resistance in yeast. Biochim. Biophys. Acta 1073, 24 1-252. Elion, E.A., Grisofi, P.L. and Fink, G.R. (1990). FUS 3 Encodes a cdc2+/CD28-related kinase required for the transition from mitosis to conjugation. Cell 60,640-664. Guerineau, M., Slonimski, P.P. and Avner, P.R. (1974). Yeast episome: oligomycin resistance associated with a small covalently closed non mitochondria1DNA. Biochem. Biophys. Res. Commun. 61,462-469. Hohn, B . and Hinnen, A. (1980). Cosmids. In Setlow, J.K. and Hollander, A. (Eds), Genetic Engineering, vol. 2. Plenum Press, New York, pp. 169-183. Kawakami, T., Pennington, C.Y. and Robbins, K.C. (1986). Isolation and oncogenic potential of a novel human srclike gene. Mol. Cell. Biol. 12,41954201. Landschulz, W.H., Johnson, P.F. and McKnight, S.L. (1988). The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 240, 1759-1764. Marck, C. (1988). “DNA Strider”: a “C” program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers. Nucl. Acids Res. 16,1829-1836. Pan, T. and Coleman, J.E. (1990). GALA transcription factor is not a “zinc finger” but forms a Zn(II)2Cys6 binuclear cluster. Proc.Nat1. Acad. Sci. USA 87,2077-208 1. Sharp, P.M. and Li, W.H. (1987). The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucl. Acids Res. 15,1281-1295. Subik, J., Vlaszewski, S. and Goffeau, A. (1986). Genetic mapping of nuclear mucidin resistance mutations in Saccharomyces cerevisiae. A new pdr locus on chromosome 11. Curr. Genet. 10,665-670.