Current Genetics

Current Genetics (1983) 7:21-28

© Springer-Verlag 1983

The Structure of the Gene for Subunit I of Cytochrome c Oxidase in Neurospora crassa Mitochondria Jenny C. de Jonge and Hans de Vries Laboratory of Physiological Chemistry, State University Medical School, Bloemsinge110, 9712 KZ Groningen, The Netherlands

Summary. We have sequenced the gene for cytochrome c oxidase subunit 1 (CO I) in Neurospora crassa mitochondrial DNA. The gene is coded by the same strand as the rRNA and tRNA genes. The coding sequence predicts a protein of 557 amino acids, starting with methionine, and ending with asparagine. Comparison to the N-terminal amino acid sequence of the mature protein (Werner et al. 1980) reveals that the methionine is located at position -2. No other upstream AUG codons have been found in frame. The C-terminal part of the gene is about 70 basepairs longer than the corresponding parts of the Saccharomyces and mammalian genes. The homology between the Neurospora coding sequence and those of yeast and mammals is very high. As compared to Saccharomyces, the introns il through i5 are absent. Key words: Mitochondrial DNA - N e u r o s p o r a crassa Cytochrome c oxidase

Introduction

The Neurospora crassa mitochondrial genome contains genes for 2 rRNAs, about 25 tRNAs (Terpstra et al. 1977) and a limited number of proteins. Among these are the genes for subunits 1, 2 and 3 of cytochrome c oxidase. The CO III gene has been completely sequenced recently (Browning and RajBhandary 1982), whereas the N-terminal part of the subunit 2 DNA sequence was determined recently in our laboratory (Van den Boogaart et al. 1982b). Comparison of the DNA sequence data of CO HI with the amino acid sequence (Sebald et al. Offprint requests to: H. de Vries Abbreviations: CO I, CO II, CO III =~subunit 1, 2, 3, resp. of

cytochrome c oxidase

1973), combined with the finding that the mature CO llI is formylated at the N-terminus (W. Machleidt, pers. comm.), shows that CO III is synthesized in its mature form. On the other hand, a similar comparison of the CO II sequence with the amino acid sequence (Machleidt and Werner 1979) shows that subunit 2 is synthesized with an N-terminal extension of 12 amino acid residues. Recent work by Van 't Sant et al. (1981) showed that among the mitochondrial translation products three of these are candidates for being precursor proteins. The largest of these (apparent mol. weight 45 kDal)was shown to be identical to the precursor of subunit 1 of cytochrome c oxidase (CO I) found in the mitochondrial mutant rni-3 by Bertrand and Werner (1979). The N-terminal amino acid sequence of the mature CO I has also been determined by Werner and colleagues (Werner et al. 1980). Interestingly, neither the yeast nor the mammalian CO I is synthesized as a precursor. The approximate location of the CO I gene on the mitochondrial genorne has been determined earlier by us (Agsteribbe et al. 1980) by using cloned segments of the Saccharomyces oxi3 region. Most of these probes hybridized to the EcoRI fragment E3 (9,000 bp). We set out to characterize and sequence the CO I gene completely, for two reasons; first to find the sequence of the N-terminal prepieee, and second, to find out whether or not the Neurospora gene has a similar complex organization as the Saccharomyces gene. In this paper, we present the complete DNA sequence of the CO I gene.

Materials and Methods Construction ofPlasmids, pBE3 contains the EcoRI fragment E3 of Neurospora crassa mtDNA cloned into the EcoRI site of pBR322, pBH7c contains the two adjacent HindlII fragments

J.C. de Jonge and H. de Vries: Neurospora Cytochrome Oxidase Subunit 1 Gene

22

Table 1. Heterologous hybridization of yeast CO I probes to Neurospora crassa mitoehondrial DNA Yeast probe

Exon

Intron

Degree of hybridization

To fragment

pKL41 pKL71 pKLC pKL20 pKL134 pKLD pKLA pKL111 pKL108

A1 A2 A3 A3, A4 A4, A5 A5 A5, A6, A7

il il il, i2 i2 i2, i3 i2, i3 i4 ai5a,/3 aiSfl, 7, i5, i6

very weak not detected not detected not detected not detected strong strong strong strong

H7c H7c H7c (also to H2a) H7c H7c

a H2 probably contains the cytochrome b gene (cf. Macino 1980)

E H I

I

b

b HH

I

I

H2

H16o I (---b4

b

I

Hb

I I

II

[ b l 0 I b12 I E3 )

HTc

I

IHll

b5

DNA Sequencing. Restriction fragments containing 5' protruding

E

II

ends were 3' labeled with a-[32p]dNTPs using the Klenow fragment of DNA polymerase. Sequencing was performed according to Maxam and Gilbert (1980).

"-)

1b13--), I-)' E5 I

I

Fig. 1. Physical map of the EcoRI fragment E3. Restriction sites: E EcoRI, H HindlII, b BgllI. The bar represents 1,000 bp. The arrow indicates the direction of transcription in the tRNArRNA region (which is located about opposite on the circular genome)

H20 and H7c, both located within E3, cloned into the HindlII site of pBR322. In both cases pBR322, linearized with either EcoRI or HindllI, was treated with caR" intestinal alkaline phosphatase before ligation to the mtDNA fragment. We used electrophoretically purified E3 in the former and total HindlII digested mtDNA in the latter hgation. The simultaneous presence of H20 and H7c in the plasmid pBH7c was caused by the presence of partial fragments in the HindllI digest. Screening was for ampicillin resistant colonies on plates, followed by small scale DNA preparation (Birnboim and Doly 1979).

PIasmids Containing Yeast oxi3 Fragments. The plasmids were gifts from Drs. L. A. Grivell and L. A. M. Hensgens.

DNA Hybridization. Restriction digests were separated by electrophoresis on agarose gels and transferred to nitrocellulose paper (Southern 1975). The plasmids containing yeast mtDNA were nick-translated and hybridized at non-stringent temperature as described before (Agsteribbe et al. 1980).

Results The probes from the Saccharomyces oxi3 region were from KL14-4A, the "long strain", i.e. the strain carrying an even larger number of introns in the CO I gene than D273-10B (Grivell et al. 1982; Bonitz et al. 1980). The probes used constitute a representative sample, covering all exons and most of the introns. A complete description of the plasmids used is given in Grivell et al. (1982). We have found earlier (Agsteribbe et al. 1980), using a somewhat less complete set o f oxi3 probes, that all probes hybridized with E3 only. Table 1 summarizes the hybridizations of the set of yeast probes to HindllI-digests of total Neurospora mtDNA and of pBE3. For convenience, the HindlII/BgllI map of E3 is shown (Fig. 1). Those probes showing detectable homology to Neurospora mtDNA hybridized solely to the 2,700 bp HindlII fragment H7c, with one exception, KLA. Probe KLA, containing the yeast intron i4, has an additional hybridization to BgllI-bl2 (Agsteribbe et al. 1980) and HindlIIH2. Since the cytochrome b gene is located on H2 (cf. Macino 1980), it may be possible that the Neurospora

TAA

H -'-I i

0

P

f

Sh~,

I

I

II

TAA

I i

S

h SfD

D

D

S Sf

I

I

I

I

I

i

1000

III I

t

J,

s

II

I

fH ll-

i

20O0

Fig. 2. Physical map of the HindlII fragment H7e. Restriction sites used for mapping and/or sequencing: H HindIII, T TaqI, h HpalI, S Sau3A I (Mbol), f HinfI, D DdeI, P Pstl. N denotes the place of the TCA codon for the N-terminal serine: ATG is the methionine codon upstream of the N-terminal TCA; TAA denotes the inframe stopcodons upstream and downstream from the CO I gene

J. C. de Jonge and H. de Vries: Neurospora Cytochrome Oxidase Subunit 1 Gene i

61 121 181 241 301 361 421 481

541 601 661 721

781 341 901

961 1021 1081 1141

1201 1261 1321 1381 1441 1501 1561

1621 1681 1741 1801 1861 1921 1981 2041 2101

CTG CAT AAT AAA CAC ATA TTT GAG TCG CTA GAG ACT GGT AAT GAA AGT TTA CAC CTC TTC GGG TCA ATT GAT ATA TTT AAT GTA ATA GGG ATA GTA GAA CTT TCA TCA

CAG GAG ACT AAA CTA TCC ATA GAG ATT ATC TTA GCA AAT ATT GGT GGA GGG AAA CCC TTT CAT GCT GGA ACA TTC GCT GCA TTA TTA GTT AGT AGT TAC TTA TTC TAT

GCA GCT GAT AGA CGT TGG AGA TTT TGA TTT AGC CAT TTC AGT GGA CCT TCT TTA GTG GAA CCT TAT ATA AGA TCA TTA TCT AGT GGT AAT GAT GTA GCA AAT GTG GGG

GGG AAT TTA AAG AGG TTA TAC GGG ACT GCG GGA GCG TTG TTC GCA AGT ATT GCT TTG ACA GAG TCT TTA GCG TGA GGF TTA ATG TTA TTA TAC GTA GGT AGA AGT GAA

GGA GGA GCC TCC TGA GCA GTA CTC GAA CTA CCT ATC TTA TGA GGT GTA AAT TTA GCC GCT GTT AAT GGA TAT TTA TFT GAT GGT AAT ACA CCT GCT AGA AGT CTT CAA

CGG ATA GTG ACA TCC TTG ACC TCT AGA TTT GGG TTA CCA CTG ACA GAT TTC TTT GGT GGT TAC AAA TTT TTC GCT GTA ATA GCT 'rAT TTC GAC TCT TAT TAT CCA AAA

GGG TTG ATT AAA AGA ATT ATT TTA TGA TCT GTG ATG TTA TTA GGA TTG ATA GGA GCA GGT ATT TCA ATT ACA ACA TTT GCA GTA AAT TTC GCA TGA CCA CCT TTA GAA

GAT GAT CCG CAT ATT GTA ATA AAT TTT GGG CAG ATT TTA CCA TGA GCT ACT TGA ATT GGT

TTA GTA GTT GCA TGT ATG TTC TTT ATG CCT TTT TTA TGA AGT CAA ATC

F~TGG AGC TGA CCT TTT TGA AAG ACG TTA [~TCA ATA AAA TCA AGT TTA TCA TTA CTT TAC ATT TTC TTT GTA GGG CCA AGT ACA ATT ATC TTT ACA ATA GCT GTA ACT ATG GAT CCT ATT ATA TTC GGT TGA AGT GCT ACA TAT GGA TTC ACA CAC GAT GCA ATG GTG TTA CAA CAT TCA GGT TTC TTA TCA ATC TTA GAA TCT TCA AGT GGG

ATA CTT TTT GCT CCG ATA ACC AAA ACG GGA GCA ATG GGT TTA TAT GCC GTA GTT CTA ATT CCT TAT CAI TTA GGT ArT ACT TTT TCT TTC TGA TAT CCA TGA TCA AGG

AGG AAT TTT AGT GGG AGA ATT AGA AAT ACA GAT GTT CCT CTA CCT CTA AAT ATT TTA TTA GGT ATT CAI' ATT TCT GGG TAT AGC AAA TTA AAT ATT CAA TCT TTT CAA

23 GTA ATA TTT GTG ATA TGT ATA AGT GCT GCA AAT ATG GAC TTA CCT CAT ATG ACA ACA TTC TTT GGI" ATG ATT ATT GGA TAC GGA GCT GGT TTA GTG TTC ATA TTT AAT

GCG GCA TTT TTT TTC TGT TTA GTC AAA TTT CAA CCA ATG TTA TTA TTA AGA GCT GAT CAA GGT ATG TAT GCA AGA TTA GTA TGA CAA TTA ATA TAT TAT AGT TTA [T-~

AGA AGG TTT TTT TGT AAT TTT GGG GAT TCT TTA GCA GCA GTA TCA TCA ACA GTA AGA CAT ATA GTC ACA GTT TTA AGT GTT TAT TTC CAA AGT ATA ACT AGC TCA TTT

AAA CAC TTT AGT TGG CTT AAT TTA ATA GTT TAC TTA TTC TTC GGT GGA CCA TTA AAT CTT ATA TAC GTT CCT ACT GGA GCT CAC TGA GGT AGT CAA GAT CCA TTT TCT

GTT AAT TTC GTT TTA AAA AAG TCA TCG GTT ATT TTA CAA TTA ~-~TCG GGA GTT TTA ATA AAT GCT ATA GGT CCA AGA TCT GCA GTA CAA GTT AGT GGA ATT TTA TTA TTT AAT TTC TGA AGT ACA GCC ATG GGT TTA ACA GGA CCT TCT GTT GTT CAT TTT TGA GTA CTT TTA ATG CCA TTT GGG CTA GTT AGT TTA CCT AAA TTT AGA CGA ACT

TTA GGT AAA GTC TAC TTT AAT TCA CTT AGA ATA GGA CTT TGT AGT TCC AGA CTA ACA TTC ACA ATG GAC ATT ATG TTA CAC CCT TTC AGA TCA CAA CGT CCT TTG

CTA TCT AAA AAT AGA TTA AGT ATT TAT ATG ATA TTT AAT ATA CAT TTA TTG TCT TCA TTC ATA TCT GTG AAA TTA GCG "TAT AAA ATA AGA ATC GGA GCT CAT TCA

Fig. 3. Nucleotide sequence of the N. crassa CO I gene. The sequence is of the nontranscribed strand, beginning with the PstI site preceding the gene and ending at the TAA codon terminating the gene. The two palindromic sequences upstream from the gene are underlined, the inframe TAA codons preceding and following the gene, as well as the initiator codons are boxed

cytochrome b gene is homologous to the 14 intron or to parts of exons A4 and A5 of the yeast CO I gene. From the results summarized in Table 1 we concluded that the gene for CO I is located, probably entirely, on H7c. Therefore, we have sequenced the coding part of the CO I gene and its flanking regions contained in the plasmid pBH7c. For sequencing, we used the restriction sites shown in Fig. 2 for cleavage and/or labeling. Figure 3 shows the sequence obtained. The sequence coding for the N-terminal part of the mature protein, starts at nucleotide position 475 with a serine TCA codon. The N-terminal protein sequence, as determined by Werner et al. (1980), could be confirmed entirely. Discussion

The knowledge of the DNA sequence of the CO I gene allows us to draw several conclusions regarding gene organization, codon usage and differences to other CO I genes. Table 2 shows the codon usage in this gene.

As usual for all mitochondrial genes sequenced so far, UGA codes for tryptophan (Coruzzi and Tzagoloff 1979; Barrell et al. 1979; Heckman et al. 1980). Of the 15 tryptophan residues not even one is coded by the orthodox UGG. Out of these 15, 9 are present in the same position as in yeast and human. Arginine is present 15 times, 14 of which are encoded by AGA (13) or AGG (1). CGT is used once. Hence, the absence o f CGN triplets in the CO III gene of Neurospora (Browning and RajBhandary 1982) does not reflect the absence of a CGN isoacceptor from mitochondria. Finally, a strong bias against codons ending with C or G is evident. This is similar to the situation in yeast an in the Neurospora mitochondrial gene for CO III. The ATPase-proteolipid like gene described b y Van den Boogaart et al. (1982b) also shows a similar codon usage. In Fig. 4 we compare the Neurospora CO I amino acid sequence to those o f Saceharomyees cerevisiae D273-10B (Bonitz et al. 1980) and o f human (Anderson et al. 1981). The Neurospora gene is in some respects

J. C. de Jonge and H. de Vries: Neurospora Cytoehrome Oxidase Subunit 1 Gene

24 Table 2. Codon usage in the N. crassa CO I gene U

C

Phe Leu

Leu

C

UUU:

24

UUC:

24

UUA:

52

UUG:

5

UCU:

1

UCA:

17

UCG:

2

CUU:

CCU:

16

CUC:

CCC:

1

CCA:

11

CCG:

-

Pro

CUA:

AUU: Ile Met

G

Val

22

AUC:

6

AUA:

21

AUG:

ACU: ACC:

-

ACA:

19

18

ACG:

1

Thr

GUU:

14

GCU:

16

1

GCC:

3

GUA:

16

GCA:

15

GUG:

6

GCG:

4

Ala

G

Tyr

UAU:

17

UAC:

7

Cys Trp

His Gin

7

GUC:

N.C. S.c. H.

A.

14

UCC:

Set

CUG:

A

A

Asn

-

I CGU: CGC:

CAA:

12

CAG:

1

AAU:

18 8

GAU:

10

GAC:

3

GAA:

7

GAG:

2

1

CGA: CGG:

Ser Arg

Gly

lO 15 F'HE LEUISER THR ASNIALA LEU TYR,SER THR ASN ALA

-2 1 5 MET SER SER ILE SER ILE TRF' THR G L U ~ MET VAL GLN MET F'HE ALA

UGG:

4

AAG:

Gtu

15

10

AAA:

Asp

-

UGA:

CAC:

Arg

2

UGC:

CAU:

AAC:

Lys

UGU:

AGU:

20

AGC:

3

AGA:

13

AGG:

1

I GGU:

23

GGC: GGA:

18

GGG:

10

yeast ~xon A]

LEO

35

u ~ ASF' ~LEIGLY VALILEU TYRIL.EU ILE F'HE ALA LEU PHE sERFG'~'ILEu LEUIGLY THR ALAl

I

LYS ASP ILE l L.YS ASP ILE I PHE ~ VAL MET ISEIRI [..EU LEU ]SERf L.EU

GLN LEU TYR SER GLN LEU PHE ASF' HIS ILE TYR

~

ALA-VAL LEU TYR PHE MET L.EU ALA ILE F'HE S E R ~ M E T ALA GL.Y THR ALA VAL LEJ GL.Y THR AL.A GLY THR LEU TYR L.EU LEU F'HE GLY ALA TRP ALA 55 LEO ~ HET ~ ~ER GLY~ VAL OLN rYR ~LE ALA A S F ' ~ ILE IIII~E ~RRGILEUIGLU LELJIAI.A ALAIPRO GLYISER GLN TYR LEU HIS GL.Y LEU ALAIGL..U L.ELJJGL.Y GLNIF'RO GLYIASN LEU - LEU - GLY

~

ILE THR A L . A ~ ILE L.EUIMET ILE F'HE]IF'HE M E T ~ GL.Y~LEU VAL VAL. GLYIHIS AL.AIVAL LEU MET ILE F'HE~THR LEU VAL'IL.E VAt. THR AL.AIHIS ALAIF'HE'vAL MET ILE F'HE]'PHE MET AI..A|IL.E

ALA LEU ILE GLY ~L.Y F.,~ o,.~ A S , I , F ' H ~ E - ~ I L E o ~ L ~ O . . . VAL FG~l SLY PRO [ 9 ~ THR IASF'I AL.A LEU ILE GLY GLY PHE GLY ASN ITYRILEUILEUIF'RO LEUIMET ILEIGLY i ALA PRO A S ~ ILE MET ILE GLY GLY PHE GLY ASN TRPILEUIVALIF'RO LEUIMET ILEIGLYI ALA •

~ER ~ ~ 0 MET DLA F'HE PRO ARG LEO ~ ~ L E THR ALA F'HE PRO ARG ILE IASN ASNIILE ALA]PHE TRF'IVAL MET ALA F'HEZ PRO ARG METIASN ASNIMET SERIF'HE TRF'ILEU

~ ~ ~



~

F'RO S E R ~ L E U MET GLYILEUIVAL PRO SER L L ~ L E U

LEU CYS LEU 135 VAL F'HE F ~ ALA CYS ILE FG-~IGLY [GLY ALA GLY THR ~L~ ~RP ~HRI~LE ~ R F'RO PRO[ VAL THR ISERI THR LEO ~AL m~OISER IOL~ ALA ~L~ T.R GLY TRF' THRIVAL TYR PRO F'RO LEU ALA ISERIALA MET VALIGLUIALAIGLY ALA GLY THR GLY TRF' THRIVAL TYR F'RO F'RO i

SER GLY VAL GLN SER F~-~SER ~ ~ SER SER I[..E GLN ALA ~ , E ~ ALA GLY ASN TYR SER SER GLY VAL THR SER ILE ALA GLY VAL

~

F'RO SER VAL ASP LEU A L A ~ A L A ~ ALA F'RO ~ER VAL ASP LEU ALA SER ALA VAL ASP LEU THR

LEO ~ S ~ R LEO iLEO ~L~IALA ~LE ILE' ~LYIALA

~LE AS~ ~'HE I L E t T H R ~ I L E V A L ~ THR LEU ~ ASH F'HE ILE VAL ILE ILE ~L~ AS~ ~'.E ILE THR

A]

AI

A1-A2-A3

A3-A4

A4

A4

A4

A4

25

J. C. de Jonge and H. de Vries: Neurospora Cytochrome Oxidase Subunit 1 Gene 195

B.

ARG THR PRO GLY ILE ARG LEU HIS t_YS LEU A L . A ~ G L Y ~-~AL..A VAL V A L ~ ARG THR ASN GLY MET THR MET HIS LYS LEU F'ROILEU F'HEIVAL ITRF'ISER ILE PHE LYS F'RO PRO AL.A MET THR GLN TYR GLN THR F'ROJLEU F'HEIVAL JTRPISER VAL LEU

A4

~

A4

VAL LEU LEU LEU LEU SER LEU PRO VAL LEU'IALA GLY ALA ILE THR MET LEU LEU THR F'HE LEU LEU LEU I.EU SER LEU PRO VAL. LEU SER ALA GLY ILE THR MET LEU LEU LEU VAL LEU LEU LEU LEU SER I_EU PRO VAL LEJJALA ALA GLY ILE THR MET LEU LEU THR 235 ASF' ARG ASN F ' H E ~ S E R ~ G L . U THRIALA GLY GLY GLY ASF' F'RO ILE LEU, PHE ASP ARG ASN PHEIASN THRISER GLU VAL ALA GLY GLY GLY ASP PRO ILE LEU TYR ASP ARG ASN _EUiASN THR~THR ASP PRO ALA GLY GLY GLY ASP PRO ILE LEU TYR

I

A4

255

GLN HIS LEU PHE TRP PHE PHE GLYIHIS PRO GLU VAL TYR ILE LEU ILEIILE PRO GLY F'HEI GLU HIS LEU F'HE TRF' PHE F'HE GLY HIS PRO GLU VAL TYR ILE LEU ILE I ILE PRO GLY F'HE GLN HIS LEU F'HE TRF' F'HE F'HE GLY HIS PRO GLU VAL TYR ILE LEU ILE~LEU PRO GLY F'HE 275 - ASNFL-Y-qSER V A L ~ T Y R ILE GLY IL.E ~ T H R THR ILE SER ALA ~ ILEIILE SERIHIS VAL VAL SER THR LYSILYSIF'RO VAL GLU ILE SER METIILE SERIHIS ILE VAL THR TYR GLY LYSILYSJGLU PRO TYR MET GLY 295 .~;IILE ILEU GLY F ' H E I I L E ~ S E R HIS HIS MET'1 MET V A L I T Y R ~ M E T IiER ILE GLY' ~ IMET VALITYR ALA ER ILE GLY' .Y|LEU ILEU GLY F'HEILEUIvVALL TTRFF,'ISER HIS HIS METI [MET VALITRF' MET ER ILE GLY .Y|PHE [LEU GLY F'HEIILE ALA HIS HIS METI 315 TYR THR VAL GLY ILEU FAS-'~VAL IASF' THR IR ARG ALA TYR PHE THRIALAIALA THRILEUIILE ILE I TYR ILE VAL GLYILEUIASF'IALAIASF' THR ~R ARG ALA TYR F'HE THR[SERIALLA ITHRI I M [~LE E T~LEE ]R ARG ALA TYR PHE THRJSER MET PHE THR VAL GLYIMETIASF'IVALIASP THR 335 4E SER TRP LEU ALA THRICYS T Y R F ~ ' ~ G L Y SER ILE VAL PRO THR G L Y I I L E F ~ ILEIPHE ~E ILE PRO THR GLY|ILEILYS fILE F'H E SER TRP LEU ALA THR I ILE TYRIGLYIGLY SER ILE ]E SER TRF' LEU ALA THRiLEU HIS IGLY}SER ASN MET ILE PRO THR GLYIVALILYSIVAL PHE 355 VAL ~ M E T ~ ILE ~ ARG LEU THR PRO SER MET F~-~PHEFA-~-]LEU GLY MET ARG LEU ALA THR PRO METILLEE~ITYRIALAIILE ALAIPHEI LEUI PHEI LEU ILE I PHEILEU VAL LYS TRP SER ALA ALA VAL TRF'IALAILEUGLY

~

~

A4-A5

A5

A5

A5

A5

A5

~7~

S E R ~ V A L VALILEU ALA ASNIALA .A ISER LEU ASPIILE ALA PHEIHIS ASP THR TYR TYRI .A ISER LEU ASPIVAL ALA PHEIHIS ASP THR TYR TYRI THR VAL ALAiLEU ALA ASNIALA 'R ISER,LEU ASF'IILE VAL LEU HIS ASP THR TYR TYRI THR ILE VAL|LEU ALA ASNISER

~

~U sER~MET GLY A L A ] V A L F ~ E I A L A MET PHE S E R ~ ' Y 1 ALA HIS F'HEi~IS TYR UAL LEU •.'U SER MET GLY A L A I I L E I P H E I S E R LEU F'HE ALA GLY HIS PHEI[FYR TYR VAL LEU - U S E R MET GLY ALAIVALIF'HEIALA ILE MET GLY ALA HIS F'HEI'HIS TYR UA~ LEU 415 LEU ASN TYR ASN MET VAL LEU SER LYS ALA TRF' TYR HIS F~R-~VAL F~ROl LYS ILE LEU LEU ASN TYR ASN GLU LYS LEU ALA GLN ILE TYR TYR T Y R ~ SERiF'RDIGLN ILE LEU TYR THR LEU ASP GLN THR TYR ALA LYS ILE PHE ILE HIS PHEIF'ROILEUF'HE SER 435 GLNFFA-EITRF' LEU LEUlPHE ILE GLYIUALFRS-~LEUTHRI PHE PHE PROIGLNIHIS PHE LEU'GLYI GLNIF'HEITRP LEU ILE PHE ILE GLY ALAIASNIVAL ILE I PHE PHE PROIMETIHIS PHE LEU GLY HISIF'HEITHR ILE MET PHE ILE GLY VAL ASL~LEU THRI PHE PHE PROIGLNIHIS PHE LEU GLY 455 LEU GLN GLY MET PRO ARG ARGIILE SER ~R]ASF' I TYR PRO ASP ALAIPHE SER GLYITRP ASNILEU ILE ASN GLY MET PRO ARG ARGIILE PRO I ~OIASP TYR PRO ASP ALAIPHE ALA GLYITRP ASNITYR LEU SER GLY MET F'RO ARG ARGITYR SER I ;RIASP TYR PRO ASP ALAITYR THR THRITRP ASNII_LE 475 ILE SERF~-~PHEIGLY SER ILE VAL SER VAL VAL ALA SER T R P ~ P H E LEU TYR ILE VAL VAL ALAISERIILEIGLY SER PHE ILE ALA THR LEU SER LEU PHEILEUIPHE ILE TYR ILE LEU LEU SERISERIVALIGLY SER PHE ILE SER LEU THR ALA VAL METILEUJNET ILE PHE NET I L E

A5

A5-A6

A6

A6

A6

A6

495

TYR ILE GLN LEU VAL GLN.GLY GLU TYR ALA GLY ARG TYR PRO TRP S E R . I L E PRO GLN PHE TYR ASP GLN LEU VAL ASNIAsN LYS SER VAL ILE TYR ALA LYS ALA PRO !l a s p phe va] glu --o TRP GLU ALA PHE ALA SER'LYS ARG LYS UAL LEU NET VAL GLU GLU PRO" TYR THR ASP SER LEU ARG ALA LEU LEU ASN ARG~E-~ TYR PRO SER L E u F G ' ~ I T R P SER ILE ser asn thr ile phe asn ]eu asn thr val lys|SER IISER SER I L E I G L U I P H E LEU LEU --

ISERI

IMET

SER SER IPRO PROILYS PRO FA-~ISERF~H-EIVAL SER LEU PRO LEU THR SERIF'RO PROIALA VAL~SERIPHEIASN THR PRO ALA UAL GLY CYSIPRO PROIPRO TYR THRPHLP.H~JGLU GLU PRO VAL TYR LEU SER PHE PHE ARG LEU SER SER TYR GLY GLU GLN LYS GLU

ASN L E U I G L U I T R P LEU TYR 535 8LN SER SER SER PHE PHE ~LN SER MET LYS SER 555 I L E SER GLY ARG GLN ASN

A6-A7-i7

i7-A8

A8

Fig. 4A and B. Comparison of the aminoacid sequences of CO 1 from N. crassa (upper lines), S. cerevisiae D273-10B (middle lines) and human (lower lines), as deduced from their DNA sequences. The Met of yeast at position 90 was inferred in view of the recent finding that AUA codes for Met in yeast (Hudspeth et al. 1982). The exon-exon junctions in yeast are designated by vertical lines. At the A1-A2 transition a Gly has been added by us, whereas we have left out a Cys at the A2-A3 transition (*). The aminoacids of the possible yeast intron 7 are inserted in lower case letters

26

J.C. de Jonge and H. de Vfies: Neurospora Cytocbrome OxidaseSubunit 1 Gene

Table 3. Number of identical aminoacids between the subunits I of cytochrome oxidase of different species

N. crassa S. cerevisiae H. sapiens

N.c.

S.c.

H.s.

555 341 320

512 298

513

Invariant: 262 residues

strikingly different from both the yeast and the humara (and other mammalian) CO I genes. From the sequence of the yeast CO I gene (Bonitz et al. 1980) it appears most probable that the mature CO I protein has no N-terminal extension. Furthermore, the yeast gene shows a very complex organization: 5 to 7 introns in D273-10B, even three more in KL14-4A (Hensgens et al. 1982). All except one of these introns contain open reading frames. Also a structural and functional relationship of CO I intron i4 and cytochrome b intron i4 has been shown (Bonitz et al. 1980; Netter et al. 1982; Grivell et al. 1982). The mammalian CO I genes, on the other hand, are uninterrupted. Like in yeast, the mammalian protein is not synthesized as an N-terminally elongated precursor (Anderson et al. 1981). The Neurospora gene has the following features: 1) At the 5' end of the gene, an ATG codon is present at 469 (amino acid position -2) upstream from the N-terminal serine codon. No other in-frame ATG codons are present between the in-frame TAA stopcodon at 327 and this ATG. The precursor of CO I has an apparent molecular weight of 45 kDal, 4 kDal more than that of the mature subunit. It may seem peculiar that a difference of only 2 amino acid residues could result in such a large difference in mobility on SDS gels. It should be noted, however, that mobilities on gels in many cases, notably for mitochondrially synthesized proteins, do not reflect true molecular weights. Therefore, the 4 kDal difference may well be caused by a conformational change induced by the removal of the N-terminal dipeptide formyl-methionyl-serine. If the ATG at -2 were not the initiation codon, we would be faced with a problem: no other in.frame ATG codons are present at the 5' terminal part. In this case, one would have to invoke splicing of a prepiece sequence or the use of other codons than ATG for initiation. The ATA codon, e.g,, which is used for methionine in animal mitochondria and for intemal methionine in yeast mitochondria (Hudspeth et al. 1982), is present several times upstream from the ATG codon. In this respect it is worthwhile mentioning that the N-terminal part of the Aspergillus nidulans CO I gene does not appear to contain any ATG codon at all, whereas several ATA codons were

found (R. B. Waring, pers. commun.). It has been shown unambiguously, however, by Browning and RajBhandary (1982) that in Neurospora AUA does not code for methionine, but for isoleucine. Nevertheless, we can not completely exclude the possibility that the initiator tRNAMet, which has a non-modified CAU anticodon (Heckm~ et al. 1978), recognizes both AUG and AUA. 2) The 5'-part of the sequence contains two palindromic regions, one being a long hairpin of T's and A's, starting at position 150:24 T's followed by 17 A's, 1 G and 3 A's. The other palindromic sequence is an 18-bp perfect palindrome, ATCACCTACGTAGGTGAT. The latter structure is very much reminiscent of the 18-bp palindromes, containing double Pst-sites found by RajBhandary's group at the 5' and 3' ends of all genes (or groups of genes) present in the rRNA-tRNA region (Yin et al. 1981) and around the CO III gene (Browning and RajBhandary 1982). These authors suggested a role in RNA processing for these palindromic sequences. It is possible that a similar role is fulfilled by the 18 bp palindrome reported here. The function of the long poly T-poly A tract is not clear. It should be emphasized, however, that this sequence as well as the 18-bp palindrome, is situated in a long open reading frame of rather weird composition. The reading frame, starting at position 25 with an ATG and terminating at the TAA at 325, potentially codes for a protein of 100 amino acid residues, and hence can be considered as an unassigned reading frame (URF). The overall polarity of the hypothetical product is relatively high (42%), whereas the codon usage is quite abnormal for mitochondria: CCG, AAG, AGG, ACG, TGG, TCC, all of which are not, or only once, found in the CO I gene. Hence, we feel that it is doubtful that this URF, with its poly T-poly A tract really codes for a protein. Besides the palindromic sequences mentioned earlier, there is a long stretch of very A+T-rich DNA between nucleotides 315-420 (over 85% A+T). It is possible that this region contains a promoter site or a ribosome binding site, as suggested by Browning and RajBhandary (1982) for the CO III gene. 3) The gene has a continuous reading frame of 1,665 bp. No introns are present, at least for the first 1,449 bp. Hence, none of the large introns il, i2, i3, i4 and i5 present in yeast D273-10B has a counterpart in Neurospora. An uncertainty in yeast is whether or not the part of the gene downstream of intron i5 is interrupted by small introns, or constitutes one exon. Bonitz et al. (1980) came to the preliminary proposal, based entirely on comparison of the human and yeast sequences, that the C-terminal part is composed of 3 exons and 2 introns: exon A6 (76 triplets) - intron i6 (7 triplets) - A7 (10 triplets) - i7 (15 triplets) - A8 (25 triplets). There are, however, no compelling reasons for this gene arrangement: the aminoacid homology

J. C. de Jonge and H. de Vries: Neurospora Cytochrome Oxidase Subunit 1 Gene between yeast and human sharply drops below 50% halfway exon A6 and stays low through the remainder of the gene. Hence, it may be equally well possible that yeast has a continuous exon of 133 triplets. The Neurospora gene sequence cannot clarify this point, since the C-terminal part is different from both the yeast and the human sequences: a strong aminoacid homology to yeast is found until the A6-i6 transition (.position 481). Then a sequence coding for 25 aminoacids is present having neither DNA nor aminoacid homology to either yeast i6 + A7 + i7 (7 + 10 + 15 triplets) or to the human sequence at the comparable position. After this nonhomologous region moderate homology between the three sequences is again found between aminoacids 507 through 533 of Neurospora on the one hand and the A8 sequence of yeast and the C-terminal part of the human gene on the other. Both in yeast and in Neurospora, therefore, it is not clear if this part of the gene is coding or non-coding. Since the entire sequence is continuous, however, we consider it most likely that this region is coding. A similar situation near the C-terminal end of the CO I gene has been found in Aspergillus nidulans (R. B. Waring, pers. comm.). Finally, at the position where the yeast, Aspergillus and mammalian genes terminate, the Neurospora gene has an entirely different structure: an additional stretch, coding for some 24 aminoacids is present before a TAA stopcodon is encountered. It is not yet established if this sequence codes for the C-terminal part of the mature protein. Alternative possibilities are that after transcription the 3' terminal sequence is removed by RNA processing, or that a Cterminal peptide is proteolytically removed from a or the precursor. Regarding the first alternative, preliminary hybridization results do not suggest the presence of low-molecular processing products. The second possibility may seem rather unusual, but needs experimental testing at the protein level. Summarizing, the features of the Neurospora CO I gene which set it apart from the other two species are: an N-terminal prepiece of, probably, 2 aminoacids; a 23 aminoacids long C-terminal extension as compared to the yeast protein. It is difficult to imagine a function for the "extra" piece at the C-terminus. As compared to the rest of the protein it does not have a very hydrophobic composition, which precludes an anchoring function at the membrane. One conclusion regarding the intron-exon boundaries of the yeast CO I gene can be inferred from the Neurospora sequence. Bonitz et al. (1980) offer as the most probable choice for the A1-A2 transition: Asn at 59 Leu at 61. For the A2-A3 transition they proposed: Cys (denoted by a *) between 71 and 72 - Thr at 72. In view of the strong homology of the Neurospora and yeast sequences in that region, we expect that one more aminoacid should be present at the A1-A2 transition,

27

the (Gly) at 60. Furthermore, we regard the Cys at the end of A2 highly unlikely, since neither the Neurospora nor the human gene have an aminoacid at this position. 4) There are several regions in CO I where the aminoacid sequences are highly conserved among the three organisms. Particularly the long stretches of hydrophobic aminoacids in the middle part of the protein are extremely homologous. Also the charged aminoacids are usually located at the same position inN. crassa, yeast and human (39 are identical, if Arg-Lys and Gin-Asp transitions are included, 9 positions are shared only withone of the other species, 11 are unique for N. crassa). A strong conservation is found for the positions with aromatic aminoacids: of the 87 aromatic residues 46 are invariant, whereas 12 are Trp/Tyr/Phe substitutions. Table III shows the overall aminoacid homology among the CO I genes of the three organisms. The homology of Neurospora CO I to the human protein is hardly less pronounced than to the yeast sequence, which indicates that the evolutionary distance between the two Ascomycetes is considerable indeed (cf. Ktintzel et al. 1981) The N. crassa mitochondrial CO I gene codes for a protein of maximally 555 aminoacid residues, if no splicing, trimming or other processing of RNA or protein occurs. The predicted molecular weight of the protein then is about 60,000. The overall polarity, calculated according to Capaldi and Vanderkooi (1972), is only 31.3%. This is in agreement with the localization of CO I within the mitochondrial inner membrane. In conclusion, the DNA sequence of the N. crassa CO 1 gene does not by itself provide all the answers about the organization of the gene and the processing of its gene products. A preliminary result obtained by Northern blot hybridization is that the major transcript of the CO I gene has the surprising length of about 6,000 nucleotides (E. Agsteribbe and H. de Vries, unpublished). Further characterization of the transcript(s), of the boundaries of the long transcript on the mitochondrial DNA and of its possible processing is in progress in our laboratory.

Acknowledgements. This research was supported in part by a grant to A. M. Kroon from the Netherlands Foundation for Chemical Research (S.O.N.) with financial aid from the Netherlauds Organization for the Advancement of Pure Research (Z.W.O3. We wish to thank Drs. Etienne Agsteribbe and Ab Kroon for their continuing interest and criticism, Marijke Holtrop for advice and help, John Samallo for his help in some critical experiments and Karin van Wijk and Rinske Kuperus for typing the manuscript. We appreciate the sending of sequence data of the Aspergillus CO I gene by Dr. R. B. Waring.

28

J.C. de Jonge and H. de Vries:

Note In the final stage o f this project, we learned that Burger, Scriven, Machleidt and Werner have also sequenced t h e Neurospora CO I gene. Their data, w h i c h have b e e n subm i t t e d for publication elsewhere, and ours are in c o m p l e t e agreement, wich the e x c e p t i o n o f two positions o f m i n o r importance.

References Agsteribbe E, Samallo J, Vries H de, Hensgens LAM, Grivell LA (1981) In: Kroon AM, Saccone C (eds) The organization and expression of the mitochondrial genome, pp 51 59 Anderson SA, Bankier AT, Barrell BG, Bruyn MHL de, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young IG (1981) Nature 290:457-465 Barrell BG, Bankier AT, Drouin J (1979) Nature 282:189194 Bertrand I-I,Werner S (1979) Eur Biochem 98:9-18 Birnboim HC, Doly J (1979) Nucl Acids Res 7:1513-1523 Bonitz SG, Coruzzi G, Thalenfeld BE, Tzagoloff A, Macino G (1980) J Biol Chem 255:11927-11941 Boogaart P van den, Dijk S van, Agsteribbe E (1982a) FEBS Lett 147:97-100 Boogaart P van den, Samallo J, Agsteribbe E (1982b) Nature 298:187 189 Browning KS, RajBhandary UL (1982) J Biol Chem 257:52535256 Capaldi RA, Vanderkooi G (1972) Proc Natl Acad Sci US 69:930-932 Coruzzi G, Tzagoloff A (1979) J Biol Chem 254:9324-9330 Goodman HM, MacDonald RJ (1980) In: Grossman L, Moldave K (eds) Methods in enzymology, vol. 65. Academic Press, pp 75-90

Neurospora Cytochrome Oxidase Subunit 1 Gene

Grivell LS, Hensgens LAM, Osinga KA, Tabak HF, Boer PH, Crusius JBA, Laan JC van der, Haan M de, Horst G van der, Evers RF, Arnberg A (1982) In: Slonimski PP, Borst P, Attardi G (eds) Mitochondrial genes. Cold Spring Harbor Press, pp 225-240 Heckman JE, Hecker LI, Schwartzbach SD, Barnett WE, Baumstark B, RajBhandary UL (1978) Cell 13:83-95 Heckman JE, Sarnoff J, Alzner-de Weerd B, Yin S, RajBhandary UL (1980) Proc Natl Acad Sci US 77:3159-3163 Hensgens LAM, Bonen L, Haan M de, Horst G van der, Grivell LA (1983) Cell (in press) Hudspeth MES, Ainley WM, Shumard DS, Butow RA, Grossman LI (1982) Cell 30:617-626 Kiintzel H, Heidrich M, Piechulla B (1981) Nucl Acids Res 9:1451-1461 Machleidt W, Werner S (1979) FEBS Lett 107: 327- 330 Macino G (1980) J Biol Chem 255:10563-10565 Maxam AM, Gilbert W (1980) In: Grossman L, Moldave K (eds) Methods in enzymology, vol 65. Academic Press, pp 4 9 9 560 Netter P, Jacq C, Carignani G, Slonimski PP (1982) Cell 28:733738 Sebald W, Machleidt W, Otto J (1973) Eur J Biochem 38:311324 Southern EM (1975) J Mol Biol 98:503-517 Terpstra P, Vries H de, Kroon AM (1977) In: Bandlow W, Schweyen RJ, Wolf K, Kaudewitz F (eds) Mitochondria 1977. De Gruyter, pp 291-302 Van't Sant P, Mak JFC, Kroon AM (1981) Eur J Biochem 121:21-26 Werner S, Machleidt W, Bertrand H, Wild G (1980) In: Kroon AM, Saccone C (eds) The organization and expression of the mitochondrial genome, pp 399-411 Yin S, Heckman JE, RajBhandary UL (1981) Cell 26:325-332

C o m m u n i c a t e d b y R. J. Schweyen Received October 4, 1982 / November 19, 1982

The structure of the gene for subunit I of cytochrome c oxidase in Neurospora crassa mitochondria.

We have sequenced the gene for cytochrome c oxidase subunit 1 (CO I) in Neurospora crassa mitochondrial DNA. The gene is coded by the same strand as t...
748KB Sizes 0 Downloads 0 Views