Biochimica et Biophysica Acta, 1140 (1992) 105-134

105

© 1992 Elsevier Science Publishers B.V. All rights reserved 0005-2728/92/$05.00

Review

BBABIO 43745

Conservation of sequences of subunits of mitochondrial complex I and their relationships with other proteins I a n M. F e a r n l e y a n d J o h n E. W a l k e r M.R.C. Laboratory of Molecular Biology, Hills Road, Cambridge (UK) (Received 18 June 1992)

Key words: Complex I; Mitochondrion; Sequence conservation; Sequence relationship

Contents I.

General introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

105

II.

Sequence comparison procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

107

III. Subunits of complex I encoded in mitochondrial D N A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Scope of the comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Hydrophobicity of ND1-ND6 and ND4L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. The ND1 subunits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Comparison of ND1 with a bacterial glucose dehydrogenase . . . . . . . . . . . . . . . . . . . . . . . . . . E. The ND2 subunits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. The ND3 subunits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. The ND4L subunits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H. The ND4 subunits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I. The ND5 subunits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Relationship of ND5 to a product of the Escherichia coli hyc operon . . . . . . . . . . . . . . . . . . . K. The ND6 subunits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L. Relationships between subunits ND2, ND4 and ND5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

107 107 108 110 110 111 111 112 112 116 116 116 117

IV. Subunits of complex I encoded in nuclear genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Subunit functions and complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Nucleotide binding proteins in complex I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. The 51-kDa subunit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. The 39-kDa subunit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. The iron-sulphur proteins of complex I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Other conserved subunits of complex I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

118 118 119 119 121 123 125

V.

128

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

132

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

132

I. Introduction Complex I, or NADH:ubiquinone oxidoreductase, provides the entry point for electrons from NADH into the electron transport chain of mitochondria. The elec-

Correspondence to: J.E. Walker, M.R.C. Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK.

trons pass via a series of enzyme bound redox centres to the mobile electron carrier ubiquinone, which is reduced to ubiquinol. During this process, redox energy is converted into an electrochemical proton potential gradient across the inner membrane of the mitochondrion, and for each electron pair that passes through the enzyme it is generally considered that four protons are translocated out of the mitochondrial matrix (for reviews see Refs. 1-3).

106 TABLE 1

Bovine complex I is now known to be an assembly of at least 41 different polypeptides [3,4]. It is the most complex enzyme that has been characterised, the total protein sequence in its subunits exceeding the combined sequences of all of the constituent polypeptides of the Escherichia coil ribosome. Seven hydrophobic intrinsic membrane subunits, known as ND1-ND6 and ND4L, encoded in the mitochondrial genome [5-7], have been sequenced in a wide range of species. Homologues of these subunits are also encoded in the chloroplast genomes of higher plants [8-10]. The remainder of the constituent polypeptides of mitochondrial complex I are nuclear gene products that are imported into the organelle from the cytoplasm of the cell, and the sequences of 32 of the 34 or more such subunits are now known in the bovine enzyme [3,11-21]. Homologues of some subunits have also been sequenced in other species, notably in Neurospora crassa (see Table I). In addition to the homologues of N D 1 - 6

Homologues of nuclear encoded subunits of bot~ine mitochondrial complex I in Neurospora crassa No equivalents of the N. crassa 21.3-kDa and the 29/21-kDa subunits [107108] have been detected in bovine complex 1. Cow 75 kDa 51 kDa 49 kDa 39 kDa 30 kDa PGIV 18 kDa B13 SDAP

(IP) (FP) (IP) (IP) (IP)

N. crassa

Reference

78 kDa 50 kDa 49 kDa 40 kDa 31 kDa 22 kDa 18.3 kDa 29.9 kDa ACP

[14,109] [17,109] [11,110] [12,85] [16,100] [18,99] [4,102] [4,1(13] [15,101]

and ND4L, homologues of at least four of the nuclear encoded subunits of complex I are encoded in chloroplast genomes [11,16,18,21]. Genes for the 49- and

0

1

2

3

4

5

6

7

I

I

I

I

I

I

I

I

8kb I

,,o3 7oo

0,.,v.,1,0261063,6,

®

}

R capsu~tus

P denitnficans

P tetraureha (777-2142 & 12, 207 14, 917)

-10~

N ..............

::

E coli fhl ndhE

ndhA

ndhH (49 kDa)

FeS

ndhK

Synechocystis sp. PCC 6803 M. polymorpha (51,233 - 52, 877) N, tabacum (50,981 - 82, 659) O. sativa (47,988- 49, 662)

Fig. 1. Clusters of genes encoding subunits of complex I in various species. Genes coding for proteins related to complex I subunits are shaded. The arrows at the end of each gene indicate the direction of transcription. In the upper part, regions are compared from the chloroplast genomes of three different plants. The genes designated ndhH (formerly known as O R F 392 or ndh392, and ORF393 or ndh393) encode proteins that are homologous to the nuclear encoded 49-kDa subunit of bovine mitochondrial complex I [11], and ndhA (ndhl), ndh6 and ndhE (ndh4L) encode homologues of subunits of N A D H dehydrogenase that are encoded in mitochondrial DNA. The ndhI genes were formerly called frxB, and are the chloroplast equivalents of the TYKY subunit of mitochondrial complex I [18]. The psaC gene (formerly /rxA) codes for a part of photosystem I [118,119]. In the middle part of the diagram are shown the arrangements of genes encoding homologues of complex I subunits in Rb. capsulatus [29] and P. denitrificans [26]. NQO3, the gene for the 66-kDa subunit from Paracoccus denitrificans (the homologue of the bovine 75-kDa subunit) has been sequenced. It is found beyond the 3' end of NQO1 in an unspecified location [26]. 'Urfs' are unidentified reading frames. In the lower part of the diagram are shown regions from the mitochondrial genome of P. tetraurelia, an operon (hyc) encoding subunits of formate hydrogen lyase (fhl) in E. coil, and other loci in a cyanobacterium and plant chloroplast genomes. In P. tetraurelia, NdhC, NdhH, NdhJ and NdhK are homologues of the bovine ND3, 49-, 30-kDa and PSST subunits, respectively; homoiogues of ND1, ND2, ND4 and ND5 have also been identified in its mitochondrial DNA. R p s l 2 is a mitochondrial ribosomal component in P. tetraurelia. The relationships between the E. coil Hyc genes encoding components of formate hydrogen lyase and complex I subunits are indicated below the genes. HycF is a potential iron-sulphur protein of unknown function, ndhC is the homologue of the gene for ND3, and ndhJ and ndhK encode homologues of the nuclear encoded 30-kDa and PSST subunits of complex I [16,62]. ndhK was formerly known as psbG. ndhA and ndhH have the same meaning as above. In M. polymorpha, ndh3 is the equivalent of ndhC in other species. The diagram is based upon data published in Refs. 9,50,10,60, and 32. The scale is in kilobases.

107 30-kDa subunits are in nuclear DNA in many species, but in Paramecium tetraurelia their homologues are encoded in mt-DNA [22] (see Fig. 1), and a homologue of the 30-kDa subunit has also been found to be encoded in mitochondrial DNA in the slime mould, Dictyostelium discoidium [23]. An NADH :ubiquinone oxidoreductase complex, related to mitochondrial complex I has been identified in bacteria [24]. No intact bacterial complex has yet been purified, but they may have fewer subunits than the mitochondrial enzymes, and the sequences of some subunits have been determined in Paracoccus denitrificans [25-28], Rhodobacter capsulatus [29] and Synechocystis [30]. The Hyc operon in Escherichia coli encodes subunits of formate hydrogen lyase [31]. Four of the Hyc gene products are related to complex I subunits [32] (see Fig. 1). The 51 and 24-kDa subunits of complex I and residues 1-200 of the 75-kDa subunit are closely related to the a and 3' subunits of an NAD + reducing hydrogenase from Alcaligenes eutrophus [17]. The hydrogenase has four non-identical subunits and its NAD + reductase activity is associated with the a y dimer. This review has two main purposes. The first is to examine the conservation of the individual subunits of complex I by comparison of homologous sequences from different species. In the case of the intrinsic membrane subunits in particular, little is known about their functions, and by such comparisons conserved amino acids that are likely to be important in the structure and function of these subunits can be identified. A second purpose of the review, also aimed at obtaining clues about the functions of complex I constituents, is to assess sequence relationships between subunits of complex I and other proteins of known function. The review is organized in two main sections. In the first section, the conservation of the seven subunits of complex I that are encoded in mitochondrial DNA is examined in a wide range of species, ranging from mammals to fungi. These sequences are also compared with related proteins encoded in chloroplast and bacterial genomes. In the second section, relationships between nuclear encoded subunits of bovine complex I and homologues in other species are presented, and weak relationships between the complex I subunits and diverse other proteins have been examined. II. Sequence comparison procedures Homologous protein sequences were aligned with AMPS (alignment of multiple protein sequences; ref. [33]). This program firstly uses the Needleman and Wunsch algorithm [34] to determine an order of similarity headed by the most similar pair of proteins. Then the sequences are aligned with MULTALIGN using a

length independent gap penalty of 8.0 and the scoring matrix MDM78 [35]. Each sequence was compared with the overall alignment at least three times. Pairwise comparisons of sequences were made with DIAGON [36] using MDM78 and a window length of 25 amino acids. Unless indicated otherwise, values exceeding a threshold score of 280 were recorded. Hydrophobicity profiles of subunits were calculated with a sequence window of 11 amino acids using HYDROPLOT, a version of SOAP [37]. When a sequence, such as P. tetraurelia ND2, was too distant from its relatives to be aligned correctly by MULTALIGN, regions of homology detected by DIAGON were included in the alignments (see Fig. 5). In regions of ND5 and ND6 it was found to be necessary to make further adjustments to sequence alignments based upon their hydrophobicity profiles (see below). The proposed positions of transmembrane helices in ND1-6 and ND4L (see Figs. 3, 5-9) refer only to the bovine proteins. In aligning other weakly homologous sequences it was sometimes found necessary to introduce gaps in these membrane spanning regions. It may be that these subunits have somewhat different secondary structures to the bovine enzyme in these regions. However, it is difficult to come to an overall alignment that satisfies both sequence data and hydrophobic profiles.

III. Subunits of complex I encoded in mitochondrial DNA

IliA. Scope of the comparisons Comparisons have been made of the sequences of complex I subunits encoded in the mitochondrial DNA of the following representative species; cattle, Bos taurus [5], the fin whale, Balaenoptera physalus [38], chicken, Gallus gallus [39], African clawed toad, Xenopus laevis [40], Atlantic cod, Gadus morhua [41], sea urchin, Strongylocentrotus purpuratus [42], fruit fly, Drosophila yakuba or D. melanogaster [43, 44], nematode worm, Caenorhabditis elegans [45], the protozoan ciliate, P. tetraurelia [22], and the ascomycete, N. crassa [46-49]; C. Breitenberger and K. Browning, personal communication]. The comparisons include NdhA - Ndh F, homologues of ND1-5 and ND4L encoded in the chloroplast genome of the tobacco plant Nicotiana tabacum [9], Ndh G, an ND6 homologue encoded in the chloroplast genome of the liverwort, Marchantia polymorpha [8,50], the ND3 protein from wheat (Triticum aestivum) mitochondria [51] and NdhA (ND1) from Rhodobacter capsulatus [29]. In discussion of the sequences in the text, amino acid residue numbers refer to the bovine sequence, unless stated otherwise.

108 III-B. Hydrophobicity of ND1-ND6 and ND4L

cytochrome b component of the cytochrome bc 1 complex, and ATPase-6 and A6L, two intrinsic membrane components of the ATP synthase complex. Amongst these 13 hydrophobic proteins, CO I, CO III and cytochrome b are better conserved than ND1, the least

In addition to the seven components of complex I, many mitochondrial DNAs also encode the CO I, CO II and CO III subunits of cytochrome c oxidase, the

NDI

.

.

.

ND2

.

I

.

.

.

.

I

.

.

.

.

I

.

.

.

.

100

i

.

.

.

.

I

200

.

.

.

.

I

*

300

ioo

ND4L

ND3

i00

300

ND4

.

50

200

.

.

.

I

.

.

I

50

.

.

I

.

.

.

.

J

200

100

.

.

.

.

I

.

.

I

.

.

.

I

.

I

300

.

.

.

.

f

400

ND6

ND5

-

.

"'""'

I 100

200

V

I

300

I

~0

'V

I

400

I

'

I

500

I

. . . .

600

|

.

.

.

I

. . . .

I,

100

Fig. 2. Hydrophobicity profiles of the ND subunits of the bovine complex I. The positions of proposed membrane spanning hydrophobic segments are indicated by horizontal bars. The positions of the various segments are also indicated on the sequence alignments (Figs. 3, 5-9 and 11].

109

thereby constrained from evolving. These conserved regions are revealed below by comparison of their sequences. The sequences of ND1-ND6 and ND4L are considerably more hydrophobic than any of the subunits of complex I that are encoded by nuclear genes [3], and their hydrophobicities are reflected in their physicochemical properties. For example, ND1, ND2, ND3,

divergent of the seven complex 1 subunits. A6L is the most divergent of all (see Table II). The conservation of the cytochrome oxidase components may well reflect their central role in the catalytic mechanism of the enzyme [52], whereas the divergence of some of the sequences of the complex I subunits could be taken to suggest that only restricted regions of their polypeptide chains are intimately concerned with catalysis, and are 10 I B. taurus ND1 B . ;ghysalus G. gallus X. laevis S. p u r p u r a t u s D. m e l a n o g a s t e r C. elegans N. crassa P. tetraurelia N. tabacum Rb. ca19sulatus

20

30

80

90

100

;

i

110

I- *

**



130

120

C



*~

140

I

7o

*



150

I

160

D

170

I

k

AP IMALGLALTMWI P L P M P Y P L I ~ I ~ M L A M S S LAVY S I LWSGWASNSKYAL I G A L R A V A Q T I S Y E V T I ~ I L L S V L L ~ G S F T L S TL ITTQEQ APVLALTLALTMWSPLPMPYPL I--I~GVI~SLAVYS I L W S G W A S N S K Y A L I G A L R A V A Q T I SYEVTLAI I L L S V L ~ G S Y T L S T L A T T Q E Q TP I L A L L L A L T I W V P L P L P F P L A - - D L N L G L L F L L A M S S LTVY S LLWSGWASNSKYALI GALRAVAQTI SYEVTLAI I LLST IMLSGNYTLSTLAI TQEP A P T M A L A L A M S I W A P L P M P F SLA--DLNLG I LF ILALSS LAVYTI LG SGWS SNSKYAL I G A L R A V A Q T I SYI/VTLGL I LLC/4IMLAGGFTYTTLMTTQEQ SP L L F L A L A L L L W N F ~ T L - - D L Q L S L L L V L G L S S LSVYAI LGSGWASNSKYS L I ~ I R A V A Q T I SYE I SLAL I LLSL I IFS SSFNLTY IMNTQEF SP IF S L F L S L F V ~ F F V K L Y - - S F N L G G L F F L C C T S L G V Y T V ~ W S SNSNYALLGGLRAVAQTI SYEVSLAL ILLSF IF LI GS YNMI YFFF YQVY VPGI S F V V M Y L E W F T L P Y F F D F I--SFEYSVLFFLCL IGFSVYTTLI SG IVSKSKYGMIGAIRASSQSI SYEIAFSLYVLCI I I HNNVFNFVSK . . . . . GPVI TL IF SLLG YAVI PYGP SLVI QDVNLG I LYMLAVSS LATYG I L~.GWSANSKYAFLGS LRSAAQLI SYELVLSSAI LLVIMLTGSFNLGVNTE SQRA MP S L A G A V C Y T F W ~ S IWGP S L S ~ D V E Y N I V Y A S LLS I LFGLCVMLTGYF SKNKYSVMAGLRAA1124LNLE I FLG IVFLNVCFLVE SFSFAAFAVYQE I GP S I AVIS IFLS YSVI PFGD HLVLADLS I GVFFWIAI SS IAPVGLI/MSG Y G S N N K Y S F L G G L R A A A Q S I SYENTI SS ---MCVINI S TVFL I S SLAECERA P M L S LVLAL L A W V V V P F N E G W V ~ IN V A V L F V F A V S S LEVY G V I M G G W A S N SKYP FLGS LRSAAQMI SYEVSMGLI IVGVI I STGSb~/LSAIVEAQRG *

*

180

190

E

200

210

I

*

220

*

*

230

240

F

i

250

I

260

I

G

.... -MW-L I LPAWP I ~ IS T L A E T N R A P F D L T E G E S E L V S G F N V E Y A A G P F A - L F F I ~ Y A N I IMMNI FTAI LFLGTSHNP~MPE LYT IN-FT IKS L ..... LWLLFPSWP L A M ~ IS TLAETNRAPFDLTEGESELVSGFNVEYAAGPFA LFF LAEYAN I IMMNMLTAI LFLGTFHNP HNPELYTAN-L I IKTL ..... I YL I F SAWP LAMMWY I STLAETNRAPFD L T E G E S E L V S G F N V E Y A A G P F A - M F F L A E Y A N I M I ~ L T T V L F L I ~ S F I / q L P P E L F P IA-LATKTL .... - M W L I I P G W P M A A M W Y I STLAETI~RAPFDLTEGESELVS GFNVEYAGGPFA-LFS L A E Y A N 1 1 2 4 ~ L S YL ILFLGS SFMNQPELTTI S-LMIKSS . . . . SWF SLSC LP LFY I WFVS T L A E T N R A P F D L T E G E S E IVSGYNVEYAGGPFV-LFF IAEYAN I I I / ~ S V V L F LGGP SPLNNLFP I S I I IVG IKTT .... -MWF L I I LFPb~LLVWVS I S/2kETNRNPFDFAEGESELVSGFNVEYS SGGFA-L I F I ~ Y A S I I / ~ C V I F L P C .... DVFNLL- I YI~LT . . . . . . . F N L S L L I I Y IPFL IMVIAELNRAPFDFSEGESELVS G F N V E F A S V A F V - L L F L S E Y G S L I FF SVLS S A ~ F . . . . . KFSIF-MAFS IF . . . . . V L F V L P L L P IF I IFF IG S IAETNRAPFD LAEAE S ELVS G F I ~ H A A V V F V - F F F LAE Y GS IVI/4C I LTS I LF LGGYLF INLKDVFNI LDFVYSNL ..... F W L IF LFFF LLSN I L L V F L L E V N R T P F D L A E A E S E L V T G Y T T E Y G G F Y F A - L F Y L G E Y F H L F F F SC LI SVVFFG S ..... L P F D L P R S R R R I S SRVSNRI F R Y Q IWFDFT------VASYLNLLVS S L F V T V L Y L G G W - N L S IP Y I F V P E L F G INKRGKVFGTL I G IF I TLAKTY DF GL L N W Y W L P H L P M V A L F F I SALAETNRPPFDLPEAES ELVAGFMVEY SS TP YL-LFMAGEY IAVWI/4CALTSVLFFGGWL SP IP GVPDGVLWMVAKMA

270

G B. taurus NDI B. p h y s a l u s G. gallus X. laevis S. p u r p u r a t u s D. m e l a n o g a s t e r C. elegans N. crassa P. tetraurelia N. taba cure Rb. capsulatus

60

I~NI LTL I LP I L L A V A F L T L V E I ~ I L G Y M Q F I ~ P N IVGP H G L L ~ F A D A I K L F T ~ P L R P A T S S ~ II M T L P T L T N L L IMTLS Y I LP I n IAVAFLTLVERK I L S Y M ~ I ~ N IVGPFGLLQPVADGVKLF I KEP I I~ STSSPFLF I I M L T I I THL INP LLYMIP I LLAVAFLTL IERKVLGYMOHRKGPNIVG~TGLI QP IADGVKLF I KEPVRP STS S QTMF L I MVYVFS I LEL I SFL IP I L L S V A F L T L V E R I ~ Y ~ F I ~ P N V V G P F G L L Q P F A I ~ IK E E L K P ~ S S P Y ~ F F b~FILSLIGSLLLI IC V L V S V A F L T L L E R K V L G Y IO I R K G ~ G L M G IPQPFCDAIKLFTKEGTYPLLSNYL SYY I MI LVLI24VI I/4MIF IVQS IAF I TLYERHLLGS S Q N R L G P T K V T F ~ L A Q A ~ L K K E ( ~ M T P I~NSSEVSF LL MFYSLTI I S I LEVLLVLVP S L L A V A Y V T V A E R K ~ M A S M Q R R L G P N A V G Y L G L L ~ A D A L K L LLKE YVALTQANMTLF FL M L I YS IVI/4LVVTL I IAS I TLI~RKLLSLVQRRVGPN'EVGYKGRLQYLADALKLFLKGVA IP SGANSFFFVA MI I D TTE I ET INSF SKLE S LKEVYG I I~MLFP I LTLVLG I T IGVLVIVWLERE I SAG I QQRI GPE YAGP LG I L Q A I A D G T K L L L K E N L IP STGDTRLF S I b ~ D F W A T S IxU-QTLI LLAQGLG I IAFVN I G LLLLVWGDRK I W ~ V ~ G P N V V G A F G L L Q S V A D A A K Y V F K E IVVPAGVD KPVYFL

B

B. taurus NDI B. physal us G. gallus X. laevis S. puzqouratus D. m e l a n o g a s t e r C. elegans N. crassa P. tetraurelia N. tabacum Rb. capsulatus

5o

A i I ~r~N~ U~I i p ~LUW~LTLWm(VLGn~U~GPmrWmYGLL0~ IADAI K ~ ~XEPL~ATSSAS~r I L

*

B. taurus ND1 B. p h y s a l u s G. gallus X. laevis S. p u r p u r a t u s D. m e l a n o g a s t e r C. elegans N. crassa P. tetraurelia N, tabacum Rb. ca;gsulatus

40

LL . . . . . . . . . . LL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LL . . . . . . . . IL . . . . . FL. FI . . . . . . . . . . . . . . . . . . . . SL . . . . . . F IFE I N ~ S E R S Y T E D F F N N Y K S I LEGWLYGWI I GLKS

L.................................... AV .....................

280

I

290

l

300

H

310

i

TMSFLWI - - R A S Y P R F R Y D Q I / ~ L L W K N F L P LTL-ALCM---WHVS LP I LTSG I PPQT TMSFLWI - - R A S Y P R F R Y D Q L ~ L L W K N F L P L T L - A L C ~ - - W H ISLP IMTAS IPPQT S S SFLWI - - R A S Y P R F R Y D Q L M H L L W K N F L P L T L - A L C L - - - W H T S ~ I S YAGLPP I SMI FLWV--RAS Y P R F R Y D Q L ~ L V W K N F LP I T L - A M T L - - - W H I S LP I SMLGLPS QT ~ F S V L W V - - R A A Y P R F R Y D Q L M F LTWKS YLP LS IGALCA--- I LALVALLG I SLPLF SFVFI --WVRGTLPRFRYDKI/MYLAWKCFLSFS I~NYLLF----F IGFKILLFSLL L IF I ----RS S Y P R Y R Y D I / ~ L F W F K L L P I S LIMLCF . . . . . YAVIFY S IMIFIF I L G R A S F P R I R Y D Q L M G F C W T V L L P I IFALI I LVPC ILESFY ILPWNLNLF WELLKPFLF. . . . . . LHNYTV FLF I P I A T R W T L P R L R M D Q L L N L G W K F L L P I SLGNLLL----TTS SQLLSL .FFVFAMV--KAIVPRYRYD QLFRI GWKVFLP LS LAW VVVVAFLAKFEVLGGFWARWS IGA

Fig. 3. Sequences of ND1 subunits of mitochondrial complex I and related proteins. The numbers above the alignment refer to the bovine sequence. Dashes represent gaps introduced by the program to optimize the alignment, the symbol t denotes identical residues in the sequences that are presented, and asterisks denote residues identical in all known sequences of this subunit. The black bars above these sequences, and those in Figs. 5-9 and Fig. 11, mark the positions of hydrophobic segments that are proposed to span the membrane. There are 8 such segments, A - H , in bovine ND1. The symbol § denotes the positions of amino acid substitutions in the human sequence that are associated with Leber's hereditary optic neuropathy (LHON).

110 TABLE II

Conservation of protein sequences encoded in mitochondrial genomes The proteins are arranged in descending order of conservation. The values were taken from I Ref. 5, 2 Ref. 42 and 3 Ref. 111. Protein

Amino acid conservation (%) cow cow cow vs. mouse vs. man i vs. Xenopus 2 sea urchin 2 vs. D. yakuba 3

CO I CO III cyt b ND1 CO II ND3 ND4 ATPase 6 ND5 ND4L ND2 ND6

A6L

91 87 79 78 73 74 74 78 70 73 63 63 51

89 81 72 72 76 61 59 54 59 46 54 34 34

76 62 62 59 63 50 45 39 46 34 39 31 25

75 65 67 46 57 42 42 36 33 40 35 17 26

ND4, ND4L and ND5 have been detected in chloroform : methanol extracts of mitochondria and complex I [53,54]; J.M. Skehel and J.E. Walker, unpublished work]. All seven of their sequences are typical of intrinsic membrane proteins, being composed predominantly of hydrophobic segments about 22-25 amino acids long that are likely to be folded into transmembrane a-helices (see Fig. 2). The hydrophobic segments are linked by short and more hydrophilic sequences that are presumed to to lie outside the lipid bilayer. There appear to be no extensive extramembrane domains in any of these proteins. The numbers of ahelices estimated to be in the seven bovine proteins are summarised in Table III. In some cases, the presence of a-helices can be predicted with reasonable confidence, but in other regions of proteins the predictions are less sure. Therefore, these estimates should be viewed as being approximate. III-C. The ND1 subunits The vertebrate, and S. purpuratus and D. yakuba proteins each appear to contain eight clearly defined TABLE III

Number of transmembrane c~-helices estimated to be present in the sequences of subunits of complex I encoded in mitochondrial DNA Subunit

No. of helices

ND1 ND2 ND3 ND4L ND4 ND5 ND6

8 9-10 3 3 12 14-16 4-5

hydrophobic membrane spanning segments, and it is possible that a somewhat less hydrophobic region between amino acids 38 and 59 could form an additional membrane span. In the 11 aligned sequences, 19 amino acids (6% of the bovine sequence) are conserved, the two best conserved regions being in polar segments (amino acids 24-48 and 192 to 210) linking hydrophobic spans A and B, and E and F, respectively (see Fig. 3). The sequence GluSerGluLeuVal (residues 204-208) is invariant in all but the tobacco chloroplast sequence, where it is deleted. According to a secondary structure model of ND1 based on its hydrophobicity profile, the most conserved hydrophilic stretches are on the same side of the mitochondrial membrane. The N. crassa sequence is longer than the other sequences in this group, about 40 extra residues being inserted in helix G after residue 266 [46]. P. tetraurelia ND1 is shorter than other ND1 sequences and is related to residues 1-240 (containing hydrophobic segments A-F) of the bovine sequence. NdhA from Rhodobacter capsulatus [29] is more similar to the mammalian ND1 proteins than are either ND1 proteins from C. elegans and P. tetraurelia or Ndh A in chloroplasts. E. coli Hyc D (not shown in Fig. 3) is the the most distant sequence from bovine ND1, the best sequence homology being between amino acids 194-214 of bovine sequence and 191-211 of HycD. HycD is part of formate hydrogen lyase, and NdhA may be a subunit of a chloroplast enzyme of unknown function. Therefore, it is possible that they fulfil somewhat different functions to those carried out in NADH:ubiquinone oxidoreductase by mitochondrial ND1 and bacterial NdhA. III-D. Comparison of ND1 with a bacterial glucose dehydrogenase Because ubiquinone is lipophilic, the site at which it is reduced has been assumed to be iff the hydrophobic membrane sector of complex I. A number of inhibitors of the enzyme, including rotenone, the classical inhibitor of complex I, bind close to ubiquinone [55]. A photoactivatable derivative of rotenone [56] and photoactivated dihydrorotenone [57] both react with ND1, and therefore this suggests, but does not prove, that the ubiquinone site may be associated with this subunit. In support of this suggestion, it has been proposed that ND1 is related to a bacterial glucose dehydrogenase from Acinetobacter calcoaceticus [58]. This seems like an attractive idea since glucose dehydrogenase is a membrane spanning enzyme that oxidises glucose to gluconolactone in the periplasm, and it transfers electrons via pyrolloquinoline-quinone to ubiquinone, and like complex I, it is inhibited by piericidin. However, comparison of the sequences of human

111 ND1 and the bacterial dehydrogenase shows that they are at best only weakly related over a short stretch of sequence (see Fig. 4). The most significant similarity is between residues 108 and 130 of glucose dehydrogenase and residues 66 and 88 of human ND1. The score for this alignment is enhanced, by the presence of two tryptophan residues which score highly in the Dayhoff comparison matrix. The region of similarity is not extended by lowering the alignment score, but the random background is increased. It might be anticipated that the ubiquinone binding site would be in the region of greatest sequence conservation amongst ND1 sequences (Fig. 3), but the glucose dehydrogenase sequence is related to a sequence in ND1 that is not highly conserved and is not in the region of greatest similarity. Therefore, it is doubtful that the short range and weak sequence relationship between ND1 and glucose dehydrogenase has any significance in the formation of an ubiquinone binding site, and it provides no convincing support for the view that ubiquinone binds to ND1. III-E. The ND2 subunits

Bovine ND2 has 8 convincing candidates for membrane spanning a-helices (segments A-C, E and G-J), and two other less hydrophobic segments D and F are also possible candidates (Fig. 2). The argument in favour of segment D being membrane spanning stems from its sequence relationship with regions in ND4 and

300

200

/

ND5 subunits (see section III-L), which are more hydrophobic than segment D in bovine ND2, and are likely to be membrane spanning. Since homologous sequences have a common secondary structure, segment D would therefore be a membrane span also. Segment D is more hydrophobic in other mitochondrial ND2 proteins than in bovine ND2, but the equivalent region is more hydrophilic in the tobacco chloroplast protein, in which other potential membrane spans are rather clearly defined. Therefore, the alternative proposal that segment D is not a membrane span cannot be dismissed. Segment F is more hydrophobic in other species than in the bovine protein, and for this reason is included in Figs. 2 and 5 as a possible membrane span. The ND2 sequences are amongst the most divergent of the seven ND subunits (see Table II and Refs. 5,43,59). Invariant amino acids are confined to a conserved region between residues 249-265, which encompasses part of hydrophobic segment H and the region linking hydrophobic segments H and I (Fig. 5). All of the known ND2 sequences are conserved in this region, and three of the four conserved residues in the nine aligned proteins are also invariant in all known ND2 sequences (see Fig. 5). P. tetraurelia ND2 is remarkable in that 48% of its sequence is composed of Phe and Leu residues. It is the most distant of the group from bovine ND2 [60], relationship being confined to residues 71 to 141 of the protozoan protein. Therefore, only this segment was included in the alignment (Fig. 5). It is possible that RNA editing occurs in Paramecium as in other protozoans, although there is no evidence for such a process. The fungal ND2 sequences from N. crassa and P. anserina are considerably longer than the vertebrate ND2 subunits (556 and 582 amino acids, respectively in N. crassa and P. anserina). Six regions of their sequences are related to the vertebrate ND2 sequences, including the region that is generally conserved [47,61], but most of the rest of the fungal sequences are unrelated, and neither is included in the alignment. It is possible that they have 12 transmembrane segments.

}

~q

/

/

III-F. The ND3 subunits

i00 ].

i

I

I

I

I

I

200 A. c a l c o a c e t i c u s

I

I

400

I

I

I

I

600

L

I

I

800

glucose dehydrogenase

Fig. 4. Sequence relationships between mitochondrial ND1 and a bacterial protein. Pairwise comparison of human ND1 with glucose dehydrogenase from Acinetobacter calcoaceticus. The region of most significant similarity is between residues 108-130 of glucose dehydrogenase [120] and residues 66-88 of ND1.

Closely related pairs of ND3 sequences are conserved from residue 35 to the C-terminus, but in wheat and P. tetraurelia mitochondria (the least similar sequences in the alignment) the conserved region is confined to a segment between residues 38-80 (wheat numbering). In the 10 aligned sequences (Fig. 6), 5 amino acids are identical, and, with the exceptions of two glutamic acids near to the C-terminus of the bovine protein, all of the charged residues in the bovine sequence are conserved in other species. Other notable residues are Cys-39, the most strongly conserved cys-

112

teine residue in the seven ND proteins, flanked by the invariant residues Glu-38 and Gly-40, which are conserved in all known mitochondrial ND3 sequences. In NdhC from N. tabacum, and also from other chloroplasts and Synechocystis 6803 [62], this cysteine is replaced by a serine. The ND3 proteins appear to contain three potential membrane spanning o~-helices, with a short loop between helices B and C. The invariant lysine residue and the conserved sequence (Glu Cys/ Ser Gly) are located in an extensive loop between potential membrane spanning segments A and B. Two invariant acidic residues occur within span B and would be buried in the lipid bilayer.

tobacco chloroplast sequence is the most divergent in the group, its relationship to other ND4L proteins being confined to amino acids 20-36 and 65-82. Cys-32 is conserved in vertebrates, in the echinoderms S. purpuratus, P. lividus and Pisaster ochraceus [63,64] and in four species of chloroplasts, but is replaced by serine in the invertebrates D. melanogaster, C. elegans, Ascaris suum [45] and Artemia [65] and in the ascomycetes, N. crassa and P. anserina [61]. 111-tl. The ND4 subunits

About 12 hydrophobic segments with the potential to form transmembrane o~-helices are present in bovine ND4 (see Figs. 2 and 8). The hydrophobic N-terminal regions of the mitochondrial ND4 proteins appear to be too short to traverse the membrane as an o~-helix,

III-G. The ND4L subunits

An amino acid residue is conserved in each of the 3 hydrophobic regions of ND4L (Figs. 2 and 7). The i0

B. B. G. X. S. D. C, N.

taurus

gallus laevis purpuratus yakuba

eleqans tabacum

.hD2

physalus gallus laevis Durpuratus yakuba elegans tabacum

II0

120

210

130

140

220 G

I taurus hD2 physalus gallus laevis purpuratus yakuba

elegans tetraurelia tabacum

50

60

70

80

90

150

160

170

180

190

C I I D I ) E i I F I TM~FHFWVPEVTQG IPLS SGL ILLTWQKLAPMSVLYQ IFP S INLNL I LTLSVLS IL IGGWGGI/qQTQLRK IblAYSS I A I ' ~ A V L P Y N P TVALAI K L G L A P F H F W V P E V T Q G I PLTTGL I LLTWQKLAP LS I LYQ I SP S INLHLML IMSLLS I I241SGWGGI/qQTQLRK IMAYSS IAI~GWMTAI LLYNP TMAIAI K L G L V P F H F W F P E ~ S SL ITALLLSTI/4KLPP ITLLLLTSQSLNTTLLTLLAI SSTL IGGII~LNQTQTRK I LAFSS ISHLGWMIMI ISYNP T IAI C M K L G L A P F H F W L P E ~ L S L T T G L ILSTWQKLAPMAI LYQIAPMLNTPLLLTLGLTSTL I GGWGGI2~QTQLRK I LAF SS IAHLGWMI S I LPFSP S IALAFKI G L A P V H F W F P D ~ L P F F Q G L I IATWQKIAPL II/~ YF SQLGFSYLL I TP SL ISVL IGGWGGI-~QTQVRK I LAF S S I G I ~ L V I TSAYSF MSAL LLKS GAAPFHFWFPNMM~GLTWMNAI/4[2.~rWQKIAPLML I S YLN--- IKNLLL I SVI LSVI IGAI GGLNQTSLRKI/MAF S S I N H I - ~ S S L M I SE FF I I LLK I GVAP LHFW IFNVTNNI FNYGI/VlWF LTFQKLPF LT ILLQ IFWLS SVY I L----LFGLL ICYVQ I ~ S Y KNLL I I SSTESFNwIVLGVFFSM NTQMYNSPGI-S IALIF I TVGI GFKLSPAPS-HQWTPDVYEGIPFYFSSNEWHLLLE I LAI LSMI L G N L I A I T Q T S M K R M ~ Y S S IGQIGYVI IGIIVGD §

200

B. B. G. X. S. D. C. P. N.

40

MNPLILI I LLTTL ILGTMMVVTS SHWL LAW I G F F ~ M M - A F I P IM M I ~ R A T E A S T K Y LLTQATASAL I / ~ V I INI/4HS GQWT ITKLFNPTAS T124 MNP HAKL ICTVSL IMGT S I T I SSNHWI LAWTGLE INTL-AI IPL I SKSHI~RAI EATIKYFLTQSTASAL I LFS SMTNAWSTGQWD I T Q ~ P T SC ~ MNP ITFSVVLTSLASEQFLAVSS SHWLLAr,~LE INTL-AI I P//MTQHKHPRAI EASTKYFLTQAAASALLLFS S L I ~ T G E W S I LDLTNPL SCATM M R Q IVSTFLFVTVVSGT I IW S S E N W F I IWVGLELSTL-ALVP I LC SGFSPRNVEADNKYFLVQAS SAALLLNGALGQAWLTGSWS I LDPVNEVTS ICL Iv~'YNSSK I LFTT IMI I GTL I TVTSNSWLGA~IvlGLE INLL-SF IPLLSDNNNI/~TEASLKYFLTQALASTVLLFS S I --LL M ~ I N E S F T S M I I MIVF I S LFTLFLTLLS I LTNNVIVW---WS IF LI/MTV-VF I L L ~ - N K S SKSYTS IFNYFVIQES LGLLFLLC S GGLLQ MAI TEFLLFVLTATLGGMF LC GAND L I T I FVAPECFS LC SYLLSGYTKKDVRSNEATMKYLLMGGAS S S I LVHGF--- SWLYG S SGGE IELQE IVNG L I

physalus

taurus

30

j A I I B I I MNPI I F I I I LLTIMLGTI IVMIS SHWLLVWIGFI~I'~IML-AII P IbI41~HNPRATE,ASTKYFLTQSTASML~VI INIb~SGQWTVMKLFNPMAS~

h~32

I00

B. B. G. X. S. D. C. N.

20

71-141

230

240

I

250 H

I

260

270

I

280

i

I

T ..... MTLLNLI I YI IMTSTMF---TMFMANSTTTTLS-LSH~%INKTIP IMTVLI LATLL SMIGGLPPLSGFI~KwMI IQEMTKI~S I ILPTFMAI TALLN T ..... LTLLNLL I Y I TMTF TMF- --MLF I Q N S T T T T L S - L S Q T W I ~ V I TTLTMLTLLSMGGLPPLSGFI~KWMI I QELTKNDML IVPTFMAI TALLN Q ..... LT I LTF ILYT IMTS TVF---LS LAQ IKVL-KLSTLLI S ~ T V M L T L L S L A G L P P L T G F ~ K W L I IQELTKQ-FaMTPMAT I ITMLS LLS Q--- --I/vlILNLT I YL IMTSTMF---LVLKT I S ST-KI SS LATSWSKTP STTALS LLTLLSI-~LPPLSGFVPKWF I IQELTSQN'I~ ILATTLALSALLS N ..... AAI IMLVI YL I INTSLF---LLFD HLK'VS-TLGHI/qT I SQLSP I SVALVLLVMLSLGGLPP LTGF I LKFTSLYFLVANNF I I LSS IMI IGNLQD S . . . . IWL I YF IF YSFLSFVLT---FMFN IFKLF-HLNQLFSWFVNSK I LKF S L ~ LS LGGLPPFLGFLPKWLV IQQLTMCNQYFLLTI/4MMSTL IT ...... FNTFYLF I YYFVI/4VL" L I SKF SKTSGYNF INWETTLVFLNI PF SVSFFVKI FS LSE I FKYD SFFTLFLLFTMF LSV QI YYFNN IFFFKFFVL I FF LNLAG I PP LLGFFLKF LI FFFLFFKTNLAF IL IF LGFNMAT SNDG YASM ITYMLF Y I SMNLGTFAC IVLFGLRTGTDNI RD YAG LYTKDPFLALSLALCLLS LGGLPPLAGFFGKLYLFWCGWQAGLYFLVL IGLLTSVVS

290

300

310

320

330

340 J

B. B. G. X. S. D. C. P. N.

taurus ND2 physal us gallus laevis purpuratus yakuba

elegans tetraurelia tabacum

71-141

LYFY~TYS TILTMFPSTNI~WQFP I/4K~ .... FLPTMVVLST~LTPMLSVLE L Y F Y ~ T Y S TALTLFPS ~ S T K R T P . . . . LLP TAIVI S TMLLPLTPMLS I LL LFFYLRLAYH ST IT L P P N S S N I ~ W R T N K T L N T P ------TAI LTALSTTLLPLSPL I I TML LFFYLRLTY IVTLTSSPNTSNASLTWRHHSKQPTL-----LLS IAL I LS SF IIP I SPLTLT YFFYLRI SFNTS LFLFPQH I I S SASWRNSTI I SPLAPKAWLS SVSTVLS TLAI PLTLP LY I I T LFFY LR IC Y S A F M L N Y F E N N W I I ~ S I ~ . . . . . LYL IMTFFS IFGLFL I SLFFFML LAFSFWL INL SMKNNEET S N N N I ~ Y FIIFPI/4VISI I LFFYLSTVKSF I YYYLK I IKLIiMTGRNQE ITPHVRNYRRSPLRSNNS IELSMIVCVIAST IPG I SMNP I IAIAQDSLF

Fig. 5. Sequences of N D 2 subunits of complex I and related proteins. The N D 2 h o m o l o g u e encoded in the chloroplast genome of tobacco (N. tabacum) is termed Ndh B. Identical residues are marked only in the region of the alignment that includes the P. tetraurelia sequence. Ten potential membrane spans in bovine N D 2 are denoted by black bars ( A - J ) . P. tetraurelia N D 2 is too dissimilar to be correctly aligned using the M U L T A L I G N program, and so it was aligned using the information obtained from a pairwise comparison. For the meaning of symbols under the sequence see the legend to Fig. 3.

113 i0

I B. B. G. X. G. S. D. C. T. N. P. N.

taurus ND3 physalus gallus laevis morhua purpuratus melanogaster elegans aestivum crassa tetraurelia tabacum

40

50

....I

60

70

80

B

I

*

I

90

I

*

*

~



Ii0

I00

taurus ND3 physalus gallus laevis morhua purpuratus melanogaster elegans aestlvum crassa teCraurelia tabacum

30

~I~TNFTLATLLVI I A F W L P Q L N - V Y SEKTSPYECGFDPMGSARLPFS~a(I~F LVAI T F L L F D L E I A L L L P L P W A S Q T A N L N T M L T M A L M N L L L T L L T N T T L A L L L V F I A F W L P Q L N - V Y A E K T S P Y E C G F D P M G S A R L P F S ~ ( F F LVAI TF LI~DLE I A L L L P L P W A I QSNNI/NTMLTMAL N ~ T T L T F M L S L S F L L S A A L T T M N F W L A Q M A - P D T E K L S P Y E C G F D P L G S A R L P F S IRFFLVAI L F L L F D L E I A L L L P L P W A I Q L A H P ~ M T L T W A T M T A T I L M I A M T L S T ILAI L S F W L P Q M T - P D ~ KL SP YE C G F D P L G S ~ L P F S ~n~/'FL IAI L F L L F D L E I A L L L P F P W A A Q L N T P S IVI LWAA ~4NLI STVI L IASALSL IL IL V S F W L P Q L S - P D Y E K L S P Y E C G F D P L G S A R L P F S LRFF L IAI L F L L F D L E I A L L L P L P W G D Q L S N P T L T F M W A T M T T I I FLFS I T IA V A V V L G L A A H A I ~ N R T - S D SEKS S P Y E C G F D P L N S A R L P F S F R F F L V A I L F L L F D L E I A L L F P L P A A S L I TPP STL IP ISM M F S I IF IALL I L L I T T I V M F L A S I L S K K A - I D R E K S S P F E C G F D P K S S S R L P F S L R F F L I T I IFL IFDVEIAL ILPMI I IMKYSNIMIWTI TS I MLVLI24VLVFTLVLLFAFYL INFLLS IKD-MGKNKI S A F E C G F V S V G K IQNSFS IHFF I~LMI/~VI FDLE I V M F L - - - - G I LVSD LS SY I S F LM M S E F A P IC IYLVI S P L V S L I P L G V P F P F A S N S S T Y P E K L S A Y E C G S D P S G D A R S R F D IRFYLVPI LF I I P D P E V T F S F P - - W A V P P N K IDLFGSWSM M R S M T L F I LFVS I IALLFLL INLVFAPH I - P Y Q E K N S E F E C G F H S F H Q T R F P F D S P IAAQAI CFVI LDLE I F T M F P - - Y V G S L G I N T F Y S L V V I M G S M T L L F F V E H V F IFCM IF W L L T W V A E Y F F K S K N - - N K Q K H Q F YECG IRALSELN IQINI/qFS IVCVFLI LYDVEF I F M Y P F F F N F F L V N A G A F LVFFV M F L L Y E Y D F F W A F L I I S ILVP I LAFL ISGVLAP I S - K G P E K L S T Y E S G I E P M G D A W L Q F R I R Y Y M F A L V F V V F D V E T V F L Y P - - W A M S F D V L G V S V F IEA

C B. B. G. X. G. S. D. C. T. N. P. N.

20 A

i

F-LI ILLAVS L A Y E W T Q K G L E W T E F L I S L L / ~ I A Y E W T O E G L E WAE T- I IALLTFGLI Y E W T Q C ~ I ~ W A E L- IL T L L T L G L I Y E W I ~ G L E W A E S - V I ~ L T L G L IY E W L O C ~ LEWAE V-FMVI L T L G L V F E W I N G G L E W A E I-F I L ILL IGLYHEWNQ(IMLNWSN M - F IF I F G - G F Y M E W W Y G K L V W V I M A F L L I LT I G S L Y E W K R G A S D W E L G F M F V V S A G F V F E L G K G A L K ID S K Q N M G G D S T H L E LKNLKD I SS L N L C P P S A F K N - - F L F F V F YS LVYD SVQNS LALQL F I F V L I L I IGLVYAWRKGALEWS

Fig. 6. Sequences of ND3 subunits of complex I and related proteins. The sequence of wheat ND3 (Triticum aestiuum) is from its mitochondrial genome [51]. The N. tabacum sequence is that of Ndh C encoded in chloroplast DNA. A - C denote the positions of potential membrane spans in bovine ND3. For the meaning of asterisks see the legend to Fig. 3.

(residues 203-245) that includes hydrophobic segments E and F, and in a second region (residues 273-294) between hydrophobic segments G and H and within segment H. These regions include two charged residues, Lys-237 and Arg-245, within hydrophobic segment F. The sequences of the chloroplast protein and of the P. tetraurelia mitochondrial protein are more distant from the other mitochondrial proteins. In all nine sequences, 22 residues are invariant, and 15 of them are identical in all known ND4 sequences. An additional 17 amino acids are identical in all but the P. tetraurelia sequence, and 9 more residues are conserved in all but the chloroplast sequence. There are no cysteines amongst the conserved residues, but His-220 is conserved, and three more histidines (213, 293 and 319) are present in all but one sequence. The only other conserved histidines in the ND subunits of complex I are in ND5 (see below). As discussed below, the two

but in the longer tobacco chloroplast protein, this region could form an additional membrane span. The hydrophilic regions between helices A and B, and helices K and L in the tobacco sequence are also expanded. C. elegans ND4 is 50 amino acids shorter than the bovine protein. It lacks the region corresponding to amino acids 1-20 of the bovine protein, and the hydrophilic regions between membrane spans are also truncated. As with other ND proteins, the hydrophobicity profile of P. tetraurelia ND4 is the least similar in the group, but both P. tetraurelia and C. elegans sequences also appear to contain about 12 potential membrane spans. Significant similarities are found throughout the ND4 sequences (Fig. 8), with the exceptions of the P. tetraurelia and tobacco chloroplast, where similarity is confined to amino acids 100-360 and 100-420, respectively. The sequences are best conserved in a region I0

20

30

A

J B. B. G, G. X. S. D. C. N. N.

taurus ND4L ~hysalus gallus morhua laevis 1Durpuratus melanogaster elegans crassa tabacum

40

50

60

B

I

I

70

80

90

C

l

l

l

MSM~ IFE4~FTVSLVG LLMYRS HLMS S LLC LE (IMMLSLFVMAALTI - - L N S H F T L A S ~ I ILLVFAACEAALGLS LLVMVSNTYGTD YVQNLNLLQC MTL IH ~ ILMAFSMSLMGLI/MYRS HLMSALLC LEGMMLSLFVLAALTI --LS S H F T L A I ~ I I LLVFAACEAAI GLALLVMVSNTYGTDYVQNLNLLQC MSP LHF S FYSAFTF SS LG LAFHRTHL ISALLC LE SMMLSMF IPLS IWP--VENQTP SFALVP II/MLAFSACEAGTGI2KMLVASARTHGSDHLHNLNLLQC MTP THFT IS SAFLL(I~MGLAFHRTHLLSALLC L E A M M ~ IALSLWS --LQLDATC-CSTAP~/MLAF SACEASAGLALLVATARTHGTDHMQALNLLQC MTL IHFSFCSAF I L G L T G L A L ~ P I LS IL L C L E G ~ M D G IVLTP--LHLT IYLS S ~ Y I M L P F i ~ % P E A A . T G L S L N S D H Y T T H G ~ S ~ C ~J~L IVI LSMFYLGLMG ILLNRLHF LS ILLC LELLL ISLF IG IA/WN--NNTGVPQN'I"rFNLFVLTLVACEAS I GLSLMVGLSRTHSSNLVGSLSLLQY M IMI LYWSLPMI LF ILGLFCFVSNRKHLLSMLLS LEF I V I / M L F ~ I YL----I~MLNYE SYFSMMFLTF SVCEGALGLS ILVSMIRTHGNDYFQSF S IM MMFLF-VSLFMF IFKWQRL IF IL IS LEF~4MLSLFLKF S . . . . . Y V L G E M M F F Y F ~ F S V I SS ILGMVVMVGI~MKFFGSDNC IF M N ITI/MTFI/MGI L G F V L ~ I MLMT IS I E M M T ~ I M L V S S - - T N N D D M I GQTYAI YMMVVAGAESAMGLAI TVAF YRLRGS ITME YK M ILEHVLVL SAYLFS IG I YGL ITSR-I~MVRALME LE L I LNAVNINFVTFSDFFDNRQLKGD IFS IFVIAIAAAEAAI GLAI VS S I YRNRKS TRINQSNLLNN t * *

Fig. 7. Sequences of ND4L subunits of complex I and related proteins. Three segments A - C proposed to form transmembrane segments in bovine ND4L are indicated. For the meaning of asterisks under the sequence, see the legend to Fig. 3.

114 10

20

30

40

I B. B. G. X. S. D. C. P. N.

taurus ND4 physalus gallus laevis purpuratus melanogaster elegans tetraurelia tabacum

90

i00

ii0

180

190

I taurus ND4 physalus gallus laevis purpuratus melanogaster elegans tetraurelia tabacum

280

290

I

380

I taurus ND4 physalus gallus laevis purpuratus melanogaster elegans tetraurelia tabacum

I

130

140

150

I

160 D

I

210

220

230

I

240

250

F

I

260

I

G

I

300

310

H

320

I

330

I

I

340

I

350

360

J

I

M IMTS S I C L R - Q T D L K S L I A Y S SVS F~wsIELVIVAI L I Q T P W S Y M - G A T A I / ~ I A H G L T S S M L F C L A N - SNYERI H S R T M I L A R G L Q T L L P L M A T W W L L A S LT M I M T S S I C L R - Q T D L K S L I A Y SSVSI{MALVI AAI L I Q T P W S Y M - G A T A I M I A H G L T S S M L F C L A N - S N Y E R I H S R T M I L P G G LQVFLPI/4ASWWLLASLT AI/qTS S I C L R - Q T D L ~ LI A Y S S V S Fn~GLVIAASMI Q T Q W S - F S G A M I IIMI S H G L T S S L L F C L A N - T N Y E R T H S R I L I L T R G L Q P L L P I / ~ V W W L L A N L T I IMTS S IC L R - Q T D L K S M I A Y S S V S I ~ G L V I S A G ~ Q T P M K A L T G A M I I i N T S D G L T H S A L C C L A K Y Q S Y E R T H S R A L L L S R G L E T I LPI/~GTWWL I S N L A A L I TSVI C V R - Q T D L K A L I A Y S SVGlt4S IVAAAI F S E T S W G - M N G A I / ~ G L V S SALFS LAN-TVYERSGTRTLAITRGLKLLLPLSTLWWLI/ZCAA G V L V S L V C L R - Q T D L K A L IA Y S S V A ~ I V L S G L L T M T Y W G - L C G S YTI/MIAHGLC S S G L F C L A N - V S Y E R L G S R S M L INKG L L N F ~ S M T L W W F LLS S A M I LG SFC C V F - Q S D S K A L A A Y SSVTI{MSF L L L S L V F I T M ~ S K I - S S V M I ~ G Y T S T I / M F Y L I G - E F I H T S G S R M I Y F M S S F F S S S M I M G I LF SVVFL S GVI D S S L I ~ S Q T D L K K L V A Y C T I Q ~ q L I A I - F F L K G D S SL I A Y G F L F T I M H A I / ~ T I M F F L V E - C I Y S R Y K S R S T L V V N G V F F S F N N L A L A I I FMVLF I I Y A A L T S L G - Q R N L K K R I A Y SSVS }{MGF I I IGI S S L T D T G - L N G A L L Q I I S H G F I G A A L F F L A G - T T Y D R I R L V Y L D E M G G I A I P M P K M F T M F S SFSMA t * ** t ~ §

370

B. B. G. X. S. D. C. P. N.

200

E

I

I taurus ND4 physalus gallus laevis purpuratus melanogaster elegans tetraurelia tabacum

I

TVG SLNF I/~LQYWVQPVHNSWSNVFMWLACMMAFM~LYGLHLWLPKAHVEAP IAG S M V L A A V L L K L G G Y G M L R I TL112/PMT-DFMAYPF IMLS LWG T T G SLNF L L L Q H W A Q P L S T S W S N I F M M L A C ~ LV~LYG LY L W L P K A H V E A P I A G S M V L A A V L L K L G G Y G M L R I T SMI/~PLT-EI@4AYPF I/MLS LWG NTGTLHLP I IKLTHPNLPASWTS LL SS L A L I ~ L Y G LH L W L P K A H V E A P I A G S M L L A A L L L K L G G Y G I I ~ V T L I / ~ P V S - N F L HYPF LTI2tLW G STGTLSLNLLQLLPNH IPMTWANYSWWLACLLAF~LYGTHLWLPKAHVEAP IAGS~4VLAA I L L K L G G Y G I IRI S I T L S P S M - K E L A Y P F LI LS LWG S SS SLS I P N V N L L W A N D G S IE S L T M W W A L S I N C F F N N L P V Y G F H L W L P K A H V E A P V A G SMI L A A I L L K I G G Y G L ~ I A L F S T IS~SLAL IVFCTWG K IG SMNFYL~--NFI~DLLYFCLLC--AF LV~LVHLWLPKAHVEAPVSGSMI I~.G I M L K L G G Y G M L R V I SF L Q I ~ - Y S~ I S I S LVG N . . . . . F L L V F T Y Y N F V I SWE-~I~F I LS - - L S F I ~ P IYFLH LWLPKAHVEAPTTASMLLAGLLLKLGTAGFLRI L G S L S F V H - - - - N N V W I L IAF LG L T N S T N F F V I K T F V - - F S K T Q A M T I YS L L - F V G F G I K F P I W P L H Y W L T K T H V E A S T G F S I Y L S G F L V K T A L F G F Y R L T N L I Q V E - - - - L D T T F F L A V L V A YGSNEPTLNFETSVNQSYPVVLE I I FY IGFF I A F A V K S P I I P L H T W L P D T H G E A H Y S T C M L L A G I L L K M G A Y G L I R I N M E L L P H A H S IFSPWLMI I GTI Q t * ** * tt * * * *

270

B. B. G. X. S. D. C. P. N.

70 B

ML---MASQHHLSI~NLTRK-KLF I TMLI S L Q L F L - I M T F T A - M E L I L F Y I L F E A T L V P T L I I I T R W G N Q T E R L N A G L Y F L F Y T L A G S L P L L V A L IY IQN M L - - - M A S Q S H L Z K E P P V R K - K L Y I TMLI T L Q A L L - I M T F T A - T E L I L F Y I M F E A T L I P T L I I I T R W G N Q T E R L N A G L Y F L F Y T L V G S L P L L V A L V Y LQN MI - - - M A S Q G H L Q H E P H K R K - I ~ M F I STLI I IQPF I - I L A F S A - T E L M L F Y I S F E A T L I P T L I L I T R W G N Q P E R L S A G I Y L L F Y T L I S S L P L L V S I LYLHT M L - - - I A S Q N H L S N E P I S R Q - R T F I T M L V F L Q L S L - IMAF S A - T E L I L F Y IMFE ITL I P T L I I I T R W G N Q A E R I / q A G T Y F L F Y T L A G S L P L L V A L L S LY S A L - - - IASKGQIA]NSSDLGS-RVF I IMI I V I T G A L - I I T F S S - L E L I L F Y I V V E T Y L I P T L I L I T R W G A Q M E R C Q A G L Y F ~ Y T L F G S L P L L I A I IAI Y I M L - -- LAS --EMI N K H N N Y K - N L F L L N I I I L L L L L - I L T F S S - M S L F M F Y LFFE SS LI P T L F L I L G W G Y Q P E R L O A G L Y L L F Y T L L V S L P M L I GI F Y V M N IV . . . . . . . . . I S E K N N N . . . . . L L I L S E I L V F I C I I FF IP -SS~IMMLYMFFELS~,FP I LVMI L G Y G S Q I E K I N S S Y Y I / ~ Y A A F C S F P F L - - F V Y F K S L . . . . . T G F V A I S T V D N L Y S E D K L K F Y LI F F Q F F L A V L G F I K C S D L I A F F F F YEVI/MLGSVLVVFFGS YSI(KS I HAVI Y F V A W T Q L G S L F V L L A C LY I Y S L L T G F I T T L A T L A A W P V T R D SRLF HFI24LAMYSGQ- I G S F S S - R D L L L F F IMWE L E L I PVYLLLCIMWGGK-KRLY S A T K F I L Y T A G G S V F L L M G V L G L A L t * * *

170

B. B. G. X. S. D. C. P. N.

120

C

I taurus ND4 physalus gallus laevis purpuratus melanogaster elegans tetraurelia tabacum

I

60

M L K Y I IP T IM I / ~ L T W L S K ~ WVNSTAHSLL .... I SFTSLLI.MNQ . . . . FGDNSI/qFS LLFF S-DS LSTP L-L- I L T ~ L L P L M L K F I IP T I M I ~ L T W L S K ~ L I W I N S T A H S L L .... I SF SS L L L L ~ - - - - - I / ~ N S L N Y S I/w~'FS - D P L S T P L-L- I L T M W L L P L MLK I ILPT IMLLPTALLSPAKSMWTN~SLL . . . . . IAS I S L H W L T P . . . . SYYPTKTLTLWTG~ I STP L - L - V L S C W F L P L M L K I L L P T L M L IP S T W L T N ~ W P S L T S Q S L I . . . . . I SLLSIIMWFFN . . . . . QSETTHFSNYIIMT IDQ I S T P L - L - I L T C W L L P L M M I T L I LFTVG~4ATTTLL IP S N K L M A G A I F Q S A L . . . . L S L L S L I V l / ~ - - - - - - H W T A S W H N L S S I L A S - D T I SAP L- I - I L S C W L A P I M L K I I F F L L F L IPFCFI - - I ~ M M V Q I M ~ I..... SFIFLI24NN . . . . F~YWSE IS-YFLGCDMLSYGL-I-LLSLWICSL M L E F LF I S L L W L F K P I Y F L L F . . . . T%~4FSFL IFNN .FSWGGLFLVLDSYSF I LLI - V M S L F ILG I M F A V Y L V F S F K K P E A S K A D L D F F M L T F K Y I LKGSFFI/4LVLALF - - - C A L F T F D I / M F S A K N L L Y P N E Y IWDS G D F F F Y K N G A L K F S LNLYG LI LVFLC L M V Q V Y L V F T T N Y F P W L T I IVVFP I F A G S L I F F L P H K G N R V I R W Y T I C I C I L E L L L T T Y A F C Y H F Q S D D P L I Q L V E D Y K W I N F F D F H W R L G IDGLS IGP I

80

B. B. G. X. S. D. C. P. N.

50

A

390

I

400

410

K

420

I

430

440

I

L

N L A I ~ P T INL I GE L F V - -VMS . . . . T F S W S N I T I 1 1 2 4 G V ~ I T A L Y S L-- Y M L I M T Q R G K Y T Y H INN I SP SF TRENAI/MS LH I L P L . . . . . . . . . LL N L A L P P T I N L I G E L L V - - V M S . . . . V F S W S N P T I LI/~GTNIVI T A L Y S L - - Y M L I M T Q R G K H T H H INNI TPSFTREHAIiMALH I I PL . . . . . . . . . . LL NI~ PTTNI/~]AE LT I - --MVA . . . . . L F N W S SPT I I L T G T A T L L T A S Y T L - - Y M L L S T Q R G T L P SH ITTTPNSNTREHLI/MTLH I I P M - . . . . . . . . . LT NI~P S P I ~ G E I T I ---MTA . . . . . L F N W S S W T I I L T D L G T L L T A S Y SL--YMFI.~4TQRGMTPEHLNAI N P T H T R E H T I / ~ L I P I . . . . . . . . . . IP N L G L P P S P N L I GE I LI - - L S S . . . . L I S W S V W L F P I V G F A Q V F G A I Y S L - - M I F Q L S Q Q G T P F T S I INVF S SF S R E H L F A A L H I L P L . . . . . . . . . . IL N M A A P P T L N L L G E I S L - - L N S . . . . . I V S W S W I S M I LLSF L S F F S A A Y T L - - Y L Y S F S Q H G K L F S G V Y SF S S G K IREYLI/V/LLHWLP L . . . . . . . . . . N L N S G V P P S L S F L S E F L V - - I SN . . . . S M L I S K S M F V M I F I Y F W S F Y Y S L - - F L I T S S IxMGKGYEETFNTWNVGF S A P L V L M M Y N V F W L . . . . . . . . . . . . F SG I L G T L K F V C E F F V F N L T L . . . . H V S W P I G V I F V V V V S A I G L I GF S K N W F N A I F C A P S K D V G P - - D A L D L S K K E L Y I I F LC F A G L . . . . . . . . . . IF S L A L P Q M S G F V A E L I V F F G I I T G Q K Y L L I PKI LI T F V M A I (IMILTP I Y S L S M S R Q M F Y G Y K L F N A P K D S F F D S G P R E L F L S I S I FLPVI G I G I YPDFVL5

450

B. B. G. X. S. D. C. P. N.

taurus ND4 physalus gallus laevis purpuratus melanogaster elegans tetraurelia tabacum

I

LTLNP-KIILGPLY LSLNP-KI ILGPLY LI L K P E L I S G T P L LM~-ELIWGLFF IMINPFSALIA LI L K S E S F M L W L . . . . . SVFY LTFLPFI/MI LAVDKVEVI LSNFFYR

Fig. 8. Sequences of ND4 subunits of complex I and related proteins. The N. tabacum chloroplast sequence is known as Ndh E. Twelve ~egments, A - L , in the bovine protein are proposed to be t r a n s m e m b r a n e segments. T h e sequence of N. crassa ND4 has not been determined. T h e meaning of symbols under the sequences is given in the legend to Fig. 3.

I0

taurus ND5 physalus gallus laevis purpuratus melanogaster elegans crassa tetraurella tabacum

I

80

I00

ii0

C

I

130

180

I

i

*

200

I

270

210

220

F

I

I

280

290

300

H

I

I

*

230

*

*

240

i

310

250

i

**

,

320

,

,

330

I

I

370

380

390

,

,

,

,

340

,

,

,

350

J

II

400

410

I

K

I

420

}

430

L

I

. . . . . . . E QD I R E M G G L F K - - A M P F T T T A L IVGS L A L T G M P F L T G F yS KDLI I E A A N K - - S Y T N A W A L L M T L I A T S F T A I y STRI I F F A L L G Q P R F .... . . . . . . . EQD I R K M G G L F K - - A L P F T ~ A L I I G C L A L T G M P F L T G F Y S K D P I I E A A T S - - S YTNA~/ALLLTLIATSLTAVY STRI I F F A L L G Q P R F .... . . . . . . . E Q D I R K M G C L Q K - - T L P M T T S CLTI G N L A I ~ F LAGF YSKDLI I ENI/qT--S y INTWALS L T L L A T S F T A T Y S L R M T L L V Q T G H T R T .... . . . . . . . E QD I R K M G G L Q N - - S LP ITTSCLTI G S L A L T G T P F L A G F F S K D A I I EAI/qT--S Q T N T W A L T L T L I A T S F T A I Y S F R V I F F A S M G H P R S .... ....... E QDLR~LSK--LLPVTS S CL I LGS LAI,MA-PLLAGFYSKD L I L E A T S A - - S V L N L L G IVLS I V A T M L T A V Y SFRI I F F C F S L S P S C .... . . . . . . . SQD I R L M G G L S IG I HMPLTSACFNVSNLALC(~4PF LAGF YSKD~E LE IVS I - - S N V ~ S F F L Y YF S T G L T V S Y S F R L V Y YSMTGD LNC .... . . . . . . - Q Q D G R N Y S N N G N - - L P N F I Q L Q M L V T L F C L C GL I F S S G A V S ~ D F I LE L F F S N - N Y M ~ F S L - ~ F V S V F L T F G Y SFRLWKSFFLSF ....... ....... NQDFRKFGG LKN--YLP LTYSV~TSLVAFPYMTGF YSKDF I L E S A Y G Q F S F S G V A V Y IMAT I G A ~ T T L Y S V K V L Y L T F L A N P N . . . . . . . YQD L R R M G S F F K - - Y L P A E F F F L V F S L L N L S G L P F F F G F Y S K T L L ~ I S D V - - L Y F R D A I FCMI LLSC ITGLFY SFNI Ly ys FF . . . . . . . . . VG Y S P A K S Q/qMG I/4GG L R K - - H V P I SK I TF L L G T L S LCG I P P L A C F W S K D E I L ~ S W L - - - ysp i F A i - I A W A T A G L T A F Y ~ R I Y L L T F E GH L N A H F P N t * ** *

440 taurus ND5 physalus gallus laevis purpuratus melanogaster elegans crassa tetraurelia tabacum

*

V A G I F L L I RF YP L T E N N K Y I QS I T L C L G A I T T L F T A M C A L T Q E D IKKI IAF ST SSQLGI/MMVT I G INQP YLAF LH IC T H A F F K A M L F M C SG S I I HSLND V A G I FL L V R F Y P L M E N N K L I Q T V T L C L G A I T T L F T A ICALTQND IKKI I A F S T S S Q L G L M M V T I GI/qQP Y L A F L H I C T H A F F K A M L F L C SG S I IHNLNNV A G IF L L IR T H P F L S S N K T A L T T C L C L C ~ S T L F A A T C A L T Q ~ I KKI I A F S T S S Q L G L M M V T IGLD L P Q L A F L H I S T H A F F K A ~ L C SGL I IHSLNGV A G I F L L I RI S P M ~ Q T A L T I C LC LGAM'ITLFTAACALTQhD IKKIVAFS TS S QLGI/MMVT IGL- I F Q L A F F H I CI~%IAFFKVYYFFC SGQY S S CLND V A G V F L L V R T S E LF S SPL ITHS LVLI L G G T T A L F A A S T A I A Q H D IKKI IAYSTTSQLGIIMVTAIG I G Q P A L A F F H I C T H A F F K A M L F L C SG SVI HSLSD T A G V Y L L I R F N I I L ST SWLG Q- I/MLLL S G L T M F ~ 0 S L G A N F E F D LKKI IALS TLS Q L G L M M S I LS f4SFLKLAMFHLLTHALFKAL L F M C A G A I I H I ~ T A G L I LI/~qFNNLVM~KDF I - S F V L I I G L F T M F F S S L A S L V E E D I ~ T L S Q M G F S M V T L G L G L S F I SF I H L V S H A L F K S C L F M ~ V G y I I HC S FGT A G V Y L I / ~ S P L I EYSS T V L T L C L W L G ~ SSTI G L F Q Q D I K K V ~ Y S T M S Q L ( ~ L S S Y N V A L F H L INHAF Y F A T L F L G A G S V I ~ V A D S A G V F L I M R F Y P I L E L S L Y F K L V T A L V G A L T A L A G G L S A V F Q T D L K K I L A Y S T I SHCGFLI F L C S F G N F K L V I V Y L F V H G F F K A I SF L C V G N L I RFSKS A A G I F L V A R L L P L F R V I P Y IMYL I SV IG I I T V L L G A T L A L A Q ~ D IKRG L A Y S T M S Q L G Y I ~ S Y R S A L F H L I T H A Y SKAL LF LG SG S I I H SMET I ** * , , , ** , , o o , , ,

I

B, B. G. X. S, D. C. N. P. N,

*

~

360

taurus ND5 physalus gallus laevis purpuratus melanogaster elegans crassa tetraurelia tabacum

160

I

o

260

B. B. G. X. S. D. C. N. P. N.

150

A A L Q A I LYNRI GD I GF-- - I L A M A W F L T N L N - - - % ~ D L Q Q IF---MI/qPSDSNMP L--- IG L A - L A A T G K S A Q F G L H P W L P S A ~ G P T P V S A L L H SSTMV A A L Q A I LYNRI GD I G L - - - L A S [ ~ W F LSbIMN---TWDLEQ IF---MLNQ~IPI/qFPL---MG L V - L A A A G K S A Q F G L H P W L P S A ~ G P T P V S A L L H S S T M V A A L Q A M I Y N R I GD I G L - - - I L S M ~ W L A S S L N - - - T W E I QQ I . . . . T H P N Q T P T L P L - - - L G L I - L A A T G K S A Q F G L H P W L P A A ~ G P T P V S A L L H S S T M V A A L Q A V I Y N R V ~ D IG L - - - IL S F ~ W V A M N L N - - - - S W ~ V F ---MLNSDNLTLPL---LGLI -LAATGKSAQFGLHPWLPAA~GPTPVS~S S~ S A I ~ V ITNRIGNI G L - - - I T F ~ S A L N F N - - - S S N L T N IL - - S S N E N L T P L L P F L L - F G L I - L A A A G K S A Q F G L H P W L P A L L E G P T P V S A L L H S S T M V A G M L T A L S N R I G D V A L - - - L L S IAWMI~NYG S - - - - W N Y - - IF---YLE IMQNEFF/MI/MI GS L V ~ A Q IPFS S W L P A A ~ T P V S A L V H SSTLV GA~ALTNRLGDY-----FbEVFFGLSVF S - - - - - G Y Y F L - - - - S F S M F SSYMS L - - - ~ L L L L T A F T K S A Q F P F S S W L P K A M S A P T P V S SLVH S S T L V SSMSAF LTNRVGDC F LTM~TGNL--- -- D Y A T V F -- SLAP Y I NSDD~%T IM G M C L T - M G A ~ SQVG L H V W T P M A ~ Gp TPVSAI/~4.AATMV S A F K A F S F N K F S D S A V L IAL I L I Y A N V H D L N F E A I L N V S H L Y S E ~ L G STPQ I N S W N L I S F C L L F A A F V K S A Q F G F H V W L P D S ~ V P A S A L I H SATLV A C Q K A F V T N R V G D F G L L L G I L G F Y W I TG S F E - - - F R D L F E IFNNL I Y N N E V D F L F V T L C A V L L F A G A V A K S A Q F P L H V W L P D A ~ G P T P I SAL I H A A T M V

G taurus ND5 physalus gallus laevis purpuratus melanogaster elegans crassa tetraurelia tabacum

190

E

,

B. B. G. X. S. D. C. N. P. N.

140

D

L - T I Q T - L K L S L S F K ~ Y F S M M F I P V A L F V T W S Ib~FSM/NYMYSDPNINKFFKYLLLFL I TML I L V T A N N L F Q L F I G W E G V G IM-SFLL I G W W Y G R A D A N T I -T I Q T - L K L T L S F ~ Y F S I / 4 F ~ A L F I ~ S I M E F S M W Y M H S D P Y I N Q F F K Y L L L F L ITML I L V T A N N L F Q L F I G W E G V G I M S F L L I G W W F G R T D A N T -QF I P N - F K IP I S L ~ S M M F F P IAI~S I L E F A T W Y M A S E P F I T K F F T Y L L T F L I A M L T L T I A N N M F L L F V G W E G V G IMSFLLI G W W Q G R A E A N T M - N I N T - F D I I ~ F K F D I YS S I F I P I A L F V ~ S I L E F A T W Y ~ D P M I S R F F K Y L L T F L V A M V I L V T A N N F F Q F F I G W E G V G IMSFLLI G W W Y A R A E P N T L - S N T P - L N I SLNF I Y D Q Y F L V F L S V A L I V T W S I M E F S F Y Y M T E D P N S S A F F R L L T I F L I / ~ I LTC S N S L F L IF L G W E G G G F L S F L L I S W W T T R N D A S S --VS L N S M S IVMTF LFDWMSLLFMSFVI/4I SS L V I F Y S K E Y M ~ N H INRF I M L V I ~ M M L L I I SPNL I S I L L G W D G L G L V S YCLVI Y - F Q N IKSYN -LSLKFNF Y F N - - S I LFS F I LF LVTF S V L V F S T Y Y L N S E L N F N Y Y Y F V L L I F V G S F E S L N F S N S I F T M L L S W D L L G I SSFF L V L F - Y N N W D SC S M-DS EW-YNI ~WGFQFDS LTVA~VLMMSS LV~S IS Y M S H D P H N Q R F F S Y L S L F T F M M M I T V T A N N Y L I ~ W E G V G V C S YTLVSFWFTR~Q F -PL S A G Y L V N F SF Y I D T V A Y S F T L L T L T I G V F V N L Y T Y S Y F R Y E P H I SRLI S LI N A F I ASMI I L V N S G N L V V F F F G W E L I G I TSFFLI N F W G E R A P TFK SW I INNDF S L D F G Y L I D P L T S IMS I L ITTVG IMVLI Y S D N Y M A H D Q G Y L R F F A Y M S F F S T S M L G L V T S S N L IQ I Y I F W E L V G L C S YLLI G F W F T R P V A A N

170

taurus ND5 physa i us qallus laevis purpuratus melanoqaster elegans crassa tetraurelia tabacum

115

60

I

120

*

B. B. G. X. S. D. C. N. P. N.

50

B

I

90

I taurus ND5 physalus gallus laevis purpuratus melanogaster elegans crassa tetraurelia tabacum

40

F~IMF S S L S L V T LL L L T ~ IM M M S F N T Y K P SNYP L Y V K T . . . . . . . . . AI S Y A F I T S E I P T M M F I HS - - - - - - - G Q E L I I S N W H - - - W M N L F T S F T L L T L L I LTTP IMMSHTGSHVNNKYQSYVKN. I V F C A F IT S L V P A M V Y L H T . . . . N Q E T L ISNWH---W ~ T P L L L N T L T T L T L L T L L T P I I L - - P P L L N I ~ S P M S ITK . . . . . . . . . T I K T A F L I S L I P T T I F IH S - - - - - - - G A E S I A T H W E - - - W M N F P LI F N S S M L I T I S I L I LP I I/MSTFNMN I I ~ E H L I K T S . . . . . . . . . V K T A F L I S I IPLI I F L D Q - - - - - - - G L E S I T T N F H - - - W M V I S P S L I I S SI24LG I LT ILLGS I F F F S K S Y F S K G N V N F P L I K T T S A C L S V ~ K E E T I E Y N S G P F A ~ I L K V L A F L S V L S L L V T C N N S I QS INI TLSLW M C S I S F V N L I SMSLSCFLLS. L Y F L L N D E I YF . . . . . . . . . . . . . IEWEL--MNI S I F L I G F V F F M G G I S V . . . . . . . WI/~TFKLGIFF ........... LEWDF--M Y L S I I I LP LLG S M V A G F F G R K V G V S G A Q L I TCLS . . . . . . . . . . . VMI ~ T G I ~ E V - -----GFNNI PVTMNLFRW M F S F F F S F Y A L S V I F S L L F K H F L S S K G V F L L N T T S IGL . . . . . . . . . . . . . . . FWAYS LSNLNLFF . . . . . IKNKLIAI HLFRW M E Q T Y E Y A W I IP F I P L P V I ~ I G A G L F LFP TATKSFP2~4WA . . . . . . . . . . . . . . . FQSVLLLSIV~SI YLSI . . . . . Q Q I N S S S F Y Q Y V W

70

B. B. G. X, S. D. C. N. P. N.

30

A

I B. B. G. X. S. D. C. N. P. N.

20

450

I

460

470

M

I

................................... P T L V N I N E N N P L L I N S IKRLLI G S L F A G Y I ISNNIPPTTI . . . . . . . . . . . . . . . . . . . . . ..................................... P P S T T INENNP LLINP I K R L L V G S IFAGF I LSNS IPPMTT . . . . . . . . . . . . . . . . . . . ...................................... P SNHP INENTPPAI LP IMRLALGS IMAGLL ISSL I L P P K T . . . . . . . . . . . . . . . . . . . . . .................................... N P L S P INENNKTVI NP I K R L A W G S I V A G L L I A S N M L P I NS . . . . . . . . . . . . . . . . . . . . ...................................... SSPFSHSEENFNLNNA~TGT IASGWFFSNLLFAPPSFNVTSLAKG ............ ................................... -G S L N M L N D E SW I M L R G M M G L L I M S I I GGSMI/qWL I F P F P y . . . . . . . . . . . . . . . . . . M •N K ~ H Y S STVFFiNFLSLVLVI FS I S F L W W M N F N L L N I GY I H F Y R H F M L Y E R T Y V Y V S Y T G K E E F Y L P K H M S K E I N N L P R S V S G E G G F F L S L P L V M L A L F S I ~ F G F I T K D ~ G L G S N F F M D N S LF I HP I H E ~ T E "DS K K A R K S I YAGVI S E Y L R S Y Y Y S N T T M A S N I A I F L L I VS SC L L C A Y L INF . . . . . . . . . . . . . . . . . . Y G G K Q K T P F Y S IS L W G K N G V K K N S C L L T ~ S T Y F F A K T K Y p ID K N G R I ~ F M T I A H F E H K A V Y S yp YE S D N T M L F P I F V L G L F T L F V G S IG I PFN §

Fig. 9. See p. 116.

116 480

490

500

I B. B. G. X. S. D. C, N. P. N.

taurus ND5 ;ghysa I us gallus laevis purpuratus melanogaster elegans crassa tetraurella tabacum

taurus ND5 physalus gallus laevis puzpuratus melanogaster elegans crassa tetraurelia tabacum

520

530

540

550

560

I

P Q M T M P Y Y L K T T A L IVTI LGF I LALE I S N M T K N ~ Y H Y P S N A F K F S T L L G Y F P T . . . . . . IMHRLAP Y M - N L S M S Q K S A S S L L D L IWLEAI LPK .... P I/V/TMPLH L K L T A L A M T T LG F I IAFE I N L D T Q N L K H K H P S N S F K F S T L L G Y F P T . . . . . I I ~ R L P P HL-D L L ~ Q K L A T S L L D L T W L E T I LPK---P P M 2 ~ T I T K T A A I IVTTLG I I LALE LS S L S Y S L T P P K I ~ I / 4 q F S S S L G Y F N P . . . . . LTHRI SP S I - L L H T G Q K IASHLI DMAWYKIq4GPE .... P I N / I ~ T L A K Q A A I IVSVTGLI IAMD L S K L T T Y INQESKTNI H S F S N L L G F F P T . . . . . I I H I ~ K T - N I / ~ L A Q N I A T H LI D L S W Y E K S G P Q - - - TP L IVP I I G V A A L F M S L I S S T S N S I G SNAH S A T T S Q W F F V D A V H LS I I ~ M S L A L S F F S . . . . . . SRTLDRGWQENI GPQ---I C LP I Y ~ L L T L F V C I V G G L F G Y L I S L S N L F F I / ~ L F M Y N L S T F L - G S M W F M P . . . . . . Y I STYQMI F YP L N Y G Q L V - V K S F D Q G W S E YFGGQ--- PS L F L Y V D F F G P L V F L F M M I F L S F L I L K I ~ L M Y K F L V D Y L A ~ S Iy l ~ , M~TEFAVPTLFKTLPFMFT~FSLMALVLSEKYPNLVVHFKLSRLGYNTFGFFNQRFLVELFYNKY I ~ - ~ I- ~ I~ K G S ~ F G ~ YLLS LS T A T D F Y L V Y V K T F S F T L A P L L S E A A L L N Y S F F Y W I I A I F F V I L V L F S Y Y Q K K T T A E F V QE G G N L D I LS K W L A Y F G I F I A S F L Y K P I YS S L I ~ E LINSFVKKGPI(R I L W D K I ING I Y D W S Y N R A Y IDAF YTRF LVGG IRGLAE F T H F F D R R V I DGTDN

570

B. B. G. X. S. D. C. N. P. N.

510

N

580

590

I

600 O

I

T I S L A Q ~ - A S T L V T N Q K G L IK L Y F LSFL I T I L I ~ 4 I L F N F H E T T A L I Q L K - A S T L T S N Q Q G L I K L Y F L S F L IT I T I ~ I L F N Y P E G L A N L H L T M T K I S T T L H T G L I K S Y L G S F A L T I L T T ILLI QK (IMVNQQLPMI I~T'I'I'NIQ Q G L I K T Y L T L F LMTSAI I I TLF G I A P T S T A L S K I S Q A G Q I GL IKRY I L S S I ~ V L V I L A L S L L I LS HLYQKLSMYSKTLFI/4HNNS LK I Y L L L F V F W I L ILLI L L F L LFI.N'NINSKGYTLFLS S G I ~ K N Y Y L K S L N F N S V V V L I F I FFMI C K V L I K W S K D M A S LS TS I V T N Y A L F I L V G F I L Y V F T F I S LLEGG LD LNLS LF I L L T S L T S STS S SD SKEGKMI K K A V V S T K N ~ ......... SLAGFFDFFLGGFFF GVGVI SF IVGEG IKY IGGGRI SS Y L F L Y L A Y V S I F L L V Y Y L L F S T L

Fig. 9. Sequences of ND5 subunits of complex I and related proteins. The fifteen segments A - O are proposed to form t r a n s m e m b r a n e segments in bovine ND5. Three residues that are conserved in all but one ND5 sequence are denoted by the symbol °. Other symbols are explained in the legend to Fig. 3.

most highly conserved regions in ND4 sequences are related to regions of ND2 and ND5.

amino acids occur at intervals of three amino acids in the same stretch of sequence in helix J, suggesting that the a-helix has a conserved face.

III-I. The ND5 subunits The hydrophobicity profile of bovine ND5 suggests that it contains about 14 to 16 potential transmembrane helices and 15 hydrophobic segments, A-O are indicated in Fig. 9. Some of the uncertainties in the precise number of helices arise because of the need to reconcile the secondary structures of ND2, ND4 and ND5 in the regions where their sequences are related. This involves one region of ND5 (segment F) that is not particularly hydrophobic, and an additional segment, I, in a hydrophobic region that may fold into only one membrane span. Two additional hydrophobic segments in the bovine sequence (residues 518-543 and 543-577) could span the membrane, but the first region is absent from the hydrophobicity profiles in invertebrates, and the second is more hydrophilic in the whale and chicken ND5 sequences. The most strongly conserved sequence in ND5 is found between residues 80-390. Thirty-nine residues are identical in the aligned sequences and 37 of them, including one histidine and two tryptophan residues, are invariant in all known ND5 sequences (see Fig. 9). Almost all of them are in regions that are related to the ND2 and ND4 proteins (see below). Three further residues (Asp-179, His-328 and His-332) are identical in all but one species. Some of the conserved residues are between predicted transmembrane spans, the most conserved regions of sequence being in residues 219260 and 294-315 between hydrophobic segments F and G, and segments H and I, respectively. Five conserved

llI-J. Relationship of ND5 to a product o f the E. coli hyc operon A 608 amino acid protein ( O R F 3) encoded in the E. coli hyc operon is related to ND4 [32], but it is more strongly related to ND5 [1] (see Fig. 10). The regions of similarity (between amino acids 268-299 and 237-390 in E. coli ORF3 and 284-316 and 328-392 in ND5, respectively) are within regions of ND5 that are related to ND4 (see below). They include the motif HXWXP, which is also conserved in ND4, ND5 and ND2. III-K. The ND6 subunits Bovine ND6 appears to contain 4 or 5 hydrophobic spans (see Fig. 2). There is uncertainty about residues 25-74 which could contain one or two hydrophobic spans. Two (B and C) are indicated in this region in Figs. 2 and 11. The hydrophobicity profiles of vertebrate sequences are similar to each other. The length of hydrophilic sequence between spans D and E varies and is most extensive in N. crassa. The best conserved regions are in span C (Fig. 11), and Leu-33 is the only invariant residue. Vertebrate ND6 sequences are related over the N-terminal 88 amino acids and in residues 145-170, but other ND6 sequences are more diverse. Even amongst vertebrates there is little or no sequence conservation in span D (residues 95 and 130) or in the hydrophilic region between segments D and E. The liverwort chloroplast sequence is least similar to

117 600

the bovine mitochondrial protein; only amino acids 50-70 and 143-164 are related to residues 56-76 and 128-149, respectively of the liverwort sequence.

"

500

400

;

III-L. Relationships between subunits ND2, ND4 and ND5

/

/

ND2, ND4 and ND5 are weakly related, suggesting that they have evolved from a common ancestor [66]. ND4 and ND5 are more similar to each other than either is to ND2, suggesting that ND4 may be either the common ancestor or an evolutionary intermediate between ND2 and ND5. The most significant relationships between the three proteins are shown in Fig. 12. Two related regions (Figs. 13a and 13b) are common to all three proteins. The sequence similarities imply that these regions have a common secondary structure (see Fig. 13). The related regions in Fig. 13b include the conserved motif HXWXP in a hydrophilic segment between helices E and F, and F and G, in ND5 and ND4, respectively. In ND2, the motif is also in a hydrophilic sequence following a hydrophobic segment, but the sequence immediately after the motif is not particularly hydrophobic and would not be expected

/.

300

/ t

200

/ 100

/

r

i

100

t

., i

i

200

/i

t

t

300

i

400

E. coli hyc

¢, 500

i

,4. 600

ORF 3

Fig. 10. Comparison of O R F 3 from the hyc operon of E. coli and ND5 from bovine mitochondria [5,32,31]. The regions of greatest similarity are between amino acids 218-230, 268-299 and 327-390 of E. coil O R F 3, and 234-246, 284-316 and 328-392 of ND5, respec tively.

i0

I B. B. G. X. S. D. C. N. M.

taurus ND6 physalus gallus laevls purpuratus melanoqaster elegans crassa pol ymorpha

i00

D

160

E taurus I~36 l~hysal us gallus laevls purpuratus melanogaster elegans crassa pol ymorpha

I

40

B

50

I

60

I

70

80

I

C

II0

120

130

140

I

| ~ LSNKAVLGAFVTGLL/4EFFMVYYVLKDKEVEVVFEF ................ N G L G D W V I Y D T G D S G F F S E E A M G IAALY SV S N K V V L G A F V L G L V V E F L I VI Y A L K S G E V K IMFEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D G L G D W V V Y D T G G S G V F S E E A T G IAALY S GD W R V V G Y G L G F V L V V ~ G V V L G G L V D F W K V G V V T V . . . . . D G G G V S F A R L D F S G V A V F Y SC G S W S V V F Y V L V Y L IG V L V W Y L F L G G V E V I X I M N K . . . . . . . . . . . . . . . . . . . . . . . . . S SE LG S Y V M R G D W V G V A L M Y S C N L G E V V - - G L V F LF SS W V F M S F D N F Q D L K N I F . . . . . . HCFVSGESLVGSNTFYNS K L T L F S S L IL I F M L ILSF I M D K T S S S L F L M N N D M Q S IIN~SYFMENSLSLNKLYNMAVFLLLLSMLYFSPTVLTY . . . . . . . . . . . . . . . . . . . . . SSYLGLSGFYYSI N T N N S I P L T I I L G I S L S Y S L F Q L L P Y D IA I L S N F S F N I N N N L Y N L SFKqKQNNGNF G I N T T P A V S L Q P K N N D L L F V T S K I WD G N L A E SNH I TT I G N V M Y SN F V Y W T I G D G I T L T L C T S I F L L L N N F I S N T S W S K IF . . . . . L M T K P N L V V K D I I L I N T V R H I G SE LLTE

150

B. B. G. X. S. D. C. N. M.

I

A

90

taurus 5D6 physalus gallus laevis purpuratus melanogaster elegans crassa polymorpha

30

M M L Y IVF I L S V I F V I ~ F - - - V G F S S K P S P I Y G G L G L I V S G G V G C G I V L N F G G S F - L G L M V F L I Y L G Q M M V V F G Y T T - - A M A T E Q Y P E IW b~MMY IVF I LS I IFV I S F - - - V G V S SKP SP I Y G G L G L I V G G G V G C G V I L S F G G SF-LGI/MVFL I Y L G ( I M L V V F G Y T T - - A M A T E Q Y P E V W M T Y F V I F L G IC F M L G V - - - L A V A S N P S P Y Y G V V G L V V A S V M G C G W L V S L G V S F - V S L A L F L V Y L G ( ~ I L V V F V Y S V - - S L A A D P Y P E A W MI Y M V S V S M M V L V L G L - - - V A V A S N P S P F Y A A L G L V L A A G A G C L V I V S F G S S F - L S IV L F L IY L G ( ~ M L V V F A Y S A - - - A R A K P Y P E A W M V V Y V T L I V M L F G S - - - T L V F Y S L S P Y Y S A L G L V V F S V P G S F V L S F L G S S F - V P I V L F L V Y I G61vILVVFP YS S - - A I S P E R F P S V N MI Q L M L Y S L I I T T S - - - I I F L I ~ I H P L A L G L T L L I Q T IFVCLLTGI./4TKSFWYSY I L F L IFLGG/4LV'LF I Y V T - S L A S N E I ~ N L S M M V K V F F V L A V L S S - - - I I SY I N I D P t ~ S F F L I F S L L F - S M P V I SMSMHIWFSYFICLLFLSGIFVI LVYFS--SLSKINVVKSY b~qSLFL I N E S F T N G Y I S S V L D I IS I L A I F C G I S V I V N K N P I I S V L F L I G L F A S V S S I Y L L L G L S F - I G L A Y L I V Y IGAI S I L F L F F ILML IN I R I S E L Q S M K L P E SF YE T I F L F L E S GL I LG S - - - L G V I L L T N I V Y S A L F L G F V F V C I S L L Y L L L N A D F - V A A A Q I L I Y V G A V N V L I I F A V - M L I N K K Q Y SN-F

I

B. B. G. X. S. D. C. N. M.

20

170

I

YG T W L V - - I V T G W S L L I G V V V - - IME I T R G N Y G V W L V - - I V T G W S L F IS V V I - - I M E I T R G S GVGLF---LVAGWGLLLALFVmVLE LVRG LSRGAI RAV W - - W V I - - I V Y W W V S I I I N F V C G I W V N S KSbfa'WES S C V G G V L V I L G V F V L L V A L V G A L I - I S R G IE ST I I RAI W L W F P T N F IT I LI/4NYLL I T L I V I V K I T K L F K G P I B M M S Y - - W F I F C F I L V C L L F F M N F SS YF L N F S G A L R K V YS I W L F - - L L A S F I L L L A M V G S I VI I M K S N A S W G G A L P N T R E T K T E G F L L P F E L M S I I L L V A L I GAI TI'dLRREKK I E L E K N D F F N F

Fig. 11. Sequences of ND6 subunits of complex I and related subunits. A-E are proposed membrane spanning segments in bovine ND6. Ndh G is encoded in chloroplast D N A in M. polymorpha. The asterisk denotes a conserved residue in all known ND6 sequences.

118 /

,.//,"

J

,-

/-

(b)

/ 300

/

/ /

/ ./

/ /

,' /

200

/

/ .,

.///

300

./ /

/" 200

/

/

,.~

....•

/

J

i00

/• /

// ~' ..,.

.

pT". •

I00 ,,

_.. /' t

t~ i

I00

,

,.'/f

i

300

2O0

B. taurus

4OO

I

I00

200

B. taurus

hD4

O0

>/

//

300

400

ND4

J (

Conservation of sequences of subunits of mitochondrial complex I and their relationships with other proteins.

Biochimica et Biophysica Acta, 1140 (1992) 105-134 105 © 1992 Elsevier Science Publishers B.V. All rights reserved 0005-2728/92/$05.00 Review BBAB...
3MB Sizes 0 Downloads 0 Views