.n/ 1990 Oxford University Press 411

Nucleic Acids Research, Vol. 18, No. 3

Organization of the mitochondrial Gadus morhua

genome

of Atlantic cod,

Steinar Johansen*, Per Henrik Guddal and Terje Johansen Departments of Cell Biology and Biochemistry, Institute of Medical Biology, University of Troms0, PO Box 977, N-9001 Troms0, Norway Received December 5, 1989; Accepted December 21, 1989

ABSTRACT The mitochondrial DNA (mtDNA) from the Atlantic cod, Gadus morhua, was mapped using 11 different restriction enzymes and cloned into plasmid vectors. Sequence data obtained from more than 10 kilobases of cod mtDNA show that the genome organization, genetic code, and the overall codon usage have been conserved throughout the evolution of vertebrates. Comparison of the derived amino acid sequences of proteins encoded by cod mtDNA to the ones encoded by Xenopus laevis mtDNA revealed that the amino acid identity range from 46% to 93% for the different proteins. ND4L is most divergent while COI is most conserved. GUG was found as the translation initiation codon of the COI gene, indicating a dual coding function for this codon. The sequences of the 997 base pair displacement-loop (D-loop)-containing region and the origin of L-strand replication (oriL), are presented. Only few of the primary and secondary structure features found to be conserved among mammalian mitochondrial D-loops, can be identified in cod. Presence of CSB-2 in the D-loop-containing region and the conserved hairpin structure at oriL, indicates that replication of bony fish mtDNA may follow the same general scheme as described for higher vertebrates.

EMBL accession

nos

X17658 X17662 (incl.) -

It is not known if primitive vertebrates such as bony fish diverge from the conserved genome organization and function seen in higher vertebrates. Reports of size polymorphisms and heteroplasmy in the mtDNA of fish (7,8) indicating DNA rearrangements in these genomes, have made this question even more pertinent. Furthermore, the complete sequences of mtDNA from two species of sea urchins (9,10), which are phylogenetically closely related to vertebrates (11,12), show that extensive rearrangements have occured in these mitochondrial genomes during the last 600 million years of evolution (12). Therefore, to obtain more information about the evolution of vertebrate mtDNA, we have sequenced regulatory and gene coding regions of the mitochondrial genome of the bony fish Atlantic cod, Gadus morhua. The genome organization, genetic code, codon usage and features of the regulatory regions are discussed upon comparison to mtDNA of Xenopus laevis (4,13) and other vertebrates.

MATERIALS AND METHODS

INTRODUCTION

Isolation of mtDNA Thirteen Norwegian Coastal (NC) cods and 8 Arcto-Norwegian (AN) cods were captured, and mtDNA was isolated from eggs and liver tissue as described (14). The NC and AN stocks correspond to the E and D stocks, respectively, of Mork et al. (15).

The complete mitochondrial DNA (mtDNA) sequences have been determined for some vertebrates including several mammals (1-3) and frog (4). These mtDNA genomes have a conserved genome organization and gene content consisting of two species of ribosomal RNA (rRNA) genes, 22 distinct transfer RNA (tRNA) genes, and 13 protein genes which all encode components of the inner mitochondrial membrane (5). The mitochondrial genome has an extremely compact organization with only one

Vectors and enzymes The plasmid vector pGEM2 (Promega Biotec), and M13mpl8 and -19 vectors (16) were used. Restriction endonucleases, T4 DNA polymerase, and T4 DNA ligase were purchased from New England BioLabs and United States Biochemical Corporation. Calf intestinal alkaline phosphatase and RNase A were from Boehringer Mannheim. Klenow fragment of E. coli DNA polymerase I was obtained from Pharmacia.

major non-coding region, the displacement-loop (D-loop). The replication of mtDNA in mammals and frog is initiated in this non-coding region at the origin of heavy strand replication (oriH) and proceeds asymetrically by displacement of the parental strand (6). When the synthesis of the H-strand is 3/4 complete, the light strand origin of replication (oriL) is activated.

mtDNA restriction map All restriction enzyme digestions, dephosphorylation reactions, and cloning of specific restriction fragments by direct ligation in low melting agarose (BioRad) was done according to Johansen (17). The circular mtDNA was mapped by single and double

*

To whom correspondence should be addressed

412 Nucleic Acids Research restriction enzyme digestions on both native and cloned mtDNA. Restriction fragments were separated on 0.7% agarose gels (BioRad), and in order to resolve small fragments, 3-4% NuSieve agarose gels (FMC BioProducts) were used.

DNA sequencing and computer analysis of sequences DNA sequencing was performed on restriction fragments cloned into M13mpl8 and -19 using both a commercial Sequenase kit (Version 2.0; United States Biochemical Corporation) and a Klenow sequencing kit (New England BioLabs). [35S]dATPaS (1000 Ci/mmol, New England Nuclear) was used for labeling. Two synthetic oligonucleotides were used as primers to obtain the complete sequence on both strands in the D-loop region. All sequences shown in this paper were obtained from both strands. The sequence data obtained from one strand used in tables 1 and 2 were all unambiguous. Computer analysis of DNA sequences was performed on a DEC VAX/VMS computer using GCG (Genetics Computer Group, Version 6.1; University of Wisconsin, 18) and GENEUS (19) computer systems.

RESULTS Cloning and sequencing of cod mtDNA Native mtDNA from a Norwegian Coastal cod (NC-1) was analysed using 16 different restriction enzymes and a restiction enzyme map including 11 different enzymes was constructed (Fig. IA). The complete mtDNA from NC-1 was cloned as four restriction fragments (Fig. 1B). Two EcoRI-SacI fragments of 4.3 kb and 4.0 kb were cloned into corresponding restriction sites in pGEM2 yielding pCOD101 and pCOD102, respectively. A 7.5 kb EcoRI fragment was cloned into EcoRI-cut and dephosphorylated pGEM2 yielding pCOD2. The additional 0.73 kb EcoRI-fragment was cloned as an 3.1 kb overlapping PstISacI fragment into PstI-SacI-cut pGEM2 yielding pCOD3. Plasmid-cloned mtDNA fragments were subcloned into M13mpl8 and -19, and sequenced using the dideoxynucleotide procedure (20) (Fig. ID). The nucleotide and derived amino acid (aa) sequences were compared to the Xenopus mtDNA sequence (4,13) and other available complete vertebrate mtDNA sequences (1-3).

TABLE 1. Sequence comparisons between cod and Xenopus laevis protein and rRNA genes. Identified cod mtDNA genes

No. of nucleotide positions

% nucleotide

No. of

% amino acid

homology

aa

homologya

positions

A

B

SSUrRNA LSUrRNA NDI ND2 COI coIl ATPase8 ATPase6 COiII ND3 ND4L ND4 ND5 ND6 Cytb

751 207 432 413 1386 672 165 211 547 348 294 482 1352 226 255

72 80 70 58 79 74 67 62 73 68 54 61 70 62 72

-

-

-

144 138 462 224 55 71 182 116 98 160 450 75 85

76 56 93 80 55 49 85 74 46 57 72 64 65

92 80 99 92 76 68 91 88 66 78 85 85 80

aIn row A the percentage of identical aa residues are shown,and in row B the sum of both identical and chemically similar aa residues (51) are indicated. Groups of chemically similar amino acids are defined as follows: A,G,P,S,T; E,D,N,Q; C; I,L,M,V; H,K,R; F,W,Y. TABLE 2. Genetic code and codon usage of cod mitochondrial DNA. Phe:

87 58 88 18

Ser:

CTT 106 CTC 56 CTA 98 CTG 16

Pro:

ATT 117 ATC 39 Met: ATA 72 ATG 41

Thr:

Val:

Ala:

Leu: Leu:

TTT TTC TTA TTG

Ile:

GTT GTC GTA GTG

34 27 55 13

TCT 43 TCC 30 TCA 47 TCG 5 CCT 51 CCC 48 CCA 30 CCG S ACT 52 ACC 41 ACA 70 ACG 8 GCT 65 GCC 68 GCA 77 GCG 8

Tyr: Ter: His: Gln: Asn: Lys: Asp: Glu:

TAT 37 TAC 27 TAA 7 TAG 2

Trp:

CAT CAC CAA CAG

CGT 10 CGC 9 CGA 24 CGG 6

31 37 41 16

AAT 37 AAC 26 AAA 27 AAG 8 GAT GAC GAA GAG

29 26 42 15

Cys:

TGT 13 TGC 4 TGA 71 TGG 7

Arg:

Ser: Ter: Gly:

AGT 17 AGC 33 AGA 0 AGG 0

GGT 34 GGC 43 GGA 36 GGG 24

Nucleic Acids Research 413

MtDNA in two cod populations MtDNA was analysed from 21 randomly captured individuals, representing the NC and AN stocks (see Materials and Methods). The DNA was digested with EcoRI, EcoRV, HindU, SacI, PstI, PvuII, XmnI, and HincHl. The only polymorphism detected was a loss of one PstI site located in the highly divergent ATPase6 gene in one individual from the Norwegian Costal stock (NC-1). These results confirm the findings of Mork et al. (15) which indicated very little genetic variation among stocks of North Atlantic cods.

This is 1 kb less than the 17.55 kb Xenopus mtDNA reported

by Roe et al. (4). The actual size of Xenopus mtDNA seem to

be 17.7 kb since extensive sequencing errors in the D-loop and the SSUrRNA gene in the sequence reported by Roe et al.(4) have been discovered (13,21). The discrepancy in size between cod and Xenopus mtDNA is due to two regions of the mitochondrial genome. The D-loop-containing region in cod is 1.16 kb smaller than in Xenopus. The sizes are 997 bp and 2152-53 bp (13,21), respectively. The second region varying in size is the non-coding region between the tRNAThr and tRNAPro genes. This intergenic region is 74 bp in cod mtDNA compared to 26 bp in Xenopus (13). In white sturgeon (Acipenser transmontanus) mtDNA (22), these two tRNA genes are separated by only three nucleotides, so the presence of this noncoding region seems not to be a conserved feature among bony fishes.

Genome organization of cod mtDNA Based on restriction enzyme fragment analysis and DNA sequencing, the size of cod mtDNA was estimated to 16.5 kb. 0

1

I1 1 fll

A

i....D

%To

2

3

.lx

I

H

I

4

ID

I X I

D

5 Al A,I

S.

SV

VP X

7

Al x

I l

DR BPD

pCODlOl

I

8

I ,,X H VVX

AD AA

H

pCOD102

I

B

6

III I I I A

pC0D3

I

oriL 4-

F

D-loop

NO01

P-0

D

0 IM

NO2,

---.

4_ .

'i

A C WN Y

-4440, 4- 4-

L

-SUrRNA SSUrRNA

c

4- 4-

.

8

II1

I

VE

X

SD

COT ,_

-. -,

4-4

4

4-4

4-

A

4- -4

9

10

III1

II

ESD

11

12 I

PS

I

PD

X

I

13

III1

I

HCRV

V

14

I1I

DP

15

16

I

I

P

X

I

E

pCOD102

pCOD2

B

K

C

G

R

HL

E

4 Nli 5 IICtb [}

TP

N

4-

-.

D

Figure 1. Organization of the cod mitochondrial genome. A) Restriction map of the cod mitochondrial genome.The numbers indicate the genome length in kilobases. Restriction enzymes are Sall (A), BgllI (B), ClaI (C), Dral (D), EcoRI (E), HindlIl (H), PstI (P), EcoRV (R), SacI (S), Pvull (V), XmnI (X). B) Subclones of cod mitochondrial DNA in pGEM2. C) Gene organization of the cod mitochondrial genome. SSUrRNA, LSUrRNA, small and large subunit ribosomal RNA, respectively; NDI, ND2, ND3, ND4, ND4L, ND5, and ND6, subunits of NADH dehydrogenase; COI, COII, COIII, subunits of cytochrome oxidase; A6, A8, subunits 6 and 8 of the Fo mitochondrial ATP synthase; Cyt b, apocytochrome subunit b; oriL, origin of light strand replication; D-loop, displacement-loop-containing region. The one-letter aa abbrevation is used to show the location of the corresponding tRNA genes. DNA sequences of the tRNA genes flanking LSUrRNA have not been obtained, but their localization in other vertebrates are shown in parentheses. Arrows indicate the orientation of all the mtDNA genes. D) The extent and direction

of each read sequence from M13 subclones.

414 Nucleic Acids Research In table 1, ribosomal and protein gene sequences of cod mtDNA are compared to the Xenopus sequences (4,13). The mitochondrial gene order and orientation in cod (Fig. 1C) was found to be identical to those described in other vertebrates. These results indicate that the genome organization of vertebrate mtDNA has been conserved from bony fish to humans, an evolutionary time scale of about 400 million years (23).

Structural RNA genes The cod mtSSU- and mtLSU rRNA genes were identified on the basis of primary and secondary structure homologies to Xenopus mtDNA (see Tab. 1). Furthermore, 20 of the 22 tRNA genes were identified by means of sequence homology, obvious secondary structure conservation, and anticodon region. The proposed cloverleaf structures inferred from 12 of these tRNA genes are displayed in figure 4, and show many similarities to the Xenopus tRNAs (4). As can be seen from figure 4, there are many mismatches and atypical base pairings in the stem regions of the tRNAs, especially in the aminoacyl- and D-arm. In animal mitochondria, serine (AGY) tRNA has a truncated D-arm substituted by the D-arm replacement loop (5). Surprisingly, the corresponding cod tRNA is much less truncated, and a cloverleaf structure containing the D-arm is proposed in figure 4.

Protein genes In table 1, sequence comparisons between the derived aa sequences of cod and Xenopus mtDNA protein encoding genes are shown. The most conserved proteins are the three mitochondrially encoded cytochrome oxidase subunits (COI, IL, and III), which are responible for the catalytic function of the holoenzyme (5). The ND-subunits encoded by mtDNA belong to the hydrophobic protein fraction of NADH ubiquinone oxidoreductase. Compared to Xenopus mtDNA, the derived aa identities range from 76% to 46% for ND1 and ND4L, respectively (Tab. 1). The low aa sequence conservation may be due to the fact that the functionally important centers are located on the nuclear encoded ND-subunits. Hence, the mitochondrially encoded subunits may have only a limited regulatory role (5). Aa alignments of ND3 and ND4L in figure 3 show that ND3 in cod is more conserved relative to Xenopus, humans and sea urchins than ND4L (Fig. 3). ATPase6 and ATPase8 belong to the 'Fo' part of the mitochondrial ATP synthase and are among the most divergent proteins encoded by mtDNA (see Tab. 1). Like Xenopus and sea urchin, the cod ATPase8 sequence lacks the last 10 - 12 aa seen in the mammalian protein (Fig. 3). The aa identities of cod ATPase8 are only 55 %, 27 %, and 25 % compared to Xenopus, humans, and sea urchins, respectively. When changes to chemically similar aa are considered, the similarities are 76%, 41 %, and 47%, respectively. No equivalent to the ATPase8 subunit has been found in bacteria (24). The ATPase8 gene is absent from the mitochondrial genome of nematodes (25), suggesting that this gene has either been transferred to the nuclear genome or lost during the evolution of these primitive animals. Recent studies have indicated that aa substitution in fish mtDNA encoded proteins is about 5 times lower than among mammals (23,26). Allthough the salmonid and cod fishes are very distantly related bony fishes, a high degree of homology can be found in the aa sequences of ND3 and ND4L (Fig. 3) as well as COIII and ATPase6. This homology is much higher than among mammals (26). These observations corroborate the findings of a slower rate of aa substitution among bony fishes.

Genetic code and codon usage The translation initiation codons were identified for 12 of the 13 cod mt protein genes. All genes except the highly conserved COI gene use AUG as initiation codon, COI however, seems to use GUG (Fig. 2B). GUG as initiator codon in vertebrates has previously been reported only for ND1 in rat mitochondria (27), but in non-vertebrate animal mitochondria, GUG has been indicated as initiation codon for ND5 in Drosophila yakuba (28) and for ATPase8 in two different sea urchin species, Strongylocentrotus purpuratus (9) and Paracentrotus lividus (10). Except for this possible dual function of the GUG codon, the genetic code in cod mtDNA, based on comparison of 2312 codons (Tab 2), is the same as reported for Xenopus (4) and mammalian mitochondria (5). The codon usage is similar to that reported for Xenopus mtDNA (4), but some small differences can be seen (see Tab. 2). In both cod and Xenopus there is a clear tendency to exclude G in the third codon position. This strong excluding bias is more pronounced in Xenopus than in cod. Furthermore, the strong preference for 'NCA' codons seen in Xenopus is not seen in cod mitochondrial protein genes. D-loop containing-region The D-loop-containing region is localized between the tRNAPro and tRNAPhe genes (Fig. IC). This 64% A+T rich region is 997 bp long (Fig. 2A), which is smaller than reported for frogs (13,29). Brown et al. (30) have divided the D-loop-containing region into three domains. These are the A+ T rich left (L) and right (R) domains at the 5' and 3'ends of the D-loop, which have been shown to have a relatively unstable sequence profile (31), and the more stable central (C) domain. Computer analysis show that the sequence of the D-loop in cod is not conserved compared to frogs and mammals. Only one conserved sequence block (CSB) corresponding to CSB-2 can be identified in cod (see Fig. 2A). The CSBs are located in the oriH region in the D-loop L-domain (32), and seem to be involved in the recognition of primer RNA by RNase MRP (33) during transition from RNA synthesis to heavy-strand DNA synthesis. CSB-1 is not well conserved between Xenopus and mammals (21). In fact, this sequence motif was not found to be identical between two individuals of the same rat species (30). This sequence block (CSB-1) can not be identified in cod from sequence alignments to frogs and mammals. Furthermore, we did not detect any sequences in the L-domain capable of folding into cloverleaf structures, as Brown et al. have suggested for other vertebrates (30). Studies of frog and mammalian mtDNA show that the H-strand synthesis can be arrested near the 3'end of the D-loop-containing region within putatively stable secondary structures (30,34). These secondary structures include the termination associated sequence (TAS) and the conserved pentanucleotide 5'-TACAT-3' (34). In cod, a sequence similar to the proposed TAS consensus (35) is found in the R-domain of the D-loop (Fig. 2A). This sequence, which may form a hairpin structure and also contains the conserved pentanucleotide sequence, is a candidate site for D-loop DNA termination. The cod D-loop contains two pyrimidine stretches of 17 and 27 nucleotides (Fig. 2A), respectively. These are putative binding sites for the mtSSBprotein. This protein binds specifically to single stranded pyrimidine regions and is thought to be important in the regulation of DNA replication (36). Two directly repeated octamer sequences were identified in

Nucleic Acids Research 415 A Rl

R2

I

R3

R4

I II ~~~~~~~I

TCTGAATACCACTAAATAATCGCCGCCGCCCTTTTTAAATCTGAATACCACTAAATAATCGCCGCCGCCCTTTTTAAATCTGAATACCACTAAATAATCGCCGCCGCCCTTTTTAAAATCTGA 4$(tRNA Pro) Py-27

125

ATACCACTAAA TAATCGCCGCCGCCCTT T TTAAAMTAT TGAT T TTCGC T TGAGCT TACCCCT CCT TT TTT CTCCCCCCCCCCCTGCCT CATAATAT GT CCTAGAGAT CATAT TCTT T TTAGTAAA * TAS4-

25 0

TTATTATTTATACATATGTATAATCACCATTAATTAAGTTAACCATACAAGGAGAAATAATCATGAAMAGTCAAGCATTCAGGATAAACAACAATATATTTATAGAACAAATATGGTTATTTTA ACCAATTTATGGAATTTAGTGCAAGAAATTGTAAACATAACCGGACTTTCCTTGCCAAGGCAAACTGTCCAATGAAGGTGAGGGACACGTATTGAAGACCTGCATTCCGTAACACGTTTCCTGGc TATTCTGCCTAGCTTCAGGTCCATTGCTTGACAATCGCTCATAATTTGCACTTTTGTACATCTCTTAATGTCTATACACATATATACTATAATCACTCCCCATGCCGGGCGTTCTTTCTAATGGG Py-17

375 500 62 5

1~~~~~~~~~~~~~~

CTACGGGT TTCTTTTT

TTTTCTTCAAGTCATTTGACATTTCAGCAGTGCAGAGCGTCGACGCCGGACAAGGTGGAGCTAGTCCTCGGTTTATAGATATATAMATTATTTATTTAGGTCCCCATA

750

CSB-2

AGAATAATTACATTAAAGTTTTTCAAGAGCATAAGGCTAAATTTTCTCGATGAGTCCCTATATCTACATTTTACCCCCTTTGTTTTTGTGCGTAAACCCCCCCTCCCCCCAGTTCTCCTGA

875

TI-region

GATTACTAATATTCCTGTAAACCCCCCGGAAMCAGGAAAATCCCTAGAACTGAGTTTTTTTTATCAAAATATGACCAATAATGTATAAATTTGTTGTTATTGCATTATTGCAAATTATTAAMATT TGCTATCGTAGCTTAATTAMGTTTAATACTGAAGATATTAGGATGGACCCTAGAAAGTCCCGAAAGCA 1069 4 tRNA Phe

1000

B

AGGGGCTTAGGATAAACTAGACCAAGGGCCTTCAAAGCCCTCAGCGGAGGTGAAAATCCTCCAGCCCCTGATAAGATCTGCAGGACACTACCCCACATCTTCTGTATGCAAAACAAATACTTTAA L4m tRNA Trp J L *-

T TAAGCTAAGACCTT tRNA Ala 4.

125

OriL

250 TTCTAGACAGAAMGGCCTCGATCCTTTAAACTCTTAGTTAACAGCTAAGCACTCAAACCAGCGAGCATCTATCTACTTTCCCCGCTGTAACGCGGGGAAGCGGGGGAAAG L tRNA Asn 4 L TCCCGGCAAACTGTAAGTCTGCTTCTTCAGATTTGCAATCTGACGTGGTAACACTCCAGAACTTGGCAAGAAGAGGGCTCAAACCTCTGTATGTGGGGTTACAATCCACCGCTTACTCAGCCATC375 tRNA Cys . L

-

M A

I

tRNA Trp

T R W

CTACCTGTGGCAATCACCCGCTGA 399

C M P Q L N P A P W F M

I

F M F T W

CACTAAGAAGCTAATATGGGTTAAGCACCAGCCTTTTAAGCTGGAAGCAGGTGACTCCCAACCACCCTTAATGAAATGCCCCAGTTAAACCCCGCCCCCTGATTTATAATCTTCATGTTTACATG 4 tRNA Lys J 4* ATPase8

125

M T L A I F L T I L P P K V M A H T F P N E P S P Q G M T T P K T A P WJ N W P W H* AGCAAT TTTCCTAACTATTCTTCCCCCAAMMGTAATAGCACACACT TTCCCAAAT GAACCTTCTCCCCAAGGTATAACAACTCCTAAACTGCCCCCTGAAACTGACCATGACACTAA 243 4* ATPase6

D M N L I S T V I L I A S A L S L I L I TCT TTCTAGTACTAAGGAGTATAAGTGGCTTCCAACCACACGGTCT TGGT TAGAGTCCAAGGAAAGATAATGAMCTTMATCTCAACAGTTATCCTTATTGCCTCAGCTTTATCTTTAATTCT TAT 4-a tRNA Gty J 4e ND3 L V S F W L P Q L S P D Y E K L S P Y E C G F D P L G S A R L P F S L R F F L I A I

TCTAGTCTCATTTTGATTGCCTCAACTAAGCCCTGACTACGAAAAGCTATCTCCCTACGAGTGCGGATTTGATCCTTTAGGGAGTGCCCGTCTCCCTTTTTCCCTACGATTTTTTCTAATCGCCA L F L L F D L E I A L L L P L P W G D a L S N P T L T F M W A T S V L A L L T L G TTTTGTTCCTTCTCTTTGACCTGGAAATTGCGCTTCTCCTCCCCCTTCCATGAGGAGATCAACTGAGTAACCCCACCTTAACATTTATGTGAGCAACCTCTGTACTAGCCCTACTAACACTCGGT L I Y E W L Q G G L E W A E * M T P T H CTTATTTACGAATGACTTCAAGGAGGCCTTGAATGAGCTGAATAGGTGATTAGTCTAAGTAAAATACTTGATTTCGGCTCAAGAGTCTGTGGTTAAAGTCCACAATTGCCTAATGACCCCCACTC J 4a tRNA Arg F T

I

S S A F L L G M M G L A F H R T H L L S A L L C L E A M M L A L F

J 4a ND4L

125 2 50

375 500

I A L S L

ACTTTACAATCTCCTCAGCCTTTCTATTAGGTATAATAGGCTTAGCGTTTCATCGAACACATCTCCTCTCTGCCCTTCTCTGTTTAGAAGCCATAATACTTGCCCTATTTATTGCACTCTCCCTC W S L Q L D A T G C S T A P M L M L A F S A C E A S A G L A

L L V A T A R T H G T D T GGTCCT TGCAGT TAGAT GCCACT GGCT GT TCAACT GCCCCCATACT TATGCT TGCT T TCT CCGCT TGT GAAGCAAGT GC TGGACTAGCCCTACT TGTAGCCACAGCCCGAACACACGGGACAGA M L H M Q A L N L L Q C

625

750

CCACATAC!FGCCTT!GATCTTCTACAATGCTAA 784

D ND4 E GTAGAT TTAGTT TAACCAAGACATTAGAT TGTGAT TCTAAAAATAGAGGTTAAACCCCTCTAATCCACCGAGAGAGGCCCGACGGCAATGAAGACTGCTAACTATCACCCCCTTGGTTAGACCCC 4o tRNA His J 4o tRMA Ser EAGY3

AAGGCTCCCTCGAAGCTCCTAAGGATAATAGCTCATCCGTTGGTCTTAGGAACCAAACTCTTGGTGCAACTCCAAGTAGCAGCT J G 4 tRNA Leu CU

Figure 2. Pnmanrv seu nce f the, L-stransd ofcod mtDN]A. The ponlarit of all genesc

125

21 2

NDJ

indicatedA hu arrows below the-

nuctleotitde sequence. Aste-risks detnote thestop codons of ATPase8, ND3 and ND4L. In A), the sequence of the D-loop containing region is shown. RI -4, tandem repeats; Py-17 and Py-27, pyrimidine stretches of 17 and 27 nucleotides; TAS, putative termination associated sequence; CSB-2, conserved sequence block 2; TI-region, putative transcription initiation region. The TAS in sequence A) and the GTG initiation codon of COI in sequence B) are underlined. Facing arrows above the nucleotide sequence at TAS in sequence A) and at the origin of L-strand replication in sequence B) indicate possible hairpin structures. The sequences shown in A to E have been assigned the accession numbers X17660, X17658, X17659, X17661, and X17662, respectively, in the EMBL Data Library. ic

416 Nucleic Acids Research ATPase 8 Cod MPOLNPAPWFMI FMFTWAI FLT ILPPKVMAHTFPNEPSPQGMTTPK-TAPWNWPWH Xen MPOLNPGPWFL ILI FSWLVLLTF IPPKVLKHKAFNEPTTQTTEKSK-PNPWNWPWT

H1

MPQLNTTVWPTMITPMLLTLFLITQLKMLNTNYHLPPSPKPNKMKNYNKPUEPKWTKICSLHSLPPQS

Sur MPQLEFAWJWIVNFSLIWASVLIVISLLLNSFPPNSAGQSSSSLTLN-KTTTNWOWL

55/76 27/41

25/47

ND3 Cod MNLISTVILIASALSLILILVSFWLPQLSPDYEKLSPYECGFDPLGSARLPFSLRFFLIAILFLLFDLEIALLLPLPWGDQLSNPTLTFMUATSVLALLTLGLIYEWLQGGLEWAE Rtr MNLITTIITITITLSAVLATISFWLPQISPDAEKLSPYECGFDPLGSARLPFSLRFFLIAILFLLFDLEIALLLPLPWGDQLHTPTLTLIWSTAVLALLTLGLIYEWTQGGLEWAE 84/93

Xen M- -TAT ILMIAMTLST ILAI LSFWLPOMTPDMEKLSPYECGFDPLGSMRLPFSMRFFLIAI LFLLFDLEIALLLPFPWAAQLNTPSIVI LWAALILTLLTLGL IYEWLQGGLEWAE 74/88 Hm MN- FALILMINTLLALLLMI ITFWLPQLNGYMEKSTPYECGFDPMSPARVPFSMKFFLVAITFLLFDLEIALLLPLPWALQTTNLPLMVMSSLLLI I ILALSLAYEULOKGLDWAE 56/82 Sur MTTIIFLFSITIAVAVLGLAAHALPNRTSDSEKSSPYECGFDPLNSARLPFSFRFFLVAILFLLFDLEIALLFPLPAASLITPPSTLIPISMVFMVILTLGLVFEWINGGLEWAE 55 /75

ND4L Cod MTPTHFTISSAFLLGMMGLAFHRTHLLSALLCLEAMMLALFIALSLWSLQLDATGCSTAPMLMLAFSACEASAGLALLVATARTHGTDHMQALNLLQC

Rtr MTPVHFSFTSAFILGLMGLAFHRTHLLSALLCLEGMMLSLFIALSLWALQMEATGTSVAPMLLLAFSACEASAGLALLVATARTHGTDRLQS

82/96

Xen MTLIHFSFCSAFILGLTGLALNRSPILSILLCLEGMLLMSMDGIVLTPLHLTIYLSSMMLYIMLPFAAPEAATGLSLNSDHYTTHGTDKLFSLNLLEC 46/66 HNm

MPLIYMNIMLAFTISLLGMLVYRSHLMSSLLCLEGMMLSLFIMATLMTLNTHSLLANIVPIAMLVFAACEAAVGLALLVSISNTYGLDYVHNLNLLQC 46/67

Sur MALL-IVILSMFYLGLMGILLNRLHFLSILLCLELLLISLFIGIAIWNNNTGVPQNTTFNLFVLTLVACEASIGLSLMVGLSRTHSSNLVGSLSLLQT 40/66

Figure 3. Alignments of the gene-derived amino acid sequences of three cod mitochondrial proteins to their counterpartsfrom other animal species. Published sequences used for comparison are Rainbow trout (Rtr: 26), Xenopus (Xen: 4), human (Hum: 1), and sea urchin (Sur: 9). Two dots indicate identical aa residues. Single dots indicate changes to chemically similar aa residues (see Tab. 1), and dashes indicate gaps. The numbers indicate identical aa residues and the sum of identical and chemically similar aa residues shared by the cod and the other animal protein sequences.

the L-domain 15 to 30 nt upstream of the 5'end of the tRNAPhegene. These sequences are nearly identical to the inverted octanucleotide promoter sequence found in the same location in Xenopus mtDNA (37). The sequences are 3'-ACGTTATT-5' and 5'-ACPuTTATA-3' in cod and frog, respectively. Thus, these motifs may be important for transcription initiation in cod mtDNA as well. The presence of a functional open reading frame (ORF) in the C-domain of mammalian D-loops have been proposed by Saccone et al. (38). We have searched for such an ORF in the cod sequence (Fig. 2A) and three frog sequences, including two Xenopus D-loop sequences (13,2 1) and the D-loop sequence from Rana (29). Both strands were searched for ORFs displaying aa sequence homology, but no such ORFs were found. In the R-domain of the cod D-loop there are 4 perfect direct repeats of 40 bp each (Fig. 2A). A 4 bp overlap is seen between the first repeat and the 5'end of the tRNAP'r-gene. Such direct repeats have not been identified in the R-domain of mammalian D-loops, but are present in frogs. In Rana catesbeiana there are six perfect direct repeats of 40 bp each (29), and in Xenopus there are two 45 bp repeats (13,21).

RNA on a T-rich sequence in the loop-region. The short 5'flanking sequence is thought to be a recognition site for a mitochondrial RNase H activity and processing of primer RNA prior to DNA synthesis. As can be seen from figure 5, the cod oriL share the overall structure including the stem-loop and the 5' flanking sequence. This indicates that the mechanism proposed for light strand replication in mammals may be conserved among all vertebrates. There is one main difference between the cod oriL structure and those from mammals and frog. The conserved T-rich loop sequence is replaced by a C-rich sequence (see Fig. 5). Consequently, the RNA primer is most probably initiated at a run of pyrimidines in vertebrates, and initiation is not restricted to thymines as suggested previously (39). Recently, Welter et al. (41) have found a DNA curvature structure in the 3'-region of the ND2 gene in human mtDNA. DNA bending has been suggested to play a regulatory role in the initiation event of Lstrand replication. We found no such sequence in the 3'-region of the cod ND2 gene.

Origin of L-strand replication The light strand origin of replication (oriL) is located in a cluster of five tRNA genes. This region contains a stem-loop structure of about 40 nucleotides partially overlapping with the tRNACYs gene. OriL consists of a stem of 11-12 bp, a 12-13 nucleotide long loop, and a short flanking sequence at the 5'end (Fig. 5). OriL of human mtDNA has been thoroughly studied (39,40). Replication of the L-strand begins with the synthesis of a primer

Sequence data from mammals, amphibians and now from bony fish, confirm that the mitochondrial genome organization is conserved among vertebrates. Conservation of gene order and orientation is also seen among arthropodes (28,42-44), among echinoderms (9,10,45,46) and among nematodes (25), but small gene rearrangements are described among members of the same phylum (44,46). It seems that each animal phyla has its own way of organizing the genes encoded in the mitochondria. Possible

DISCUSSION Genome organization of animal mitochondria

Nucleic Acids Research 417 A3' C C 59 G

A 3'

D -ARM

C UC CU A A 6 U 6A 6 86

C

A

Au

U

C6A

u C

6 6

CUCCCC0U

U A

Co

C U

C 6U

A

U

CA

6

C-C

U-A

C-C

U

U

U

A

C

9CUU

A 3'

5A-U

U-A

A 3'

C A

59

tRNA-Tyr

ICCA

C-c

U C

U-A A- U A-U

A-U

C-C U* C

A

AUSC

6

C

-U -C-

_A C C CA

A A

U

m

_CS C-C tRNA-Cys

C

U

U

A

UC UC CC

A

U

CC6

C t

6-C

6A

U-A

UuC "CU

-

CuU A UUAAuCCA A A U C 6GA

1

C C A CCAC II

A

UUA CUCcc 00 AC

CU UA A8CCA

ucc

C-CAU

U -A

AC-C c-C

UA A

C

U

A

U

U C- A

C

U-A U-A A-U

U-A

A 3' C

6

UUU

CU

U-AA U-C

BAA

6

ACGGAU C

A A

.1 1 1

UA

U

C

A

6

U U U C CG A

A

C C U6 U

*

I I I

A A

6, ,u ,6

A-U

CU6UA

A A-UCA C-6

IU IA Au

C

A-U U-C

C-C U

ANTI-CODON 6-C A-C ARM C

A- U

U U-A

U AA A

C

U -A

tRNA-Asn

A-U

C

6 C

C

A -U

tRNA-AIa

AMINO-ACYL C-C ARMC-6 TwC-ARM U-A A

C

C A

tRNA-Trp 6-C U

A3'

C

C U

C

5'C -C-A A

A A

UUU

tRNA-61A-UU-A rC-Cl

UA

A A3'

A 3'

C

A

tRNA-Ar

U-A

UC

0

tRNA-His AU-A

-CUc

C

U -A tR AA C Gl-AU U A C-

U A 6AU

I

-

A

U

U -AC

A C A C

A

U C AA ACC

Cu

A

CA

A A

C

UA

A 3'

tRNA-Ser

A- U

(AGY)

A6 C

U

A

U

CUC A3'

C

C

C

C

c

A C

tRNA-PPhe c-C

U-A

c

A

U-A

C

A A C C

Cc CC

6-C A-U U

c-C tRNA-Lsu U-A

-

CCA

UU A

-

A

AAA

UAAU

A

C-C

UCA

AAIA UUCC AU U-AAc u 0 U-A9U C-C A C C

u

C

C C UU

UEAAA

-U AA-U

U- A

A- U

C

A- U

A

A

6*U

C

A C

A A

U

CU

C -C U- A C

cC

A UA C A 6A

U-A

C-

A

A-U

A3'

(CUN)

U

AA U-A

1CC

C

U

A

U

U- A 6- C A-U U C U

A A

U U

C A ACACA

CA U-A

UCCA

6-

A

C-

UAU-A

C C U CC C

U

C

0U6 U

U,

U, . UA A A A U

U-A

-A

U

C CC

6- C A U

A

UACS

U A

6A AA6 CC C U A

U-A

U- A A- U C

A

CGAA

Figure 4. Proposed cloverleaf structures of cod mitochondrial tRNAs as inferredfrom the sequences of the genes. The nucleotides CCA, added by posttranscriptional modification at the 3'end, are included. No modified ribonucleotides are indicated in the figure. Anticodons are boxed.

evolutionary rearrangements between phylum-specific genome organizations have been proposed by Cantatore et al. (12). A possible explanation for the conservation of mitochondrial gene organization seen within animal phyla, can be found in the

hypothesis of Kotylak et al. (47). They suggest that the loss of genetic recombination in mitochondrial genomes of metazoans be a consequence of the loss of introns and intron-encoded DNA recombinases.

may

418 Nucleic Acids Research PRIMER

G C GC C T C T C A

C C T T

T T T

T A A

T

C

v INITIATION

T C GG

A-T

A-T

T

C

A-T

G-C

T T T

C

G G

T

T

G-C

A-TG

C-GC

C -G

G-C

G-C

G-C

A-T

G -C G-C A-T '

G-C

G -C

G-C

A-T

A-T

A-T

A-T

A-T

ACG|GGCCG|T-5

212

Cod

T

-

258

A-T

A-T

G-C

3'-A GA

G -C

3'-GAT

G|GGCCGIT-5'

7224

Xenopus

7270

3'-GAT 5727

GGGGGCC T-5 5771

Human

Figure 5. Proposed stem-loop structures at the origin of mitochondrial L-strand replication in cod, Xenopus and humans. Nucleotide sequences of template Hstrands are shown, and arrows indicate the proposed direction of L-strand synthesis (39). The putative RNA primer processing site at the 5' end (40), is boxed. The numbers refer to nucleotide positions in the respective mitochondrial genomes (1,4, and Fig. 2B).

GUG translation initiation codon in CO-I DNA and protein sequencing studies have shown that in mammalian mitochondria, both AUA and AUU can function as non-AUG initiation codons specifying methionine as the aminoterminal residue (48). GUG is frequently found as an initiation codon in Gram positive bacteria and is also used to some extent in Gram negative bacteria. In cod mitochondria, it is likely that GUG has a dual coding function both as an initiation codon specifying methionine and as an internal codon specifying valine. In vertebrate mitochondria there is only one tRNAMet-gene encoding a tRNA with a CAU anticodon (5). In cod this tRNA must recognize both AUG and GUG codons, indicating that the atypical GDU base pair is accepted between the first position of the initiation codon and the third position of the anticodon. Based on observations from sea urchin and rodent mtDNA, Cantatore et al. (10) have suggested that GUG as an initiation codon can only be found in genes which immediately precede the end of the previous gene. This is not the situation in cod mtDNA since there is one spacer nucleotide between the 5'end of the tRNATyr and the GUG translation initiation codon of the COI gene (Fig. 2B). No open reading frames in the vertebrate D-loop region The presence of a short open reading frame localized in the Cdomain of the D-loop has been proposed for mammalian mtDNA by Saccone et al. (38). Aa sequence comparisons of the putative ORFs, showed a small common region. But for this ORF to be functional it must use a different codon strategy than the rest of the mitochondrial genome since it contains the mitochondrial stop codons AGA and AGG in frame. There has been no reports on expression of such an ORF. Upon comparison we find that this ORF is not conserved between man and three different ape species (35). The ape D-loop sequences have no start codon for this proposed ORF and contain reading frame shifts compared to the human ORF. Furhermore, since we did not find such putative ORFs in cod and frog D-loops, we conclude that it is unlikely that this region has any gene coding function in vertebrate mitochondria.

R-domain repeats and heteroplasmy The presence of perfect tandem repeats in the R-domain of the D-loop-containing region in cod and frogs may be linked to some of the mtDNA size heteroplasmies reported in lower vertebrates

(7,8,49,50). These are due to direct tandem duplications in the D-loop-containing region. It is interesting that this type of heteroplasmy is reported in the frog Rana esculenta (49), which belongs to the same genus as Rana catesbeiana where six copies of a 40 bp directly repeated sequence are found. In lizards, heteroplasmy of mtDNA has been reported to be due to the presence of 3-9 copies of a 64 bp tandemly repeated sequence (50) in the D-loop-containing region, but not within the D-loop itself. This agrees with the localization of the repeats at the 3' end, as seen in cod and frogs. Slipped mispairing during replication is probably the most likely mechanism for variation in copy number of these tandemly repeated sequences (50). In conclusion, we have shown that the gene content, genome organization, genetic code, and probably also the mechanisms for DNA replication are conserved among the vertebrate mitochondrial genomes from bony fish to humans. Since the Dloop region in cod is highly diverged compared to mammals and frogs, sequence data must be obtained from more related species to identify primary and secondary structure conservations. Furthermore, experimental identification of the initiation and termination sites of the D-loop DNA as well as the promoter regions in fish mtDNA will lead to a greater understanding of the mode of action and evolution of mtDNA replication and transcription among vertebrates in general.

ACKNOWLEDGEMENTS We thank Dr. Zbigniew Kotylak and Dr. Finn B. Haugli for critically reading this manuscript. This work was supported by a grant from the Norwegian Research Council for Science and the Humanities to T.J.

REFERENCES 1. Anderson,S., Bankier,A.T., Barrell,B.G., deBruijn,M.H.L., Coulson,A.R., Drouin,J., Eperon,I.C., Nierlich,D.P., Roe,B.A. Sanger,F., Schreier,P.H., Smith,A.J.H., Staden,R. and Young, I.G. (1981) Nature, 290,457-465. 2. Anderson,S., deBruijn,M.H.L., Coulson,A.R., Eperon,I.C., Sanger,F. and Young,I.G. (1982) J. Mol. Biol., 156,683-717. 3. Bibb,M.J., Van Etten,R.A., Wright,C.T., Walberg,M.W. and Clayton,R.A. (1981) Cell, 26,167- 180. 4. Roe,B.A., Ma,D., Wilson,R.K. and Wong,J.F. (1985) J. Biol. Chem.,

260,9759-9774. 5. Cantatore,P. and Saccone,C. (1987) Int. Rev. Cytol., 108,149-208. 6. Clayton,D.A. (1982) Cell, 28,693-705. 7. Bermingham,E., Lamb,T. and Avise,J.C. (1986) J. Hered., 77,249-252. 8. Bentzen,P., Leggett,W.C. and Brown,G.G. (1988) Genetics, 118,509-518. 9. Jacobs,H.T., Elliott,D.J., Math,V.B. and Farquharson,A. (1988) J. Mol.

Biol., 202,185-217. 10. Cantatore,P., Roberti,M., Rainaldi,G., Gadaleta,M.N. and Saccone, C. (1989) J. Biol. Chem., 264,10965-10975. 11. Margulis,L. and Schwartz,K.V. (1982) Five Kingdoms: An fliustrated Guide to the Phyla of Life on Earth. Freeman, San Francisco. 12. Cantatore,P., Gadaleta,M.N., Roberti,M., Saccone,C. and WilsonA.C. (1987) Nature, 329,853-855. 13. Dunnon-Bluteau,D., Volovitch,M. and Brun,G. (1985) Gene, 36, 65-78. 14. Johansen,S. (1988) DNA Prot. Engin. Techn., 1,77-79. 15. Mork,J., Ryman,N., St.h1,G., Utter,F. and Sundnes,G. (1985) Can. J. Fish. Aquat. Sci., 42,1580-1587. 16. Yanisch-Perron,C., Vieira,J. and MessingJ. (1985) Gene 33,103-119. 17. JohansenT. (1988) DNA Prot. Engin. Techn., 1,57-59. 18. Devereux,J., Haeberli,P. and Smithies,O. (1984) Nucleic Acid Res., 12,387 -395. 19. Harr,R., FIllman,P, Haggstrom,M., Wahlstrom,L. and Gustafson,P. (1986) Nucleic Acid Res., 14,273-284. 20. Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA, 74,5463 -5467. 21. Cairns,S.S. and Bogenhagen,D.F. (1986) J. Biol. Chem., 261, 8481-8487.

Nucleic Acids Research 419 22. Gilbert,T.L., Brown,J.R., O'Hara,P.J., Buroker,N.E., Beckenbach,A.T. and Smith,M.J. (1988) Nucleic Acid Res., 16,11825. 23. Kocher,T.D., Thomas,W.K., Meyer,A., Edwards,S.V., Paabo,S., Villablanca,F.X. and Wilson,A.C. (1989) Proc. Natl. Acad. Sci. USA,

86,6196-6200. 24. Nagley,P. (1988) Trends Genet., 4,46-52. 25. Wolstenholme,D.R., MacFarlane,J.L., Okimoto,R., Clary,D.O. and Wahleithner,J.A. (1987) Proc. Natl. Acad. Sci. USA, 84, 1324-1328. 26. Thomas,W.K. and Beckenbach,A.T. (1989) J. Mol. Evol., 29,233-245. 27. Gadaleta,G., Pepe,G., De Candia,G., Quagliariello,C., Sbisa,E. and Saccone,C. (1988) Nucleic Acid Res., 16,6233. 28. Clary,D.O. and Wolstenholme,D.R. (1985) J. Mol. Evol., 22, 252-271. 29. Yoneyama,Y. (1987) Nippon Ika Daigaku Zasshi, 54,429-440. 30. Brown,G.G., Gadaleta,G., Pepe,G., Saccone,C. and Sbisa,E. (1986) J. Mol. Biol., 192,503-511. 31. Mignotte,B., Dunon-Bluteau,D., Reiss,C. and Mounolou,J. (1987) J. Theor. Biol., 124,57-69. 32. Walberg,M.W. and Clayton,D.A. (1981) Nucleic Acid Res., 9, 5411 -5421. 33. Chang,D.D., Fisher,R.P. and Clayton,D.A. (1987) Biochem. Biophys. Acta,

909,85-91. 34. Dunon-Bluteau,D.C. and Brun,G.M. (1987) Biochem. Int., 14,643-657. 35. Foran,D.R., Hixon,J.E. and Brown,W.M. (1988) Nucleic Acid Res.,

16,5841-5861. 36. Mignotte,B., Barat,M. and Mounolou,J. (1985) Nucleic Acid Res., 13,1703-1716. 37. Bogenhagen,D.F. and Yoza,B.K. (1986) Mol. Cell. Biol., 6,2543-2550. 38. Saccone,C., Attimonelli,M. and Sbisa,E. (1987) J. Mol. Evol., 26,205-211. 39. Wong,T.W. and Clayton,D.A. (1985) Cell, 42,951-958. 40. Hixson,J.E., Wong,T.W. and Clayton,D.A. (1986) J. Biol. Chem., 261,2384-2390. 41. Welter,C., Dooley,S., Zang,K.D. and Blin,N. (1989) Nucleic Acid Res., 17,6077-6086. 42. Garesse,R. (1988) Genetics, 118,649-663. 43. Batuecas,B., Garesse,R., Calleja,M., Valverde,J.R. and Marco,R. (1988) Nucleic Acid Res., 16,6515-6529. 44. Haucke,H. and Gellissen,G. (1988) Curr. Genet., 14,471-476. 45. Himeno,H., Masaki,H., Kawai,T., Ohta,T., Kumagai,I., Miura,K. and Watanabe,K. (1987) Gene, 56,219-230. 46. Smith,M.J., Banfield,D.K., Doteval,K., Gorski,S. and Kowbel,-D.J. (1989) Gene, 76,181-185. 47. Kotylak,Z., Lazowska,J. and Slonimski,P.P. (1985) In Quagliariello,E., Slater,E.C., Palmieri,F., Saccone,C. and Kroon, A.M. (eds), Achievements and Perspectives of Mitochondrial Research. Elsevier, Amsterdam, Vol. 2, pp. 1-21. 48. Fearnley,I.M. and Walker,J.E. (1987) Biochemistry, 26,8247-8251. 49. Monnerot,M., Mounolou,J. and Solignac,M. (1984) Biol. Cell, 52,213-218. 50. Densmore,L.D., Wright,J.W. and Brown,W.M. (1985) Genetics, 110,689-707. 51. French,S. and Robson,B. (1983) J. Mol. Evol., 19,171-175.

Organization of the mitochondrial genome of Atlantic cod, Gadus morhua.

The mitochondrial DNA (mtDNA) from the Atlantic cod, Gadus morhua, was mapped using 11 different restriction enzymes and cloned into plasmid vectors. ...
1MB Sizes 0 Downloads 0 Views