J Mol Evol (1991) 33:83-91
Journal of Molecular Evolution (~) Springer-Verlag New York Inc. 1991
The Main Regulatory Region of Mammalian Mitochondrial DNA: Structure-Function Model and Evolutionary Pattern Cecilia Saccone, ~ Graziano Pesole, ~ and Elisabetta Sbis~ 2 Dipartimento di Biochimicae BiologiaMoleeolare,Universitfidi Bari, Italy 2Centro di Studio sui Mitocondrie MetabolismoEnergetieo,CNR, Bail, Italy
Summary.
The evolution o f the main regulatory region (D-loop) o f the mammalian mitochondrial genome was analyzed by comparing the sequences o f eight mammalian species: human, c o m m o n chimpanzee, pygmy chimpanzee, dolphin, cow, rat, mouse, and rabbit. The best alignment o f the sequences was obtained by optimization of the sequence similarities c o m m o n to all these species. The two peripheral left and right D-loop domains, which contain the main regulatory elements so far discovered, evolved rapidly in a species-specific manner generating heterogeneity in both length and base composition. They are prone to the insertion and deletion o f elements and to the generation o f short repeats by replication slippage. However, the preservation o f some sequence blocks and similar cloverleaf-like structures in these regions, indicates a basic similarity in the regulatory mechanisms o f the m i t o c h o n d r i a l g e n o m e in all mammalian species. We found, particularly in the right domain, significant similarities to the telomeric sequences of the mitochondrial (mt) and nuclear D N A of Tetrahymena thermophila. These sequences may be interpreted as relics o f telomeres present in ancestral linear forms o f m t D N A or may simply represent efficient templates o f R N A primase-like enzymes. Due to their peculiar evolution, the two peripheral domains cannot be used to estimate in a quantitative way the genetic distances between mammalian species. On the other hand the central domain, highly conserved during evolution, behaves as a good molecular clock. Reliable estimates o f the times o f divergence between closely and distantly related species were ob-
Offprint requests to: C. Saccone
tained from the central domain using a Markov model and assuming nonhomogeneous evolution o f nucleotide sites.
Key words: Mammalian mitochondrial DNA -Origin of replication -- Mitochondrial DNA evolution -- Stationary Markov model -- Phylogenetic tree -- Telomeres -- D-loop -- Regulatory region
Introduction The presence of only one major noncoding segment in the mitochondrial genome is a feature c o m m o n to all metazoa. In vertebrates this region, spanning between the Phe- and Pro-tRNA genes, is called the D-loop-containing region because of the threestranded displacement (D) loop structure created by the nascent heavy (H) strand at the level o f the Hstrand replication origin (On). It also contains promoters for the transcription o f both the heavy Strand (HSP) and the light strand (LSP). This region is the target site for numerous proteins and enzymes, such as D N A and RN A polymerases and transcription and regulatory factors and is thus subjected t o various evolutionary pressures. Because all these proteins are coded for by nuclear DNA, the study o f the D-loop-containing region is also extremely important for shedding light on the processes inherent in nucleus-mitochondrion coevolution. In order to gain deeper insight into the evolutionary dynamics o f the noncoding region o f the mammalian mitochondrial genome, we undertook a detailed investigation of its evolution at the molecular level. In previous papers we have identified several well-preserved features in the evolution of
84
COM
...........
20 30 40 50 60 70 80 90 I00 T TCT TTCATGGGGAAGCAAATTTAAGTGCCACCCAAGTATTGG ............................................................. :::::::::::::::::::::::: :: ::::: ::::: ::: TTCT TT CATC~C-GCAAGCAAATT TAG~TACCACC TAAGTACT GG .............................................................
.~
...........
~ ; ~ ; ; k k ; ~ i ; ; ; ~ i ~ ; i T ~ k ~
PYG
i0 ...........
ii0
120 .- - - C . ---C
.............................................................
.---~
COW
:: : :: : : :: :::: :::::: : AAAAAAGCTTAT T - G T A C A A T ~ A C C A C A A C C C C A C A G T G C C A C GT C A G T A T T A A A A G T A A T T T A T T T ~ A A A A A C A T T T T A C T G T A C A C A T T A C A T A C A C C A A T A C . . . . . . . . . . .T T A G :: :::: :: : :: ::: :: :: :: ::: ::::::::::: ::::: ::: : : ::: ::::: : : ::;: :: .... AACACTAT TAATATAGT T - C CATAAATACAAAGAGCCTTAT CAGTAT TAAA ..... TT TAT CAAAAAT C C CAATAACT CAACACAGAAT TT GCACC CTAACCAAATAT TAC AATG
RAT
TCAG
DOL
MUS RAB
PYG COM MAN DOL COW RAT MUS PAB
PYC COM MAt4 DOL COW
...........
130 140 T-C-ATTCACTA ............... : I ::::: :: T-C-ATTCATTA ............... : I : ::: : T-C-ACCCATCAA .............. : I : : I
150 160 170 180 190 200 210 220 230 240 TAAC-CGCTATGTATT-TCGTAcATTACTG-CCA--GCCACCATGAATA--TTACATAGTACTATAATCATTTAACCACCTATAACACATAAA :11 :::11111:1: I:ll:lll::::: ::: ::::111::::: : : :11:: :1: ::: ::: ::::::: ::::::: CAA•-CC-CTATGTATT-TCGTA•ATTA•TG-CCA--G••AC•ATGAATA--TCGTA•AGTACCATA-TCACCCAACTACCTATAGTACATAAA ::11 :::11111:1: I:11:111::::: ::: ::::111::::: : :::: I1::111:. 11 :1 1::: ::::;:::::: CAAC-•GCTATGTATT-TCGTACATTACTG-CCA•-GCCA•CATGAATA--TTGTACGGTACCATAAATACTTGACCACCT•TAGTACATAAA II 11111111:: I II III: : : III: II I::: : : : :: : :::::: T•C•T•TCTTTGTAAATATT•ATATA••TACATCCTATGTATTATTGTGCATTCATTT--ATTT---CCATACG-A-TAA ..... GT .... TAAAG-CCCGTATTAAT-TA-T-CATTAA : I : : I ::: : I1:: Illit:l :l II 111: : :: :: : - II1: : 9 :::. 9 I1...I : : :" TAC-ATAACATTA-AT-GTA-ATAAA--GA~ATAATATGTAT-ATAGTA~ATTAAATT--ATATGCCCCATGCATA-TAAGCAA~GTA~ATGACC~TCTATAG ................ ::1 ::11 I: 1 1: : :.:: 11 II11 II1:111:1 I:1111:::1:: 1: : 1:111:111 11:::1 I1: :1 "::1 9 . TAC-ATAAAATGATATGG-ACATTAA--AACATT-TATGTAT-ATCGTACATTAAATT--ATTTTCCCCAAGCATA-TAAGCAT-GTA--ATATATATCTAATGATTT ............ ::1 ::::: : : :: :-::: : II::: Illll:l :l;l{:ill:::: : :::::::ill::::: ;:::: II:. :+ : ::: :::: :: TAC-ATAAATTTACATAGTACAACAG--TACATT-TATGTAT•ATCGTACATTAAACT--ATTTTCCCCAAGCATA-TAACCTA-GTAC•ATTAA-ATC-AATGGTT• ............ :1 : : : : : : : :::: :::;; II;; IIIII I :l:ll III ::: :.. : ::111 : :: ::::::: If:: :1:: 9 9 :: : : . . . . . . AACAATAAAT-T-CATAA-CCAACATTTAACATACTATGTTTAATcGTGCAT-AAATTCCTCATCCCCCATGAATAATAAGCTA-GTAC-ATTACTGCTTGATTGGACATAATCCACT--
250 260 270 280 290 300 310 320 330 340 350 360 _CAGTACATAGCACATACAATTATATACCGTACATAGCACATTACAGTCAAATCCATCCTCGCCCCCACGGATG .............................................. ::: :11::: I:1::::: ::: :::::::::::::::::::::::::I 1: :::1:::::::::::I:: _CAGAACATAGTACATACAACCATACACCGTACATAGCACATTACAGTCAAACCCCTCCTCCCCCCCACGGATG .............................................. ::: :11::::::1:: : :::: :::::::::::::::::::::::::: :::1 :1:: ::::: ::::: CAGTACATACTACATAAAGCCATTTACCGTACATAGCACATTACACTCAAATCCCTTCTCGTCCCCATGGATG .............................................. ::11:: :::1: : : I : :1:11: I :: : I: 1: I : :: : •TTTTACATATTACATGATATGTATAATCTTACATATTATATATCCCCTAACAATTTTATTTCCATTATACCTATGGTCGCT ...... CCATTAGATCACGAG ................. ::ll:: :1:I: :: :: I :1:11::1 :: I1 I1 1: : :::: :::: : : ::::::::::::::: ~CAGTACATAATAC~TATAATTATTGACTGTACATAGTACATT-ATCTCAAAT~CATTCTTGATAGTATATCTATTATATATTCCTTACCATTAGATcACGAG .................
: RAT MOS RAB
PYG
:ll
:l:l:
:::
:
l::
:I
II
I::
:I
:
:l::l
COM
......................................................... .........................................................
DOL
..............................................................................
COW
..............................................................................
RAT
........
COM MA~
:
::
::::::
.... AAG-ATRATGCTT-ATTAGACATATCTGTGTTATTAGACATG-:: ::::: :: :::::::::::::::: ::::: .... AAA-CTAATQ-TTATAACGACATATCTGTGTTATCTGACATA-:: :::.:::::: :: ::::: :: :::: .......... AAATCTAATGATTGACTTGACATCAGACATCAATTC--CATAAT A>
::::::::::::::::::::::::
:lllll:l::ll
480
l~ll:ll::t:lllll::::::
CTCCCCCTCAGATAGGAATCCCTTGGT-CACCATCCTCCGTGAAATCAATATCCCGCACAAGA :::1:::::::::: ::::::: 111111:1::11 IIII11:::111:111:::::: ACCCCCC~CAGATAGG~GTCCCTTGA-CCACCATCCTCCGT@AAATCAATATCCCGCACAAC-A ::: :. :11111 I II IIII II I IIIII CTTAAT-CACCATGCCGCGTGAAACCAGCAACCCGCTCC-GCA :::::: IIII111::11 III1:11::1:11111: CTTAAT-TACCATGCCGCGTGAAACCAGCAACCCGCTAGC~IA . . . . . . . . . . :: :.:11111 I II I I I I II : 1 : 1 1 1 1 1
490 500 510 520 530 540 550 GTG- - TACTCT CCTC-GCTCCGGGCCCAT -AACACTTGGGGGTACCTAA-ACT GAA- CTGTATCCGACATCTGGTTC
I
: ::::
560 570 580 590 600 CTACCTCAGGGCCATGAAG- TTCAAA- GGACTCCCACACGT
::: 11111:11 I1:1:11 IIII :::: IIIIlll::lll: : Ill: :l:ll:::l:lll::lJlll:l:l::llll: :111:::: :::::: ::::::11::1:1 GTG- - -ACTCTCCT C - GCTCCGGGCCCAT -AACAT CTGGGGGTAGrCTAA--AGTGAA- CTGTATCCGACATCT GGTTCCTACCTCAGGG-CCATGAAG-TTCAAA-AGACTCCCACACGT ::: :1111:11 I1:1111 IIII :::: IIllll::lll: ::111: :l:ll:::l:lll::lllll:l:l Illll: III ::: ::: : :ll::l:l GTGCT -ACTCTCCT C -GCTCC C~P~GCCCAT-AACACTTGGGGGTAGCTAA--AGTGAA- CTGTATCCGACATCT G,@TTCCTACTTCAGGG- TCATAAAG-CCTAAA- TAGC - -CCACACGT : : IIII II II I I I I IIII : : I}llll::tll: ; III :1 I 1 : I;111: IIIII I11:11111 }11 :::: : :: :11:I:1 GG-ATCCCT
::.::::llll:ll
II
COW
GGGAT CCCTCTTCTC-
GCTC CGGGCCCAT
RAT
::
:::::
CACCATTAAGTCATAA . . . . . . . ACCTTTCTCTT--CCATATGACTATCCCT•TCCCCAA-TTGGTCTCTATT--TCTACCATCCTCCGT•AAATCAACAACCCGCCCACTC :::::: :::1:::: :: ::::::: ::::::::::::::: : ::: : :::: :::I:::::::::::::::::: Jill I1::1:11111:::: ........ CACCATACAGTCATAA ....... A•TCTTCTCTT--CCATATGACTATCCCCTT-CCCCATTTCC--TCTATTAATCTACCATCCTCCGTGAAACCAACAACCCGCCCACCA ::::: : :: :: :::: ::::: ::::: ::::::::: :.::::: : .::: :: :11111:1::11 III1:11::1:11111:::::: TAAACATAGACCATCAAATC-TACACACACCACTC`AACTCTTACC•ATACGACTATCCCTCTCCCCCA---GTCCTCTCACAACTTACCAT•CTC•GTGAAACCAACAACCCGCCCACCA
II
:.1:1:1
It::
9
:it
-It:
.1:
I:
:1:
I:
I
II
-::::
:1:11:
:
:
I
I
It
:
::
:;
:::
I
I
:
:
.......
..... TAAGGG-TCATTTATCCTCATAGAC
::::
:
.... CACAGTCTA-GACGCACCTAC-GG
9 ;;;:1
..............................
:
...............................
:::
.... CAAAGTCCT-GTGGAACCTTTTAGT
::
:::::::::::::::::::::::::::
::
I:l
ACTCAGCTATGGCCGTC-AAA0GCCCTGAC-CCGG--AG-CATC--TATTGTAGC--TGGA--CTTAACTC-CATCTT~A--GCACcAGC
:l:l
:::
..... TGATTCCT~CCTCATCCTATTATTTATCGCACCTACGTTCAATA
:::
COW
9 I:l:l
:::::::::::::::::::::::::
I:l::::::::
..... ACAGTCAAATAAATTGTAGC-GGGCCTGTGTGTATTTT---TGATTGGACTAGCA
::::li::l(
I
I
:
:::::
:::
...................
::
:
:
...... TGAAGAATCATTAGTCCGCAAAACC
:::::::
I:l;:;
.... TATTGTAGACGA-GCACCTAA
.........
:
:
:::
...................
::
::
TGAAGA-CCCTCCATCCTCATAATT
...................
850 860 870 880 890 900 910 920 930 940 TTATTACCTAGCATGATTTACTAAAGCGTG-TTAATTAATT~-TGcTTGTAGGACATAA-CAATAG-CAGCAAAATAC-CACGT~-AACTGCTTTCCACACCAAC-ATCATAACAAAAAA
COM
TTACGACCTAGCAT-ACCTA~TAAAGTGTG-TTAATTGATTAATGCTTGCAGGAcATAA-CAACAG-CAGCAAAATGCTCACAT~-AACTGCTTTCCACACCAAc-ATCATRACAAAAAA
MAN
TTACAGGCGAACAT-ACTTACTAAAGTGTG-TTAATTAATTAATC.;CTTGTAGGACATAA-TAATAA-CAATTGAATGTCT0CAC~-AGCCACTTTCCA~GAC-AT~T~
DOL
................
COW
............
::::
:
:
:::
::
:::::::::l::
l:::::
:I::111:I::
::I11111:
::
:
::
::~:
::
:
CAACCAACAG-GTG-TTATTTAATTAATGGTTACAC-GACATAT-TACTCTATTATT---CCCCCGGGT.-
:
9
:
::
I
I::
:
I
:llt:l
::::IIIIII
:
:
I::
I
:..
I
III
I
;
Illlll:::
:
AAAGCTCGAAAGAC-T-ATTTTATTCATGTTTGTAAGACATAAATAT--TTATAAATACTG
MUS
................
CAATCACCTAAGGC-TAATT--ATTCATGCTTGTTAGACATAAATGC--TACTCAATACCAAATTTT-.
RAB
..................
: :
::
COM
AGACGCCAGCCTAGCCAGACTTCAAATTT-.-CATCTTTAGGCGGTATGCACTTTTAACAGT
.... CACCCCTCAATTAACATGCCCTCC--CCCCTCA-ACT-CCCATTCTACTAGCCC
TAACACCAOCCTAACCAGATTTCAAATTT-o-TATCTTTTCGCGGTATGCACTTTTAACAGT
.... CACCCCCCAACTAACACATTATTT-TCCCCTCCCACT-CCCATACTACT
cow
i~
RAT
-AAATAAAACAAAAAGCTACT
MUS
GAAAGACATATAATATTAAC
RAB
CCGCACAGTATTTACTTAGACT-AAATT-~
.................................................. i A T c §