Nucleotide sequence of the arrestin-like 49 Kd protein gene of Drosophila miranda.

5894 Nucleic Acids Research, Vol. 18, No. 19

Nucleotide sequence of the arrestin-like 49 Kd protein gene of Drosophila miranda Rajesh Krishnan and Ranjan Ganguly* Department of Zoology and Cell, Molecular and Developmental Biology Program, University of Tennessee, Knoxville, TN 37996, USA Submitted August 14, 1990

EMBL accession no. X54084

We report the genomic nucleotide sequence of the D. miranda homologue of the gene 507, that encodes a 49 Kd photoreceptor cell specific protein in D. melanogaster (1, 2). This protein undergoes light activated phosphorylation and shares a 42% amino acid identity with vertebrate arrestin (1), which is involved in the vertebrate visual phototransduction pathway (3). The gene 507 is different from the one mapped at locus 36D of D. melanogaster, whose product shows a 40% amino acid identity with vertebrate arrestin (4, 5). D. melanogaster and D. miranda have diverged 45 million years ago and in the course of this divergence, gene 507, which is autosomal (66D) in D. melanogaster (6), has become X-linked (12A) in D. miranda (2). Consequently this gene has been subjected to additional selection pressures brought on by its new X-linked and dosage compensated status in D. miranda (2). Viewed against this background the coding regions of gene 507 in D. miranda and D. melanogaster are remarkably conserved, showing an 87 % homology at nucleic acid level and a 98 % similarity at amino acid level. None of the amino acid changes occur at domains thought to be similar in function to the vertebrate homologue (1). Six amino acid substitutions are in exon I and one of them is in exon II. Shown below is the genomic sequence of gene 507 in D. miranda. The translational start and stop codons, identified by similarity to the D. mekanogaster sequence, are located at nucleotide no. 139 and 10

20

30

40

50

1469 respectively, and are underlined. Also underlined are a TATA like element and a putative polyadenylation signal at positions 17 and 1657, respectively. We have not determined the exact transcriptional start site. Two arrowheads show the positions of the boundaries of the two introns whose sequences are underlined.

ACKNOWLEDGEMENTS We thank Dr D.Brian, Dr P.Sethna and Dr S.Abraham for use of the Beckman Microgenie program. This work was supported by the University of Tennessee Start-up Research Fund and a Faculty Research Award to R.G. REFERENCES Yamada,T. et al. (1990) Science 248, 483 -486. Krishnan,R., Swanson,K. and Ganguly,R. (1990) Chromosoma, in press. Kuhn,H. and Wilden,U. (1987) J. Recept. Res. 7, 283-298. Hyde,D. et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1008-1012. Smith,D., Shieh,B. and Zuker,C. (1990) Proc. Natl. Acad. Sci. USA 87, 1003-1007. 6. Levy,L.S., Ganguly,R., Ganguly,N. and Manning,J.E. (1982) Dev. Biol. 94, 451-464.

1. 2. 3. 4. 5.

60

70

80

90

110

100

120

GGACAGGGCAAAGGTAUI^^GCAGGTCCTACGGC.ACTIOGfCCXll CCTCCGGGC AAACGGAAAACTCAGGAGCTCCCAGTCCTGGTCTTAGGAAAAGGAMACAGCAAAAGA 140 180 130 1S0 160 170 190 200 240 210 220 230 AAAGAAAAACCGCTAGTTJMGTTGTCTCCCI CAAC=CAI AGCCACOACCC AACGGC AAGGTCACTTTCTATTTGGCGCCCCGCACTTCATCGATCACTTGGACTACTGCGAT 260 250 270 280 290 300 310 320 330 340 360 350 CCCGTCGTTCTATTCTCGTGGA GCCr.ATCTAGAACGCAACTIG 11CCAGCTCCCCACGACCTATCGCTATGGCCGCGAGGAGGACGAGGTCATGGGCGTCMACTTC

380

370

390

400

420

410

440

430

450

460

470

480

TCGAAGGAGCTGATCCTCTCCCOGGCGA"CGMATGCCGATGACAACCGACATGGAGCCCCGATGCAGGAGCAGCTGCGMCAGCTGCGCAGCAACGCCCATGCCOTTCACC 500

490

510

520

530

540

TTCCACTTCCCACCCAATTCGCCCAGCTCGGTCACGCTGCCCACCC^GACACACGCCM

550 560 570 580 600 590 AGCCCCTCGCCCTGCACTACACCATTCCCCCGTrTCTCGCC4CTCCGAGGACCAT

620 630 640 660 670 650 680 690 700 710 720 CGCCACCACAAACGAC4ACCTICAGCITCGM CAGAACCTCCACTATGCCCCCCTGAACItCCCOCCAGCGCTCAGCTCCCGCTCGTACCMGGGATTCACCTTCTCAACCCC 610

740 730 750 760 770 800 780 790 810 AAGATCACT CTGGA:TCACCCTCGACAGCGAGATCTACTATCACGGAAACCCC^GCCCG=CGTCAGCTGCICrCMI

830

860

970

980

CTC

870

880

990

1000

900 910 CArTAGCAGCTCCGCCAGCCTCAGCCAAGA

ATTCCCCTGGCAGCCMMMCCT 1090

TCTA

1210

CA

1020 _

1030

1040 1050 __

1060 lC

840 960

1070 1080 Cl:OTA

1220

1110

1230

1240

1350 1360 _ OTCATCCACCSM 1340

1370

1S80

1590

1700

1710

CAGGGCTCCTCT 1690

ACATA1AITCTC0T0CrTTCG0CTATCTAGA

To whom correspondence should be addressed

1250

1260

1270

1280

1290

1300

1310

1320

CGCC_ _ >AC

1450 1460 1470 1480 GC _GA"CGAAOCC ,GAT MAAGAC

*

1010 _

630

820

920 930 940 950 CG=CATUCACSC C

1120 1130 1140 1150 1160 1170 1180 1190 1200 G l:ATTCACOCCTCCTCCACCATI _lGla > C-CTCCCCAC; 1100

CCCTCGGGOOGAGATGCSAA 1330 _AAA

890

1600

1370

1380

TCTCS

1400

1390 _

1410

1420

1430

1440

1530

1560

_

W

1490

1500

1510

1520

1530

1540

1610

1620

1630

1640

1650

1660

1670 1680 CWGTOMCCGCGCTCOCC

Complete nucleotide sequence of a gene encoding the 70 kd heat shock protein of Mycobacterium paratuberculosis.

Nucleotide sequence of the Adh gene of Drosophila lebanonensis.

Nucleotide sequence of the Adh gene of Drosophila lebanonensis.

The nucleotide sequence of a Drosophila melanogaster enolase gene.

Nucleotide sequence of AMV-capsid protein-gene.

Nucleotide sequence of the protein D2 gene of Pseudomonas aeruginosa.

Nucleotide sequence of the tamarillo mosaic virus coat protein gene.

Nucleotide sequence of cDNA encoding the BYMV coat protein gene.

Nucleotide sequence of the Spiroplasma citri fibril protein gene.

Nucleotide sequence of the bean leafroll luteovirus coat protein gene.

Nucleotide sequence of a Neurospora crassa ribosomal protein gene.

The nucleotide sequence of the initiator tRNA from Drosophila melanogaster.

The nucleotide sequence of lysine tRNA2 from Drosophila.

Nucleotide sequence of the mouse preprosomatostatin gene.

Nucleotide sequence of the sheep MyoD1 gene.

Nucleotide sequence of the Salmonella serC gene.

Degenerating Y chromosome of Drosophila miranda: a trap for retrotransposons.

Nucleotide sequence of the gene coding for the major protein of hepatitis B virus surface antigen.

Nucleotide sequence of the gene for the major structural protein of SV40 virus.

Nucleotide sequence of the Yersinia enterocolitica ail gene and characterization of the Ail protein product.

The nucleotide sequence of the L10 equivalent ribosomal protein gene of Streptomyces antibioticus.

Nucleotide sequence of the gene (ard) encoding the antirestriction protein of plasmid colIb-P9.

Nucleotide sequence of the polyhedron envelope protein gene region of the Lymantria dispar nuclear polyhedrosis virus.

The nucleotide sequence of phenylalanine tRNA2 of Drosophila melanogaster: four isoacceptors with one basic sequence.