Nucleic Acids Research, Vol. 18, No. 7 1731

The chorion genes of the medfly, Ceratitis capitata, 1: structural and regulatory conservation of the s36 gene relative to two Drosophila species Mary Konsolaki1, Katia Komitopoulou2, Peter P.Tolias3, Dennis L.King3, Candace Swimmer3 and Fotis C.Kafatos1' 3 * 'institute of Molecular Biology and Biotechnology, Research Center of Crete, PO Box 1527 and Department of Biology, University of Crete, Heraclio 71110, Crete, Greece, 2Department of Biochemistry, Cell and Molecular Biology and Genetics, University of Athens, Panepistimiopolis, Kouponia, Athens 15701, Greece and 3Department of Cellular and Developmental Biology, The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA EMBL accession nos X51342, X51343

Received December 22, 1989; Revised and Accepted March 1, 1990

ABSTRACT We have used low stringency screening with the Drosophila melanogaster s36 chorion gene to recover its homologue from genomic and cDNA libraries of the medfly, Ceratitis capitata. The same gene has also been recovered from a genomic library of D. virilis. The medfly s36 gene shows similar developmental specificity as in Drosophila (early choriogenesis). It is also specifically amplified in ovarian follicles; this is the first report of chorion gene amplification outside the genus Drosophila. Alignments of s36 sequences from three species show that, in addition to its regulatory conservation, the s36 gene is extensively conserved in sequence, in a region corresponding to a central protein domain, and in short regions of 5' flanking DNA that might correspond to cis-regulatory elements. INTRODUCTION Our laboratories have undertaken recently a detailed study of molecular evolution of dipteran chorion genes. In Drosophila melanogaster, the major genes of this small family are organized in two clusters, one on the X and the other on the third chromosome (1,2,3). They are expressed with strict developmental specificity, exclusively in the follicular epithelium, within a limited period during chorion formation. Both clusters are differentially amplified in the follicle cells, ensuring production of adequate amounts of proteins in the short time available for choriogenesis (4). The structural and regulatory evolution of the autosomal cluster has been examined in four distantly related species of the genus Drosophila (5,6). The regulatory properties of this cluster have remained remarkably constant: gene amplification occurs, and each gene is expressed with similar temporal specificities in all four species. Furthermore, the overall organization of the cluster has remained constant, with the same four genes maintained in tandem, in the *

same order and similar spacing. In strong contrast, the transcription units show substantial diversification in sequence; however, in the 5' end of each gene and in the proximal 5' flanking DNA, islands of sequence conservation have been noted which may correspond to cis-regulatory genetic elements. In the present report, we extend these studies in two respects. First, we examine a chorion gene and its surrounding DNA from the X-linked rather than the autosomal cluster; preliminary crosshybridization studies had suggested that the X-linked cluster may be somewhat more conservative in terms of sequence evolution (J.C. Martfnez, pers. commun.). Second, we examine this gene both in two species of the genus Drosophila and in the Mediterranean fruitfly (medfly), Ceratitis capitata, a member of a different dipteran family (Tephritidae); this distant comparison tests the generality of the pattern of sequence conservation and divergence that is prevalent in the Drosophilidae.

MATERIALS AND METHODS Screening of libraries The medfly genomic library was a gift of M. Rina and C. Savakis (IMBB, Crete), and the D. virilis genomic library was prepared by Ron Blackman (Harvard University). The medfly cDNA library was constructed using total RNA from hand dissected ovaries, as described previously (7). All libraries were screened using standard procedures (8,9), at 50°C (medfly genomic library), 60°C (D. virilis genomic library), or 65°C (medfly cDNA library), respectively. All probes were gel purified and nick-translated (10), except for a synthetic oligonucleotide which was end-labelled (8). All restriction mapping and subcloning was done using standard procedures. Southern blot analysis was performed on Gene-Screen Plus nylon membrane (DuPont), using the hybridization conditions described in (9).

To whom correspondence should be addressed at Harvard University, 16 Divinity Avenue,

Cambridge,

MA 02138, USA

1732 Nucleic Acids Research, Vol. 18, No.7 Sequence analysis and aligmnents C. capitata cDNA and D. virilis genomic restriction fragments were subcloned into M13mpl8 and M13mpl9 vectors, and the DNA was sequenced using the chain termination procedure (11) with 35S-dATP (12) and Sequenase (13), in approximately 40 cm gradient gels. Overlapping restriction fragments of C. capita genomic DNA were subcloned into the pBluescriptLlKS + vector (Stratagene), and single-stranded DNA was prepared using R408 helper phage (14) and sequenced as above. Each strand was sequenced at least three times, and the data were analyzed using computer programs (15,16,17,18). For the protein alignments in Fig. 3, small deletions were allowed only when they resulted in two-way matches of at least two amino acids or three-way matches of at least one amino acid. The DNA alignments in Fig. 4 were established according to the following rules. Perfect matches of at least five, or five out of six, consecutive nucleotides were identified by computer and used as primary anchor points; these were extended to encompass contiguous two-way matches and boxed. Three-way matches of three nucleotides or three-way matches of four out of six nucleotides or two-way matches of nine nucleotides were used as secondary anchor points. Between anchor points insertions/deletions were positioned following the rules described in (5). RNA blot hybridization Total RNA was prepared using the protocol described in (7), from males, ovariectomized females, ovaries, and from follicles individually staged according to morphological criteria analogous to those used in Drosophila (19). After electrophoresis on 1.4% agarose gels, RNA was blotted onto nylon membrane (Bio-Rad Zetaprobe) by the aLkaline method (20), modified for RNA (gel was not acid treated prior to blotting and NaOH concentration was reduced to 25 mM). The filter was hybridized (9), with nicktranslated probe as above, at 65°C.

Amplification analysis Genomic DNAs from male flies, ovariectomized females and hand-dissected ovaries were prepared as described in (21). After restriction digestion with Hindffi, DNA was blotted on a nitrocellulose filter and hybridized as above, at 65°C. Both probes, the plasmid containing Ccs36 cDNA and a fragment of the C. capitata PS2ae integrin gene, were used simultaneously. RESULTS Conservation and diversification of the s36 gene in D. virilis, C. capita and D. melanogaster Genomic libraries of D. virilis and the medfly, C. capitata, constructed in phage vectors, were screened at low to moderate stringency for homologues of the single-copy s36 chorion gene of D. melanogaster. Several positive recombinant clones were isolated from each library, and subjected to Southern hybridization with the same probe. They defined a single locus in each species, as diagrammed in Fig. 1. In D. virilis polytene chromosomes, the locus maps to 13B-C (nomenclature as in ref. 22), in the chromosome equivalent to the X chromosome of D. melanogaster. A 1.6 kb Bgfll-XbaI segment of the D. virilis DNA was selected by cross-hybridization to the D. melanogaster probe, subcloned into M13 vectors and sequenced by the dideoxynucleotide procedure. Fig. 2 shows the nucleotide sequence, spanning positions -228 to + 1400 relative to the transcription start site,

C. capitata 0

1

2

A

5

4

3

C CE

6

kb

K

Genomic H E I

K

K

BE

_mm~~~ K

.......Genomic

1

K

0_-AAAAAA

cDNA

D. viril is o

E

1

2

4

3

O

6

6

X

7

g Eg BoE

| x,"

r|

kb

Gnoic

X

Figure 1. Structure and partial restriction map of genomic and cDNA clones encompassing the Ccs36 and Dvs36 genes. At the top of each panel is a length scale with the coding parts of the exons indicated by filled boxes. Transcription is from left to right. Below, genomic clones are diagrammed, and their sequenced regions are shaded. In the upper panel, the cDNA sequence is also diagrammed with shaded boxes; it extends 19 nucleotides downstream of the polyadenylation signal (AATAAA, Fig. 2.), after which an approximately 90-100 nucleotide long poly-A tail is found. Open boxes underneath the cDNA clone represent the 40-mer synthetic oligonucleotide that was used to map the 5' end in a genomic clone, and the 500 base pair KpnI fragment that was used as probe for the RNA blot. Restriction sites shown are: AccI (A), BamHI (B), BgIlI (Bg), ClaI (C), EcoRl (E), HindIl (H), K4nl (K), Sail (S), XbaI (X).

detemined by sequence homology to the D. melanogaster s36 Similarly, the same probe was used to select by crosshybridization a 1.6 kb EcoRI genomic DNA fragment from the C. capitata locus (Fig. 1). That fragment was used as a homospecific probe to select from a medfly ovarian cDNA library an s36 cDNA clone, which proved to be full-length (see below). The cDNA sequence was determined, and a synthetic 40-mer oligonucleotide corresponding to the 5' end was used to locate the 5' end of the s36 gene plus upstream sequences in genomic DNA. The sequence of a 1523 bp of C. capitata genomic DNA, from -623 to +900, was then determined. The cDNA and genomic sequences together define the sequence of the medfly locus, which is also presented in Fig. 2; 742 bp of that sequence was obtained from both cDNA and genomic DNA, which were identical. We have also sequenced independently the s36 gene of D. melanogaster and obtained a sequence identical to that reported in (23), from 642 to 2544 (their numbering). Conceptual translation of the sequences and comparison of the encoded proteins showed remarkable similarities in all three species (Fig. 3, and see Discussion). These similarities, and the single-copy nature of the DNA, confirmed that the D. virilis and C. capitata transcription units are orthologous and correspond to the D. melanogaster s36 chorion gene. We shall refer to these genes as Dvs36, Ccs36 and Dms36, respectively, and the encoded proteins as Dvs36, Ccs36 and Dms36. as

gene.

Nucleic Acids Research, Vol. 18, No. 7 1733 -623

ATCGATTATT GCCAGCGCCA GATATAAGCG ATTTAAGCTA AGAAAACGCA TTAAGATGCA AAACGATAAA GTGCGATCAG TAATTCAAAA CCTTACAGAA GAGCAATCTA -513

TOGTTTTGTG CGCAGCTTAA TGAAGGCAGO AAGTATOTGG TTACATCAAA ACAATTCCCA TACATTAGTG AGTTGATTGA GCTTGGTGTG TTGAACAAAA CTTTTTCCCG -403 ATGGAATGGG AAGCATATAT TATTCCCTAT TGAGGATATT TACTGGACGT AATTAGTTGC CAGCTATGAT CCATATAATA TTGAGATAAA GCCAAGGCCA ATATCTAAGT

-293

AACTAGATAA GAGGAATCGA TGCACATTAA TTGCTAGCGA AAATGCAAGA GCAAAGACGA AAACATGCCA CACATGAGGA ATACCGATTC TCTCATTAAC ATATTCAGGC

-163

CAGTTATCTG GGCTTAAAAG CAGAAGTCCA ACCCAGATAA CGATCATATA CATGGTTCTC TCCAGAGGTT CATTACTGAA CACTCGTCCG AGAATAACGA GTGGATCTGG -73 re GAATTCCCGG ATCACGTACG TAAGAACTCA GTGTAGCACA CTTATATAAA AGCTTCAATA ACATCTCCGG TACAAGTAGT ATTACGACCA ACTTCAGCAG ACAGTCCATC ......

+3,

GGCAGTAACA GATCAGAGAG ATCGTCATAA A ATG AAC TGT TTT CTA TTC ACA CTT TTC TTC GTT GCC GCA CCG GTACGTG TACTCAATAT AAATACAAAC Met Asn Cys Ph. LOu Phe Thr Lou Ph. Phe Val Ala Ala Pro

+138

ACATTCATTG AGAATCCGAA AAGAAAAATA AAGATTAGAA +248 CTTTCTTTCG TCTTTATAAA G CTC GCA ACA GCC TCT Lou Ala Thr Ala Ser +341 TCO AAC GGT CTC GAT GAA TTA GTA CAG GCT GCT Ser An Gly Lou Asp Glu Lou Val Gln Ala Ala +431 GTC TCA CCC GCC GAG GTG GCT CGC CTC AAT CAA Val Sor Pro Ala Glu Val Ala Arg Lou Asn Gln +521 TCT GAT GCC ATT GCC GAA TCT TTG GCC GAA TCC Ser Asp Ala I1e Ala Glu Sor LOu Ala Glu Ser

+611

ATTAAMCGC ACACACTCAC TTACACAGCG CATTGCATCA AAGAAGTTCC AATACTAATT GTAAATGTTT TAC GGC TCT TCT TCC GGT GGC GGT GGT GGT GGC TCA TCG TAT TTA TCC TCT GCC TCA Tyr Gly Ser Ser Sor Gly Gly Gly Gly Gly Gly Ser Ser Tyr Lou Ser Ser Ala Ser

GCT GGT GGT GCA CAA CAA GCT GGC GGT ACC ATT ACG CCC GCT AAT GCT GAA ATA CCC

Ala Gly Gly Ala Gln Gln Ala Gly Gly Thr Ile Thr Pro Ala Asn Ala Glu Ile Pro

GTG CAA GCT CAA TTG CAG GCA CTC AAC TCC AAC CCA GTT TAT CGC AAT TTG AAG AAC Val Gln Ala Gln Lou Gln Ala Lou AJn Ser Asn Pro Val Tyr Arg Amn Lou Lys Asn AGC TTG GCT TCG AAA ATC CGC CAG GGT AAC ATC AAT ATT GTT GCG CCC AAT GTA ATC Sor Lou Ala Ser Lys Ile Arg Gln Gly Asn Ile Asn Ile Val Ala Pro Asn Val Ile

GAT CAG GGA GTT TAC CGT TCA CTC TTG GTG CCA TCT GGC CAA AAC AAC CAT CAA GTG ATC GCA ACA CAA CCT CTG CCA CCA ATC ATT GTC

Asp Gln Gly Val Tyr Arg Ser Lou Lou Val Pro Ser Gly Gln Asn Asn His Gln Val I1Q Ala Thr Gln Pro Leu Pro Pro Ile Ile Val +701 AAC CAG CCA GCT TTG CCA CCC ACT CAA ATT GGT GOT Asn Gln Pro Ala Leu Pro Pro Thr Gln Ile Gly Gly +791 GTA ATC TAT CAA CAA GAA GTG ATC AAC AAA GTG CCT Val Ile Tyr Gln Gln Glu Val Ile Asn Lys Val Pro

+081

GGA CCA GCT GCT GTG GTC AAG GCT GCG CCT GTC ATC TAC AAG ATC AAA CCA TCT

Gly Pro Ala Ala Val Val Lys Ala Ala Pro Val I1Q Tyr Lys Ile Lys Pro Ser ACT CCA CTC AGT TTG AAC CCA GTT TAC GTT AAG GTT TAC AAA CCA GGC AAG AAG Thr Pro Lou Ser Lou Asn Pro Val Tyr Val Lys Val Tyr Lys Pro Gly Lys Lys

ATC GAT GCT CCT TTG GTA CCT GGT GTA CAA CAG AAC TAC CAG GCT CCT TCA TAT GGT GGC TCA TCA TAC TCC GCA CCA GCT GCT TCT TAT Ile Asp Ala Pro Lou Val Pro Gly Val Gln Gln Asn Tyr Gln Ala Pro Ser Tyr Gly Gly Ser Sor Tyr Ser Ala Pro Ala Ala Ser Tyr +971

GAA CCA GCA CCC GCA CCA TCT TAC AGT GCT GCT CCT GCT CAA TCT TAC AAT GCA GCG CCT GCT CCA TCT TAT AGT GCT GCT CCT GCT GCT +1061 TCT TAT GGC GCT GCT CCT TCT GCA TCT TAT GAC GCT GCG CCT GCT GCT TCT TAT GGC GCT GAA TCA TCC TAC GGT TCG CCT CAA AGC AGC Sor Tyr Gly Ala Ala Pro Sor Ala Ser Tyr ASp Ala Ala Pro Ala Ala Ser Tyr Gly Ala Glu Sor Ser Tyr Gly Ser Pro Gln Ser Ser

Glu Pro Ala Pro Ala Pro Sor Tyr Ser Ala Ala Pro Ala Gln Ser Tyr Asn Ala Ala Pro Ala Pro Ser Tyr Ser Ala Ala Pro Ala Ala

+1151 AGC AGC TAT GGC AGC GCT CCA CCA GCA TCG GGC TAC TAA AT TTAGTGTAGC TACTTGCAAA AGCATAGCAT AGTATTTTTT CAGTAAATTT TTTTTTACAT Ser Ser Tyr Gly Ser Ala Pro Pro Ala Ser Gly Tyr End

+1252

GTTTTGTACA AACATTTGTC CTCTTAGACA AAGAAAAATA AAAACAAATT TTTCGCACCA G

C. capitata s36

-228 AGATCTTGAG AAGTGCAACA AGAACAAGAG AGTCAGAAGA CATTGTCGTC GTCGTCGTCG CTGCAGTGCG ACAAAAATTT GCAAATCGGA AATGTAAATC GGTGGCAATG

-116 -s

GCATCGCAAA GATCACGTAG CCGGGGCCGT GGTGGTGTCG GCAACGTGAG CGCGATATAA ATGCGGCCCA ACGGTGCCAT AGGCCGCTAT AAAGAGCTAG CACCGTGGCC

ATCCAACGAG CAGTAGACAT CTGAGTGCTG ACACATCACG CATCCACACAAA ATG CAA CTT GGT CTC TGG TTT GGG CTT TTC GCC GTC GCC GCC GCA CCG Met Gln Leu Gly Leu Trp Ph. Gly Lou Phe Ala Val Ala Ala Ala Pro +93

GTACGTTACC AGAGGCCAAC ACGACACACC ACGACACTGA TCTCCGATAA TCTCCGCCAA CAGACTAATC ATGTCAATCA AAATTCTGTT CTATTCTGTT CTGTAG CTG Leu +202

OTG AOC GCT AAT TAC GGT TCG CCA GGT AGG GGT GCT GGC COT CAG ATA CAG TAT CTG CCC GGT GGT CCA TCG GAA GGC TTG GAG GAG TAT Val Ser Ala Asn Tyr Gly Sor Pro Gly Arg Gly Ala Gly Arg Gln Ile Gln Tyr Lou Pro Gly Gly Pro Sor Glu Gly LoU Glu Glu Tyr +292 GTG AAT Val Asn +382 GGA CGC Glv Arg

GCT GCG ACC GGC GGC TCA CAG CCA TCA GCC AAT CAG CTG ACG GCA CAG GCT GAG ATT CAG ATC GTC GGA GAG GCA CGT CGT CTG Ala Ala Thr Gly Gly sor Gln Pro sor Ala Asn Gln Lou Thr Ala Gln Ala Glu Ile Gln Iie Val Gly Glu Ala Arg Arg Leu

GTC CAG GCT CAG CTG CAG GCA CTG AAC AAC Val Gln Ala Gln Leu Gln Ala Lou Asn Asn +472 GAG ACC AAT CTG GCT AGC AAC ATC CGT CAG GGC AAG Glu Thr Asn Lou Ala Ser Asn Ile Arg Gln Gly Lys +562 GTG CCA TCG GGC CAC AAC AAC CAT CAG GTG ATC GCC

AAT CCC ACC TAC CAG AAG CTG AAG AAC TCC GAG GAT ATT GCC GAA TCG CTG GCT Asn Pro Thr Tyr Gln Lys Lou Lys Asn Ser Glu Asp Ile Ala Glu Ser Lou Ala ATC AAT GTG GTG TCG CCC CAG TTT GTC GAT CAG CAT CTG TTC CGT TCC CTG TTG Ile Asn Val Val Ser Pro Gln Phe Val Asp Gln His Leu Ph. Arg Ser Leu Lou

ACC CAA CCC CTG CCG CCA ATC ATT GTG CAC CAG CCC GGC GCA CCA CCC GCC CAT Val Pro Sor Gly His Asn Asn His Gln Val Il. Ala Thr Gln Pro Leu Pro Pro Ile Ile Val His Gln Pro Gly Ala Pro Pro Ala His

+652 GTG AAC AGC GGC CCA CCG ACC GTG GTG CGC GGC AAT CCT GTC ATC TAC AAG ATC AAG CCC TCG GTC ATC TAC CAA CAG GAG GTG ATC AAC Val Asn Ser Gly Pro Pro Thr Val Val Arg Gly Asn Pro Val Il. Tyr Lys Ile Lys Pro Ser Val Ile Tyr Gln Gln Glu Val Il. Asn +742

AAG GTG CCA Lvs Val Pro +832 GCG CCC GTC Ala Pro Val +922 TCT GGC AGC

ACA CCA CTG AGC TTA AAC CCC GTC TAC GTC AAG GTC TAC AAG CCG GGC AAG AAA ATC GAG GCC CCA CTG GCC CCA GTC GTT Thr Pro Leu Sor Lou Asn Pro Val Tyr Val Lys Val Tyr Lys Pro Gly Lys Lys Ile Glu Ala Pro Leu Ala Pro Val Val

TAC AGC CAG CCT CAG TCC TAT GGC CAG CCC CAG GCT TAC AAC CAG CCT CAG GCC TAC AGC CAA CCT CAG TCA TAT GGC AAC Gly Gln Pro Gln Ala Tyr Asn Gln Pro Gln Ala Tyr Ser Gln Pro Gln Ser Tyr Gly Asn

Tyr Sor Gln Pro Gln Ser Tyr

TCC GGC Ser Gly Ser Ser Gly +1012 TAT GGC TCC CCC AGC Tyr Gly Sor Pro Ser +1117

GCT GGC AAC TCC GGT CCC AGT TCC GAT AGC TAT GCC GCC GGC GCT GAA ACC CCC CTC TAC GCC AGC CCA GCT CCC Ala Gly Asn Ser Gly Pro Ser Ser Asp Seo Tyr Ala Ala Gly Ala Glu Thr Pro Leu Tyr Ala Ser Pro Ala Pro TAC TAA GCC AGATGCAATCT CGGCACTTGC AGCCTGTGTC GCTGCCAGCA CAGCAGTCGA GCAGTCCAGA

Tyr End

GCCCAGCTGA AGGGCGTTTA

......

AATTAACAAC TTTGTTTTTT TTTTTCTAAA TGCAAATAAA AAAAM

A AAAGCCCAAA ATACCTATAG ATATTGTGTT TACTGTACTG CCACGCCTCT CCCATCTAGC +1227 ATATGACTCT ATCATTCGCA TTGAGTTTTC CCCTCATTGA GTTTTCCCCA ACATACAATT TATAATACTT ACCTGTCGCG TGTTAATAAC ATGTACTTTT ATGCAGCACA +1337 D. vi s3 6 TGAGMATGCT TAOGGOTGGGC GTMATTGCGT AGATCGTACC ATATCATTAT TATCATTTTC TAGA

ril.1is

Figure 2. Complete nucleotide and deduced protein sequences of the Ccs36 (upper panel) and Dvs36 (lower panel) loci. Arrows indicate transcription start sites, identified by genomic-cDNA sequence comparisons in the case of C. capitata and cap site homologies to D. melanogaster in the case of D. virilis (see text). Dots indicate putative TATA boxes at the 5' end and putative polyadenylation signals at the 3' end of both sequences. Asterisks indicate the TCACGT hexamer. The Dvs36 sequence is from genomic DNA. For Ccs36, the 5' flanking and intron sequences are from genomic DNA, and the exons are from the cDNA clone. The entire first exon and 631 nucleotides of the second exon were also sequenced from genomic DNA, and were identical to the cDNA.

1734 Nucleic Acids Research, Vol. 18, No.7 Ccs36 Dvs36 Dms36

---- s S sa s sisggs g

LFfV-Mt- s5

cf nf-

-.

nd-.E-

V

a.

G -n,EEY-v S e-,,E-4Y..NA.t9A;sn iari--.pgP ~ ~ A-APtS eQLG;WGLFAVAVS0S .. OVS*pag;9hi9hihhghghQ :ESG-nGEYNssn L iQGWGilAl

11

80

Ccs36 Dvs36 Dms3 6

55

QAggtITpAnA..;pvsP

57

psANQlT-A

60

QAANQIa-sQAQptP

Ccs36 Dvs36 Dms36

115 115 119

I4

.MQivg

la

?T ?T 219

Ccs36 Dvs36 Dms3 6

175 175 179

Ccs36 Dvs36 Dms36

235 235 239

qSYGqpqaYnqPqAYSQsYGnsGssgAgnsgpSS-DSYAAG~aEtPhtaSP

Ccs36

289

daapaasygaessygspqssssygsappasgy

-SYG-gssYsaPaAsy apapSysAApAqsynAapApSYsAapaAs-,ga- sa--Sx YGsPS re------------YS QQgYGSaGAAssaagaASSADgnAyGnEAPL:.fnSPAPYGqPnl

Figure 3. Alignment of the deduced s36 protein sequences. Upper case letters indicate amino acids that are identical in at least two species. Residues identical in all three species are shaded. Dashes correspond to putative deletions and dots above the sequences mark spaces of ten aligned positions. The relative numbering of amino acids appears on the left of the sequences. Note that the 139 amino acid region between residues 80 and 219 in C. capitata, shows no insertions/deletions, but includes long segments that are identical in all three species (5 to 40 residues long). Note also that the C-terminal portion of the proteins is highly divergent and is enriched in tyrosines. The species codes for the homologous s36 proteins are: Ccs36, C. capitata; Dvs36, D. virilis; and Dms36, D. melanogaster.

Ccs36 Dvs36 Dms36

-87 -228 -132

Ccs36 Dvs36 Dms36

-70 -136 -79

GZA.......................................... GCAAA.TCGGAA .G..G--49bp--gcaGtGcGACAAAaATtt... atgGcGaGACAAAgATgcggcGCAAAaTCGGAA

acgagt

ctctac

Dc

.AcG

..

ttCc .. ATGtAaATcggtggcaatggcatcgCaz

:CgTGGtGGTGTCGG

ATGgAgAT....................

:CaTGGcGG.....G

.....................

.

431 TATA

Ccs36 Dvs36 Dms36

-49

Ccs36 Dvs36 Dms36

-19 -19 -19

tcAatAaiA-CtC..CgGt

-76

.AACt

CA

-46

....

.

GCGATt ......A..G.CG.

tacgACcMCgTtCA .GCAGAAGtcCATC.

.AGcAcigTg

.

.Th-...

caCaCtte

CgtgagcGCGATataaatgcggcccaaCgGTGccATAGGCCGC.. ......

.

CaGTG..

a..A.ATTgagtGCtGACA...CATCa

AtcC G gGagt..:XlCGCCAG

a. . ...

Ccs36

+38

gGCAgtaacagatcagagagaTCg

Dvs36

+32

cGCA ...

Dms36

+29

.........

TCc

g

caCAaCAd.GtAACgG

ACA

....

Figure 4. Sequence comparison of the 5' flanking (untranscribed) and untranslated regions of the s36 gene. All sequences end at the putative start of translation. Elements that consist of at least five consecutive or five out of six consecutive invariant nucleotides are boxed and served anchor points for the alignment of the rest of the sequences (see text). Invariant bases in all three species are shaded and invariant bases in at least two species are shown in upper case letters. Dots correspond to putative deletions and are positioned following the rules described in the text. Asterisks indicate the TCACGT hexamer and the numbers in this box correspond to the relative positions in C. capitata (upper) and D. melanogaster (lower). Numbers to the left of the sequences are relative to the start of transcription which is indicated by a black arrow. White arrows delimit a DNA fragment which is sufficient to support s36-like expression of a lacZ reporter gene in transgenic D. melanogaster (40). The species codes for the homologous s36 genes are: Ccs36, C. capitata; Dvs36, D. virilis; and Dms36, D. melanogaster. as

Nucleic Acids Research, Vol. 18, No. 7 1735

Ov.

V

1-9 10

11

12 13 14

Figure 5. RNA blot hybridization analysis of Ccs36, showing the tissue and temporal specificity of expression of the gene. Total RNA was prepared from ovaries (ov.), ovariectomized females (9), males (o') and individually staged follicles. The KpnI fragment that was used as a homospecific probe is described in Fig. 1. Note that expression in staged follicles which is restricted to stages 12 and 13, is comparable to the developmental profile observed in the two Drosophila species (see text).

ov'

Ccs36

Figure 6. Amplification of the Ccs36 chorion locus. Genomic DNA was isolated from males (a), ovariectormized females ( 9 ) and ovaries (ov.), restricted with HindIll, fractionated by electrophoresis, blotted on nitrocellulose and hybridized. Equal amounts of DNA were loaded in all three lanes and two probes of different specific activity were used simulataneously: the plasmid containing Ccs36 cDNA hybridized to a 9 kb band that was specifically amplified in the ovaries, and the Hindll-XwoI fragment from PS2ae hybridized to a 5.5 kb band that is equally intense in all three lanes. A double ca. 3 kb, unamplified band that also appears on the blot, is discussed in the text.

Despite the protein similarities, the non-coding DNA shows substantial divergence, with only a few islands of conserved sequence elements (Fig. 4). In all three species the s36 gene has a single intron near the 5' end, preceded only by 14 to 16 codons corresponding to part of the signal peptide sequence, plus 31 to 68 nt of 5' untranslated region. In all three species the cap site sequence is similar (AAGTAGTA); it has been identified by primer extension in D. melanogaster (24), by homology in D. virilis, and in C. capitata by its presence in the cDNA sequence, immediately preceded by a G which is not encoded in the DNA

but results from attempted reverse transcription of the methylated cap (N. Brown, pers. commun.). The cap site is preceded in the DNA, at a canonical distance, by a TATA box; in D. virilis a second TATA box is found 26 nt farther upstream. Most interestingly, all three species possess nearby a single copy of the TCACGT hexamer (-69 in D. melanogaster, -106 in D. virilis and -62 in C. capitata). This motif is found in all chorion genes characterized to date in four widely divergent Drosophila species (25,26,23,5,6), as well as in most moth chorion promoters (27,28,29,30). A more detailed analysis of the s36 DNA sequence evolution is presented in the Discussion.

Conservatism of developmental regulation of the s36 gene The conservatism of a major portion of the C. capitata s36 protein, relative to Drosophila, contrasting with the substantial diversification of the 5' flanking DNA, raised the question of how extensively the regulation of s36 was conserved during evolution. A key aspect of chorion gene regulation is the extreme tissue and temporal specificity of expression into mRNA. This specificity is very similar in D. melanogaster and D. virilis: in ooth species the s36 gene is expressed only in ovarian follicles, during the early stages of chorion formation (stages 11-13 of oogenesis; ref. 1,6). To determine the developmental profile of its expression in C. capitata, we assayed for s36 transcripts in RNA from individually staged follicles, by electrophoresis and blot hybridization using a 0.5 kb KpnI fragment of the Ccs36 coding region as probe. As shown in Fig. 5, in this species also, transcripts are not detected in adult males or females without ovaries, and expression is strictly limited to early choriogenesis: Ccs36 transcripts are absent from prechoriogenic follicles, reach maximum abundance at stage 12, and decline steeply at stage 13. The staging criteria used for different species are consistent, although the exact correspondence cannot be established, because of some differences in morphological markers of developing follicles. Thus, temporal as well as tissue specificity of s36 expression is essentially indistinguishable between medfly and the two Drosophila species. A second key feature of chorion regulation is the differential amplification of all major chorion genes, exclusively in the follicular epithelial cells, shortly before and during choriogenesis (4). Amplification of the autosomal chorion cluster occurs in all Drosophila species examined to date, including both D. melanogaster and D. virilis (5). Similarly, we have shown that the s36 gene is amplified in D. virilis as well as in D. melanogaster (data not shown). To examine whether chorion genes are amplified in the medfly, HindIlI digested genomic DNAs from males, ovariectomized females and whole ovaries were electrophoresed and blotted on a nitrocellulose filter; the filter was then probed with the plasmid containing the Ccs36 cDNA and a single copy control probe, a HindlI-XhoI fragment of the medfly PS2ce integrin gene (31). Results are shown in Fig. 6. A 9 kb band, encompassing Ccs36, was amplified at least tenfold in ovarian DNA, as compared to DNA from males and ovariectomized females. The specificity of amplification was confirmed by the constancy of the 5.5 kb PS2cx band, which also verified the equal loading of DNA in different lanes. An unamplified 3 kb band was also observed; it hybridized weakly with the chorion probe, and may correspond to a duplicated, nonamplifying minor chorion gene, or alternatively may result from fortuitous sequence similarity.

1736 Nucleic Acids Research, Vol. 18, No.7

DISCUSSION The s36 protein and its conservation We have compared the s36 chorion gene in two Drosophila species, belonging to different subgenera (Sophophora for D. melanogaster; Drosophila forD. virilis), as well as in the medfly, a member of a different dipteran family. The Drosophila subgenera are thought to have separated 50 to 80 million years ago (32,5) while the families of Drosophila and Ceratitis are estimated to have separated approximately 120 million years ago (32). Unlike the autosomal chorion genes examined in four Drosophila species, the s36 gene in Ceratitis and two Drosophila shows high conservation in the coding sequence, and low conservation in non-coding regions, including the proximal 5' flanking DNA. In 225 aligned positions (excluding the C-terminal end, which is of variable length), the medfly protein, Ccs36, is 61 and 64% identical to Dms36 and Dvs36 respectively; the latter two are 84% identical to each other. This degree of conservation far exceeds the previously documented similarities of autosomal chorion proteins in four Drosophila species (5,6; Martinez et al. in preparation). Presumably the conservation of s36 reflects some important role in the chorion. Indeed, in D. melanogaster two mutations that abolish or alter s36 protein synthesis result in major disruption of chorion morphogenesis, even though all the other chorion proteins are produced normally (33,34). Conservation of the s36 protein is particularly notable in a region between residues 80 and 219 in C. capitata (Fig. 3). In this central 139 amino acid region the three sequences show 76 to 97% identities; there are no insertions or deletions, and 76% of the residues are invariant, forming several exceptionally long blocks that are perfectly conserved in all three species (note blocks of 40, 16, 9 and 7 invariant residues in Fig. 3). The amino terminal region (1-79 in C. capitata), including the signal peptide, is less extensively conserved: the Ccs36-encoded protein is 47 and 43% identical to the Dvs36 and Dms36 sequences, respectively, and each of the three sequences shows small gaps relative to the other two. Finally, the carboxyl terminal region (220-320 in C capitat) is the most variable: here the sequences can only be aligned to ca 23-54% identity, and are significantly different in length, (101 residues in Ccs36, 74 in Dvs36 and 63 in Dms36). Because of the differences in the N-terminal and Cterminal regions, the three proteins are somewhat different in size: 320, 293 and 286 amino acids for Ccs36, Dvs36 and Dms36, respectively. Interestingly, secondary structure predictions suggest that the Dms36 protein consists of a highly structured central domain (residues 99 to 222 in that sequence), flanked by arms that may be less structured and may participate in protein-protein interactions (35). Based on the three-species comparisons, we would extend the central domain only slightly, to encompass residues 84 through 223 in Dms36. This region is predicted as a series of ai-helical segments followed by fl-sheet strands (35). It appears that the arms, especially the C-terminal one, are evolutionarily more variable as well as less structured than the s36 central domain. Despite substantial differences in primary sequences, a similar structural pattern is seen in moth chorion proteins, which are also tripartite and show a highly structured and extensively conserved central domain. An additional similarity between moth and fly chorion components is the terminal distribution of residues that are thought to serve in crosslinking: the anns of the moth proteins are enriched in

cysteines (36), and the arms of the fly s36 proteins (especially the C-terminal arm) are similarly enriched in tyrosines, which form di-tyrosine and tri-tyrosine crosslinks in the mature chorion (37,38). The C-terminal arm is also enriched in alanine, and in Dvs36 and Ccs36 also in serine.

Conserved regulatory properties and the s36 DNA sequences We have established that the s36 gene is differentially amplified in the ovaries of C. capitata. This is the first demonstration of chorion gene amplification outside the genus Drosophila; it does not occur in silkmoths, where it would be superfluous because of the high multiplicity of chorion genes in the haploid genome and the ten-fold longer duration of choriogenesis (39). Similarly, we have established that the s36 gene of C. capitata is expressed during early choriogenesis, with a temporal specificity comparable to that in Drosophila species. Thus, the regulatory properties of s36 have remained constant for approximately 120 million years, since the last common ancestor of drosophilid and tephritid flies, even though the morphology of the chorion underwent considerable changes (38). We have compared the available non-coding (transcribed as well as non-transcribed) DNA sequences of s36 in the medfly and two Drosophila species. As previously observed for the autosomal chorion genes in four Drosophila species (5,6), the intron and 3' untranslated sequences of s36 have been effectively randomized (results not shown). The most interesting results were obtained from sequence comparisons in the 5' end of the gene. Here, sequence conservation is not extensive. However, five elements are evident in all three species, each showing conservation in at least five out of six contiguous nucleotide positions (see boxes in Fig. 4). The two elements farthest downstream may be related to translational initiation and the specification of the cap site. A third element encompasses the TATA box, and a fourth spans the chorion-specific TCACGT hexamer. The fifth element, farthest upstream, is aGATCTgG. The fourth and fifti elements are almost contiguous in C. capitata, but in Drosophila are separated by DNA of variable length (much longer in D. virilis) and considerable sequence similarity. Conversely, Ccs36 has the longest 5' untranslated region.

The conservation of these elements is of special interest, because of the documented conservation of tissue and temporallyspecific expression patterns of the gene in all three species. Indeed, it has been shown by transformation that a Dms36 DNA fragment only 84 bp long (- 132/ -49; see white arrows in Fig. 4) is sufficient to specify s36-specific expression of an attached hsp70/lacZ reporter gene (40). It would be interesting to determine by transfornation experiments to what extent the corresponding region of the Ceratitis homologue, which is only 38 bp long but encompasses both the aGATCTgG and the TCACGT elements, can specify s36-like expression in transgenic Drosophila.

ACKNOWLEDGEMENTS We are grateful to N. Brown for guidance in constructing the cDNA library, C. Savakis and M. Rina (IMBB, Crete) for the Ceratitis genomic library, R. Blackman for the D. virlis genomic library, and M. Frisardi for in situ hybridization analysis of the Dvs36 gene. We thank E. Fenerjian for secretarial assistance, Marie Yuk-See for graphics and B. Klumpar for photography.

Nucleic Acids Research, Vol. 18, No. 7 1737 This work was supported by grants from ACS and USDA (F.C.K.) and from the Greek General Secretariat for Research and Technology (K.K. and F.C.K.).

REFERENCES 1. Spradling, A.C. (1981) Cell 27, 193-201. 2. Griffin-Shea, R., Thireos, G. and Kafatos, F.C. (1982) Dev. Biol. 91, 325-336. 3. Parks, S., Wakimoto, B. and Spradling, A.C. (1986) Dev. Biol. 117, 294-305. 4. Spradling, A.C. and Mahowald, A.P. (1980) Proc. Natl. Acad. Sci. USA 77, 1096-1100. 5. Martinez-Cruzado, J.C., Swimmer, C., Fenerjian, M.G. and Kafatos, F.C. (1988) Genetics 119, 663-677. 6. Fenerjian, M.G., Martinez-Cruzado, J.C., Swimmer, C., King, D. and Kafatos, F.C. (1989) J. Mol. Evol. 29, 108-125. 7. Brown, N.H. and Kafatos, F.C. (1988) J. Mol. Biol. 203, 425-437. 8. Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning: A Laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor. 9. Church, G.M. and Gilbert, W. (1984) Proc. Nad. Acad. Sci. USA 81, 1991-1995. 10. Rigby, P.W., Dieckman, M., Rhodes, C. and Berg, P. (1977) J. Mol. Biol. 113, 237-248. 11. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Nail. Acad. Sci. USA 12, 5463-5467. 12. Biggen, M.D., Gibson, T.J. and Hong, G.F. (1983) Proc. Nail. Acad. Sci. USA 80, 3963-3965. 13. Tabor, S. and Richardson, C.C. (1987) Proc. Natl. Acad. Sci. USA 84, 4767-4771. 14. Russel, M., Kidd, S. and Kelley, M.R. (1986) Gene 45, 333-338. 15. Staden, R. (1982) Nucleic Acids Res. 10, 4731-4751. 16. Staden, R. (1982) Nucleic Acids Res. 12, 399-504. 17. Pustell, J. and Kafatos, F.C. (1982) Nucleic Acids Res. 10, 4765-4782. 18. Pustell, J. and Kafatos, F.C. (1984) Nucleic Acids Res. 12, 643-655. 19. King, R.C. (1970) Ovarian Development in Drosophila melanogaster. Academic Press, New York. 20. Reed, K.C. and Mann, D.A. (1985) Nucleic Acids Res. 13, 7207-7221. 21. Delidakis, C. and Kafatos, F.C. (1987) J. Mol. Biol. 197, 11-26. 22. Gubenko, I.S. and Evgen'ev, M.B. (1984) Genetica 65, 127-139. 23. Sprdling, A.C., DeCicco, D.V., Wakimoto, B.T., Levine, J.F., Kalfayan, L.J. and Cooley, L. (1987) EMBO J. 6, 1045-1053. 24. Romano, C.P., Bienz-Tadmor, B., Mariani, B.D. and Kafatos, F.C. (1988) EMBO J. 7, 783-790. 25. Wong, Y.-C., Pustell, J., Spoerel, N. and Kafatos, F.C. (1985) Chromosoma 92, 124-135. 26. Levine, J. and Spradling, A. (1985) Chromosoma 92, 136-142. 27. Spoerel, N., Nguyen, H.T. and Kafatos, F.C. (1986) J. Mol. Biol. 190, 23-35. 28. Regier, J.C., Hatzopoulos, A.K. and Durot, A.C. (1986) Dev. Biol. 118, 432-441. 29. Hibner, B.L., Burke, W.D., Lecanidou, R., Rodalds, G.C. and Eickbush, T.H. (1988) Dev. Biol. 125, 423-431. 30. Mitsialis, S.A., Veletza, S. and Kafatos, F.C. (1989) J. Mol Evol. 29, 486-495. 31. Brown, N.H., King, D.L., Wilcox, M. and Kafatos, F.C. (1989) Cell 59, 185- 195. 32. Beverley, S.M. and Wilson, A.C. (1984) J. Mol. Evol. 21, 1-13. 33. Digan, M.E., Spradling, A.C., Waring, G.L. and Mahowald, A.P. (1979) In Axel, R., Maniatis, T. and Fox, C.F. (eds), Eucaryotic Gene Regulation. ICN-UCLA Symposium. Academic Press, New York, pp. 171-181. 34. Komitopoulou, K. (1982) PhD Thesis, University of Athens, Greece. 35. Hamodrakas, S.J., Batrinou, A. and Christophoratou, T. (1989) Int. J. Biol. Macromol. 11, 307-313. 36. Regier, J.C. and Kafatos, F.C. (1985) In Kerkut, G.A. and Gilbert, L.I. (eds), Emnbryogenesis and Reproduction. Comprehensive nsect Physiology, Biochemistry and Pharmacology. Pergamon Press, Oxford, Vol. 1, pp. 113-151. 37. Petri, W.H., Wyman, A.R. and Kafatos, F.C. (1976) Dev. Biol. 49, 185-199. 38. Margaritis, L.H. (1985) Can. J. Zool. 63, 2194-2206. 39. Hatnopoulos, A.K. and Regier, J.C. (1986) Mol. and Cell. Biol. 6, 3215-3220. 40. Tolias, P.P. and Kafatos, F.C. (1990) EMBO J. In Press.

The chorion genes of the medfly, Ceratitis capitata, I: Structural and regulatory conservation of the s36 gene relative to two Drosophila species.

We have used low stringency screening with the Drosophila melanogaster s36 chorion gene to recover its homologue from genomic and cDNA libraries of th...
2MB Sizes 0 Downloads 0 Views