180

Biochimica el Biophvsica Acta. 1(187 (199/)) 181/ 1~;9 Elsevier

BBAEXP 92168

Structural organization of the human C a M I I I calmodulin gene Markus Koller, Bruno Schnyder and Emanuel E. Strehler Laborato O, for Biochemistry, Swiss Federal Institute of Technology, Zurich (Switzerland) (Received 14 May 19901

Key words: Calmodulin gene; Gene structure; Evolution; Promoter: (Human)

The complete structural organization of the human calmodulin III gene has been determined. This gene specifies the mRNA represented by the previously reported cDNA ht6. The gene contains six exons spread over a total of approx. 10 kb of DNA. Its exon-intron organization is identical to that of the only known chicken calmodulin gene and to that of two of the three characterized rat calmodulin genes. As in many other genes encoding Ca 2+ binding proteins, intron 1 separates the ATG initiation codon from the remainder of the coding region. The major and two minor sites of transcription initiation have been determined by primer extension and ribonuclease protection assays. The DNA sequence in the promoter and 5' untranslated region is extremely GC-rich. No typical TATA and CAAT boxes are present upstream of the major transcriptional start site; however, a consensus CAAT box sequence is found further upstream and may play a role in transcriptional initiation from the minor start sites. Six sequence elements with high similarity to monkey SV40-1ike Spl-binding regions are present in the putative promoter region, two of which contain perfect GGGCGG core sequences. The structure of the human calmodulin III gene promoter indicates that this gene belongs to a class of 'house-keeping' genes but that its level of expression may also be specifically regulated.

Introduction The ubiquitous Ca z+ binding protein calmodulin is known to act as an important mediator of the intracellular Ca z+ signal. It is involved in the regulation of a vast number of fundamental cellular processes including cell division, motility, secretion and intermediary metabolism [1,2]. Its primary amino acid sequence has been highly conserved throughout evolution, displaying only a single residue difference between fish and mammalian species [3]. In addition to its presence in all animal phyla calmodulin has also been characterized from higher plants [4,5,6], green algae [7], yeast [8,9] and the fungus Aehlya klebsiana [10]. Within a given species only a single calmodulin protein is generally found, so far with the sole exception of the sea urchin Arbacia punctulata [11] and possibly Naegleria flagellates [12] and chicken (131 where two slightly different calmodulin species may exist on the protein level. On the m R N A and gene level, however, multiple calmodulin coding

The sequence data in this paper have been submitted to the E M B L / G e n b a n k Data Libraries under the accession numbers X52606, X52607 and X52608. Correspondence: E.E. Strehler, Laboratory for Biochemistry, ETH Zentrum, Universit~itsstrasse 16, CH-8092 Zurich, Switzerland.

sequences have been characterized from several mammalian [14-201, frog [21] and sea urchin [11] species. In addition, multiple retropseudogenes and calmodulin-like genes have been identified in several higher eukaryotes [20,22 25]. In contrast, a single calmodulin gene has been shown to be present in Dictyostelium discoideum [26], Drosophila melanogaster [27,28], Chlamydomonas Reinhardtii [7], the freshwater mold, Achlya klebsiana [101 and yeast [8,9]. The presence in mammals of a true multigene family consisting of at least three separate bona-fide members [19,20], in addition to several calmodulin-like genes and pseudogenes, suggests that the regulation of expression of this important Ca :+ binding protein in these organisms may at least in part - be occurring at the transcriptional level. In order to assess the importance of transcriptional contributions to the regulated expression of calmodulin, and to learn more about the evolutionary origin and interrelationship of different calmodulin genes it is important to determine their structural organization and putative regulatory sequence features. Recently, the complete characterization of three rat bona-fide calmodulin genes as well as of a total of four retropseudogenes has been accomplished [20,23,29]. To date, these studies represent the single most complete set of data on the molecular organization of calmodulin coding and calmodulin-related sequences in a higher organism. A similar genomic complexity of calmodulin-related sequences can be cx-

0167-4781/90/$03.50 © 1990 Elsevier Science Publishers B.V. (Biomedical Division)

181 pected to occur in humans. Here we present the first complete description of the structural organization of a human bona-fide calmodulin gene, specifying the calmodulin III m R N A in the terminology introduced by Nojima and his collaborators [14,20]. This gene shows an intron-exon structure identical to the rat CaMI and CaMIII genes, containing 6 exons distrubuted over approx. 10 kb of DNA. Its close relationship with the rat CaM 111 gene is further emphasized by the sequence of its immediate upstream region which is extremely GC-rich and lacks a typical TATA box. Materials and Methods

Materials All reagents and chemicals were of the highest grade available. Restriction enzymes, nucleic acid modifying enzymes and D N A molecular weight standards were from Pharmacia LKB Biotechnology, Boehringer Mannheim and Bethesda Research Laboratories. Oligolabeling kits were purchased from Pharmacia LKB Biotechnology, Geneclean kits were from BIO 101 and dideoxynucleotide sequencing kits from United States Biochemical. [ 35S]dATPaS ( > 600 Ci/mmol), [ a- 32p]dCTP ( > 3000 Ci/mmol), [a-a2p]CTP (>_ 400 Ci/mmol) and [3,-32p]ATP ( > 3 0 0 0 Ci/mmol) were obtained from Amersham. Synthetic oligodeoxynucleotides were obtained from S. Keller (Institute for Cell Biology, ETH Zurich, Switzerland) and were purified by polyacrylamide gel electrophoresis followed by passage of the eluted DNA through a Sep-Pak column (Millipore) as suggested by the supplier. Human genomic DNA was isolated from fresh blood samples as described [30] and total RNA from human NTera2D cells was a gift of Dr. R. Fischer (Rorer Biotechnology, King of Prussia, PA, U.S.A.). Human leukocyte genomic DNA libraries (partial Sau3A digests in ~EMBL3) were obtained from Dr. S. Orkin (Children's Hospital, Harvard Medical School, Boston, MA, U.S.A.) and from Clontech Laboratories. Screening of human genomic DNA libraries with radiolabeled probes DNA fragments were labeled with [a-32p]dCTP by the oligolabeling method [31] and were used as described [32] for hybridization (5 • 105 c p m / m l ) to nitrocellulose replica filters taken from agar plates containing plaques of the recombinant h-phage libraries. The regions corresponding to positive signals were identified on the original plates and the plaques were replated for further screenings until well-separated single plaques could be isolated. Lambda-phage DNA preparation and isolation of insert DNA Lambda phage DNA was prepared from plaquepurified clones according to published procedures

[32,33]. To release the inserts about 20 ~g of DNA were digested with SalI or varying combinations of SalI and other restriction enzymes with no or only few sites in the lambda DNA. The digests were separated on 0.6 to 1.0% agarose gels and the gels were either used for Southem blotting [34] or to isolate individual DNA fragments by excising the corresponding gel slices and purifying the D N A by the gene-clean procedure as indicated by the supplier of the kit (BIO 101).

DNA subcloning and nucleotide sequencing Restriction fragments of the genomic DNA clones were ligated into pUC18/19 plasmids using standard procedures, and the DNA was transformed into Escherichia coli JM101 cells as described [32]. Recombinants were selected by preparing the plasmid D N A of individually picked colonies according to the alkali miniprep procedure [32]. The identity of the plasmids was tested by restriction digest analysis. For nucleotide sequencing according to the dideoxy chain termination method [35], the desired restriction fragments were subcloned into M13mp18/19 sequencing vectors [36] and sequencing was performed with modified T7 DNA p o l y m e r a s e ( ' S e q u e n a s e ' ) in the presence of [35S]dATPaS according to the protocols recommended by the manufacturer of the corresponding kit (United States Biochemical). The 5' flanking and exon sequences as well as about 50% of intronic regions were determined on both strands (with the exception of about 50% of the 3' untranslated region in exon 6, which was sequenced only on one strand). Overlapping sequences were obtained to align all fragments within each sequence block. Nucleotide sequence analysis Sequences were analysed with the University of Wisconsin Computer Group Sequence Analysis Software Package [37] (Version 6, April 1989). Mapping of the genomic clones Mapping was done by restriction digest analysis of whole lambda and of plasmid genomic-subclone DNA, and by using Southern blotting procedures [34] on complete and partial D N A digests [32,38] combined with nucleotide sequence determination. Primer extension analysis About 100 ng (17 pmol) of polyacrylamide gel-purified oligodeoxynucleotide primer was 5' end-labeled with T4 polynucleotide kinase and ['y-3/P]ATP as described [38]. 0.36.106 cpm (approx. 0.17 pmol) of labeled primer were annealed to 20/~g of total RNA for 15 min at 65°C in a total volume of 6.5/~1 double-distilled, autoclaved H20. After the addition of 1 #1 of 10 × reverse transcriptase buffer (500 mM Tris-HC1 (pH 8.3), 500 m M KC1, 50 mM MgClz, 50 mM DTT) the

182 mixture was transferred to a 42°C waterbath and was allowed to cool to 42°C over a period of 15 to 30 min. 0.5 ~tl of a dNTP stock solution (25 mM each of dATP, dCTP, dGTP and dTTP) and 2 /~1 (30 units) of AMV reverse transcriptase were then added and the reaction allowed to proceed for 60 rain at 42°C. The reaction was stopped by the addition of 5 /~1 250 mM EDTA (pH 8) and was extracted once with phenol/chloroform, once with chloroform, twice with ether and finally ethanol-precipitated in the presence of 0.3 M sodium acetate (pH 5.5). The pellet was washed once with 70% ethanol, briefly dried in a speedvac concentrator and redissolved in 20 /~1 of 85% deionized f0rmamide, 10 mM Tris-HC1 (pH 8), 1 mM EDTA, 0.01% dye mix (Bromophenol blue and Xylene Cyanol). The samples were heated at 95°C for 3 min and finally run on a 6.7% polyacrylamide, 8.3 M urea sequencing-type gel. The sequencing ladder of a known D N A fragment served as size standard. After electrophoresis the gel was fixed in 10% acetic acid/10% ethanol, dried and exposed to Kodak XAR-5 x-ray film for 24 to 48 h. RNase protection assays To generate uniformly labeled sense and anti-sense R N A probes a 363-bp SmaI-SmaI fragment encompassing the immediate upstream region and the beginning of exon I of the calmodulin gene was cloned into the transcription vector pSP65 (Promega Biotech.) according to standard protocols [32]. The orientation of the insert in recombinant plasmids was determined by restriction analysis. Plasmids pSP-H7a and pSP-H7b were used to generate sense and anti-sense R N A transcripts, respectively. Plasmids were digested with HindIII and the linearized template D N A was isolated from agarose gels by gene-cleaning. High-specific activity radiolabeled R N A was then synthesized in the presence of [a-3ZP]CTP and using SP6 polymerase as described [39]. Annealing of 106 cpm of labeled R N A transcripts to 20 /~g of total human R N A and RNase treatment were as described [39]. The protected R N A samples were run on 6.7% denaturing sequencing-type gels as described above. Results isolation and characterization of genomic clones covering the human calmodulin III gene About 10 6 clones of a human leukocyte genomic library were screened for inserts encoding the CaMIII gene corresponding to the human calmodulin cDNA ht6 described previously [19]. Three clones positive with a 0.6 kb PouII-EcoRI fragment (from nucleotide 1579 to the 3' end of the calmodulin c D N A clone ht6) were analyzed [19]. One of them, named Xhg2, turned out to contain a part of the human CaMIII gene but lacked the 5' most exon(s). To obtain the first exon specifying

the 5" untranslated region, as well as the 5' flanking (promoter) sequences, another human leukocyte genomie library was screened using as probes the cDNA fragments SmaI-EcoR1 (from nucleotide 945 to the 3' end) and SmaI-SmaI (from nucleotide 92 to 559). 10 out of 106 clones gave positive signals with a combination of the two fragments. One clone, named Xhg7, was positive with the radiolabeled calmodulin cDNA fragment EcoRI-SmaI (from nucleotide 1 to 91) but negative with a probe derived from the 3' end of the same c D N A clone. The D N A of the two clones X hg2 and ~hg7 was purified and the inserts were analyzed by restriction enzyme mapping and Southern blotting. A 4.5 kb SacI-SacI subfragment of clone Xhg2, hybridizing to the 0.6 kb PvuII-EcoRI cDNA fragment that is entirely located within the 3' untranslated region of clone ht6, was further characterized by restriction enzyme mapping and sequencing. This fragment corresponds to the 4.5 kb band observed on a Southern blot of SacI-digested human genomic D N A [19]. Sequencing of this fragment resulted in the identification of four exons (exons 3-6) and the corresponding intervening sequences of the human CaMIII gene. In a Southern blot experiment a 3.0 kb Sacl-SacI subfragment of clone Ahg7 was positive with the 97 bp EcoR1-Smal fragment representing the 5' end of the calmodutin e D N A clone ht6. Using a radiolabeled 52 bp HaeII1HaeIlI subfragment of ht6 (nucleotides 95 to 146) comprising the putative exon 2, we were able to locate exon 2 in a 1.3 kb NcoI-NcoI fragment of clone )~hg7. Further restriction enzyme mapping and sequencing resulted in the complete characterization of exon 1 and exon 2 of the human CaMIlI gene. Structural organization of the human CA MH1 gene The intron/exon organization shown in Fig. 1 was determined by comparing the sequence of the human CaM c D N A ht6 with the sequence of the CaMIII gene as shown in Fig. 2. The coding sequence of the human CaM1H gene is interrupted by five introns. Intron/exon boundaries are found at identical positions as in the chicken CaMII gene [40] and the rat CaMI [23] and CaMIII gene [20]. The first intron is located immediately 3' of the ATG translation start codon. The 3' most intron is placed 26 bp upstream from the TGA translation stop codon. Both intron positions are conserved in genes belonging to the CAM-SpecI-myosin light chain-group of Ca: +-binding proteins. All intron/exon junctions follow the so-called G T / A G rule (underlined in Fig. 2). In the 5' and 3' untranslated region several nucleotide substitutions were detected with respect to the published cDNA sequence ht6 [19]. All differences in the 5' untranslated sequence were found to be due to errors in the e D N A sequence as demonstrated by re-sequencing the corresponding c D N A fragments and using dITP mixes in the sequenc-

183 I

I

I

0

1

2 (kb)

ATG

TGA

ATA CaMI/I cDNA

C a M I I gene 1

2

3 4

//

5

6

i

clone khg7 //

B

N

S

,~'

0

N

NS

B

N

I

I

I

I

I

I

2

3.

6

7

8

9

clone Zhg2

S

I

10 (kb)

Fig. 1. Schematic illustration of the human C a M I I I cDNA ht6 and the human C a M I I I gene. Note the different scale bars for the cDNA and the genomic sequences, respectively. A partial restriction enzyme map for the C a M I I I gene is included in the corresponding bar. Restriction enzymes are: B, BamHl; N, Ncol and S, SacI. Coding regions, starting with the A T G initiation codon and ending with a TGA termination codon, are shown by open boxes. 5' and 3' untranslated regions are represented by hatched boxes. ATA indicates the AATAAA consensus polyadenylation signal. In the schematic representation of the C a M I I I gene, lines indicate intronic and intergenic sequences. Exons are numbered from 1 to 6. hhg7 and hhg2 are the two overlapping clones spanning the entire human C a M I l I gene region.

ing reactions. The changes are: Insertion of a C at position 100 of the published ht6 sequence (thereby creating a NcoI restriction site that includes the A T G initiation codon), switching the position of C-46 and G-47, insertion of a G between nucleotides 18 and 19 and insertion of a G between positions 13 and 14. On the other hand, some of the divergences noted in the 3' untranslated sequences may be due to allelic variations. This appears to be the case for positions 878 (T---, C) and 1239 (G--, A), since the sequence divergences at these positions could be confirmed on the cDNA and the genomic D N A level. Determination of the transcriptional start site of the human CaMII1 gene In order to identify the start site of transcription, primer extension and RNase protection experiments were performed. For primer extension analysis the 18mer oligonucleotide primer 5"-GGGGATCAAGGTTCCTCC-3', complementary to nucleotides 58 to 75 in the 5'-untranslated region of CaMIII cDNA ht6 [19],

was synthesized and radiolabeled. After hybridization of the primer to 20/~g of total human teratoma RNA, the elongation products of the reverse transcriptase reaction were analyzed on a 6.7% denaturing polyacrylamide gel (see Materials and Methods). Two strong bands with sizes of 96 and 97 nucleotides, and two weak bands of 161 and 240 nucleotides length were observed (Fig. 3a). From the length of these extension products the positions of the transcriptional start sites were identified in the CaMIII gene sequence. They are indicated in Fig. 2 : + 1 corresponds to the major and - 6 4 and - 1 4 3 to two minor start sites. These results were verified by RNase protection assays (Fig. 3b). A 363 bp SmaI-SmaI fragment, starting at nucleotide - 2 5 0 in the immediate upstream region and ending at nucleotide + 113 in exon 1, was cloned into the transcription vector pSP65. Radiolabeled transcription products representing antisense R N A were hybridized to 20 /xg of total human teratoma R N A and single stranded R N A was digested by RNase A and T1. The protected double stranded R N A was analyzed on a 6.7% denaturing

184 -253

cccgggaggc

ggggcgcgcg

gcgagggaaa

gtagtccggc

gacgggagcg

agcgcgcgcg

cgcccggcgg

caaacccaat

tcctgtgcag

ggtggggacq

-153

-143 agagattcgc ~ggcccgtag

gtgtggagcg

gggcgcggag

ggatccgtgg

gagccgcagt

gcggcggcgc

gcgggccgGg

-64 tgggqcgqg~

ccgagc~,Jgg

-53

cggggcgcgc

ggcggccgtt

gaqggaccgt

tggggcggga

ggcggcggcg

+1 gcg~CGGCGC

GCGCTGCGGG

CAGTGAGTSr? G G A G G C G C G G

ACGCGCGGCG

+48

GAGCTGGAAC

TGCTGCAGCT

GCTGCCGCCG

CCGGAGGAAC

CTTGATCCCC

GTGCTCCGGA

CACCCCGGGC

CTCGCCATG& igagtqaggc M

Structural organization of the human CaMIII calmodulin gene.

The complete structural organization of the human calmodulin III gene has been determined. This gene specifies the mRNA represented by the previously ...
1020KB Sizes 0 Downloads 0 Views