DNA AND CELL BIOLOGY Volume 10, Number 6, 1991 Mary Ann Liebert, Inc., Publishers
Organization of the
BEATRIX REDECKER, BERND HECKENDORF,* HANS-WILHELM GROSCH, GÜNTHER MERSMANN, and ANDREJ HASILIK
fragment of human DNA containing the cathepsin D (CATD) gene was isolated. Nucleotide sequencing, primer extension, protection from mung bean nuclease, and promoter activity assays were used to characterize the gene. The transcribed portion of the gene is about 11,000 bp and is organized into 9 exons analogous with the human pepsinogen A gene. Human pepsinogen A and CATD proteins have 42% sequence identity, while the two cDNAs are 55.7% identical. The positions of the splice junctions are fully conserved in these two genes. The noncoding sequences of the two genes are dissimilar. We report the nucleotide sequence of an Eco Rl-Bam HI fragment that contains the transcription initiation site. The promoter region contains no TATA and CCAAT boxes, but five potential Spl binding sites (one of them in the first intron) and four AP-2 binding sites (two of them in the first intron). In COS-1 cells, the region containing the three proximal Spl sites possesses the bulk of the promoter activity of the 5 -flanking sequence. The transcription start site of the CATD gene is localized within a CpG cluster. In the interval -390 through +450, the content of CpG is 5.8 times above the average throughout the human genome.
INTRODUCTION a soluble lysosomal enzyme larger-molecular-weight phosThe phosphorylation of carbohy-
(CATD) Cathepsinsynthesized D
drate side chains supports the targeting of this and other soluble lysosomal enzymes into a prelysosomal compartment. After segregation from the secretory pathway, the CATD precursor is subject to activation and proteolytic maturation (for review see von Figura and Hasilik, 1986). Besides being involved in the intralysosomal proteolysis, CATD probably participates in antigen processing (van Noort and van der Drift, 1989) and extracellular proteolysis. It exerts a mitogenic effect in mouse liver (Humphries and Ayad, 1983; Morioka and Terayama, 1984). Such processes, if biologically relevant, may be modulated by changing the rate of synthesis and/or the efficiency of the segregation with respect to the secretion of the CATD precursor. CATD synthesis is regulated by estrogens in MCF7 cells (Cavailles et al, 1989) and CATD protein (Stein et al, 1987) and mRNA (Redecker et al., 1989) synthesis are regulated by calcitriol (1,25-dihydroxy vitamin D3) in U937 promonocytes. Although the rate of the expression of
CATD can be manipulated with such treatments and the concentration of the enzyme varies among different tissues (Dorn et al, 1986; Reid et al, 1986), the molecular basis of this regulation is unknown. To study these phenomena we isolated the CATD gene, determined its exon-intron organization, and characterized its 5' region.
MATERIALS AND METHODS
Cells, plasmids, enzymes, and reagents U937 monocytes (Sundström and Nilsson, 1976) were cultured in RPMI-1640 medium (GIBCO-BRL, Eggenstein) containing 10% heat-inactivated fetal calf serum. COS-1 cells (Gluzman, 1981) were grown in Eagle's minimum essential medium with Earle's salts (Biochrom KG, Berlin) containing 10% fetal calf serum. The plasmids pBC12/PL/SEAP and pBC12/RSV/SEAP (Berger et al, 1988) were a gift from Dr. B. Cullen (Duke University, Durham, NC). The restriction enzymes were obtained from Boehringer-Mannheim, Biozym, Hameln, and New England BioLabs (Schwalbach, FRG). The radiochemicals, [a-32P]dATP (specific radioactivity 110 TBq/mmole), [•y-"P]ATP (110 TBq/mmole), and [a-35S]dATP (22 TBq/
•Present address: Institut für Biochemie, Universität Stuttgart, Pfaffenwaldring 55, D-7000 Stuttgart 80, FRG. Institute of Physiological Chemistry and Pathobiochemistry, Westfälische Wilhelms-Universität, D-4400 Münster, FRG.
REDECKER ET AL.
mmole) were from Amersham-Buchler (Braunschweig, FRG); T7 DNA polymerase sequencing kit from Pharmacia-LKB (Freiburg, FRG); T4 DNA ligase, T4 polynucleotide kinase, Klenow fragment, and mung bean nuclease from Boehringer-Mannheim and the lipofectin reagent from GIBCO-BRL (Eggenstein, FRG). Isolation
of genomic clones and sequence analysis
cloned in the Eco Rl-Hind III sites of M13mpl9 and single-stranded phage DNA was isolated. The second strand was synthesized with the above-mentioned 30-mer oligonucleotide as a primer and [a-32P]dATP (Calzone et al, 1987). The DNA was denatured and the labeled singlestranded DNA was separated from the template by agarose gel electrophoresis. The single-stranded 32P-labeled probe, 5 x 105 cpm, was hybridized to 50 fig of either total U937 RNA or 50 fig yeast tRNA or 5 fig poly(A)*RNA in 80% formamide, 0.4 M NaCl, 0.05 M PIPES pH 6.4, and 1 mM EDTA overnight at 60°C (Ausubel et al, 1987). The hybridized samples were digested at 37°C with 50 U of mung bean nuclease for 1 hr, and the length of the products was determined on a 6% acrylamide/7 M urea sequencing gel (Sanger et al, 1977).
A genomic library from human leukocytes in XL47.1 was kindly provided by Dr. B. Horsthemke (Univesity of Essen, FRG). An aliquot of the multiplied library, 8 x 105 phages, was screened by hybridization with the random primed cDNA of CATD (Redecker et al, 1989). Rescreening of the positive plaques was performed with a 30-mer oligonucleotide corresponding to amino acids -64 to -55 and a 36-mer oligonucleotide corresponding to amino acids 325-336 of human CATD (Faust et al, 1985). After a restriction endonuclease mapping, appropriate fragments Determination of the promoter activity were subcloned in M13mpl8 and/or mpl9. Exons and Several partially overlapping fragments of the 5'-flanksplice junctions were determined by sequencing short subclones that shared sequences with the cDNA (Barnum et ing region and the first exon of the genomic CATD clone al., 1989). The sequencing was performed by the chain-ter- were cloned into the Eco RV site of the pBC12/PL/SEAP mination method (Sanger et al, 1977) with [a-35S]dATP vector (Berger et al, 1988). The ends of the fragments that and T7 DNA polymerase using a sequencing kit from were obtained with endonucleases Eag I and Rsr II were Pharmacia-LKB. In a few cases, where the sequence was filled using Klenow fragment, and Bgl I ends were flushed determined only in one direction, the DNA was purified by the exonuclease activity of T4 DNA polymerase. Only and analyzed several times. DNA sequences were compiled the clones containing the promoter fragments in the corwith the computer program Genepro, version 4.2 (River- rect orientation were analyzed further. COS-1 cells (4 x side Scientific Enterprises, Seattle, USA) and sequence 105) were transfected with the lipofectin reagent (GIBCOsimilarities were evaluated by the method of Pearson and BRL) and 10 fig of DNA was added according to the Lipman (1988). Unless determined by sequencing, the method of Feigner et al. (1987). Two days after transfeclength of the introns was estimated by agarose gel electro- tion, the activity of the secreted alkaline phosphatase was phoresis of gene fragments cut within adjacent or distal assayed (Berger et al, 1988). An insert-free plasmid exons and identified by hybridization with subclones con(pBC12/PL/SEAP) and a plasmid containing the Rous sarcoma virus promoter (pBC12/RSV/SEAP) were used as taining single exons.
mapping of CATD mRNA
Total RNA of
isolated from U937 cells in the presence
guanidine thiocyanate as described by Chirgwin et al. (1979). To enhance the level of CATD mRNA, the cells were incubated with 100 nM calcitriol for 3 days. Poly(A)*RNA was prepared by oligo(dT)-cellulose. The transcription initiation site of the CATD gene was determined by primer extension and protection from mung bean nuclease (Ausubel et al, 1987; Calzone et al, 1987). The probe for primer extension analysis was the 30-mer oligonucleotide mentioned above. The probe was 5' end-labeled with T4 polynucleotide kinase and [7-32P]ATP. The 32P-labeled oligonucleotide, 105 cpm, was annealed to 50 fig of either total U937 RNA or yeast tRNA or 5 fig poly(A)*RNA by an overnight incubation at 42CC. The hybridized probe was extended by incubation with 400 U of
Moloney murine leukemia virus reverse transcriptase at 42°C for 30 min (Ausubel et al, 1987). The products were analyzed on a 6% acrylamide/7 M urea sequencing gel. The protection from digestion with mung bean nuclease was performed with a 445-bp Bgl I fragment of the CATD gene containing the first ATG codon. This fragment was
RESULTS Isolation of CA TD gene clones encoding the entire cDNA A human library in the vector XL47.1 was screened with CATD cDNA. Seventeen positive clones were isolated. These clones were hybridized with two synthetic probes representing the amino- and carboxy-terminal coding regions of the cDNA. Four of them reacted with both probes and the remainder with the carboxy-terminal probe only. One of the four clones appeared to contain the amino-terminal coding region —3 kb from the 5' end of the insert, and was chosen for further analyses. Since the Bam HI cloning site of the vector XL47.1 had been destroyed, fragments for further analysis were prepared with Eco RI and Hind III (Fig. 1A). By subtracting the phage-encoded sequences, the length of the isolated gene fragments was estimated to be 16 kb.
HUMAN CATHEPSIN D GENE
XV AP PV
SKZ A KZ
FIG. 1. Partial restriction map of the human CATD gene and sequencing strategy of its 5' region. A. Physical map of restriction fragments hybridizing with probes complementary to portions of CATD cDNA encoding amino acid residues (numbering according to Faust el al, 1985) -64 through -55 (T) and 325-336 (*). The dashed line represents the phage-encoded sequence. B. Physical map of an Eco Rl-Bam HI fragment encoding the 5' region. C. Physical map and sequencing strategy of a Hind U-Bam HI fragment. The extent and direction of sequencing are indicated by the arrows. A, Apa I; B, Bam HI; E, Eco RI; H, Hind III; K, Kpn I; N, Nae I; P, Pst I; S, Sst I; V, Sma I; X, Xho I; Y, Hind II; Z, Bgl I.
Mapping and sequencing of the 5' region of the human CA TD gene
The genomic CATD clone yielded an 10-kb Eco RI fragment, which encoded the amino-terminus. This DNA was digested with Bam HI to remove 4.5-kb from its 3' end. The remaining 5.5-kb Eco Rl-Bam HI fragment was mapped with several nucleases (Fig. IB). A 3.2 kb Hind IIBam HI subfragment was sequenced (Fig. 2) following the strategy shown in Fig. 1C. ~
human pepsinogen genes, we predicted that the CATD gene consists of several exons, all probably being < 300 bp long. Therefore, using subclones containing short portions of the gene fragment, we could determine the entire coding sequence and all its interruptions due to introns. The results show that the human CATD gene is organized into 9 exons and 8 introns (Fig. 2). The sizes of the exons and introns and the nucleotide sequences at all splice junctions are compiled in Table 1. Exon 9 contains 168 bp through the stop codon, 724 bp of 3' untranslated sequences through the polyadenylation signal, and another 24 bp (Faust et al, 1985; Westley and May, 1987) to
25 bp (Augereau et al, 1988) at the 3' end of the cDNA. In the sequence shown in Fig. 2, an alternative polyadenylation signal is present 431 bp downstream from the first one. This signal is probably not utilized, because Northern blots show a band of CATD mRNA of 2.1-2.2 kb either alone (Faust et al, 1985; Redecker et al, 1989) or with a minor companion of 4.5 kb (Augereau et al, 1988). No cDNA sequences supporting the existence of the 4.5 kb mRNA have been identified. or
Mapping of the transcription
Two methods were used to determine the transcriptional site with either total or poly(A)*RNA isolated from cultured human cells. In the first approach, an oligonucleotide complementary to the mRNA encoding the first 10 amino acids of human prepro-CATD was used as a primer for mRNA-directed DNA synthesis. The products of this reaction were compared to a standard obtained by extension of a phage M13 primer (Fig. 3). The three most prominent products shown contained extensions of 68, 125, and 292 nucleotides. In the second assay, mung bean nuclease was used to determine the length of the hybrid formed between CATD mRNA and a single-stranded DNA that was synthesized using a 445-bp Bgl I fragment of CATD gene start
REDECKER ET AL.
gtcgacgtgag cgaatttcgc gtattttagg gccagtgtga ctaagcacag gagctggttt gctggggact cgtctggagg
tggacaaaag cccctcaaga gtggaatact ggctggaatt
gtgcccgacc gacagctgtg ttgcctgcct tgggatctgg
tcctaaggga cagtgccata tcgggcctgc aggctagagc cgcccagagg gagagggcgg acgggtccga tccaggtttc tctggaagcc ctgtagagga
aggagcttct tcaaaacatg tgtctgccac catcggtgag ccccctttgc gcggagggtc
ctagaaaacg tgaaaggaat gtgaggctgt
ggcagggcct cattcggtgg ttgaatttaa ccttggtttg caagaggctt ccagagagga tgtctgggag gggacgaggg ggcgccggga ggagcaggtg caggagccca cggcgcaggc cccgcgcagg cctggacgcg gggacggccg cggcggccgg gacaggggtc accccgcggg gccctccagg gtgggccgcc ccacgacccc aggccaggcc gaaacgggaa tcctccagac
-751 -691 -631 -571 -511 -451 -391 -331 -271 -211
gaccccgcgg gcgcgagcgg cgggaactgt