Molecular and Cellular Endocrinology, 78 (1991) 115-125 0 1991 Elsevier Scientific Publishers Ireland, Ltd. 0303-7207/91/$03.50
MOLCEL
115
02517
Identification
of multiple transcription start sites in the human insulin-like growth factor-I gene
E. Jansen I, P.H. Steenbergh’,
D. LeRoith
2, C.T. Roberts,
Jr. * and J.S. Sussenbach’
’ Laboratory for Physiological Chemistry, State University of Utrecht, Urechr, The Netherlands, and ’
Section of Molecular
and Cellular
Physiology, Diabetes Branch, NIDDK, National Institutes of Healih, Bethesda, MD, U.S.A. (Received
Key words:
Insulin-like
growth
factor-I;
15 January
Transcription
1991; accepted
start sites; Leader
18 February
1991)
exon; Polymerase
chain reaction:
RNase
protection
Summary We have localized four transcription initiation sites in the human insulin-like growth factor-I (IGF-I) gene. Two transcription start sites were identified which result in a longer and shorter version of the leader derived from the known exon 1 of the IGF-I gene. Transcription starting at the upstream transcription initiation site results in a leader exon 1 of about 1155 nucleotides (nt), whereas transcription starting at the downstream initiation site results in a leader of about 240 nt. The majority of the transcripts initiate at the latter site. We further identified a region in the human IGF-I gene between exons 1 and 2, which shows a high degree of homology with the rat IGF-I leader exon 1B. By means of the polymerase chain reaction (PCR) we detected human IGF-I mRNAs containing this novel leader. The corresponding exon was designated exon 1B according to the rat IGF-I gene terminology. PCR and RNase protection analyses identified two transcription start sites within this alternative leader exon 1B. Transcription initiated at the most upstream start site results in a leader of about 750 nt, whereas transcription starting at the downstream site is heterogeneous, resulting in leaders of 65-75 nt long. No consensus TATA-box or AT-rich regions are present immediately upstream of all four transcription start sites identified, nor are these regions particularly GC-rich. The IGF-I gene is known to be expressed differentially in a tissue- and development-specific fashion. Differential activation of multiple promoters could very well play a crucial role in IGF-I gene regulation.
Address for correspondence: E. Jansen, Laboratory for Physiological Chemistry, Vondellaan 24a. 3521 GG Utrecht, The Netherlands. Tel.: + 31-30-880521; Fax: + 31-30-888443. As a consequence of novel leader exons being detected in various species, the nomenclature regarding the exons of the IGF-I gene has become somewhat confusing. During the 2nd International Symposium on IGFs in San Francisco (January 12-16. 1991) a proposal for a new nomenclature was put forward; however, this nomenclature has not yet been generally agreed upon and published. The exons now designated 1C and lB, etc., will be renamed according to the internationally accepted new nomenclature as soon as possible. The nucleotide sequence data of exon 1B reported in this paper will appear in the EMBL. GenBank and DDBJ Nucleotide Sequence Databases under the accession number: M 59812.
Introduction Insulin-like growth factor-l (IGF-I) is a 70amino acid basic poiyp~ptide which shows structural homology with insulin. IGF-I plays a fundamental role in postnatal mammalian growth where it mediates the growth-promoting effects of growth hormone. The liver is known to be the major site of IGF-I production, but many other tissues also produce IGF-I. where it may act as an autocrine or paracrine growth factor. cDNA analysis and gene mapping have shown that the human IGF-I gene consists of at least 5 exons. spanning a minimum of 90 kilobases (kb) of chromosomal DNA (Jansen et al.. 1983; Ullrich et al., 1984; Bell et al., 1985; De Pagter-Holthuizen et al., 1986; Le Bout et al., 1986: Rotwein et al.. 1986). Alternative splicing yields two classes of mRNA which encode IGF-I precursors with different carboxyl-terminal extensions. 1.1 kb IGF-Ia mRNA consists of the exons l-3 and 5 (Jansen et al., 1983), whereas 1.3 kb IGF-Ib mRNA is derived from exons l-4 (Rotwein, 1986). The third major mRNA species of 7.6 kb has been shown to contain exon 5 and no exon 4 sequences (Hiippener et al., 1988). In these mRNAs, the regions encoding the mature IGF-I peptide are flanked by regions coding for amino-terminal and carboxylterminal peptides. indicating that IGF-I is synthesized as a precursor molecule (prepro IGF-I). The structure of the human IGF-I gene shows many similarities to the rat IGF-I gene (Shimatsu and Rotwein, 1987a; Daughaday and Kotwein, 1989; Sussenbach, 1989). A difference between the two genes is the number of leader exons identified. Three different, alternatively used leaders, designated class A, B and C, have been described in the rat, based upon cDNA sequences (Roberts et al., 1987a). Recent evidence. however, suggests that only the class B and C sequences are present in rat mRNA and that the sequence unique to the class A cDNA is an artifact of the cloning procedure (Kato et al., 1990: Adamo, M. et al., in preparation). Both of these Ieaders contain upstream inframe translation initiation codons and may therefore encode different signal peptides. The expression of the different leaders is development-dependent and differentially regulated by growth hormone in a tissue-specific way (Adam0 et al.,
1989; Lowe et al., 1987). In rat liver. the major class C leader is found in about 75% of the IGF-I messengers. The rat exon 1B is located between exon 1C and exon 2 (Bucci et al.. 1989). Exon lB-specific cDNAs have also been reported in guinea pig, mouse and sheep (Bell et al.. 1986, 1990; Wong et al., 1989). In the human IGF-I mRNA to date only one type of leader has been found. This leader is highly homologous to the rat class C leader. The exact location of the 5’ end of the IGF-I mRNAs has not yet been determined with certainty for any species. Employing polymerase chain reaction (PCR) and RNase protection analysis we localized the 5’ ends of human IGF-I mRNAs. In addition to human adult liver, uterus leiomyoma tissue was used as a source of IGF-I mRNA. IGF-I expression in this type of tumor has been investigated in detail by us (Hoppener et al.. 1988; Gloudemans et al., 1990). No differences in mRNA lengths or nucleotide sequences have been detected in leiomyoma tissue in comparison to adult liver. The IGF-I mRNA levels are often higher in leiomyoma than in normal adult liver tissue. Materials and methods RNA isolation Total cellular RNA was isolated from human adult liver and uterus leiomyoma using the guanidine thiocyanate procedure (Chirgwin et al.. 1979). Poly(A)+ RNA was prepared by oligo(dT)cellulose affinity chromatography (Aviv and Leder, 1972) or the PolyATract mRNA isolation system (Promega).
Probe synthesis and purification. Anti-sense RNA probes were synthesized and labeled according to the following procedure: a typical reaction mixture contained transcription buffer (BRL), 10 mM dithiothreitol (DTT), 1 mM each of GTP, ATP and CTP, 0.3 mg/ml bovine serum albumin (BSA), 5 units RNasin (Promega), 1 pg linearized template, 10 units T3 or T7 RNA polymerase (BRL), 125 pmol (50 PCi) [oL-~~P]UTP (Amersham) and diethylpyrocarbonate (DEPC) treated water to 15 ~1. After 1 h at 37*C, spin-dialysis
117
through Sephadex G-50F in the presence of 0.1% sodium dodecyl sulfate (SDS) was performed and a sample was taken for Cerenkov counting. 20 pg of tRNA, 0.1 volume of 3 M sodium acetate and 2 volumes of 96% ethanol were added. After 30 min at - 70°C the precipitate was collected by centrifugation, dried in vacua and resuspended in sample buffer (95% deionized formamide and 0.1% each of bromophenol blue and xylene cyanol). The sample was heated at 100°C for 5 min, chilled on ice and electrophoresed on a 5% polyacrylamide/ 8.3 M urea denaturing gel. After autoradiography, full-length probe was cut out of the gel and eluted with 0.5 M NH,CI, 10 mM Mg-acetate, 1 mM EDTA. 0.1% SDS and 20 pg tRNA for 5-7 h at 37°C. H.ybridizution and RNase digestion. lo’-lo6 cpm of anti-sense RNA probe was added to total or poly (A)+ RNA. 0.1 volume of 3 M sodium acetate and 2 volumes of 96% ethanol were added. After 30 min at -70°C the precipitate was collected by centrifugation, dried in vacua and resuspended in 24 ~1 100% formamide. Then, 6 ~1 5 x hybridization buffer was added to final concentrations of 80 mM Pipes/pH 6.4, 400 mM NaCl and 2 mM EDTA. This mixture was heated at 100°C for 5 min and subsequently incubated at 42°C for about 16 h. After hybridization, 300 ~1 RNase digestion buffer (10 mM Tris-HCl/pH 7.5, 300 mM NaCl, 5 mM EDTA), containing varying RNase A and RNase Tl concentrations (Boehringer), was added. After 30 min incubation at 0°C or 37°C SDS to a final concentration of 0.6% and 50 pg proteinase K were added, followed by a 15 min incubation at 37°C. The mixture was extracted with an equal volume of phenol/chloroform/isoamyl alcohol (25:24:1). To the aqueous layer, 20 pg of tRNA, 0.1 volume of 3 M sodium acetate and 2 volumes of 96% ethanol were added. After 30 min at -70°C the precipitate was collected by centrifugation, dried in vacua and resuspended in sample buffer. The sample was heated at 100°C for 5 min, chilled on ice and electrophoresed on a 5% polyacrylamide/8.3 M urea denaturing gel. Densitometric scanning of autoradiographs was done with the LKB Ultroscan XL Enhanced Laser Densitometer. The data were processed with the Gel Scan XL 2.0 software (Pharmacia).
cDNA synthesis and PCR Complementary DNA was synthesized on human adult liver and uterus leiomyoma poly(A)+ RNA. A typical reaction mixture contained 3 pg poly(A)+ RNA, 12 pmol RT primer. 6 mM MgCl,, 40 mM KCl, 50 mM Tris-HCl/pH 8.15, 1 mM DTT, 250 PM dNTPs (Pharmacia), 16 pmol (50 PCi) [cu-j2P]dCTP (Amersham). 20 units RNasin (Promega) and DEPC-treated water to a volume of 50 ~1. This mixture was incubated for 3 min at 65°C and slowly cooled to room temperature. 20 units AMV reverse transcriptase (Pharmacia) were added and the mixture was incubated at 42°C for 2.5 h. The amount of cDNA synthesized was monitored by measuring 32P incorporation. 0.5% of the cDNA was heated for 3 min at lOO”C, chilled on ice and used as template for polymerase chain reaction (PCR) in a 50 ~1 reaction mixture containing 50 mM KCl, 10 mM Tris-HCl/pH 8.3, 1.5 mM MgCl?, 0.01% gelatine, 250 PM dNTPs, 0.5 PM of each of the two synthetic oligonucleotides (synthesized with a Pharmacia-LKB Gene Assembler Plus DNA synthesizer) and 2.5 units of Ampli-Taq DNA polymerase (Perkin-Elmer Cetus). The mixture was overlayed with 50 ~1 mineral oil (Sigma). 30 cycles of PCR were performed with a programmable thermal cycler (Bioexcellence). Each cycle consisted of 1 min at 94°C (denaturation). 2 min at 57°C (annealing) and 3 min at 72°C (polymerization). The product was examined by electrophoresis on a 1% agarose gel. Results and discussion Determination of the transcription start site To determine the position of the transcription start site of human IGF-I mRNA, we performed RNase protection experiments. Exon 1 is located within a genomic EcoRI fragment of 1300 basepairs (bp) (Rotwein et al., 1986). This fragment contains over 1000 nucleotides upstream and 38 nt downstream of the longest exon 1 sequence described thus far (Rotwein, 1986). This EcoRI fragment was cloned into the Bluescript pKS+-I vector and used as a template for the production of a radiolabeled anti-sense RNA probe of 1380 nt employing T3 RNA polymerase (probe a, Fig. 3). Total RNA
118
bases 1632 -
517 396 344 298 -
221/220 -
154 -
Fig. 1. RNase protection analysis with exon 1 an&sense RNA probe a. Lane 1, no RNase added; lane 2, no RNA added; lane 3, 50 pg total RNA from human adult liver; lane 4, 5 pg total RNA from human uterus leiomyoma. Hybridization with 4x 10’ cpm “P-labeled probe was carried out at 42”C, followed by an RNase digestion using 10 U/ml RNase Tl and 1 pg/ml RNase A for 30 min at 30°C. The exposure time was 2 days with an intensifying screen at - 70°C.
isolated from human adult liver and uterus leiomyoma was hybridized to the RNA probe and subsequently subjected to RNase A and Tl digestion. The products were characterized by gel electrophoresis. One major protected fragment of about 240 nt was detected (Fig. 1). Since there is no splice acceptor sequence found around this position, this result suggests that a transcription start site is located at about 240 nucleotides upstream of the 3’ end of exon 1 (at position nt 1030, Fig. 1). This corresponds well to the length of this exon predicted by different IGF-I cDNA clones (Le Bout et al., 1986; Rotwein, 1986). Further-
more, the lengths of the small IGF-I mRNAs of 1.1 kb and 1.3 kb can be accounted for by the addition of a 240 nt-long exon 1 to the lengths of the remaining exons of which they are known to be composed, assuming an average length of 200 residues for the poly(A) tail. We noticed that in the RNase protection assays longer protected fragments, in addition to the 240 nt fragment, were reproducibly present (Fig. 1). Since this phenomenon might be indicative of the presence of longer leaders, we cloned a 333 bp genomic AccI-EcoRI fragment into the Bluescript pKS+-I vector and prepared a radiolabeled antisense RNA probe using T7 RNA polymerase (probe b, Fig. 3). This probe contains only 60 nucleotides upstream of the transcription start site identified by the previous experiment. RNase protection analysis with this probe and total RNA isolated from human adult liver or uterus leiomyoma, resulted in two protected fragments (Fig. 2). The most prominent protected fragment again is the 240 nucleotides long product, confirming the transcription start site identified previously. The other protected fragment is 295 nucleotides long and corresponds to the distance between the AccI site (at position nt 977, Fig. 3) and the 3’ end of exon 1. This result confirms the presence in IGF-I mRNA of a longer version of exon 1, extending upstream from the major transcription start site, suggesting the existence of a second transcription start site. Densitometric scanning of the two protected bands showed that the ratio of mRNAs derived from the major transcription start site (around position nt 1030, Fig. 3) over transcripts derived from the upstream transcription start site is about 5 in adult liver and about 20 in uterus leiomyoma. In order to localize the upstream transcription start site, PCR experiments were performed with single-stranded cDNA, synthesized by reverse transcription (RT) of adult liver poly(A)+ RNA, primed with an exon 2 specific primer (RT-2, Fig. 3). PCR with exon 2-specific oligonucleotides (primers 1 and 2, Fig. 3) located 5’ of the RT primer and a set of oligonucleotides located upstream of the major transcription start site in exon 1 (primers 3-5, Fig. 3) revealed that the 5’ end of the longer version of exon 1 is located between the positions of primer 4 (yielding a PCR product of
119
base 396
298
295
240 2211220
154
75
Fig. 2. RNase protection analysis with exon 1 anti-sense RNA probe b. Lane 1, no RNase added; lane 2, no RNA added; lane 3, 50 pg total RNA from human adult liver; lane 4, 5 pg total RNA from human uterus leiomyoma. Hybridization with 2X lo5 cpm “P-labeled probe was carried out at 42°C followed by an RNase digestion using 10 U/ml RNase Tl and 40 ~g,/ml RNase A for 30 mitt at 30°C. The exposure time was 4 days with an intensifying screen at - 70°C.
the expected length) and primer 5 (no product was made). In control PCR experiments using cloned genomic DNA, products of the expected lengths were obtained with all primers used. In order to map the 5’ end of the longer version of exon 1 more precisely, we performed PCR with primers 6-9 (Fig. 3). PCR with primers 6 and 7 as well as with 6 and 8 yielded the expected products, but with the primers 6 and 5 no product was detected. The same result was obtained when primer 9 was used instead of primer 6. The controls with cloned genomic DNA were positive for all primers. Since there is no splice acceptor sequence present in this region, these data suggest that the upstream tran-
scription start site is located between the positions of primers 5 and 8, resulting in an exon 1 of 1142-1172 nt long. The above experiments show clearly that the longer version of the leader is spliced to exon 2 derived sequences. We also synthesized singlestranded cDNAs on adult liver poly(A)+ RNA, using RT primers in exon 4 and exon 5, respectively (RT-4, RT-5, Fig. 3). PCR analysis of these cDNAs employing primers 10 and 11, respectively (Fig. 3), in combination with a primer in exon 1 (primer 12, Fig. 3), yielded products of the expected lengths. This indicates that sequences deri;led from the upstream transcription start site in exon 1 are present in fully processed IGF-I mRNA. In the rat, the class C sequence is highly homologous to the human exon 1. The exact position of the transcription start site of the rat class C leader has not been defined precisely, although by primer extension analysis, the length of the 5’-untranslated region is estimated to be approximately 1100 nucleotides (Shimatsu and Rotwein, 1987a). It has been established that within the rat exon 1C alternative splicing may occur, due to the presence of a small 186 bp ‘part-time’ intron within the rat exon 1C (Shimatsu and Rotwein, 1987b). Our PCR data demonstrate that in human adult liver no such alternative splicing event occurs within exon 1, despite conserved splice junctions surrounding the homologous region (position nt 1005-1190, Fig. 3) in the human IGF-I gene. Identification of a novel leader exon In the rat, one of the leaders is derived from exon 1B. This exon is located between exon 1C and exon 2 in the rat IGF-I gene (Bucci et al., 1989). To establish whether a similar leader exon is present in the human IGF-I gene, we hybridized a radiolabeled rat exon 1B specific cDNA fragment (nt l-750, Roberts et al., 1987b) 1 ‘:h a Southern blot of various restriction endonuclease digests of a cosmid clone containing a 40 kb human genomic DNA insert encompassing exons 1 and 2 (De Pagter-Holthuizen et al., 1986). Strong hybridization was found with a region between these exons. The homologous region was mapped more precisely (Fig. 4a) and after subcloning into Ml3 sequencing vector, the nucleotide sequence
120 qaattctcaa a tgttgccggc
tggcaaaggc
aagtgtacat
tataaatagc
cagtcaccca
gttgagggat
*at******** CAAGAGG-
GCTGGTGTTA
TTTAGhhTAC
*****it**** ttqaatqaca TCATAACCCT 5+ ACAAhAATCA GAGAAAGAAA
ACACACTCTG
180
********** TTGCTAGCCA 8-a
aaaacagctg
gcttggacca
60 120
7-P GCACACAGAC
TCCCTCTGTC
ATACACACAC
ACACACACAC
ACACACACAC
ACACACACAC
240
AGAGGTTTGA
GTTATATGGA
AAA"TCAAAC
CCCAGGTACC
300
CTTCTCCCAG
AGTGGTGGGG
TGGGGAGGGG
AACAGGAAAA TTGTTTGCCC C6 43 ACAGTGACAG GCAGCCTAGT
AGAAGAATAA
360
AGAAAAATGT
TCTATTTCAG
TTGGGTTTTA
CAGCTCGGCA
TAGTCTTTGC
CTCATCGCAG
420
GAGhAAAAGT
ATGAGACAGT
GCCCTAAAGG
GACCAATCCA
ATGCTGCCTG
CCCCTCCATA
480
GGTTCTAGGA
AATGAGATCA
CACCCCTCAC
TTGGCAACTG
GGACAAGGGG
TCACCCGAGT
540
GCTGTCTTCC
ACCCCAGTCA
CTTCAGGGTT
AMATTGTAG
AGTTTGCTGG f9
600
AGAGGGTCTT
AATCTACTTT 3+ ATCGTCCTTT
CTTTCTTTTT
TTGTTTThhh
TAATGCATTT
GCTCTAGAAT
660
CTAAAATTGC
TCTCCCATCC
CCCATATTCC
TTTAATACTG
GTAAGGTGTA
TTAGCAGACG
720
TTTGTGTCTT
CATGCCCAGC
AGAAAGTTAh
TCAGAAAACA
GATCCTTATT
TTCTATGGCA
780
GCATAAGTAT
TTTAATGTCT
GCGAhCCCTG
TCACTAACAC
ACATTCTTTT
AAGGGAAAAA
840
AATGCTTCTG
TGCTCTAGTI:
TTAAAATGCA
AAGGTATGAT
GTTATTTGTC
ACCATGCCCA
900
AAAAAGTCCT
TACTCAATAA
CTTTGCCAGA
AGAGGGAGAG
AGAGAGAAGG
CAAATGTTCC
960
CCCAGCTGTT
TCCTGTSAC
AGTGTCTGTG
TTTTGTAGAT
AAATGTGAGG
ATTTTCTCTA
1020
=2+*** AATCCCTCTT
*** CTGTTTGCTA
AATCTCACTG
TCACTGCTAA
ATTCAGAGCA
GATAGAGCCT
1080
GCGChhTGGA
ATAAAGTCCT
CAAAATTGAA
ATGTGACATT
GCTCTCAACA
TCTCCCATCT
1140
CTCTGGATTT
CTTTTTGCTT
CATTATTCCT
GCTAACCAAT
TCATTTTCAG
ACTTTGTAS;
1200
TCAGAAGCAA
TGGGAAAAAT
CAGCAGTCTT
CCAACCCAAT
TATTTAAGTG
CTGCTTTTGT
1260
exon 2 GCACACCATG
TCCTCCTCGC
ATCTCTTCTA
CCTGGCGCTG
1320
CTCTGCCACG CGTGTGTGGA
GCTGGACCGG AGACGCTCTG C RT-2 GACAGGGGCT TTTATTTdi
CGGGGCTGAG sron 3 CAAGCCCACA
1380
CTGGTGGATG
gAAG"T T CCTTCACCAG Cl CTCTTCAGTT
GGGTATGGCT
CCAGCAGTCG
GAGGGCGCCT
CAGACAGGTA
TCGTGGATGA
GTGCTGCTTC
1500
CGGAGCTGTG
ATCTAAGGAG
GCTGGAGATG
TATTGCGCAC
CCCTCAAGCC
TGCCAAGTCA
1560
TCCGTGCCCA
GCGCCACACC
CAAGTAGAGG fll GAAGAGTGAC
GAGTGCAGGA
GATTT:%A=A$z TGCCTGCTCA
GCTCGCTCTG TTG~G~CG CCTGAGGAGT
1440
GACATGCCCA
AGACCCAGAA
&g:@AY
1620
AACAAGAACT f- RT-5 ATGCCACCGC AGGATCCTTT
ACAGGATGTA
GGAAGACCCT
1680
GCTCTGCACG
AGTTACCTGT
1740
TAFiACTTTGG
AACACCTACC
AAAAAATAAG
TTTGhTAACA
TTTAAAAGAT
GGGCGTTTCC
1800
CCCAATGAAA
TACACAAGTA
AACATTCCAA
CATTGTCTTT
AGGAGTGATT
TGCACCTTGC
1860
AAAhATGGTC
CTGGAGTTGG
TAGATTGCTG
TTGATCTTTT
ATCAATAATG
TTCTATAGAA
1920
AhGAAAAAAh
A
CATCTACCAA CAAGhhCACG c-10 ACACATCChG-----------
AhGTCTCAGA
GAAGGAAAGG R!e-4
TTGGCCAAAG
exon 4 TATCAGCCCC
f
60
Pig. 3. Nucleotide sequence of the human exon 1 region and positions of the transcription start sites. The positions of the anti-sense RNA probes are indicated by a, b and c. Underlined sequences and numbered arrows represent the positions and 5’ to 3’ direction of primers used in cDNA synthesis and PCR. Regions of transcriptional initiation are indicated by asterisks. Sequences which were not detected in mRNA are in lowercase. Only the first 70 nt of exon 4 are depicted. Sequence data are from Jansen et al., 1983: Rotwein, 1986; Rotwein et al., 1986.
121
was established (Fig. 4b). This sequence shows 72% homology with the rat exon 1B. The position of this region in the human IGF-I gene relative to the exons 1 and 2 is well conserved. To investigate whether the human counterpart of exon 1B is indeed expressed at the transcriptional level, we performed Northern blot analysis. Northern blot analysis of human However, poly(A)+ RNA isolated from adult liver and uterus leiomyoma, hybridized with a probe (position nt 5677823, Fig. 4b) specific for the exon 1B homologous region, failed to detect the presence of exon 1B specific mRNAs. We also tried to detect mRNA containing the exon lB-like region in human adult liver and uterus leiomyoma using the more sensitive PCR technique. Single-stranded cDNA was synthesized using the exon 2 specific RT primer (RT-2, Fig. 3). PCR analysis with exon 2 specific oligonucleotides (primers 1 and 2, Fig. 3) located 5’ to the RT-2 primer and with an oligonucleotide within the exon lB-like region (primer 1, Fig. 4b) yielded specific products of the expected lengths. These experiments reveal that IGF-I mRNA with an exon lB-like leader is present in human liver tissue and uterus leiomyoma. In analogy of the rat leader exons, this leader will henceforth be designated exon 1B in the human gene, whereas the former human exon 1, homologous to the rat exon lC, will now be referred to as exon 1C. An impression of the ratio of exon 1B and exon lC-containing mRNAs was obtained by PCR with adult liver single-stranded cDNA obtained by reverse transcription of IGF-I mRNA primed with the RT-2 oligonucleotide. Employing an exon 2 primer (primer 1, Fig. 3) and a primer specific for exon 1C (primer 12, Fig. 3) the expected product is 392 bp long. When the same exon 2 primer is combined with an oligonucleotide specific for exon 1B (primer 1, Fig. 4b) this results in a 236 bp long product. In order to obtain equal amounts of amplified exon 1C and exon 1B derived products, only 3% of input single-stranded cDNA was needed in the exon 1C directed PCR, compared to the exon lB-specific PCR (Fig. 5). This suggests that IGF-I mRNA containing this region of exon 1B constitutes only about 3% of the total IGF-I mRNA in human liver. It should be noted that this ratio is only a rough estimate, since it depends
on comparison of two PCRs using different primers. Exon IB-like leader sequences have also been identified in cDNA clones of guinea pig (281 nt, Bell et al., 1990) mouse (137 nt, Bell et al., 1986) and sheep (44 nt, Wong et al., 1989). The human exon 1B sequence is 80-90% homologous to these sequences. Since no cDNAs with exon 1C type leaders have been identified in guinea pig and mouse, it seems likely that the 1B type leader is more prominently expressed in these species than in the human tissues used in this study. Determination
of the transcription start site of leader
IB
To localize the transcription start site of mRNA with leader lB, PCR experiments were performed with single-stranded cDNA, synthesized by reverse transcription (RT) of adult liver poly(A)+ RNA, primed with the exon 2 specific primer (RT-2, Fig. 3). PCR with exon 2-specific oligonucleotides (primers 1 and 2, Fig. 3) located 5’ of the RT primer and a set of oligonucleotides located upstream of the 3’ end of exon 1B (primers l-10, Fig. 4b), revealed that a transcription start site of the human exon 1B is located between the positions of primer 4 (yielding a PCR product of the expected length) and primer 5 (no product was made). In control PCR experiments using cloned genomic DNA, products of the expected lengths were obtained with all primers used. This transcription start site, which is at a position comparable to the 5’ end of the rat class B cDNA (Roberts et al., 1987b), results in an exon 1B derived leader of 750 nucleotides. The total fraction of IGF-I mRNA with a leader which is not derived from exon 1C was determined in an RNase protection assay. Total RNA isolated from human liver, placenta, skin and kidney was hybridized to an anti-sense RNA probe encompassing the exons 2, 3 and parts of exons 1C and 5 (nt 1199-1616, probe c, Fig. 3). mRNAs with leader 1C should yield a protected fragment of 418 nt (exons lC, 2, 3 and 5, nt 1199-1616, Fig. 3) whereas IGF-I mRNAs with non-1C leaders should yield a protected fragment of 344 nt (exons 2, 3 and 5, nt 1273-1616, Fig. 3). The result of this protection experiment is shown in Fig. 6. It appears that non-1C leaders are also
122
=
1
kb
Fig. 4. (a) Schematic presentation of the 5’ part of the human IGF-I gene and mRNA leaders. The boxes represent exons 1C and 1B (the shaded regions indicate the shorter versions of these exons) and exon 2 (solid box). Four alternative IGF-I mRNA leaders, initiating at the four transcription start sites identified, are indicated below the genomic map. B = BamHI; E = EcoRI; H = HindIII; P2 = PvuII; X = XbaI. (h) Genomic nucleotide sequence encompassing exon 1B. Underlined sequences and numbered arrows represent the positions and 5’ to 3’ direction of primers used in PCR. The position of the anti-sense RNA probe is indicated by d. Regions of transcriptional initiation are marked by asterisks. Exon sequences are shown in uppercase, intron sequences in lowercase. The splice donor site of exon IB is indicated by an open circle. The n-frame translation initiation codon ( - 32”“‘) is indicated (+i+).
present in placenta, in addition to liver and uterus leiomyoma as shown before. The existence of non1C leaders in skin and kidney cannot be excluded, since the amount of IGF-I mRNAs is very low in these tissues. According to the signal intensity of the protected fragments, the relative amount of IGF-I mRNAs containing a non-1C leader is about 20% of total IGF-I mRNA in human liver, much higher than the 3% estimated for the exon 1B specific transcripts (Fig. 5). It should be realized that due to the choice of the primers, the PCR analysis described before only detects mRNA containing a IB leader longer than 163 nucleotides. If shorter versions of the 1B leader exist (as is the case for the 1C leader), these mRNAs would not have been detected. To check the existence of shorter versions of the 1B leader, we performed RNase protection assays using an anti-sense RNA probe (probe d,
Fig. 4b) corresponding to the exon IB sequence (nt 983-1205). Using poly(A)+ RNA from liver and uterus leiomyoma, three major protected fragments of about 65-75 nucleotides in both liver and uterus leiomyoma were observed (Fig. 7). Since there are no splice acceptor sequences present in this region, these data suggest a cluster of transcription start sites located 65-75 nucleotides upstream of the 3’ end of exon IB. Transcripts derived from the upstream transcription start site in exon 1B. as identified previously, account only for a very small fraction of the exon lB-containing IGF-I transcripts, since the corresponding protected fragment (223 nt), is hardly detectable in the RNase protection assay. The total amount of exon fB transcripts derived from the transcription start sites at positions about 750 nt and 75-65 nt upstream of the 3’ end of the exon lB, could well account for all of the IGF-I mRNAs (about 20%) containing a leader other than leader 1C. Concluding remarks We have now identified four different transcription start sites in the human IGF-I gene: two transcription start sites yielding mRNAs with exon IC specific leaders of approximately 1155 nt and 240 nt and two regions of transcriptional initiation corresponding to 750 nt and 65-75 nt long exon 1B derived leaders. Experiments are now in progress to establish promoter activity of the regions immediately upstream of the identified transcription start sites in the human IGF-I gene. No consensus TATA box or AT-rich regions are present within the first 50 nt upstream of the four transcription start sites determined, nor are these regions particularly GC-rich, suggesting that the IGF-I promoter regions belong to the class of TATA-less, non-GC-rich promoters. Differential tissue-specific and developmentdependent activation of the multiple promoters of the human IGF-I gene could well be an essential element in the complex mechanism of regulation of IGF-I expression. In this context it is important to note that in case of the rat IGF-I gene, the expression of the alternative leaders is differentially regulated by growth hormone (Lowe et al., 1987). The upstream transcription start sites of both human exons 1B and 1C lead to long 5’-untrans-
123
b
tgcatatttg
tataatttaa
acaaatacat
actgtatatg
gaaagcagaa
actttctaag
60
ccaacttttc
tgtttagaag
aggactttca
tgggcaaagt
ttggacttgg
120
ggttctgtgt
tataaaactc
tgattttata
ttcagtgtcg
tgaaqtccct
ttaqqtaaat 9+
180
ctqqctgctg
ctgtcagtgc
accgacttct
cgtttccgat
tgctggccgt
agttctagtt
240
tccattctca
gcaaaattat
atccttcaag
acttgtgttt
tttttcaatt
tgcaagcgct
300
tttaaqctqc
tqtcactqqc 8-s
tccaccgatt
caattgcctg
agggctcaat
tcataagacg
360
tctctqccac l=+
ttaqtqcaqc ~~ l ****
attcagttgc
tgctttcaaa
cacttcacca 6-B
ctacqacttt
420
CTTGTCAGGC
ACCTGATTCT
480
aqctqcttqc
lo+
sctagtcaa
GTAAGTGGCT
***+I******
CCAGGAGTTA
l ********t
AAGATACACC
5+
I-9
GCTGTTCCTG
AGATGCCAAC
ACATGCAGGC
CACCTTGCTT
TCAAAGAAAT
GACGTCACTG
540
TGCATACATA
CTATGTGATC
TAGCAGCTGG
AGTTTTTGTC
TCCTTACTTA
GGGGATCATA
600
AAAGAGGCTG
TGGAGCGTTA
TCTCTGCATT
AATTACAAGT
TAAGAAAATT
GTTTCCAAAT
660
GCACTTTTCA
TGCTGTGTAT
GCTGAACACT
AGCTCTTAAT
AAGTTGTTAC
720
TTTTTTCTGT
ACTTGAAGCA
GGAAGTGGTT
TCAGAAGTGG 3+ TGAGGGAGCT
GCGTGGTCTT
CACATGTAAT
780
TCAGTGGGTA
AAGGTGTCCT
GCCCAGAGGC
AGAGCTACAC
CAGCTGATTG
TACTCTGACT
840
CTCAAGGTAT 2-+ GGGTCACTAT
TTCCAAGTGA
GTGAGTCGGG
GGAAGGGAGT
AAGGGAGTGG
ACTGGAGCTT
900
CTGGGCTGCT
ACAATAGGCA
TACAATGGAA
ATAGGTGGCT
TGACTGGGGG
960
TTTGTTCATT
GTTTCAATGG
ACAAAAGGCA
1020
CTGGGTGTCC 1+ AGCACATGTT TTTAAGACTT
AAATGTAACT
AGATGCTTTC
1080
CAGTTTTCTA
TTCACATCGG
GAAAAGATTG
ACTTAAATCC
AGESGTGCAA
GTTTACCCAG
GCTCATAATA
GCATACCTGC
ACAAACCCCA
CCCACAAAGC
~TCATAATA
cccAccc~GA CCTGCTGTAA AAGACCTGGA A~AAACAAAA &EATTACA~
1200
CTACAOtgag -2 gcccagacat
tattttctta
tgactgttgc
cctcaaattt
tacagggcat
tttcattgtg
1260
ctggaatcaa
ttaatattcc
atttatctaa
gattaaaaaa
aaagaacttt
1320
aaaatttggg
tttgtgaatg
atttttgaga
aagtgttttc
tgattttttt
ttzttttttt
1380
ctcatgtctt
tctgattctt
cccttttttt
tctatcatat
ctttcctttc
tctctattga
1440
tttcttttgt
gtttggcaaa
ataaaaggcc
aaggaaataa
tgaacatatg
gaccacttgt
1500
ttcacacttt
aaactcctaa
gcaagttcgg
tattgttttc
atttgtggga
acataaaatt
1560
ctctggttct
ctgtgggtgt
actggactgt
tccctacact
aaaggaaaat
gcactagagg
1620
ttctgtctgt
ctag
l ****t****
1140
Fig. 4 (continued).
lated sequences. The function of 5’untranslated sequences is still unknown, but they may influence mRNA stability and/or translation efficiency. This has already been shown for several other genes. This effect may be due to secondary structure within the 5’-untranslated sequences alone or in combination with the 3’-untranslated sequence (e.g. inverted repeats), In a number of cases, specific proteins which bind to these regions have been implicated in the regulatory mechanism (Aziz and Munro, 1987; Hentze et al., 1989). Transla-
tion efficiency is known to be affected by the presence of short open reading frames upstream of the actual translation start site (Kozak, 1988, 1989; Herman, 1989). Such short open reading frames are present in both exons 1B and 1C. A further feature of interest is that in exon 1C shortly downstream of the upstream transcription start site a stretch of alternating purines and pyrimidines ([CA],,) is present with the capacity to form ZDNA. It must be emphasized, however, that the majority of the transcripts of both leader exons
124
1234 bases 392 236 -
396 298 221 f220 154
Fig. 5. Estimati[~n of the relative amounts of leader lB-containtng mRNA in comparison with leader lC-containing mRNA. Lane 1. PCR with exon 1 B and exon 2 specific primers; lane 2, PCR with exon 1C and exon 2 specific primers; lanes 3-5, PCR with exon 1C and exon 2 specific primers performed with a 5-, IO- and 30.fold reduced cDNA input, respectively.
102 80
”
”
75 initiate at the downstream start sites, which would minimize the lengths of the 5’-untranslated sequences.
1
2
3
4
5
67
bases
5271521-
Fig. 6. RNase protection analysis with anti-sense RNA probe c. Lanes 1-2, 30 fig total RNA from human liver (19- and 67.year-old male, respectively); lanes 3-5, 30 pg total RNA from human placenta, skin and kidney, respectively: lane 6, no RNA added: lane 7, no RNase added. Hybridization with 2x 105 cpm “P-labeled probe was carried out at 4S°C, followed by an RNase digestion using 0.7 U/ml RNase TI and 40 pg/ml RNase A for 60 min at 30°C. The exposure time was 4 days with two intensifying screens at - 70°C.
Fig. 7. RNase protection analysis with exon 1B specific antisense RNA probe d. Lane 1, no RNase added: lane 2, no RNA added; lanes 3-4, 20 pg poly(A) I RNA from human adult liver and uterus leiomyoma, respectively. Hybridization with 7 ~10’ cpm “P-labeled probe was carried out at 42°C. followed by an RNase digestion using 100 U/ml RNase Tl and 75 @g/ml RNase A for 30 min at 0°C. The exposure time was 3 days with an intensifying screen at - 7O’C.
Little is known about the biosynthesis of IGF-I. According to the sequences of exons 1C and 2. there are three potential in-frame translation initiation codons (-4X”” in exon lC, - 25”” and -22”“’ in exon 2). Data obtained by in vitro transcript~on/translation studies indicate that protein biosynthesis is initiated at the first in-frame methionine codon (Rotwein et al., 1987). Since exon 1C contains multiple in-frame stop codons located directly upstream of the -48”” at position 1210 (Fig. 3), the transcripts derived from either the upstream or downstream transcription
125
start site probably encode the same protein product. It is important to note that the long and short versions of the leader 1B both contain an in-frame translation initiation codon (- 32”“‘, Fig. 4b) and may therefore encode a signal peptide different from the one encoded by leader 1C. References Adamo, M.. Lowe, Jr.. W.L., LeRoith, D. and Roberts, Jr.. C.T. (1989) Endocrinology 124, 2737-2744. Aviv. H. and Leder, P. (1972) Proc. Natl. Acad. Sci. U.S.A. 69, 14081412. Aziz, U. and Munro, H.N. (1987) Proc. Nat]. Acad. Sci. U.S.A. X4. 847888482. Bell, G.I., Gerhard. D.S., Fong. N.M., Sanchez-Pescador. R. and Rall. L.B. (1985) Proc. Nat]. Acad. Sci. U.S.A. 82, 6450-6454. Bell. G.1.. Stempien, M.M.. Fong. N.M. and Rail. L.B. (1986) Nucleic Acids Res. 14. 7873-7882. Bell. (;.I.. Stempien, M.M.. Fang, N.M. and Seino. S. (1990) Nucleic Acids Res. 18, 4275. Bucci. C.. Mallucci. P., Roberts. Jr., C.T., Frunrio, R. and Bruni, C.B. (1989) Nucleic Acids Res. 17. 3596. Chirgwin, J.M.. Przybyla, A.E., MacDonald, R.J. and Rutter, W.J. (1979) Biochemistry 18, 5294-5299. Daughaday. W.H. and Rotwein, P. (1989) Endocr. Rev. 10. 68 -91. De Payter-Holthuizen. P., Van Schaik, F.M.A.. Verduijn. G.M., Van Ommen, G.J.B., Bouma, B.N.. Jansen, M. and Sussenbach, J.S. (1986) FEBS Lett. 195, 1799184. Gloudemans, T.. Prinsen, I., Van Unnik, J.A.M., Lips, C.J.M.. Den Otter, W. and Sussenbach. J.S. (1990) Cancer Res. 50. 6689-6695. Hentze, M.W.. Rouault. T.A., Harford. J.B. and Klausner, R.D. (1989) Science 244. 3577359.
Herman, R.C. (1989) Trends Biochem. Sci. 14. 219-222. Hoppener, J.W.M.. Mosselman, S., Roholl, P.J.M.. Lambrechts. C., Slebos, R.J.C., De Pagter-Holthuizen, P., Lips, C.J.M.. Jansz. H.S. and Sussenbach, J.S. (1988) EMBO J. 7, 13791385. Jansen, M.. Van Schaik, F.M.A.. Ricker, A.T., Bullock, B., Woods, D.E.. Gabbay, K.H., Nussbaum, A.L., Sussenbach, J.S. and Van den Brande. J.L. (1983) Nature 306, 6099611. Kato, H., Takenaka, A., Miura. Y.. Nishiyama, M. and Noguchi, T. (1990) Agric. Biol. Chem. 54. 2225-2230. Kozak, M. (1988) J. Cell Biol. 107, l-7. Kozak. M. (1989) J. Cell Biol. 108, 229-241. Le Bout. Y., Dreyer, D., Jaeger, F., Binoux, M. and Sondermeyer. P. (1986) FEBS Lett. 196. 108-112. Lowe. Jr.. W.L.. Roberts, Jr., C.T., Lasky, S.R. and LeRoith, D. (1987) Proc. Nat]. Acad. Sci. U.S.A. 84. 894668950. Roberts. Jr., C.T., Lasky. S.R.. Lowe, Jr., W.L. and LeRoith, D. (1987a) Biochem. Biophys. Res. Commun. 146. 11541159. Roberts. Jr., CT.. Lasky. S.R., Lowe. Jr., W.L., Seaman, W.T. and LeRoith, D. (1987b) Mol. Endocrinol. 1, 243-248. Rotwein. P. (1986) Proc. Natl. Acad. Sci. U.S.A. 83. 77-81. Rotwein. P., Pollock. K.M., Didier. D.K. and Krivi. G.G. (1986) J. Biol. Chem. 261, 482X-4832. Rotwein, P.. Folz. R.J. and Gordon, J.I. (1987) J. Biol. Chem. 262, 11807-11812. Shimatsu, A. and Rotwein, P. (1987a) J. Biol. Chem. 262, 789447900. Shimatsu, A. and Rotwein, P (1987b) Nucleic Acids Res. 15, 7196. Sussenbach, J.S. (1989) Prog. Growth Factor Res. 1, 33-48. Ullrich, A.. Berman, C.H., Dull. T.J.. Gray, A. and Lee, J.M. (1984) EMBO J. 3, 361-364. Wang, EA.. Ohlsen, SM., Godfredson, J.A., Dean, D.M. and Wheaton, J.E. (1989) DNA 8, 649-657.