Molecular and Cellular Endocrinology, 78 (1991) 115-125 0 1991 Elsevier Scientific Publishers Ireland, Ltd. 0303-7207/91/$03.50

MOLCEL

115

02517

Identification

of multiple transcription start sites in the human insulin-like growth factor-I gene

E. Jansen I, P.H. Steenbergh’,

D. LeRoith

2, C.T. Roberts,

Jr. * and J.S. Sussenbach’

’ Laboratory for Physiological Chemistry, State University of Utrecht, Urechr, The Netherlands, and ’

Section of Molecular

and Cellular

Physiology, Diabetes Branch, NIDDK, National Institutes of Healih, Bethesda, MD, U.S.A. (Received

Key words:

Insulin-like

growth

factor-I;

15 January

Transcription

1991; accepted

start sites; Leader

18 February

1991)

exon; Polymerase

chain reaction:

RNase

protection

Summary We have localized four transcription initiation sites in the human insulin-like growth factor-I (IGF-I) gene. Two transcription start sites were identified which result in a longer and shorter version of the leader derived from the known exon 1 of the IGF-I gene. Transcription starting at the upstream transcription initiation site results in a leader exon 1 of about 1155 nucleotides (nt), whereas transcription starting at the downstream initiation site results in a leader of about 240 nt. The majority of the transcripts initiate at the latter site. We further identified a region in the human IGF-I gene between exons 1 and 2, which shows a high degree of homology with the rat IGF-I leader exon 1B. By means of the polymerase chain reaction (PCR) we detected human IGF-I mRNAs containing this novel leader. The corresponding exon was designated exon 1B according to the rat IGF-I gene terminology. PCR and RNase protection analyses identified two transcription start sites within this alternative leader exon 1B. Transcription initiated at the most upstream start site results in a leader of about 750 nt, whereas transcription starting at the downstream site is heterogeneous, resulting in leaders of 65-75 nt long. No consensus TATA-box or AT-rich regions are present immediately upstream of all four transcription start sites identified, nor are these regions particularly GC-rich. The IGF-I gene is known to be expressed differentially in a tissue- and development-specific fashion. Differential activation of multiple promoters could very well play a crucial role in IGF-I gene regulation.

Address for correspondence: E. Jansen, Laboratory for Physiological Chemistry, Vondellaan 24a. 3521 GG Utrecht, The Netherlands. Tel.: + 31-30-880521; Fax: + 31-30-888443. As a consequence of novel leader exons being detected in various species, the nomenclature regarding the exons of the IGF-I gene has become somewhat confusing. During the 2nd International Symposium on IGFs in San Francisco (January 12-16. 1991) a proposal for a new nomenclature was put forward; however, this nomenclature has not yet been generally agreed upon and published. The exons now designated 1C and lB, etc., will be renamed according to the internationally accepted new nomenclature as soon as possible. The nucleotide sequence data of exon 1B reported in this paper will appear in the EMBL. GenBank and DDBJ Nucleotide Sequence Databases under the accession number: M 59812.

Introduction Insulin-like growth factor-l (IGF-I) is a 70amino acid basic poiyp~ptide which shows structural homology with insulin. IGF-I plays a fundamental role in postnatal mammalian growth where it mediates the growth-promoting effects of growth hormone. The liver is known to be the major site of IGF-I production, but many other tissues also produce IGF-I. where it may act as an autocrine or paracrine growth factor. cDNA analysis and gene mapping have shown that the human IGF-I gene consists of at least 5 exons. spanning a minimum of 90 kilobases (kb) of chromosomal DNA (Jansen et al.. 1983; Ullrich et al., 1984; Bell et al., 1985; De Pagter-Holthuizen et al., 1986; Le Bout et al., 1986: Rotwein et al.. 1986). Alternative splicing yields two classes of mRNA which encode IGF-I precursors with different carboxyl-terminal extensions. 1.1 kb IGF-Ia mRNA consists of the exons l-3 and 5 (Jansen et al., 1983), whereas 1.3 kb IGF-Ib mRNA is derived from exons l-4 (Rotwein, 1986). The third major mRNA species of 7.6 kb has been shown to contain exon 5 and no exon 4 sequences (Hiippener et al., 1988). In these mRNAs, the regions encoding the mature IGF-I peptide are flanked by regions coding for amino-terminal and carboxylterminal peptides. indicating that IGF-I is synthesized as a precursor molecule (prepro IGF-I). The structure of the human IGF-I gene shows many similarities to the rat IGF-I gene (Shimatsu and Rotwein, 1987a; Daughaday and Kotwein, 1989; Sussenbach, 1989). A difference between the two genes is the number of leader exons identified. Three different, alternatively used leaders, designated class A, B and C, have been described in the rat, based upon cDNA sequences (Roberts et al., 1987a). Recent evidence. however, suggests that only the class B and C sequences are present in rat mRNA and that the sequence unique to the class A cDNA is an artifact of the cloning procedure (Kato et al., 1990: Adamo, M. et al., in preparation). Both of these Ieaders contain upstream inframe translation initiation codons and may therefore encode different signal peptides. The expression of the different leaders is development-dependent and differentially regulated by growth hormone in a tissue-specific way (Adam0 et al.,

1989; Lowe et al., 1987). In rat liver. the major class C leader is found in about 75% of the IGF-I messengers. The rat exon 1B is located between exon 1C and exon 2 (Bucci et al.. 1989). Exon lB-specific cDNAs have also been reported in guinea pig, mouse and sheep (Bell et al.. 1986, 1990; Wong et al., 1989). In the human IGF-I mRNA to date only one type of leader has been found. This leader is highly homologous to the rat class C leader. The exact location of the 5’ end of the IGF-I mRNAs has not yet been determined with certainty for any species. Employing polymerase chain reaction (PCR) and RNase protection analysis we localized the 5’ ends of human IGF-I mRNAs. In addition to human adult liver, uterus leiomyoma tissue was used as a source of IGF-I mRNA. IGF-I expression in this type of tumor has been investigated in detail by us (Hoppener et al.. 1988; Gloudemans et al., 1990). No differences in mRNA lengths or nucleotide sequences have been detected in leiomyoma tissue in comparison to adult liver. The IGF-I mRNA levels are often higher in leiomyoma than in normal adult liver tissue. Materials and methods RNA isolation Total cellular RNA was isolated from human adult liver and uterus leiomyoma using the guanidine thiocyanate procedure (Chirgwin et al.. 1979). Poly(A)+ RNA was prepared by oligo(dT)cellulose affinity chromatography (Aviv and Leder, 1972) or the PolyATract mRNA isolation system (Promega).

Probe synthesis and purification. Anti-sense RNA probes were synthesized and labeled according to the following procedure: a typical reaction mixture contained transcription buffer (BRL), 10 mM dithiothreitol (DTT), 1 mM each of GTP, ATP and CTP, 0.3 mg/ml bovine serum albumin (BSA), 5 units RNasin (Promega), 1 pg linearized template, 10 units T3 or T7 RNA polymerase (BRL), 125 pmol (50 PCi) [oL-~~P]UTP (Amersham) and diethylpyrocarbonate (DEPC) treated water to 15 ~1. After 1 h at 37*C, spin-dialysis

117

through Sephadex G-50F in the presence of 0.1% sodium dodecyl sulfate (SDS) was performed and a sample was taken for Cerenkov counting. 20 pg of tRNA, 0.1 volume of 3 M sodium acetate and 2 volumes of 96% ethanol were added. After 30 min at - 70°C the precipitate was collected by centrifugation, dried in vacua and resuspended in sample buffer (95% deionized formamide and 0.1% each of bromophenol blue and xylene cyanol). The sample was heated at 100°C for 5 min, chilled on ice and electrophoresed on a 5% polyacrylamide/ 8.3 M urea denaturing gel. After autoradiography, full-length probe was cut out of the gel and eluted with 0.5 M NH,CI, 10 mM Mg-acetate, 1 mM EDTA. 0.1% SDS and 20 pg tRNA for 5-7 h at 37°C. H.ybridizution and RNase digestion. lo’-lo6 cpm of anti-sense RNA probe was added to total or poly (A)+ RNA. 0.1 volume of 3 M sodium acetate and 2 volumes of 96% ethanol were added. After 30 min at -70°C the precipitate was collected by centrifugation, dried in vacua and resuspended in 24 ~1 100% formamide. Then, 6 ~1 5 x hybridization buffer was added to final concentrations of 80 mM Pipes/pH 6.4, 400 mM NaCl and 2 mM EDTA. This mixture was heated at 100°C for 5 min and subsequently incubated at 42°C for about 16 h. After hybridization, 300 ~1 RNase digestion buffer (10 mM Tris-HCl/pH 7.5, 300 mM NaCl, 5 mM EDTA), containing varying RNase A and RNase Tl concentrations (Boehringer), was added. After 30 min incubation at 0°C or 37°C SDS to a final concentration of 0.6% and 50 pg proteinase K were added, followed by a 15 min incubation at 37°C. The mixture was extracted with an equal volume of phenol/chloroform/isoamyl alcohol (25:24:1). To the aqueous layer, 20 pg of tRNA, 0.1 volume of 3 M sodium acetate and 2 volumes of 96% ethanol were added. After 30 min at -70°C the precipitate was collected by centrifugation, dried in vacua and resuspended in sample buffer. The sample was heated at 100°C for 5 min, chilled on ice and electrophoresed on a 5% polyacrylamide/8.3 M urea denaturing gel. Densitometric scanning of autoradiographs was done with the LKB Ultroscan XL Enhanced Laser Densitometer. The data were processed with the Gel Scan XL 2.0 software (Pharmacia).

cDNA synthesis and PCR Complementary DNA was synthesized on human adult liver and uterus leiomyoma poly(A)+ RNA. A typical reaction mixture contained 3 pg poly(A)+ RNA, 12 pmol RT primer. 6 mM MgCl,, 40 mM KCl, 50 mM Tris-HCl/pH 8.15, 1 mM DTT, 250 PM dNTPs (Pharmacia), 16 pmol (50 PCi) [cu-j2P]dCTP (Amersham). 20 units RNasin (Promega) and DEPC-treated water to a volume of 50 ~1. This mixture was incubated for 3 min at 65°C and slowly cooled to room temperature. 20 units AMV reverse transcriptase (Pharmacia) were added and the mixture was incubated at 42°C for 2.5 h. The amount of cDNA synthesized was monitored by measuring 32P incorporation. 0.5% of the cDNA was heated for 3 min at lOO”C, chilled on ice and used as template for polymerase chain reaction (PCR) in a 50 ~1 reaction mixture containing 50 mM KCl, 10 mM Tris-HCl/pH 8.3, 1.5 mM MgCl?, 0.01% gelatine, 250 PM dNTPs, 0.5 PM of each of the two synthetic oligonucleotides (synthesized with a Pharmacia-LKB Gene Assembler Plus DNA synthesizer) and 2.5 units of Ampli-Taq DNA polymerase (Perkin-Elmer Cetus). The mixture was overlayed with 50 ~1 mineral oil (Sigma). 30 cycles of PCR were performed with a programmable thermal cycler (Bioexcellence). Each cycle consisted of 1 min at 94°C (denaturation). 2 min at 57°C (annealing) and 3 min at 72°C (polymerization). The product was examined by electrophoresis on a 1% agarose gel. Results and discussion Determination of the transcription start site To determine the position of the transcription start site of human IGF-I mRNA, we performed RNase protection experiments. Exon 1 is located within a genomic EcoRI fragment of 1300 basepairs (bp) (Rotwein et al., 1986). This fragment contains over 1000 nucleotides upstream and 38 nt downstream of the longest exon 1 sequence described thus far (Rotwein, 1986). This EcoRI fragment was cloned into the Bluescript pKS+-I vector and used as a template for the production of a radiolabeled anti-sense RNA probe of 1380 nt employing T3 RNA polymerase (probe a, Fig. 3). Total RNA

118

bases 1632 -

517 396 344 298 -

221/220 -

154 -

Fig. 1. RNase protection analysis with exon 1 an&sense RNA probe a. Lane 1, no RNase added; lane 2, no RNA added; lane 3, 50 pg total RNA from human adult liver; lane 4, 5 pg total RNA from human uterus leiomyoma. Hybridization with 4x 10’ cpm “P-labeled probe was carried out at 42”C, followed by an RNase digestion using 10 U/ml RNase Tl and 1 pg/ml RNase A for 30 min at 30°C. The exposure time was 2 days with an intensifying screen at - 70°C.

isolated from human adult liver and uterus leiomyoma was hybridized to the RNA probe and subsequently subjected to RNase A and Tl digestion. The products were characterized by gel electrophoresis. One major protected fragment of about 240 nt was detected (Fig. 1). Since there is no splice acceptor sequence found around this position, this result suggests that a transcription start site is located at about 240 nucleotides upstream of the 3’ end of exon 1 (at position nt 1030, Fig. 1). This corresponds well to the length of this exon predicted by different IGF-I cDNA clones (Le Bout et al., 1986; Rotwein, 1986). Further-

more, the lengths of the small IGF-I mRNAs of 1.1 kb and 1.3 kb can be accounted for by the addition of a 240 nt-long exon 1 to the lengths of the remaining exons of which they are known to be composed, assuming an average length of 200 residues for the poly(A) tail. We noticed that in the RNase protection assays longer protected fragments, in addition to the 240 nt fragment, were reproducibly present (Fig. 1). Since this phenomenon might be indicative of the presence of longer leaders, we cloned a 333 bp genomic AccI-EcoRI fragment into the Bluescript pKS+-I vector and prepared a radiolabeled antisense RNA probe using T7 RNA polymerase (probe b, Fig. 3). This probe contains only 60 nucleotides upstream of the transcription start site identified by the previous experiment. RNase protection analysis with this probe and total RNA isolated from human adult liver or uterus leiomyoma, resulted in two protected fragments (Fig. 2). The most prominent protected fragment again is the 240 nucleotides long product, confirming the transcription start site identified previously. The other protected fragment is 295 nucleotides long and corresponds to the distance between the AccI site (at position nt 977, Fig. 3) and the 3’ end of exon 1. This result confirms the presence in IGF-I mRNA of a longer version of exon 1, extending upstream from the major transcription start site, suggesting the existence of a second transcription start site. Densitometric scanning of the two protected bands showed that the ratio of mRNAs derived from the major transcription start site (around position nt 1030, Fig. 3) over transcripts derived from the upstream transcription start site is about 5 in adult liver and about 20 in uterus leiomyoma. In order to localize the upstream transcription start site, PCR experiments were performed with single-stranded cDNA, synthesized by reverse transcription (RT) of adult liver poly(A)+ RNA, primed with an exon 2 specific primer (RT-2, Fig. 3). PCR with exon 2-specific oligonucleotides (primers 1 and 2, Fig. 3) located 5’ of the RT primer and a set of oligonucleotides located upstream of the major transcription start site in exon 1 (primers 3-5, Fig. 3) revealed that the 5’ end of the longer version of exon 1 is located between the positions of primer 4 (yielding a PCR product of

119

base 396

298

295

240 2211220

154

75

Fig. 2. RNase protection analysis with exon 1 anti-sense RNA probe b. Lane 1, no RNase added; lane 2, no RNA added; lane 3, 50 pg total RNA from human adult liver; lane 4, 5 pg total RNA from human uterus leiomyoma. Hybridization with 2X lo5 cpm “P-labeled probe was carried out at 42°C followed by an RNase digestion using 10 U/ml RNase Tl and 40 ~g,/ml RNase A for 30 mitt at 30°C. The exposure time was 4 days with an intensifying screen at - 70°C.

the expected length) and primer 5 (no product was made). In control PCR experiments using cloned genomic DNA, products of the expected lengths were obtained with all primers used. In order to map the 5’ end of the longer version of exon 1 more precisely, we performed PCR with primers 6-9 (Fig. 3). PCR with primers 6 and 7 as well as with 6 and 8 yielded the expected products, but with the primers 6 and 5 no product was detected. The same result was obtained when primer 9 was used instead of primer 6. The controls with cloned genomic DNA were positive for all primers. Since there is no splice acceptor sequence present in this region, these data suggest that the upstream tran-

scription start site is located between the positions of primers 5 and 8, resulting in an exon 1 of 1142-1172 nt long. The above experiments show clearly that the longer version of the leader is spliced to exon 2 derived sequences. We also synthesized singlestranded cDNAs on adult liver poly(A)+ RNA, using RT primers in exon 4 and exon 5, respectively (RT-4, RT-5, Fig. 3). PCR analysis of these cDNAs employing primers 10 and 11, respectively (Fig. 3), in combination with a primer in exon 1 (primer 12, Fig. 3), yielded products of the expected lengths. This indicates that sequences deri;led from the upstream transcription start site in exon 1 are present in fully processed IGF-I mRNA. In the rat, the class C sequence is highly homologous to the human exon 1. The exact position of the transcription start site of the rat class C leader has not been defined precisely, although by primer extension analysis, the length of the 5’-untranslated region is estimated to be approximately 1100 nucleotides (Shimatsu and Rotwein, 1987a). It has been established that within the rat exon 1C alternative splicing may occur, due to the presence of a small 186 bp ‘part-time’ intron within the rat exon 1C (Shimatsu and Rotwein, 1987b). Our PCR data demonstrate that in human adult liver no such alternative splicing event occurs within exon 1, despite conserved splice junctions surrounding the homologous region (position nt 1005-1190, Fig. 3) in the human IGF-I gene. Identification of a novel leader exon In the rat, one of the leaders is derived from exon 1B. This exon is located between exon 1C and exon 2 in the rat IGF-I gene (Bucci et al., 1989). To establish whether a similar leader exon is present in the human IGF-I gene, we hybridized a radiolabeled rat exon 1B specific cDNA fragment (nt l-750, Roberts et al., 1987b) 1 ‘:h a Southern blot of various restriction endonuclease digests of a cosmid clone containing a 40 kb human genomic DNA insert encompassing exons 1 and 2 (De Pagter-Holthuizen et al., 1986). Strong hybridization was found with a region between these exons. The homologous region was mapped more precisely (Fig. 4a) and after subcloning into Ml3 sequencing vector, the nucleotide sequence

120 qaattctcaa a tgttgccggc

tggcaaaggc

aagtgtacat

tataaatagc

cagtcaccca

gttgagggat

*at******** CAAGAGG-

GCTGGTGTTA

TTTAGhhTAC

*****it**** ttqaatqaca TCATAACCCT 5+ ACAAhAATCA GAGAAAGAAA

ACACACTCTG

180

********** TTGCTAGCCA 8-a

aaaacagctg

gcttggacca

60 120

7-P GCACACAGAC

TCCCTCTGTC

ATACACACAC

ACACACACAC

ACACACACAC

ACACACACAC

240

AGAGGTTTGA

GTTATATGGA

AAA"TCAAAC

CCCAGGTACC

300

CTTCTCCCAG

AGTGGTGGGG

TGGGGAGGGG

AACAGGAAAA TTGTTTGCCC C6 43 ACAGTGACAG GCAGCCTAGT

AGAAGAATAA

360

AGAAAAATGT

TCTATTTCAG

TTGGGTTTTA

CAGCTCGGCA

TAGTCTTTGC

CTCATCGCAG

420

GAGhAAAAGT

ATGAGACAGT

GCCCTAAAGG

GACCAATCCA

ATGCTGCCTG

CCCCTCCATA

480

GGTTCTAGGA

AATGAGATCA

CACCCCTCAC

TTGGCAACTG

GGACAAGGGG

TCACCCGAGT

540

GCTGTCTTCC

ACCCCAGTCA

CTTCAGGGTT

AMATTGTAG

AGTTTGCTGG f9

600

AGAGGGTCTT

AATCTACTTT 3+ ATCGTCCTTT

CTTTCTTTTT

TTGTTTThhh

TAATGCATTT

GCTCTAGAAT

660

CTAAAATTGC

TCTCCCATCC

CCCATATTCC

TTTAATACTG

GTAAGGTGTA

TTAGCAGACG

720

TTTGTGTCTT

CATGCCCAGC

AGAAAGTTAh

TCAGAAAACA

GATCCTTATT

TTCTATGGCA

780

GCATAAGTAT

TTTAATGTCT

GCGAhCCCTG

TCACTAACAC

ACATTCTTTT

AAGGGAAAAA

840

AATGCTTCTG

TGCTCTAGTI:

TTAAAATGCA

AAGGTATGAT

GTTATTTGTC

ACCATGCCCA

900

AAAAAGTCCT

TACTCAATAA

CTTTGCCAGA

AGAGGGAGAG

AGAGAGAAGG

CAAATGTTCC

960

CCCAGCTGTT

TCCTGTSAC

AGTGTCTGTG

TTTTGTAGAT

AAATGTGAGG

ATTTTCTCTA

1020

=2+*** AATCCCTCTT

*** CTGTTTGCTA

AATCTCACTG

TCACTGCTAA

ATTCAGAGCA

GATAGAGCCT

1080

GCGChhTGGA

ATAAAGTCCT

CAAAATTGAA

ATGTGACATT

GCTCTCAACA

TCTCCCATCT

1140

CTCTGGATTT

CTTTTTGCTT

CATTATTCCT

GCTAACCAAT

TCATTTTCAG

ACTTTGTAS;

1200

TCAGAAGCAA

TGGGAAAAAT

CAGCAGTCTT

CCAACCCAAT

TATTTAAGTG

CTGCTTTTGT

1260

exon 2 GCACACCATG

TCCTCCTCGC

ATCTCTTCTA

CCTGGCGCTG

1320

CTCTGCCACG CGTGTGTGGA

GCTGGACCGG AGACGCTCTG C RT-2 GACAGGGGCT TTTATTTdi

CGGGGCTGAG sron 3 CAAGCCCACA

1380

CTGGTGGATG

gAAG"T T CCTTCACCAG Cl CTCTTCAGTT

GGGTATGGCT

CCAGCAGTCG

GAGGGCGCCT

CAGACAGGTA

TCGTGGATGA

GTGCTGCTTC

1500

CGGAGCTGTG

ATCTAAGGAG

GCTGGAGATG

TATTGCGCAC

CCCTCAAGCC

TGCCAAGTCA

1560

TCCGTGCCCA

GCGCCACACC

CAAGTAGAGG fll GAAGAGTGAC

GAGTGCAGGA

GATTT:%A=A$z TGCCTGCTCA

GCTCGCTCTG TTG~G~CG CCTGAGGAGT

1440

GACATGCCCA

AGACCCAGAA

&g:@AY

1620

AACAAGAACT f- RT-5 ATGCCACCGC AGGATCCTTT

ACAGGATGTA

GGAAGACCCT

1680

GCTCTGCACG

AGTTACCTGT

1740

TAFiACTTTGG

AACACCTACC

AAAAAATAAG

TTTGhTAACA

TTTAAAAGAT

GGGCGTTTCC

1800

CCCAATGAAA

TACACAAGTA

AACATTCCAA

CATTGTCTTT

AGGAGTGATT

TGCACCTTGC

1860

AAAhATGGTC

CTGGAGTTGG

TAGATTGCTG

TTGATCTTTT

ATCAATAATG

TTCTATAGAA

1920

AhGAAAAAAh

A

CATCTACCAA CAAGhhCACG c-10 ACACATCChG-----------

AhGTCTCAGA

GAAGGAAAGG R!e-4

TTGGCCAAAG

exon 4 TATCAGCCCC

f

60

Pig. 3. Nucleotide sequence of the human exon 1 region and positions of the transcription start sites. The positions of the anti-sense RNA probes are indicated by a, b and c. Underlined sequences and numbered arrows represent the positions and 5’ to 3’ direction of primers used in cDNA synthesis and PCR. Regions of transcriptional initiation are indicated by asterisks. Sequences which were not detected in mRNA are in lowercase. Only the first 70 nt of exon 4 are depicted. Sequence data are from Jansen et al., 1983: Rotwein, 1986; Rotwein et al., 1986.

121

was established (Fig. 4b). This sequence shows 72% homology with the rat exon 1B. The position of this region in the human IGF-I gene relative to the exons 1 and 2 is well conserved. To investigate whether the human counterpart of exon 1B is indeed expressed at the transcriptional level, we performed Northern blot analysis. Northern blot analysis of human However, poly(A)+ RNA isolated from adult liver and uterus leiomyoma, hybridized with a probe (position nt 5677823, Fig. 4b) specific for the exon 1B homologous region, failed to detect the presence of exon 1B specific mRNAs. We also tried to detect mRNA containing the exon lB-like region in human adult liver and uterus leiomyoma using the more sensitive PCR technique. Single-stranded cDNA was synthesized using the exon 2 specific RT primer (RT-2, Fig. 3). PCR analysis with exon 2 specific oligonucleotides (primers 1 and 2, Fig. 3) located 5’ to the RT-2 primer and with an oligonucleotide within the exon lB-like region (primer 1, Fig. 4b) yielded specific products of the expected lengths. These experiments reveal that IGF-I mRNA with an exon lB-like leader is present in human liver tissue and uterus leiomyoma. In analogy of the rat leader exons, this leader will henceforth be designated exon 1B in the human gene, whereas the former human exon 1, homologous to the rat exon lC, will now be referred to as exon 1C. An impression of the ratio of exon 1B and exon lC-containing mRNAs was obtained by PCR with adult liver single-stranded cDNA obtained by reverse transcription of IGF-I mRNA primed with the RT-2 oligonucleotide. Employing an exon 2 primer (primer 1, Fig. 3) and a primer specific for exon 1C (primer 12, Fig. 3) the expected product is 392 bp long. When the same exon 2 primer is combined with an oligonucleotide specific for exon 1B (primer 1, Fig. 4b) this results in a 236 bp long product. In order to obtain equal amounts of amplified exon 1C and exon 1B derived products, only 3% of input single-stranded cDNA was needed in the exon 1C directed PCR, compared to the exon lB-specific PCR (Fig. 5). This suggests that IGF-I mRNA containing this region of exon 1B constitutes only about 3% of the total IGF-I mRNA in human liver. It should be noted that this ratio is only a rough estimate, since it depends

on comparison of two PCRs using different primers. Exon IB-like leader sequences have also been identified in cDNA clones of guinea pig (281 nt, Bell et al., 1990) mouse (137 nt, Bell et al., 1986) and sheep (44 nt, Wong et al., 1989). The human exon 1B sequence is 80-90% homologous to these sequences. Since no cDNAs with exon 1C type leaders have been identified in guinea pig and mouse, it seems likely that the 1B type leader is more prominently expressed in these species than in the human tissues used in this study. Determination

of the transcription start site of leader

IB

To localize the transcription start site of mRNA with leader lB, PCR experiments were performed with single-stranded cDNA, synthesized by reverse transcription (RT) of adult liver poly(A)+ RNA, primed with the exon 2 specific primer (RT-2, Fig. 3). PCR with exon 2-specific oligonucleotides (primers 1 and 2, Fig. 3) located 5’ of the RT primer and a set of oligonucleotides located upstream of the 3’ end of exon 1B (primers l-10, Fig. 4b), revealed that a transcription start site of the human exon 1B is located between the positions of primer 4 (yielding a PCR product of the expected length) and primer 5 (no product was made). In control PCR experiments using cloned genomic DNA, products of the expected lengths were obtained with all primers used. This transcription start site, which is at a position comparable to the 5’ end of the rat class B cDNA (Roberts et al., 1987b), results in an exon 1B derived leader of 750 nucleotides. The total fraction of IGF-I mRNA with a leader which is not derived from exon 1C was determined in an RNase protection assay. Total RNA isolated from human liver, placenta, skin and kidney was hybridized to an anti-sense RNA probe encompassing the exons 2, 3 and parts of exons 1C and 5 (nt 1199-1616, probe c, Fig. 3). mRNAs with leader 1C should yield a protected fragment of 418 nt (exons lC, 2, 3 and 5, nt 1199-1616, Fig. 3) whereas IGF-I mRNAs with non-1C leaders should yield a protected fragment of 344 nt (exons 2, 3 and 5, nt 1273-1616, Fig. 3). The result of this protection experiment is shown in Fig. 6. It appears that non-1C leaders are also

122

=

1

kb

Fig. 4. (a) Schematic presentation of the 5’ part of the human IGF-I gene and mRNA leaders. The boxes represent exons 1C and 1B (the shaded regions indicate the shorter versions of these exons) and exon 2 (solid box). Four alternative IGF-I mRNA leaders, initiating at the four transcription start sites identified, are indicated below the genomic map. B = BamHI; E = EcoRI; H = HindIII; P2 = PvuII; X = XbaI. (h) Genomic nucleotide sequence encompassing exon 1B. Underlined sequences and numbered arrows represent the positions and 5’ to 3’ direction of primers used in PCR. The position of the anti-sense RNA probe is indicated by d. Regions of transcriptional initiation are marked by asterisks. Exon sequences are shown in uppercase, intron sequences in lowercase. The splice donor site of exon IB is indicated by an open circle. The n-frame translation initiation codon ( - 32”“‘) is indicated (+i+).

present in placenta, in addition to liver and uterus leiomyoma as shown before. The existence of non1C leaders in skin and kidney cannot be excluded, since the amount of IGF-I mRNAs is very low in these tissues. According to the signal intensity of the protected fragments, the relative amount of IGF-I mRNAs containing a non-1C leader is about 20% of total IGF-I mRNA in human liver, much higher than the 3% estimated for the exon 1B specific transcripts (Fig. 5). It should be realized that due to the choice of the primers, the PCR analysis described before only detects mRNA containing a IB leader longer than 163 nucleotides. If shorter versions of the 1B leader exist (as is the case for the 1C leader), these mRNAs would not have been detected. To check the existence of shorter versions of the 1B leader, we performed RNase protection assays using an anti-sense RNA probe (probe d,

Fig. 4b) corresponding to the exon IB sequence (nt 983-1205). Using poly(A)+ RNA from liver and uterus leiomyoma, three major protected fragments of about 65-75 nucleotides in both liver and uterus leiomyoma were observed (Fig. 7). Since there are no splice acceptor sequences present in this region, these data suggest a cluster of transcription start sites located 65-75 nucleotides upstream of the 3’ end of exon IB. Transcripts derived from the upstream transcription start site in exon 1B. as identified previously, account only for a very small fraction of the exon lB-containing IGF-I transcripts, since the corresponding protected fragment (223 nt), is hardly detectable in the RNase protection assay. The total amount of exon fB transcripts derived from the transcription start sites at positions about 750 nt and 75-65 nt upstream of the 3’ end of the exon lB, could well account for all of the IGF-I mRNAs (about 20%) containing a leader other than leader 1C. Concluding remarks We have now identified four different transcription start sites in the human IGF-I gene: two transcription start sites yielding mRNAs with exon IC specific leaders of approximately 1155 nt and 240 nt and two regions of transcriptional initiation corresponding to 750 nt and 65-75 nt long exon 1B derived leaders. Experiments are now in progress to establish promoter activity of the regions immediately upstream of the identified transcription start sites in the human IGF-I gene. No consensus TATA box or AT-rich regions are present within the first 50 nt upstream of the four transcription start sites determined, nor are these regions particularly GC-rich, suggesting that the IGF-I promoter regions belong to the class of TATA-less, non-GC-rich promoters. Differential tissue-specific and developmentdependent activation of the multiple promoters of the human IGF-I gene could well be an essential element in the complex mechanism of regulation of IGF-I expression. In this context it is important to note that in case of the rat IGF-I gene, the expression of the alternative leaders is differentially regulated by growth hormone (Lowe et al., 1987). The upstream transcription start sites of both human exons 1B and 1C lead to long 5’-untrans-

123

b

tgcatatttg

tataatttaa

acaaatacat

actgtatatg

gaaagcagaa

actttctaag

60

ccaacttttc

tgtttagaag

aggactttca

tgggcaaagt

ttggacttgg

120

ggttctgtgt

tataaaactc

tgattttata

ttcagtgtcg

tgaaqtccct

ttaqqtaaat 9+

180

ctqqctgctg

ctgtcagtgc

accgacttct

cgtttccgat

tgctggccgt

agttctagtt

240

tccattctca

gcaaaattat

atccttcaag

acttgtgttt

tttttcaatt

tgcaagcgct

300

tttaaqctqc

tqtcactqqc 8-s

tccaccgatt

caattgcctg

agggctcaat

tcataagacg

360

tctctqccac l=+

ttaqtqcaqc ~~ l ****

attcagttgc

tgctttcaaa

cacttcacca 6-B

ctacqacttt

420

CTTGTCAGGC

ACCTGATTCT

480

aqctqcttqc

lo+

sctagtcaa

GTAAGTGGCT

***+I******

CCAGGAGTTA

l ********t

AAGATACACC

5+

I-9

GCTGTTCCTG

AGATGCCAAC

ACATGCAGGC

CACCTTGCTT

TCAAAGAAAT

GACGTCACTG

540

TGCATACATA

CTATGTGATC

TAGCAGCTGG

AGTTTTTGTC

TCCTTACTTA

GGGGATCATA

600

AAAGAGGCTG

TGGAGCGTTA

TCTCTGCATT

AATTACAAGT

TAAGAAAATT

GTTTCCAAAT

660

GCACTTTTCA

TGCTGTGTAT

GCTGAACACT

AGCTCTTAAT

AAGTTGTTAC

720

TTTTTTCTGT

ACTTGAAGCA

GGAAGTGGTT

TCAGAAGTGG 3+ TGAGGGAGCT

GCGTGGTCTT

CACATGTAAT

780

TCAGTGGGTA

AAGGTGTCCT

GCCCAGAGGC

AGAGCTACAC

CAGCTGATTG

TACTCTGACT

840

CTCAAGGTAT 2-+ GGGTCACTAT

TTCCAAGTGA

GTGAGTCGGG

GGAAGGGAGT

AAGGGAGTGG

ACTGGAGCTT

900

CTGGGCTGCT

ACAATAGGCA

TACAATGGAA

ATAGGTGGCT

TGACTGGGGG

960

TTTGTTCATT

GTTTCAATGG

ACAAAAGGCA

1020

CTGGGTGTCC 1+ AGCACATGTT TTTAAGACTT

AAATGTAACT

AGATGCTTTC

1080

CAGTTTTCTA

TTCACATCGG

GAAAAGATTG

ACTTAAATCC

AGESGTGCAA

GTTTACCCAG

GCTCATAATA

GCATACCTGC

ACAAACCCCA

CCCACAAAGC

~TCATAATA

cccAccc~GA CCTGCTGTAA AAGACCTGGA A~AAACAAAA &EATTACA~

1200

CTACAOtgag -2 gcccagacat

tattttctta

tgactgttgc

cctcaaattt

tacagggcat

tttcattgtg

1260

ctggaatcaa

ttaatattcc

atttatctaa

gattaaaaaa

aaagaacttt

1320

aaaatttggg

tttgtgaatg

atttttgaga

aagtgttttc

tgattttttt

ttzttttttt

1380

ctcatgtctt

tctgattctt

cccttttttt

tctatcatat

ctttcctttc

tctctattga

1440

tttcttttgt

gtttggcaaa

ataaaaggcc

aaggaaataa

tgaacatatg

gaccacttgt

1500

ttcacacttt

aaactcctaa

gcaagttcgg

tattgttttc

atttgtggga

acataaaatt

1560

ctctggttct

ctgtgggtgt

actggactgt

tccctacact

aaaggaaaat

gcactagagg

1620

ttctgtctgt

ctag

l ****t****

1140

Fig. 4 (continued).

lated sequences. The function of 5’untranslated sequences is still unknown, but they may influence mRNA stability and/or translation efficiency. This has already been shown for several other genes. This effect may be due to secondary structure within the 5’-untranslated sequences alone or in combination with the 3’-untranslated sequence (e.g. inverted repeats), In a number of cases, specific proteins which bind to these regions have been implicated in the regulatory mechanism (Aziz and Munro, 1987; Hentze et al., 1989). Transla-

tion efficiency is known to be affected by the presence of short open reading frames upstream of the actual translation start site (Kozak, 1988, 1989; Herman, 1989). Such short open reading frames are present in both exons 1B and 1C. A further feature of interest is that in exon 1C shortly downstream of the upstream transcription start site a stretch of alternating purines and pyrimidines ([CA],,) is present with the capacity to form ZDNA. It must be emphasized, however, that the majority of the transcripts of both leader exons

124

1234 bases 392 236 -

396 298 221 f220 154

Fig. 5. Estimati[~n of the relative amounts of leader lB-containtng mRNA in comparison with leader lC-containing mRNA. Lane 1. PCR with exon 1 B and exon 2 specific primers; lane 2, PCR with exon 1C and exon 2 specific primers; lanes 3-5, PCR with exon 1C and exon 2 specific primers performed with a 5-, IO- and 30.fold reduced cDNA input, respectively.

102 80





75 initiate at the downstream start sites, which would minimize the lengths of the 5’-untranslated sequences.

1

2

3

4

5

67

bases

5271521-

Fig. 6. RNase protection analysis with anti-sense RNA probe c. Lanes 1-2, 30 fig total RNA from human liver (19- and 67.year-old male, respectively); lanes 3-5, 30 pg total RNA from human placenta, skin and kidney, respectively: lane 6, no RNA added: lane 7, no RNase added. Hybridization with 2x 105 cpm “P-labeled probe was carried out at 4S°C, followed by an RNase digestion using 0.7 U/ml RNase TI and 40 pg/ml RNase A for 60 min at 30°C. The exposure time was 4 days with two intensifying screens at - 70°C.

Fig. 7. RNase protection analysis with exon 1B specific antisense RNA probe d. Lane 1, no RNase added: lane 2, no RNA added; lanes 3-4, 20 pg poly(A) I RNA from human adult liver and uterus leiomyoma, respectively. Hybridization with 7 ~10’ cpm “P-labeled probe was carried out at 42°C. followed by an RNase digestion using 100 U/ml RNase Tl and 75 @g/ml RNase A for 30 min at 0°C. The exposure time was 3 days with an intensifying screen at - 7O’C.

Little is known about the biosynthesis of IGF-I. According to the sequences of exons 1C and 2. there are three potential in-frame translation initiation codons (-4X”” in exon lC, - 25”” and -22”“’ in exon 2). Data obtained by in vitro transcript~on/translation studies indicate that protein biosynthesis is initiated at the first in-frame methionine codon (Rotwein et al., 1987). Since exon 1C contains multiple in-frame stop codons located directly upstream of the -48”” at position 1210 (Fig. 3), the transcripts derived from either the upstream or downstream transcription

125

start site probably encode the same protein product. It is important to note that the long and short versions of the leader 1B both contain an in-frame translation initiation codon (- 32”“‘, Fig. 4b) and may therefore encode a signal peptide different from the one encoded by leader 1C. References Adamo, M.. Lowe, Jr.. W.L., LeRoith, D. and Roberts, Jr.. C.T. (1989) Endocrinology 124, 2737-2744. Aviv. H. and Leder, P. (1972) Proc. Natl. Acad. Sci. U.S.A. 69, 14081412. Aziz, U. and Munro, H.N. (1987) Proc. Nat]. Acad. Sci. U.S.A. X4. 847888482. Bell, G.I., Gerhard. D.S., Fong. N.M., Sanchez-Pescador. R. and Rall. L.B. (1985) Proc. Nat]. Acad. Sci. U.S.A. 82, 6450-6454. Bell. G.1.. Stempien, M.M.. Fong. N.M. and Rail. L.B. (1986) Nucleic Acids Res. 14. 7873-7882. Bell. (;.I.. Stempien, M.M.. Fang, N.M. and Seino. S. (1990) Nucleic Acids Res. 18, 4275. Bucci. C.. Mallucci. P., Roberts. Jr., C.T., Frunrio, R. and Bruni, C.B. (1989) Nucleic Acids Res. 17. 3596. Chirgwin, J.M.. Przybyla, A.E., MacDonald, R.J. and Rutter, W.J. (1979) Biochemistry 18, 5294-5299. Daughaday. W.H. and Rotwein, P. (1989) Endocr. Rev. 10. 68 -91. De Payter-Holthuizen. P., Van Schaik, F.M.A.. Verduijn. G.M., Van Ommen, G.J.B., Bouma, B.N.. Jansen, M. and Sussenbach, J.S. (1986) FEBS Lett. 195, 1799184. Gloudemans, T.. Prinsen, I., Van Unnik, J.A.M., Lips, C.J.M.. Den Otter, W. and Sussenbach. J.S. (1990) Cancer Res. 50. 6689-6695. Hentze, M.W.. Rouault. T.A., Harford. J.B. and Klausner, R.D. (1989) Science 244. 3577359.

Herman, R.C. (1989) Trends Biochem. Sci. 14. 219-222. Hoppener, J.W.M.. Mosselman, S., Roholl, P.J.M.. Lambrechts. C., Slebos, R.J.C., De Pagter-Holthuizen, P., Lips, C.J.M.. Jansz. H.S. and Sussenbach, J.S. (1988) EMBO J. 7, 13791385. Jansen, M.. Van Schaik, F.M.A.. Ricker, A.T., Bullock, B., Woods, D.E.. Gabbay, K.H., Nussbaum, A.L., Sussenbach, J.S. and Van den Brande. J.L. (1983) Nature 306, 6099611. Kato, H., Takenaka, A., Miura. Y.. Nishiyama, M. and Noguchi, T. (1990) Agric. Biol. Chem. 54. 2225-2230. Kozak, M. (1988) J. Cell Biol. 107, l-7. Kozak. M. (1989) J. Cell Biol. 108, 229-241. Le Bout. Y., Dreyer, D., Jaeger, F., Binoux, M. and Sondermeyer. P. (1986) FEBS Lett. 196. 108-112. Lowe. Jr.. W.L.. Roberts, Jr., C.T., Lasky, S.R. and LeRoith, D. (1987) Proc. Nat]. Acad. Sci. U.S.A. 84. 894668950. Roberts. Jr., C.T., Lasky. S.R.. Lowe, Jr., W.L. and LeRoith, D. (1987a) Biochem. Biophys. Res. Commun. 146. 11541159. Roberts. Jr., CT.. Lasky. S.R., Lowe. Jr., W.L., Seaman, W.T. and LeRoith, D. (1987b) Mol. Endocrinol. 1, 243-248. Rotwein. P. (1986) Proc. Natl. Acad. Sci. U.S.A. 83. 77-81. Rotwein. P., Pollock. K.M., Didier. D.K. and Krivi. G.G. (1986) J. Biol. Chem. 261, 482X-4832. Rotwein, P.. Folz. R.J. and Gordon, J.I. (1987) J. Biol. Chem. 262, 11807-11812. Shimatsu, A. and Rotwein, P. (1987a) J. Biol. Chem. 262, 789447900. Shimatsu, A. and Rotwein, P (1987b) Nucleic Acids Res. 15, 7196. Sussenbach, J.S. (1989) Prog. Growth Factor Res. 1, 33-48. Ullrich, A.. Berman, C.H., Dull. T.J.. Gray, A. and Lee, J.M. (1984) EMBO J. 3, 361-364. Wang, EA.. Ohlsen, SM., Godfredson, J.A., Dean, D.M. and Wheaton, J.E. (1989) DNA 8, 649-657.

Identification of multiple transcription start sites in the human insulin-like growth factor-I gene.

We have localized four transcription initiation sites in the human insulin-like growth factor-I (IGF-I) gene. Two transcription start sites were ident...
1MB Sizes 0 Downloads 0 Views