GENOMICS

12,459-464

(19%)

Genomic Structure and Mapping of the Chromosomal Gene for Transcobalamin I (TCNI): Comparison to Human Intrinsic Factor JENNIFER JOHNSTON, Department

TERESA YANG-FENG,

AND NANCY BERLINER’

of Internal Medicine, Section of Hematology and the Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06510 Received

July 31, 1991;

revised

Transcobalamin I (TCI) is a vitamin B12 binding protein that is found in the secondary granules of mature neutrophils. The expressionof the genefor TCI (TCNl) within neutrophils hasbeen shown to be restricted to the later stagesof myeloid development and can therefore be usedasa marker for granulocyte differentiation. To study transcriptional control regionsimportant in late stagemyeloid generegulation the genomic sequencefor TCNl hasbeen cloned. Cloneswere isolated from a genomiclibrary constructed in Charon 4A using homologous full-length cDNA probes. Southern blot analysis showedthe geneto reside on five EcoRI fragments totaling 14 kb in length. Two overlapping phage clones, containing the entire 14 kb, were isolated and the introns and exons were mapped using Southern blotting and dideoxy sequencing of subclones.The cDNA is represented by nine exons contained within 12 kb of genomic DNA. Comparison of the genomic structure to gastric intrinsic factor (GIF), another vitamin B12 binding protein, revealed a strikingly similar intron/ exon structure, with several positionally conserved splice sites. The genewas localized to chromosome11 using in situ hybridization. 0 1992 Academic Press, he.

INTRODUCTION Transcobalamin I (TCNl) is transcribed late in myeloid differentiation and the TCI protein is contained within neutrophil secondary granules (Kane and Peters, 1975). Neutrophil TCNl expression appears to be coordinately regulated with that of the other secondary granule protein genes, lactoferrin, collagenase, and gelatinase. Because genes for all of these proteins are transcribed late in myeloid differentiation they can be used as markers for granulocytic maturation. HL60, an acute leukemia cell line, does not express TCNl or other secondary granule protein genes even after induction toward myeloid differentiation (Fontana et aZ., 1980; Rado et al., 1984; Johnston et al., 1989; Devarajan et al., 1991). It is thought that common regulatory signals within the promoters of these genes may be involved in loss of their activity. We have therefore isolated genomic clones of ’ To whom correspondence should be addressed at Department of Internal Medicine, Section of Hematology WWW423a, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06510. 459

October

14. 1991

TCNl to study the promoter looking for specific sequences responsible for late-stage gene regulation in myeloid cells. TCI is a member of the R-binder family of proteins present in a number of secretions and tissues. R-binders function as vitamin B12 binding proteins in vivo, although the importance of this function remains unknown (Grasbeck, 1969). They are all identical in amino acid composition and are distinguished by their differential glycosylation (Burger and Allen, 1974). Intrinsic factor (GIF), another vitamin B12 binding protein, mediates uptake of vitamin B12 from the ileum (Wilson and Strauss, 1959). Two areas of homology between TCNl and GIF have recently been identified (Johnston et aZ., 1989; Hewitt et aZ., 1991). We report here the genomic cloning of TCNl including mapping of the exons and sequencing the exon/intron boundaries. Comparison of the genomic structure of TCNl and GIF shows striking similarity of intron/ exon structure, suggesting that the genes arose by a gene duplication. Southern blot analysis has been included to show that one gene likely accounts for all R-binders. Finally, in situ hybridization that localizes TCNl to the centromere of chromosome 11 is presented. MATERIALS

AND METHODS

Isolation of genomic clones. Cloning and characterization of a 1.6kb cDNA probe for human TCNl has been previously described (Johnston et al., 1989). This probe was used to screen a previously described Charon 4A human genomic library (Lawn et al., 1978). Phage were plated at 5 X lo4 pfu/plate. Duplicate nitrocellulose filters were lifted and hybridized at 42°C overnight to nick-translated cDNA probes. Filters were washed at 55°C in 0.1% sodium dodecyl sulfate and 15 n&f Na citrate and exposed overnight at -70°C. Positive plaques were purified by secondary and tertiary screening. Genomic fragments were subcloned into plasmid vectors and analyzed by restriction enzyme mapping and DNA sequencing using the dideoxy chain termination method @anger et al., 1977). Southern blot analysis. High-molecular-weight DNA was prepared from peripheral blood by previously described methods (Berliner et al., 1985). DNA was digested with EcoRI, at 37°C for 6 h, size fractionated by agarose gel electrophoresis at 60 V in TAR buffer, and transferred to nitrocellulose filters by the Southern method (Southern, 1975). Filters were hybridized overnight to ‘zP-labeled, nick-translated probes of the TCNl gene, washed at 55°C in 0.1% sodium doo&%7543/92 $3.00 Copyright 0 1992 hy Academic Press, Inc. All rights of reproduction in any form reserved.

460

JOHNSTON,

YANG-FENG,

AND

BERLINER

kinased using -yATP andpolynucleotide kinase. 5 X lo4 counts of end-labeled probe was heated for 5 min at 95°C with 10 pg of total RNA from CML, HL60 cells, or yeast tRNA in hybridization buffer. The samples were then hybridized overnight at 48’C. Hybridization buffer consisted of 80% formamide, 1 m&f EDTA, 0.4 M NaCl, and 40 mM PIPES, pH 6.4. The hybrids were digested with Sl nuclease, and the reaction products were.precipitated with ethanol. The pellets were resuspended in 10 ~1 TE. Loading dye was added, consisting of 95% formamide and 20 n&f EDTA, samples were heated to 95” and analyzed on 8% polyacrylamide gels. Chromosome localization. A cDNA (3’ end) probe was 3H-labeled to a specificity of 2.3 X lo7 cpm/pg and used for hybridization at a concentration of 25 rig/ml. In situ hybridization to chromosome spreads prepared from a normal individual was carried out as previously described (Yang-Feng et al., 1985). Emulsion exposures took place for 10 to 14 days. Chromosomes were G-banded using Wright’s stain and analyzed for silver-grain localization.

RESULTS =2.1 “.

FIG. 1. Southern blot of DNA gested with EcoRI and probed with of the bands is indicated. decyl sulfate -7OT.

and 15 n&f

Na citrate,

-1.8 from human peripheral full-length TCNl cDNA.

and exposed

to film

blood diThe size

overnight

at

Primer extension. Reactions were performed using two antisense oligonucleotides corresponding to sequence in the 5’ end of the cDNA and separated by approximately 200 bp. Oligonucleotides were labeled with [y-32P]ATP using polynucleotide kinase. Labeled oligonucleotides were heated with 10 pg of total RNA from CML, normal bone marrow, or HL60 cells in the presence of 0.1 m&4 EDTA at 65’C for 5 min. Reverse transcriptase buffer was then added and the reaction was heated to 65°C for 15 min, followed by slow cooling to room temperature, or 42°C over l-2 h. dNTPs were added to 2 mM, and primer extension was performed with AMV reverse transcriptase for 10 min at room temperature, 10 min at 3’7°C and 60 min at 42°C; alternatively the reaction was not allowed to cool past 42°C and was incubated for 60 min at 42°C. Reactions were stopped with 1 pl 0.5 M EDTA, and the reaction products were precipitated with ethanol. The pellets were resuspended in 10 al TE. Loading dye was added, consisting of 95% formamide and 20 n&f EDTA; samples were heated to 95” and analyzed on 8% polyacrylamide gels. SI analysis. A DNA probe for Sl analysis was prepared from a genomic subclone containing the first exon of the TCNl gene. To prepare the probe, a PCR reaction was performed using a sense primer to the polylinker region of the genomic subclone and an antisense oligonucleotide at the 5’ end of the TCNl cDNA, which had been S

-

4

Ii

H

AND

DISCUSSION

Southern analysis of the TCNl gene. Southern blot analysis of genomic DNA using the full-length TCNl cDNA as a probe was used to estimate the approximate size of the gene to 14 kb (Fig. 1). Low-stringency Southern analysis was used to determine whether the Rbinder family of proteins is represented by a single gene as suggested by their nearly identical amino acid compositions. At low stringency, if the R-binder proteins were represented by more than one homologous gene, bands other than those accounted for by the TCNl genomic map would be expected. Results identical to those of the high-stringency Southern were observed (data not shown), suggesting that all R-binder proteins are in fact produced from a single gene. Genomic cloning. Screening a Charon 4A genomic library yielded two clones that were determined to be overlapping by restriction digest analysis. Ordering the clones was accomplished by restriction digests and probing with portions of the cDNA. Clone TCNlA contained five exons including nucleotides 1 to 822 of the TCNl cDNA as well as 6 kb of promoter sequence. Clone TCNlB contained eight exons including nucleotides 155 to 1537 as well as flanking 3’ sequence. The limits of the phage clones are diagramed in Fig. 2. Genomic organization. Figure 2 shows a restriction map of the genomic region surrounding the TCNl gene. The introns and exons were positioned using Southern

A

lkb

TCNlB

1

1 TUT1

A

+

FIG. 2. Genomic organization of the human TCNl gene. Boxes identify the exon locations. Restriction enzyme sites are depicted by the vertical lines. Abbreviations are as follows: A, AuaI; B, BarnHI; E, EcoRI; H, HindIII; P, PuuII; Ps, PstI; S, Sac. The extent of the phage clones is depicted by the lines under the map.

CHROMOSOMAL

GENE

FOR

TRANSCOBALAMIN

TABLE Intron-Exon Exon Exon

No.

Length (bp)

79 +5’ UT 180 141 156 191 190 184 119

Boundaries

461

I (TCNl)

1

of the Transcobalamin

I Gene

Splice donor description

Intron No. and Length

ATTTGTG gtgagt Ile Cys G AGCAGAT gtaagt Ser Arg L AATATGGgtaaga Am MetG TCAGTAG gtgagt Ser Val A ATG CAG gtaagt Met Gln GCTTCAG gtataa Ala SerG TTTGGgtaggt Phe Gl AGC CAAG gtaagg Ser Gln G

1 (2.1 kb) 2 (1.6 kb) 3 (0.9 kb) 4 (2.1 kb) 5 (3.2 kb) 6 (1.3 kb) 7 (1.3 kb) 8 (160 bp)

Splice acceptor description ctgccacag

AGGTA lu Val tcttttcag TGTCA eu Ser caaaactag AAGCA 1uAla tcctgacag ATACT spThr cctcattag GCC CTC Ala Leu ctctgccag GTAAC 1yAsn cacccacag TTTC y Phe ccttttcag GAGCT 1yAla

59 +3’ UT

blot analysis as well as dideoxy sequencing. All of the EcoRI genomic fragments identified by Southern analysis were consistent with the map. Overlapping subclones were mapped to ensure that all intron sequences were accounted for within the EcoRI fragments. The TCNl gene contains nine exons and eight introns and spans approximately 12 kb, as shown in Fig. 2. The exons range in size from 59 to 191 bp and the introns range in size from 160 to 3.2 kb. Boundaries were determined by comparing the sequence of the genomic DNA to that of the cDNA. All boundaries agreed with the AG-GT rule, and the 3’ splice sites were all pyrimidine rich (Table 1) (Breathnach and Chambon, 1981). The sequences of the exons agreed with previously reported cDNA sequence except at basepairs 431 and 432 of the cDNA (Johnston et aZ., 1989); rechecking the cDNA sequence revealed the true sequence to be that found in the genomic clone changing a CT to TC. Chromosomal localization. In situ hybridization of the TCNl probe to human chromosomes resulted in specific labeling on the long arm of chromosome 11, at bands qll+q12 (Fig. 3). Out of 75 metaphase cells (33%), 25 exhibited specific labeling on one or both chromosomes 11. Grains over this llqll+q12 region represented 15.5% (25/161) of all chromosomal label and no other site was labeled above background. GIF has also been localized to chromosome 11 using mouse/human somatic cell hybrids (Hewitt et al., 1991). An individual has been reported to have a combined congenital deficiency of both of these proteins (Zittoun et al., 1988). This suggests that the two genes may lie close to one another on the chromosome. Further confirmation of this would require mapping of GIF to a specific region on 11. Sequence analysis of the 5’ end of TCNl. Using primer extension and Sl analysis the primary TCNl

transcript has been predicted to be 53 bp shorter than the reported cDNA (Fig. 4A). Two oligonucleotides were used for primer extension located at 208 or 404 bp in the cDNA. The major primer extension products were approximately 150 and 350 bp, respectively. This suggests that the major transcription start site for TCNl actually occurs within the cDNA. To check for the possibility of secondary structure causing reverse transcription to stall at this point, Sl analysis was performed using a 1-kb probe containing the first 94 bp of the cDNA. The Sl product was 40 bp, predicting a transcription start site in agreement with the primer extension (data not shown). A minor primer extension product corresponding with the length of the reported cDNA suggests that transcription from that point also occurs, but at lower frequency (Fig. 4B). In the region directly 5’ of the predicted transcription start there are consensus TATA and CAAT sequences at approximately -30 and -100, respectively. The length of the minor product corresponds to the length of our cDNA, and the entire sequence of our cDNA is contiguous in the genomic clone. Therefore, we conclude that the minor product is most likely a result of a second transcription start site 60 bp 5’ of the major transcript initiation site, rather than the result of a splicing event. This conclusion was further verified by the Sl findings, which also revealed a minor protected fragment of 90-100 bp. Current understanding of secondary granule protein genes suggest that their expression is coordinately regulated. Consequently, the sequence of 1 kb of DNA 5’ of the transcript initiation site was obtained (Fig. 5) and searched by computer for consensus sequences known to be present in the promoter regions of other secondary granule genes. These elements include GF-1 binding sites, SP-1 binding sites, and estrogen response elements (Johnston et al., submitted). The program used

462

JOHNSTON,

YANG-FENG,

AND

BERLINER

aattcaattgttttgatttttagatcccacaaataagtgagaacatgcaatgtatgtctttctgtgatcctggcttatttcacttaacat

-916

-915

aattatctccagttccatccattttgtcacaactgacaggatttcattcttttatatggctgaatagtactccattgtgtatatgtacat

-826

-625

ttgctgtgttcatttatctgttgatagacttagcttqcttccaaatcttggttattgtaAacagtgctgcaacaaacatgggagtgca

-736

-735

gatatctcttcaatatactggtcttctttcttttgggtatatagccagcagtgggattgctggatcatatggtagctctatgtttagttt

-646

-645

tttgaggaagctccaaactgttctccatagtgattgtgattgtactaatttacattcccaccaacagtgtacaaggcttcccttttctctacatcc

-556

-555

gtgccagcatttgttattgcctgtgttttgaataaagccatttttagctgagatgagaggatatctcattgtagctttgatttccattta

-466

-465

tctgatgatcaatgatgttaagcaccttttcaagcactttttcatatgcctgcttcccatttggatgtcttcttttacttccatttttat

-376

-375

atgttgtatgtgtttgcatgatcaaatctttatcaggaaacataaaacataaagtgaaacggaaaaaaaaggaaagattatagg~aa

-286

-285

cagtcatttgcctaatatatcttcactgcacggaaggcaaataatagatataaacccgaagggtttaggacaggagacggaaaggaaagt

-196

-195

tattgccttattctccctaaccacatccttcaagtcttcttccttatttattcaccacatggacaaatacacttgaccacacctcctgac

-106

-105

attgct~atttgcttgagtgaaaagaggctgaggcaacctgaaggaggagctctcattaccttctgcccatcacttaataaatagcc

-1005

-16 A

-15

76

agccahttcatcaacATTCTGGTACACTGTTGGAGAGATGAGACAGTCACACCAGCTGCCCCTAGTGGGGCTCTTACTGTTTTCTTTTAT MRQSAQLPLVGLLLPSFI

TCCAAGCCAACTATGCGAGATTTGTG P S Q L C E I

75

101 C

FIG. 3. Nucleotide sequence of the promoter region and the first exon of TCI. The putative primary the start of the cDNA is marked by an arrow. The associated TATA box and CAAT box are underlined. at the putative primary transcription start site.

picked out sequences within two mismatches of the consensus. Designated sequences were further analyzed visually to determine if the mismatches were consistent with functional control sequences or if they had been shown in other systems to make the sites nonfunctional. No SP-1 sites or estrogen response elements were found; a single GF-1 site that had the sequence GATA intact M

1

2

3

B.

M

1

7

transcription Numbering

start site is capitalized and of the exon sequence begins

was found. Binding studies would be necessary to determine the potential significance of this site. Comparison of TCNl genomic organization to intrinsic of predicted amino acid sequence factor. Comparison

for TCNl and GIF reveals low overall homology, but several areas of highly conserved sequence. We have previously hypothesized that these areas held in common

3 15

Genomic structure and mapping of the chromosomal gene for transcobalamin I (TCN1): comparison to human intrinsic factor.

Transcobalamin I (TCI) is a vitamin B12 binding protein that is found in the secondary granules of mature neutrophils. The expression of the gene for ...
1MB Sizes 0 Downloads 0 Views