CENOMICS
(1991)
11,981-990
Genomic Cloning, Complete Nucleotide Sequence, and Structure of the Human Gene Encoding the Major Intrinsic Protein (MIP) of the Lens M. MICHELE PISANO’ AND ANA B. CHEPELINSKY’ Laboratory
of Molecular and Developmental Biology, National Eye Institute, National Institutes of Health, Bethesda, Maryland 20892 Received
May 21, 1991;
Academic
July 20, 1991
cells, proper membrane biosynthesis and physiology are of upmost importance in maintaining the transparent state of the lens (Alcala and Maisel, 1985). The crystallins, or major water-soluble proteins of the lens, account for 80-90% of the dry weight of the tissue (Wistow and Piatigorsky, 1988) while the remaining lo-20% of lens dry weight is composed of various cytoskeletal and membrane proteins, collectively referred to as the water-insoluble fraction (Alcala and Maisel, 1985). Major intrinsic protein (MIP, also called MP26)3 the principal constituent of this fraction, constitutes up to 80% of total lens membrane protein (Broekhuyse et aZ., 1976). While significant inroads have been made in delineating the molecular basis of crystallin diversity and differential expression (see Wistow and Piatigorsky, 1988; Piatigorsky and Zelenka, 1991, for reviews), far fewer insights regarding the molecular biological basis of lens membrane protein structure and function have been gained. Thus, we have initiated a molecular characterization of the MIP gene, which encodes the predominant intrinsic protein of the lens fiber cell membrane. MIP has a molecular mass of 28,200 Da based on its deduced amino acid sequence (Gorin et al., 1984) and appears to be conserved throughout evolution as judged by immunological cross-reactivity and peptide mapping (Bouman and Broekhuyse, 1981; Takemoto et aZ., 1981). This integral membrane protein not only appears to be lensspecific, but is restricted to lens fiber cells (Paul and Goodenough, 1983; Watanabe et al, 1989; Yancey et CL, 1988). Both MIP mRNA and protein are first detectable in the primary lens fibers at the early lens vesicle stage and are subsequently found in secondary lens fibers as they differentiate from epithelial cells (Yancey et al., 1988; Watanabe et al., 1989). Hence,
Major intrinsic protein (MIP, also called MP26) is the predominant fiber cell membrane protein of the ocular lens. MIP hasbeensuggestedto play a role in cell-cell communication in the lens. Its expression is tissue-specificand developmentally regulated. We have isolated and characterized the human gene encoding MIP and report here its genomic structure and entire nucleotide sequence. The gene is 3.6 kb, contains four exons separated by introns ranging in size from 0.4 to 1.6 kb, and is present in single copy per haploid human genome.Primer extension of human lensRNA indicates that transcription of the geneinitiates from a single site 26 nt downstream from the TATA box. Three complete Ah repetitive elementsare found in tandem in the 6’-flanking region of the gene, and a single complete Alu sequenceis present in the third intron. The interspeciescomparisonsof the MIP gene coding sequence and homologiesto other membersof a putative transmembrane channel protein superfamily are also discussed. Q 1991
revised
Press, Inc.
INTRODUCTION The vertebrate ocular lens constitutes an excellent paradigm of developmentally regulated and tissuespecific gene expression (see McAvoy, 1980, Wistow and Piatigorsky, 1988, for reviews). The lens, which is derived from the embryonic surface ectoderm, is composed of an anterior layer of epithelial cells which continually differentiate into fiber cells at the equatorial epithelium (McAvoy, 1980). Since the differentiation of epithelial cells into fiber cells entails extensive cell elongation with a resultant lOOO-fold increase in the elaboration of new plasma membrane by elongating
’ Present address: Department of Anatomy, Jefferson Medical College, 1020 Locust Street, Philadelphia, PA 19107. * To whom correspondence should be addressed at National Institutes of Health, Building 6, Room 211, Bethesda, MD 20892.
3 Abbreviations tide(s).
used:
MIP,
major
intrinsic
protein;
981
nt, nucleo-
OSSS-7543/91$3.00 All
Copyright 0 1991 rights of reproduction
by
Academic in any
form
Press, Inc. reserved.
982
PISANO
AND
CHEPELINSKY
A. Clone: HMIPAlG-1
5.8 Sal I
8gl II Ndel
Bgl II
Nde I
kb
8gl II
Sal I
B. Probe: -
+
C (bp 603-820)
-
+ -
D (bp l-35)
-
A (bp -31+201) 8 (bp 282-602)
C. Plasmid Su bclones
+ +
+
-
-
-
-
+
-
-
-
-
-
-
{ pHMIP16
l1-j
pHMlP1 l-l
pHMIP2
e
pHMIP4 I+
pHMIP5
FIG. 1. Restriction endonuclease cleavage map of genomic clone HMIPXlG-1 and relative localization of human MIP gene sequences. (A) Schematic representation of genomic clone HMIPXlG-1 showing restriction enzyme sites used in mapping and subcloning the gene. Wavy lines represent X EMBL-3 sequences and the solid bar above the map indicates the relative location of the human MIP gene. (B) Summarized results from Southern blot hybridization of clone HMIPXlG-1 to probes A-C (derived by restriction digest from the bovine cDNA) and to probe D (an oligodeoxyribonucleotide corresponding to the initial 35 bases of the bovine MIP coding sequence). The relative nucleotide location of these fragments in the bovine cDMA are noted in parentheses, with +l being the translation initiation site. Hybridization to specific restriction fragments of the human genomic clone is noted by I‘+” beneath the respective fragment. (C) Plasmid subclones of HMIPXlG-1 in pBluescript II.
MIP serves as an excellent marker of lens fiber cell differentiation. The lens, which lacks vasculature, innervation, and cellular connective tissue, maintains an extensive network of intercellular transmembrane pathways. Numerous electrophysiological and dye passage studies indicate that lens fiber cells are thoroughly coupled (Mathias and Rae, 1989). MIP, reconstituted into planar lipid bilayers and membrane vesicles, exhibits channel-forming activity (Gooden et al., 1985; Girsch and Peracchia, 1985; Ehring et al., 1990). Additional evidence demonstrates that antibodies to MIP block channel activity in reconstituted membrane systems (Gooden et al., 1985) and in cultured lens cells (Johnson et al., 1988). Such results support the notion that MIP may play an integral role in lens cell-cell communication. Indeed, recent evidence indicates that MIP belongs to a growing superfamily of putative transmembrane channel proteins (Rao et al., 1990; Yamamot0 et al., 1990; Pa0 et al., 1991; Wistow et al., 1991). To gain further insight regarding the function of this protein, which constitutes the preponderant component of the lens fiber cell membrane, and to identify the eIements underlying the tissue specific and developmentally regulated expression of MIP, we have cloned the gene encoding this protein. Here we report the isolation and characterization of the human MIP
gene, representing the first description structure of a lens membrane protein MATERIALS
Isolation and Identification Human MIP Gene
AND
of the genomic gene.
METHODS
of Genomic Clones for the
A human leukocyte genomic library of Mb01 random fragments inserted into the BamHI site in X EMBL-3 phage was screened (Clontech, Inc., Palo Alto, CA). Approximately 5 X lo6 recombinant phage plaques (corresponding to approximately two haploid genomes) were screened with a bovine MIP cDNA probe containing the entire protein-coding sequence. The hybridization probe was a 780-bp NdeI-HindIII fragment isolated from the bovine MIP cDNA clone, MP4 (Gorin et al., 1984). Screening was performed using the probe radiolabeled with [a-32P]dCTP (3000 Ci/mmol; Amersham Corp., Arlington Heights, IL) by random oligonucleotide priming (BoehringerMannheim Biochemicals, Indianapolis, IN) to a specific activity of approximately 3 x lo8 cpm/pg. Hybridization positive phage clones were purified by successive plaque hybridization. Recombinant plaques were transferred to nitrocellulose and the filters hybridized and washed according to standard methods (Maniatis et al., 1982). Phage DNAs were purified and
STRUCTURE
OF
THE
LENS
MAJOR
INTRINSIC
PROTEIN
GENE
983
synthesized on an Applied Biosystems 380A DNA synthesizer and purified by passage over a Sephadex G-25 column (5 Prime-3 Prime, Paoli, PA) according to the manufacturer’s directions.
A
kb 23.1-
DNA Sequencing A 5.0-kb WI-NdeI fragment (HMIP5.0) and a 1.5kb NdeI fragment (HMIP1.5) of the human MIP gene were separately multimerized, sheared by sonication, and cloned into the SmaI site of MlSmplO. The resulting clones were sequenced by the dideoxynucleotide chain-termination method (Sanger et al., 1977) using T7 DNA polymerase (Tabor and Richardson, 1987). The nucleotide sequence was established by sequencing the fragments in Ml3 with universal primers. Specific oligodeoxribonucleotides based on the human MIP gene sequence were utilized as primers to sequence the junction between HMIP5.0 and HMIP1.5 and to complete several singlestranded regions.
9.46.64.4-
Southern Analysis
B
1.9kb
991 II
7 +1 I UII Pstl
II
J
1 EWRI
I1 { Pstl
+950 I f&l
Pstl
II i xt,
+3410 I I Hind Ill
BernHI
FIG. 2. Southern hybridization of human genomic DNA. (A) Human genomic DNA was digested with the indicated restriction enzymes. The blotted filter was hybridized with a 1.9-kb BgEII fragment of the human MIP gene containing coding and 5’-flanking sequences (see Fig. 2B for relative location of probe) and washed as described under Materials and Methods. HindIII-digested X DNA was electrophoresed in parallel with genomic DNA samples. The locations of these size standards are indicated at the left. (B) Restriction endonuclease cleavage map of the human MIP gene illustrating sites for enzymes used in Southern analysis of the gene. The bar above the map indicates the relative location of the 1.9-kb BglII fragment used as a hybridization probe. The transcription initiation site is denoted as +l.
the positive clones characterized by restriction enzyme mapping. Different DNA fragments of the human MIP gene were subcloned into Bluescript II plasmid (Stratagene, La Jolla, CA) and plasmid DNA was prepared by alkaline lysis followed by two series of cesium chloride/ethidium bromide equilibrium centrifugation (Maniatis et al., 1982).
Oligodeoxyribonucleotide
Synthesis
Oligodeoxyribonucleotides Southern hybridizations,
used for sequencing, and primer extensions were
Recombinant plasmid DNA (2.0 pg) or genomic DNA prepared from human placental tissue (8.0 pg) (Clontech, Inc.) was digested with individual or multiple restriction enzymes as indicated, fractionated through a 0.7% agarose gel in 0.04 M Tris-acetate/ 0.001 M ethylenediaminetetraacetic acid, and transferred to nitrocellulose according to standard methods (Southern, 1975) in 10X SSC (1X SSC: 0.15 M sodium chloride, 0.015 M sodium citrate, pH 7.0). Filters were prehybridized for 4 h at 42°C in 5X SSC, 0.5% sodium dodecyl sulfate (SDS), 5X Denhardt’s solution (1 X Denhardt’s: 0.02% each of bovine serum albumin, polyvinylpyrrolidone, and Ficoll), 100 1.18 salmon sperm DNA per milliliter, and 50% formamide. Hybridizations were performed for 18 h at 42°C in the same buffer with the addition of 1.0-2.0 x 10’ cpm of [a-32P]dCTP-labeled probe prepared by random oligodeoxyribonucleotide priming (Boehringer-MannheimBiochemicals).Followinghybridization, the filters were washed at room temperature twice in 2~ SSC, 0.1% SDS for 15 min each; twice in 1X SSC, 0.1% SDS for 15 min each; and twice at 68°C in 0.1X SSC, 0.1% SDS for 15 min each, unless otherwise stated. For autoradiography, washed filters were exposed to Kodak XAR film with an intensifying screen at -70°C.
RNA Isolation Total cellular RNA was isolated from human lenses by the acid guanidinium thiocyanate-phenol-chloroform extraction method (Chomczynski and Sacchi,
984
PISANO
AND
CHEPELINSKY
I P s s 5
I
I
’
A
\\
_
400 E\
c,
D ..\
-
8
\ I
0
2000
0
\
-8(Jobp
1
I
4000
6000 b
HUMAN MIP GENOMIC SEQUENCE FIG. 3. Dot matrix comparison of the bovine MIP cDNA and human MIP gene sequences. The human MIP gene and ita flanking sequences are shown on the horizontal axis. The bovine MIP cDNA is shown on the vertical axis. Four regions of homology (A through D) are noted between 2800 and 6400 bp, delineating the four exons of the human MIP gene.
1987). Normal human lenses were generously provided by Dr. J. Horwitz (Jules Stein Eye Institute, Los Angeles, CA) and stored at -70°C until processed. Primer
Extension
age (MBUG) software for the PC, the Integrated Database and Extended Analysis System for Nucleic Acids and Protein (IDEAS), and Sequence Analysis Software Package of the Genetics Computer Group (GCG) for the VAX computer.
Analysis
Oligo 3729 (5’ GACATAGAAGAGGGTGGC 3’) and oligo 3730 (5’ CAGGAGCCCAGCGCAGTGAGGAC 3’) were 5’-end-labeled with [T-~~P]ATP (7000 Ci/mmol; ICN Pharmaceuticals, Inc., Irvine, CA) and T4 polynucleotide kinase (Pharmacia, Piscataway, NJ) to a specific activity of approximately 5 X 10’ cpm/pg. Oligo 3729 is complementary to the coding sequences corresponding to nucleotides 99 to 116 (encoding amino acids 19 to 24) and oligo 3730 complementary to the coding sequences corresponding to nucleotides 131 to 153 (encoding amino acids 30 to 36), both in exon 1 of the human MIP gene. Primer extensions were performed essentially as described (Chepelinsky et al., 1987) using approximately 9.0 pg of total RNA, 1.0 X lo6 cpm of primer, and 85 units of avian myeloblastosis virus reverse transcriptase (Seikagaku America, Inc., Rockville, MD). Primer-extended products were analyzed on a 10% polyacrylamide-8 A4 urea sequencing gel with 32P-end-labeled MspI-digested pBR322 DNA fragments and a sequencing ladder as size markers. Materials Unless otherwise noted, restriction and modifying enzymes were purchased from New England Biolabs, Inc. (Beverly, MA); Boehringer-Mannheim Biochemicals (Indianapolis, IN); or Stratagene (La Jolla, CA). All other chemicals and reagents were of the highest grade available. Software Alignment and analysis of the nucleotide and protein sequences were performed using the NIH Molecular Biology User Group PC-Tools Distribution Pack-
RESULTS
AND
Isolation and Characterization Human MIP Gene
DISCUSSION
of a Clone for the
To clone the human MIP gene, approximately 500,000 recombinant phage plaques from a human genomic library in h EMBL-3 were screened using a bovine cDNA fragment containing nearly the entire protein-coding sequence of the MIP gene (from nucleotide 38 encoding amino acid 13 to nucleotide 820, 26 nucleotides 3’ to the translation termination codon) (Gorin et al, 1984). Three positive recombinant clones were identified and carried through to tertiary screening. All three clones were of the same overall length. Two of the recombinant clones (HMIPXlG-1 and HMIPXlG-3) contained 16-kb SalI inserts that appeared to be identical by restriction enzyme mapping. The restriction fragment map of genomic clone HMIPXlG-1 is diagramed in Fig. 1A. The insert from the third recombinant clone could not be isolated from the phage vector, possibly due to loss or disruption of one of the SalI sites. The 16-kb SalI insert from positive clone HMIPhlG-1 and various restriction fragments of the insert were subcloned into pBluescript II (Stratagene) (Fig. 1C). Localization ckm?
of MIP
Sequences within the Genomic
As a means of localizing MIP sequences within the positive genomic clone, pHMIP16 was digested with the restriction enzymes noted in Fig. 1A and the DNA fragments were subjected to Southern analysis using three probes derived from the bovine MIP cDNA and an oligodeoxyribonucleotide. Probe A was a 232-bp
STRUCTURE
OF
THE
LENS
MAJOR
INTRINSIC
PROTEIN
GENE
985
203 263 363 443 TGTCTCTCTATTGCCCTGACTCCCTGACTGGTGCAGGTGAATCAGCTCCCCTGGGGTCTGGGACAATATGTGCATGTGTG
523
AGCATGTGTGTGTTGAAGTGGTGAGGATTGACAGGTGGTTTAGAG~TTTAAGGAAGG~ATAGGGCCCTGGA~TGAGAATT
603
AAGAAACCTGAGTTTGAGTTTCAGCTTTCCTTCCAACTCCTTGCAGGTTCTTAAGGAAACTTATTTTAC~TTT~AAGGCC
683
TCAGTTTCCTCATACATGTAAGTGGCTGCAATCATTTCCATCATGAAGGAGCACTGTTAGGAGATGGTAAGATGCAAATA
763
CAATGATTATGAAGGGGGTTGTATTATCCTCTCCTCTCCACGGACCTGATGCCGTAACGAGATCCTTGCCGGGGAAGTCTT
043 923 1003 TAAGCAGGAAGGAGA
1063
GCAACTTCTCATTCAAGGATTCCGGGGG~C~CTAAAC~TTTC~~AACTTTTAAGGGTT~TT~~TG~~TACTT~A~TGT~AA
1163
ACATGAACATTGTCCATTGCACTCTCTCTACTGGCACAAAGGAATCAACTCTGC~CTATCCCTCTCTTGTGACTGCTATATC
1243
TAGTCCTTCTTGACCCCAAGGTAGAAATGAACGTATCATGGCTAGACATGGGCTTTGCCTATGGATCCATAGTCTGTTCC
1323
CAGACAGGGCATCAGTTGCCCTCACCCCATTGCAGGAACCCCTGGAGAGTCATGCAGGCTGTCCCTCTCCACTTGCTGCT
1403
GGCTGGAGAAAAGATGGCAGCT
1463 1563
BTTACAACTGTCTCTTTTGCAG TGAGTGAAGGGAGAGGGGCAAGATCCTGAAGCCCTTCTGGATGGTTGTGGACTGCA
1643
GGTTCCAGGCTGGATTCCTGCTTTCCTTCTGCGTGTGTTGGCTGCTGTACCAGCACTCAGCTAAGGGGGCTGTCAGAGAG
1723
TTTGGCATGCCTGTGTGAGCAGGATTCATGATTTTGCTAGAAGGAGAGGCTCTTGCTCATATTTTTCCTTCTCTGGTTAG
1603
TCAGGGAGTATGCACTGAGTATTCACTGGTTGCTGAACTAGCGGGTATGAAGAGAGACAACTAAATATGAGCAGATAAAT
1663
TCTGCTCTCAGAGATCAAGATGTGTTGTGAGCACAGATGGGCCTCAGCCCCATTCCAGTTCTAATACGTTATCGGCTTTG
1963
TGTGTGACCATGGGCAAATAATCACTCTGTGCTTCAGTTTGTTTTACTATAAAATGGGACATTACGAGAAATGTGTGAAA
2043
GTTATATGTGAGAAACATTTAGCACAGAATCTGATATAAAGTAAGCACTCAATAAATATTGGTTATGTTGATGTTGTGCA
2123
AGCCAAACATATGGAAATAATTTAAAAATTTTTCAAAGCTGTACACCCATTTTCATAACAGCATTATTCACAATAGCCAA
2203
GAGGTAAAAGCAACCCAAGTGTTCCTCAGTGGATGAATGGATAAACAAAATGTGGTATATACATAAATGGAATATTACCT
2263
GTAAAAAGGAAGGAAATTGTGACACATACCACAACATGGATGAATCTTGAGGACATTATGCTAAGTGAATAAGCCAGTCA
2363
CAAAAAGACAAATACTGTATGATTACATTTATATGGGGTATTTAGAGTACTCAAATTCA~AGACGCAAAGTAGAAGGATA
2443
GTTACCAGGGGCTGATGGGGGTGGGGTGGAATAGGCAGTTGTTTAATGGGTATAGAGTTACAGCTTTGCAAGATGAAAAA
2523
GTTCTAGACATAGGTTGCACAACAATGTGAATGTGAATATACTTAACACTACTAAACTGTACACTTAAAAACATATATATATTTTT
2603
TTGAGATGGAGTCTCCCTCTGTCACCCAGGCTGGAGTGCAGrGGAG~TTGATCTCAGGATCTCAGCTCACTGCAACCTCC
2683
~CCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTTCAGGCGCCTGCCACCACGCCCAGCTA
2763
CTTTTTGTATTTTTAGTAGAGATGGGGTTTCTCCCTGTTGGTCAGGCTGTTCTCAAATGGCTGACCTTGTGATCCGCCCG
2643
TCTTGTGATCCGCCCGTCTCAGCCTCCCAAAGTGCTGGGATTACAGGCATAAGCCACTGCGCCTGT~CAAAAATATTTAA
2923
TATGGTAAATTTTATATGTGTTTTACCACAATGTATAAAATTTTTTTGAAAGCAACATACAAACTAGTGCAAATTGATAA
3003
CAAATAGAATATACACATGCTAAGGTGTGGGATAAAGGAGTAATTTGATGAATCAGAAAATAAATGGTGAGTTTATTGTG
3063
GATATCAAGTGTACAATATCTTGTTTTCCACTAAGGTGGCTGG~AAAAGAGCAGCGTTGCTGCTCTGTCCCCTCCCCAAG . .
TATTCCTTTTCTCTTTCTACAG~ *
m=
* . . m--TGA
.
.
* . .
.
.
.
.
,,
GCTGGTTGGTGCAAACTTCCCTTCCTCCCCATCCCACCACCCTTCGCCGTGTGTGCTGATTGTGCATATG
3243 . I
.
.
3163
1 . . .
3323
,.A...
3463 . * .
. .
3563 3633
FIG. 4. The complete nucleotide sequence of the human MIP gene. The nucleotide sequence of the human MIP gene was determined as indicated under Materials and Methods. The transcription initiation site was determined by primer extension of human lens RNA (see Fig. 5) and is noted as position tl. Numbers correspond to positions relative to the transcription initiation site. The four exons of the gene are indicated by reverse images (white letters on a black background). The translation initiation codon (ATG) is denoted by asterisks (***) beginning at position 45, as is the translation termination codon (TAG) beginning at position 3375. Putative lariat branch points (CEAC) in each of the introns are boxed and shaded gray. An Ah repetitive element, located in the third intron, is underlined. A TATA box, present 26 bp upstream from the transcription initiation site, is delineated by a box.
986
PISANO
AND
CHEPELINSKY
MIP Sequences
=z -217 -201 -190 -180 -180 152+
-147
-123 115*
-110
FIG. 5. Localization of the human MIP gene transcription initiation site. The 5’ end of the gene was mapped by primer extension of human lens RNA. Oligos 3729 and 3730 were 5’-end-labeled, hybridized to 9.0 pg of RNA from I-year-old normal human lenses, and extended as described under Materials and Methods. Extended products were separated on a 10% polyacrylamide-8 M urea sequencing gel. Primer extended products of 115 bases (lane 3729) and 152 bases (lane 3730), indicated by arrows, were obtained with the respective oligonucleotide. Dideoxy sequencing reactions (lanes T and G) and MspI-digested pBR322 DNA fragments (lane M) were simultaneously run as markers. The sizes of the pBR322 fragments are indicated to the right.
PpuMI/HincII fragment of the bovine cDNA which encompassed 30 bp of the 5’-untranslated region and the coding sequence for the first 66 amino acids of the protein. Probe B was a 321-bp PflMI DNA fragment encoding amino acids 91-197 of the bovine MIP. Probe C was a 218bp PfEMI/HindIII fragment of the bovine cDNA containing the coding sequence for amino acids 197 to 263, and 29 bp of the 3’-untranslated region. Probe D, an oligodeoxyribonucleotide, corresponded to the initial 35 bases of the bovine MIP coding sequence. Results from the Southern analysis of the genomic clone insert, using these probes, are summarized in Fig. 1B. Collectively, these results indicated that the entire coding region of the human MIP gene was contained within the 1.9-kb EgZII, 1.2kb BglII/NdeI, and the 1.5kb NdeI fragments of the genomic clone. Subsequent sequence analysis demonstrated that the human MIP gene spanned 3.6 kb, as illustrated by the solid bar over clone HMIPhlG-1 in Fig. 1A.
within
the Human
Genome
Southern analysis of total human genomic DNA was performed to determine the copy number of the MIP gene in the human genome. Autoradiographic results from this analysis as well as a restriction map of the human MIP gene are shown in Figs. 2A and 2B, respectively. Human genomic DNA was digested with in an agarose gel, various enzymes, fractionated transferred to nitrocellulose, and hybridized with the 1.9-kb BgZII fragment of pHMIP16 which encompassed the 5’ end of the MIP gene as shown in Fig. 1B. Based on prior restriction mapping of the genomic clone and subsequent computer analysis for restriction enzyme sites in the human MIP nucleotide sequence, it was anticipated that a single HindIII, BamHI, and BglII fragment and two EcoRI fragments would hybridize to the 1.9-kb BgJII probe. These results are evident in Fig. 2A. PstI-digested DNA was expected to produce four hybridization positive bands of 1389, 730, and 82 bp and one larger than 2200 bp. The PstI fragments that hybridized to the probe were approximately 4400, 1400, and 700 bp. The expected 82-bp PstI fragment probably was not retained in the 0.7% agarose gel during the separation due to its small size. No additional hybridization positive bands were detected on the Southern blot, even under low stringency hybridization conditions (data not shown). The five resultant genomic DNA hybridization patterns evident in Fig. 2A indicate the presence of a single MIP gene copy per haploid human genome. This finding is consistent with the mapping of MIP to the long arm of human chromosome 12 (Sparkes et al., 1986).
Nucleotide Sequence and Structural the Human MIP Gene
Organization
of
The complete nucleotide sequence of the human MIP gene and approximately 3.0 kb of the 5’-flanking region was determined by dideoxynucleotide sequencing of single-stranded DNA from recombinant Ml3 clones, as indicated under Materials and Methods. A total of 133 overlapping Ml3 templates assembled HMIP5.0 and 31 overlapping templates assembled HMIP1.5. Using this strategy, the human MIP gene was sequenced in its entirety on both strands. To determine the overall structural organization and intron/exon boundaries of the gene, the entire sequence obtained for the human MIP gene (coding and flanking sequences) was compared to the fulllength bovine cDNA sequence using a computer-generated dot matrix analysis (Maize1 and Lenk, 1981). Results of this dot matrix analysis (Fig. 3) demonstrated four significant regions of homology (A, B, C,
STRUCTURE
OF
THE
LENS
MAJOR
INTRINSIC
PROTEIN
987
GENE
HumanMWELRSASFWRAlFAEFFATLFYVFFGLGSSLRWAPGPLHVLQVAMAFGL Bovine - _ _ - - _ _ _ - _ _ _ -C.----b--.---.--~---------------i----
50
Human
ALATLVQSVGHISGAHVNPAVTFAFLVGSQMSLLRAFCYMAAQLLGAVAG - - - - - - -~-.---...--.---..------------i---~---------
100
Bovine Human Bovine
AAVLYSVTPPAVRGNLALNTLHPAVSVGQATTVEIFLTLQFVLCIFATYD _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ --~-------i------------------
150 R;---------------
Rat Human Bovine Rat
ERRNGQLGSVALAVGFSLALGHLFGMYYTGAGMNPARSFAPAlLTGmN - - - - -~------..----;--------------------------R----
200
_ _ _ _ _ ~~-.---------;----.---------------------R--~-
Chick
~-~~-~p.-~--p.----.------ip~--------------~i------
Human Bovine
HWV YWV G P I I G G G L G S L L Y D F L L F:~~T-~~?-~~~~-~~~~~~~~P ---w-e -“-‘-‘“1-----c _ _ _ _ _ _ _ _ ;--~-.----------.----V--.,---SR-~E------
Rat
.---------.------.-----------~-----i----~-~N------
Chick
- - -
Human Bovine
VTGEPVELNTQAL _ _ _ _ _ _ _ _ k _ _ _ _
++
+
+ 1
D V S N G Q P E
250
;.~--ii.~~-~~---;i~-c--~~-~~---~--~-~--p~~App--
Rat
G-----;---
Chick
ppi.-i--i---t
300
FIG. 6. Alignment of the predicted amino acid sequences of human, bovine, rat, and chicken MIP. The deduced amino acid sequence of human MIP is aligned with the deduced sequences of MIP from bovine (Ref. (11)) and partial sequences from rat (Ref. (32)) and chicken (Ref. (14)). Amino acid identities are noted by a dashed line. Bovine amino acid 14 obtained by peptide sequencing is a phenylalanine (Ref. (39)). Conservative sequence changes due to polarity and steric considerations are indicated by an asterisk above the amino acid. A potential glycosylation site beginning at amino acid 197 is indicated by a solid box, while a putative calmodulin binding site beginning at amino acid 225 (Ref. (25)) is demarcated by a dashed box. Potential phosphorylation sites (Refs. (8,15)) at positions 229,231,235, and 245 are denoted by arrows.
and D) between nucleotides 2800 and 6400, indicating the presence of four exons in the gene. This finding was confirmed by computer-assisted alignment of the nucleotide sequences of the human MIP gene and bovine MIP cDNA using the NUCALN program. The exon-intron junctions were also verified by scanning the nucleotide sequences for a consensus 5’-donor splice signal (zAG/GTzAGT) and 3’-acceptor splice signal (&NCAG/G) (Mount, 1982). The sequences flanking the exon-intron junctions in the human MIP gene conform well, but are not identical, to the donor splice and acceptor consensus sequences. However, the consensus intron border dinucleotides, GT and AG (Breathnach et al., 1978), were found at the 5’ and 3’ borders, respectively, of all three introns in the human MIP gene. Lariat signals having the consensus sequence YNYURAY are typically located 20-50 nucleotides upstream from the acceptor signal in the intron (see Smith et al., 1989). Signals having either the sequence ctgag or ctcag were found in each of the three introns, 21 to 28 nucleotides upstream from the exon-intron junction (Fig. 4). The four exons of the
gene are 404, 165, 81, and 369 bp, respectively. three introns are 498,438, and 1605 bp.
Transcription
Start Site of the Human
MIP
The
Gene
Primer extension analysis of RNA isolated from lyear-old normal human lenses was performed to delineate the 5’ end of the gene and to establish the precise transcription initiation site(s). Two oligodeoxyribonucleotides, a 23-mer (oligo 3730) complementary to the mRNA-encoding amino acids 30 to 36 and an 18-mer (oligo 3729) complementary to the mRNAencoding amino acids 19 to 24, were extended with reverse transcriptase and the reverse-transcribed products analyzed as detailed under Materials and Methods. Oligo 3730 produced a single primer-extended product of 152 bases and oligo 3729 produced a single product of 115 bases (see Fig. 5), indicating a single site of transcriptional initiation in the human MIP gene. Based on these results, the transcription initiation site of the human MIP gene (designated +l
988
PISANO
in Fig. 4) was mapped stream from the TATA 3’-Untranslated
AND
to a single site 26 bp downbox.
Sequence of the Human
MIP
Gene
The 3’ end of the human MIP gene was determined by comparing the nucleotide sequence of the 3’-untranslated region of the gene with the bovine cDNA sequence. The translation stop codon or the human MIP gene, TAG, is located at nucleotide 3375 (Fig. 4). The 3’-untranslated regions of the human gene and bovine cDNA are approximately 90% identical from the translation termination codon to the polyadenylation site that had been delineated for the bovine MIP cDNA (Gorin et al., 1984). Based on the excellent sequence homology between the 3’-untranslated regions of the human and bovine MIP genes, exon 4 of the human MIP gene was determined to be 369 bp, the last 186 bp of which are untranslated. Analysis of the 3’ end of the human gene sequence indicated that the eukaryotic polyadenylation consensus signal AATAAA, usually located lo-30 bp 5’ to the polyadenylation site (Proudfoot and Brownlee, 1976), is absent from the human MIP gene. Although the AATAAA sequence has been found to be highly conserved, natural variations of this sequence that are still active in specifying the poly(A) addition site have been found in many genes (see Leff et al., 1986, for review). Additional sequences have been suggested to play a role in mRNA cleavage and polyadenylation (McLaughlan et al., 1985; Leff et al, 1986; Renan, 1987). The hexanucleotide AAGAAA, located at nucleotide 3449 in the human MIP gene, is found in an identical position in the bovine MIP cDNA (Gorin et al., 1984). Analysis of the partial rat MIP cDNA sequence (Shiels et al., 1988) suggests that this polyadenylation signal may be utilized. The apparent divergence in the 3’ end sequence of the MIP cDNA from rat, bovine, and chicken (Kodama et al, 1990) suggests that alternative polyadenylation may be involved in processing of the MIP gene. Human
MIP Gene Coding Sequence
Based on delineation of the 5’ and 3’ ends of the human MIP gene as detailed above, the gene was determined to be 3560 bp. Alignment of the MIP gene sequence with the bovine cDNA sequence using the NUCALN program demonstrated 81 and 90% sequence identity for the 5’- and 3’-untranslated regions of the genes, respectively. Exons 1 through 4 of the human gene were found to be 90, 95, 93, and 89% identical to the coding sequence of the bovine cDNA. Minimal overall sequence divergence between the coding sequences of the two genes indicates a high degree of evolutionary conservation.
CHEPELINSKY
A comparison of the deduced amino acid sequences from the human MIP gene and bovine (Gorin et al., 1984), partial rat (Shiels et al., 1988), and partial chicken (Kodama et al., 1990) cDNAs is presented in Fig. 6. The human MIP gene encodes a 263-aminoacid protein that bears 92% overall sequence identity to the bovine protein. The human and bovine proteins possess even greater homology (98%) if one considers conservative amino acid changes based on polarity of the residue. Comparison of the derived rat and chicken amino acid sequences with the human MIP primary amino acid sequence, accounting for sequence identities and conservative changes based on polarity, demonstrated 96 and 89% sequence homology, respectively, in the regions available. Such comparisons of the primary amino acid sequences of the human, bovine, rat, and chicken MIP suggest that the protein has been highly conserved throughout evolution, a finding that has previously been suggested based on the immunological cross-reactivity and peptide mapping of MIP in various species (Bouman and Broekhuyse, 1981; Takemoto et al, 1981). Based on computer alignment of the amino acid coding sequences of several recently cloned membrane genes and cDNAs, an expanding superfamily of putative transmembrane channel proteins has been identified. Included in this superfamily are MIP, the Drosophila big brain protein (bib), soybean nodulin 26 protein (nod26), the Escherichia coli glycerol facilitator protein (glpF), the root-specific proteins TobRB7 (from tobacco) and AtRB7 (from Arubidopsis), and a soybean tonoplast protein (see Rao et at., 1990; Yamamoto et al., 1990; Pao et al., 1991). It has been noted recently that the 28-kDa erythrocyte transmembrane protein may also belong to this superfamily (Smith and Agre, 1991). The amino acid sequences of MIP, bib, nod26, glpF, TobRB7, and AtRB7 bear no marked homology to known transport proteins, yet each appears to play some intrinsic role in intercellular transport or communication within their respective environmental locales. Analyses of the amino acid sequences of MIP, bib, nod26, and glpF have delineated the presence of a twofold repeat in the primary structure of these proteins (Pao et al., 1991; Wistow et al., 1991). The first repeat corresponds to the sequences encoded by a single exon (exon 1) of the human MIP gene; the second repeat encompasses exons 2 through 4 (Wistow et al., 1991). This finding suggests that members of this superfamily may have evolved by gene duplication of a single structural motif, perhaps representing an ancestral monomer capable of forming higher-order multimerit structures. Human MIP Gene Noncoding Sequences Comparison of the human MIP gene sequence to those nucleotide sequences in GenBank indicated the
STRUCTURE
OF
THE
LENS
MAJOR
presence of multiple Ah repetitive sequences in and around the human MIP gene. Ah repeats, the most abundant family of interspersed repetitive DNA in the human genome, typically found in intergenic regions and introns, share a 300-bp conserved consensus sequence consisting of two imperfect, directly repeated monomeric units that are separated by an adenine-rich spacer (see Schmid and Jelinek, 1982, for review). Their expression may regulate various aspects of cell proliferation, differentiation, and transformation (Howard and Sakamoto, 1990). Three complete Alu sequences are found in tandem in the 5’-flanking region of the human MIP gene, and a single complete Ah sequence in the third intron. The Ah repeats in the human MIP gene and its 5’-flanking sequence are from 77 to 87% identical to the Ah consensus sequence. The three Ah repeats upstream of the MIP gene are classic Ah sequences in that they are all flanked by inverted direct repeats and terminate with a stretch of poly(A), whereas the repeat present in the third intron of the gene, while 84% identical to the Alu repeat consensus sequence, is not flanked by direct repeats and terminates with an imperfect region of poly(A)/poly(T). In summary, the present report delineates the genomic structure and complete nucleotide sequence of the human gene encoding the major intrinsic protein of the ocular lens, the first such report on the gene structure of a lens membrane protein. The isolation of the coding and noncoding sequences of the MIP gene has allowed us to initiate analyses of the c&elements and trans-acting factors governing the tissue specific and developmental profile of the protein. ACKNOWLEDGMENTS
REFERENCES
BOWN,
A. A., AND BROEKHWSE,
4.
5.
6.
I.
8.
9.
10.
11.
12.
14.
15.
16.
17.
1. ALCALA, J., AND MAISEL, H. (1985).
2.
3.
13.
We gratefully acknowledge Dr. Michael B. Gorin for providing the bovine MIP cDNA clone, Dr. Joseph Horwitz for the provision of human lenses, and Dr. Abdul Ally of Biotechnica International, Inc., for assistance in dideoxynucleotide sequencing of the gene. We also acknowledge Marvin Shapiro of the NIH Division of Computer Research and Technology, Dr. David Landsman of the National Library of Medicine’s Center for Biotechnology Information for assistance with computer analyses, as well as the National Cancer Institute for allocation of computer time and staff support, and in particular Mark Gunnel1 at the Advanced Scientific Computing Laboratory of the Frederick Cancer Research Facility. Special thanks is expressed to Drs. John Klement, Douglas Lee, Graeme Wistow, and Joram Piatigorsky for their insightful comments and thoughtful evaluation of the manuscript and to MS. Gabriela Tobal for assistance in verifying the gene sequence.
plasma membranes and cytoskeleton. Structure, Function, and Pathology” 169-222, Dekker, New York.
INTRINSIC
Biochemistry of lens In “The Ocular Lens: (H. Maisel, Ed.), pp.
R. M. (1981).
Lens Mem-
18.
19.
PROTEIN
GENE
989
branes XIV. Comparative study of immunological characteristics of the fiber membrane polypeptides from calf, pig, sheep and chicken lenses. Exp. Eye Res. 33: 299-308. BRJZATHNACH, R., BENOIST, C., O’HARE, K., GANNON, F., AND CHAMRON, P. (1978). Ovalbumin gene: Evidence for a leader sequence in mRNA and DNA sequences at the exonintron boundaries. Proc. Nutl. Acad. Sei. USA 75: 4853-4857. BROEKHUYSE, R. M., KUHLMAN, E. D., AND STOLS, A. L. (1976). Lens membranes II. Isolation and characterization of the main intrinsic polypeptide (MIP) of bovine lens fiber membranes. Exp. Eye Res. 23: 365-371. CHEPELINSKY, A. B., SOMMER, B., AND PIATIGORSKY, J. (1987). Interaction between two different regulatory elements activates the murine oA-crystallin gene promoter in explanted lens epithelia. Mol. Cell. Biol. 7: 1807-1814. CHOMCZYNSKI, P., AND SACCHI, N. (1987). Single step method of RNA isolation by acid guanidinium thiocyanatephenol-chloroform extraction. Anal. Biochem. 162: 156-159. EHRING, G. R., ZAMPIGHI, G. A., HORWIT~, J., BOK, D., AND HALL, J. E. (1990). Properties of channels reconstituted from the major intrinsic protein of lens fiber membrane. J. Gen. Phystil. 96: 631-664. GARLAND, D., AND RUSSELL, P. (1985). Phosphorylation of lens fiber cell membrane proteins. Proc. Natl. Acad. Sci. USA 82: 653-657. GIRSCH, S. J., AND PERACCHIA, C. (1985). Lens cell-to-cell channel protein. I. Self-assembly into liposomes and permeability regulation by calmodulin. J. Membr. Biol. 83: 217-225. GOODEN, M., RINTOUL, D., TAKEHANA, M., AND TAKEMOTO, L. (1985). Major intrinsic polypeptide (MIP26K) from lens membrane: Reconstitution into vesicles and inhibition of channel forming activity by peptide antiserum. Biochem. Biophys. Res. Commun. 128: 993-999. GORIN, M. B., YANCEY, S. B., CLINE, J., REVEL, J. P., AND HORWITZ, J. (1984). The major intrinsic protein (MIP) of the bovine lens fiber membrane: Characterization and structure based on cDNA cloning. Cell 39: 49-59. HOWARD, B. H., AND SAKAMOTO, K. (1999). Alu interspersed repeats: Selfish DNA or a functional gene family? New Bioiagist 2: 759-770. JOHNSON, R. G., KLUKAS, K. A., TZE-HONG, L., AND SPRAY, D. C. (1988). Antibodies to MP28 are localized to lens junctions, alter intercellular premeability and demonstrate increased expression during development. In “Gap Junctions” (E. L. Hertzberg and R. G. Johnson, Eds.), pp. 81-98, A. R. Liss, New York. KODAMA, R., AGATA, N., MOCHII, M., AND EGUCHI, G. (1990). Partial amino acid sequence of the major intrinsic protein (MIP) of the chicken lens deduced from the nucleotide sequence of a cDNA clone. Exp. Eye Res. 60: 737-741. LAMPE, P. D., AND JOHNSON, R. G. (1990). Amino acid sequence of in vivo phosphorylation sites in the main intrinsic protein (MIP) of lens membranes. Eur. J. Biochem. 194: 541547. LEFF, S. E., ROSENFELD, M. G., AND EVANS, R. M. (1986). Complex transcriptional units: Diversity in gene expression by alternative RNA processing. Anna Rev. Bioehem. 65: 1091-1117. MAIZEL, J. V., AND LENK, R. P. (1981). Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl. Acad. Sei. USA 78: 7665-7669. MANIATIS, T., FRITSCH, E. F., AND SAMBROOK, J. (1982) “Molecular Cloning: a Laboratory Manual,” Cold Harbor Laboratory, Cold Spring Harbor, NY. MATHIAS, R. T., AND RAE, J. L. (1989). Cell to cell communi-
990
20. 21.
22. 23.
24.
25.
26.
27.
28.
29. 30.
31. 32.
PISANO
AND
cation in lens. In “Cell Interactions and Gap Junctions” (N. Sperelakis and W. C. Cole, Eds.), Vol. I, pp. 29-50, CRC Press, Boca Raton, FL. MCAVOY, J. W. (1980). Induction of the eye lens. Difierentiation 17: 137-14s. MCLAUGHLAN, J., GAFFNEY, D., WHITTON, J. L., AND CLEMENTS, J. B. (1985). The consensus sequence YGTGTTYY located downstream from the AATAAA signal is required for efficient formation of mRNA 3’ termini. Nucleic Acids Res. 13: 1347-1368. MOUNT, S. M. (1982). A catalogue of splice junction sequences. Nucleic Acio!s Res. 10: 4X-472. PAO, G. M., WV, L-F., JOHNSON, K. D., HOFTE, H., CHRISPEELS, M. J., SWEET, G., SANDAL, N. N., AND SAIER, M. H. (1991). Evolution of the MIP family of integral membrane transport proteins. Mol. Microbial. 5: 33-37. PAUL, D. L., AND GOODENOUGH, D. A. (1983). Preparation, characterization, and localization of antisera against bovine MP26, an integral protein from lens fiber plasma membrane. J. Cell Bid. 96: 625-632. PERACCHIA, C. (1989). Control of gap junction permeability and ealmodulin-like proteins. In “Cell Interactions and Gap Junctions” (N. Sperelakis and W. C. Cole, Eds.), Vol. I, pp. 125-142, CRC Press, Boca Raton, FL. PIATIGORSKY, J., AND ZELENKA, P. (1991). Transcriptional regulation of crystallin genes: Cis elements, trans-factors and signal transduction system in the lens. In “Advances in Developmental Biochemistry” (P. Wassarman, Ed.), Vol. I, pp. 211-256. PROUDFOOT, N. J., AND BROWNLEE, G. G. (1976). 3’Non-coding region sequences in eukaryotic messenger RNA. Nature 263: 211-214. RAo, Y., JAN, L. Y., AND JAN, Y. N. (1990). Similarity of the product of the Drosophila neurogenic gene big brain to transmembrane channel proteins. Nature 345: 163-167. RENAN, M. J. (1987). Conserved 12-bp element downstream from mRNA polyadenylation sites. Gene 60: 245-254. SANGER, F., NICKLEN, S., AND COULSON, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad.Sci. USA 71:5463-5467. SCHMID, C. W., AND JELINEK, W. R. (1982). The Alu family of dispersed repetitive sequences. Science 216: 1065-1070. SHIELS, A., KENT, N. A., MCHALE, M., AND BANGHAM, J. A. (1988). Homology of MIP26 to Nod26. Nucleic Acids Res. 16:
9348.
CHEPELINSKY 33.
SMITH, B. L., AND AGRE, P. (1991). Erythrocyte M, 28,000 transmembrane proteinexists as a multisubunit oligomer similar to channel proteins. J. Biol. Chem. 266: 6407-6415.
34.
SMITH, C. W. J., PATTON, J. G., AND NADAL-GINARD, B. (1989). Alternative splicing in the control of gene expression. Anna Rev Genet. 23: 527-577.
35.
SOUTHERN, E. M. (1975). Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98: 503-517.
36.
SPARKES, R. S., MOHANDAS, T., HEINZMANN, C., GORIN, M. B., HORWITZ, J., LAW, M. L., JONES, C. A., AND BATEMAN, J. B. (1986). The gene for the major intrinsic protein (MIP) of the ocular lens is assigned to human chromosome 12cen-q14. Invest. Ophthalmol. Visual Sci. 27: 1351-1354.
37.
TABOR, S., AND RICHARDSON, C. C. (1987). DNA sequence analysis with modified bacteriophage T7 DNA polymerase. Proc. Natl. Acad. Sci. USA 84: 4767-4771.
38.
TAKJZMOTO, L. J., HANSEN, J. S., AND HORWITZ, J. (1981). Interspecies conservation of the main intrinsic polypeptide (MIP) of the lens membrane. Comp. Biochem. Physiol. B 68: 101-106.
39.
TAKEMOTO, L. J., HANSEN, J. S., NICHOLSON, B. J., HUNKAPILLER, M., REVEL, J-P., AND HORWITZ, J. (1983). Major intrinsic polypeptide of lens membrane. Biochemical and immunological characterization of the major cyanogen bromide fragment. Biochim. Biophys. Acta 731: 267-274.
40.
WATANABE, M., KOBAYASHI, H., RUTISHAUSER, U., KATAR, M., ALCALA, J., AND MAISEL, H. (1989). NCAM in the differentiation of embryonic lens tissue. Dev. Biol. 135: 414-423.
41.
WISTOW, G., AND PIATIGORSKY, J. (1988). The lens crystallins: Evolution and expression of proteins for a highly specialized tissue. Annu. Rev. B&hem. 57: 479-504.
42.
WISTOW, G., PISANO, M. M., AND CHEPELINSKY, Tandem sequence repeats in transmembrane teins. Trends Biochem. Sci. 16: 170-171.
43.
YAMAMOTO, Y. T., CHENG, C-L., AND CONKLING, M. A. (1990). Root-specific genes from tobacco and Arabidopsis homologous to an evolutionarily conserved gene family of membrane channel proteins. Nucleic Acids Res. 18: 7449.
44.
YANCEY, S. B., KOH, K., CHUNG, J., AND REVEL, J. P. (1988). Expression of the gene for main intrinsic polypeptide (MIP): Separate spatial distributions of MIP and &crystalIin gene transcripts in rat lens development. J. Cell Biol. 106: 705714.
A. B. (1991). channel pro-