Eur. J. Biochem. 191, 619-625 (1990)

0 FEBS 1990

A novel gene member of the human glycophorin A and B gene family Molecular cloning and expression Alain VIGNAL ', C e d e RAHUEL', Jacqueline LONDON ', Baya CHERTF ZAHAR', Sophie SCHAFF', Claude HATTAB', Yasuko OKUBO' and Jean-Pierre CARTRON

' Institut National de la Sante et de la Recherche Mkdicale, Unite 76, Paris, France Osaka Red Cross Blood Center, Japan (Received February 12, 1990) - EJB 90 0146

A new gene closely related to the glycophorin A (GPA)and glycophorin B (GPB)genes has been identified in the normal human genome as well as in that of persons with known alterations of GPA and/or GPB expression. This gene, called glycophorin E (GPE), is transcribed into a 0.6-kb message which encodes a 78-amino-acid protein with a putative leader peptide of 19 residues. The first 26 amino acids of the mature protein are identical to those of M-type glycophorin A (GPA), but the C-terminal domain (residues 27 - 59) differs significantly from those of glycophorins A and B (GPA and GPB). The GPE gene consists of four exons distributed over 30 kb of DNA, and its nucleotide sequence is homologous to those of the GPA and GPB genes in the 5' region, up to exon 3. Because of branch and splice site mutations, the GPE gene contains a large intron sequence partially used as exons in GPA and GPB genes. Compared to its counterpart in the GPB gene, exon 3 of the GPE gene contains several point mutations, an insertion of 24 bp, and a stop codon which shortens the reading frame. Downstream from exon 3, the GPE and the GPB sequences are virtually identical and include the same Alu repeats. Thus, it is likely that the GPE and GPB genes have evolved by a similar mechanism. From the analysis of the GPA,GPB and GPE genes in glycophorin variants [En(a-), S-s-U- and Mk], it is proposed that the three genes are organized in tandem on chromosome 4. Deletion events within this region may remove one or two structural gene(s) and may generate new hybrid structures in which the promoter region of one gene is positioned upstream from the body of another gene of the same family. This model of gene organization provides a basis with which to explain the diversity of the glycophorin gene family. Glycophorin A (GPA) and glycophorin B (GPB) are the major sialoglycoproteins of the human red cell membrane and carry the blood group M N and Ss antigens, respectively; they are also ligands for viruses, bacteria and parasites [l, 21. The amino acid sequences of GPA and GPB and their cDNA sequences have been determined [3- 81. More recently, their genes have been characterized [9, lo]. They are each spread over 30 kb of DNA and contain seven and five exons, respectively. Complete sequence determination has confirmed the high similarity between the two genes in their 5' region and has demonstrated divergence in the 3' region [9]. It was suggested that the GPB gene has evolved from the acquisition of 3' sequences different from those of the GPA gene, by duplication and homologous recombination at A h repeats [9]. Southern blot analyses of genomic DNAs from several glycophorin variants [7,10- 141, indicate that the lack of GPA in En(a-) individuals or of GPB in S-s-U- individuals is related in most cases to gene deletions which have not been characterized [7, 11, 121. In other variants, hybrid glycophorin molCorrespondence to J.-P. Cartron, INTS, 6 rue Alexandre Cabanel, F-75015 Paris, France Abbreviations. GPA, GPB, GPC, GPD, GPE, glycophorins A, B, C, D and E; GPA, GPB, GPE, genes encoding glycophorins A, B and E; inv, invariant gene, renamed GPE; PCR, polymerase chain reaction. Note. The novel nucleotide sequence data published here has been deposited with the EMBL sequence data banks and is available under accession number X53004-X53010.

ecules have been found to be products of unequal crossing over, and/or gene conversion between the GPA and GPB genes [lo, 13,141. During these investigations, we identified on Southern blots unexpected bands defining a new gene closely related to the GPA or GPB genes [lo, 111. This gene, which was first called invariant (inv), has now been isolated and sequenced. Its transcript, identified in human erythroblasts and K562 cells, encodes a new glycophorin species. Following the glycophorin nomenclature [15], and since four main glycophorins, GPA, GPB, GPC and GPD, have already been characterized [16], it is proposed that the glycoprotein encoded by the inv gene be called glycophorin E (GPE), and that the inv gene be renamed the GPE gene. Also, as the genomic alterations occurring in the glycophorin variants lacking GPA and/or GPB have not previously been characterized precisely, we have investigated the status of the glycophorin genes in some of these individuals [En(a-), S-s-Uand Mk]. These studies have provided a general model for the organization of the glycophorin genes as well as new insights into the potential mechanisms used for generating glycophorin gene diversity. MATERIALS AND METHODS Cell samples and D N A preparation

DNA from the En(a-) individual (G. W.) was obtained from an Epstein-Barr-virus (WES-2)-transformed B cell line

620 (gift from Prof. C. G. Gahmberg and L. C. Anderson, Helsinki, Finland). Blood from the S-s-U- donor (Fav.) was a gift from Dr M. Girard (CTS Asnieres, France). The Mk blood sample (R. S.) was collected at the Osaka Red Cross Blood Center (Osaka, Japan) and shipped to Paris for analysis. The human-hamster hybrid cell line 9TK [I71 was a gift from Dr E. Chu (University of Michigan, Ann Arbor). The chromosome content of this hybrid was checked by karyotype analysis after R-banding (N. Creau-Goldberg, H6pital Necker, Paris). DNA samples were prepared as previously described [7, 10, 111. Materials Restriction enzymes, bacterial alkaline phosphatase and pUC vectors were from Appligene (Strasbourg, France). T4 DNA ligase and T4 polynucleotide kinase were from Biolabs, and radiolabeled nucleotides were from Amersham International (Bucks, UK). pUC sequencing kits were from Pharmacia (Uppsala, Sweden). Avian myeloblastosis virus reverse transcriptase and Thermus aquaticus polymerase (Tuq polymerase) were from Perkin-Elmer (USA).

0 AGTTGTCTTTGGTAGTTTTTTTGCACTMCTTCAGGAGCCAGCTCGTGATCTCAGG

ATG TAT GGA AAA ATA A T C TTT GTA TTA CTA TTG TCA GGA ATT GTG AGC ATA TCA GCA K e t Tyr Gly Lys Ile Ile Phe Val Leu Leu Leu Ser Gly Ile V a l ser Ile Ser Ala

-1

TCA AGT A C C ACT GGT GTG GCA ATG CAC ACT TCA ACC TCT TCT TCA GTC ACA PAG AGT Ser Ser Thr Thr Gly Val Ala Met His Thr Ser Thr Ser Ser S e r Val Thr Lys Ser

19

rAc ATC TCA TCA CAG A C A AAT GGG ATA ACA CTC ATT AAT rGG TGG GCG ATG GCT CGT Iyr Ile Ser Ser Gln Thr Asn Gly Ile Thr Leu Ile Asn Trp Trp A l a Met Ala Arq

30

GTT A T T TTT GAG CTG ATG CTT GTT GTT GTT GGA ATG ATC ATC TTA ATT TCT TAC TGT v a l 11e Phe G ~ Val U Met Leu Val V a l Val ~ l M y e t Ile 11e ~ e u11e ser ~ y cys r

57

ATT CGA TGA GGATGTGGCCTGCATGCTGCCTGATCTTGCCTAGPACCAGCTGCACCTGCTGTTCTCTTGCTT

Ile A r q * * *

59

ATGCAAACTGGCTGCACCTGCTATTCCTTTGCTTATGCCCCMCCCTTGGCTATCCTMCT ~

4.5

3.5

1B

p34

__

I

Nucleic acid probes cDNA probes and the genomic probe (1.7 kb) have been already described [7, 10, 111 and were labeled with [a32P]dCTPby random priming (Boehringer, Mannheim, FRG) to a specific activity of l o 9 cpm/mg. Oligonucleotide probes 1 10 20 30 40 so 60 70 78 were synthesized on a Milligen Biosearch 8700 DNA AMINO A C I D POSITION synthesizer and labeled with [ Y - ~ ~ P I ~ A using T P T4 polynuFig. 1. Nucleotide sequence of GPE and predicted amino acid sequence cleotide kinase to a specific activity of 5.106 cpm/pmol. Oligonucleotide P4 (5’-GTGGCAATGCACACTTCA- of’GPE. (A) Nucleotide sequence of the PCR amplification product ACCTCTTCTTCAGTCACA-3’) is located within the exon 2 from RNA preparations of human spleen erythroblasts and K562 region common to the three glycophorin genes [lo]. Probe PI5 cells, using amplimers P33 and P34 (underlined), located in exon E-I and exon E-4 of the GPE gene shown in Fig. 2. Predicted amino acid (5’-GTACTTACCACTTTCACAGCCCCA-3’) is located 5’ sequences are given in the three-letter code. We assume that the upstream from exon 1 of each genes, and PI8 (5’- transcription-initiation site of the GPE mRNA (closed circle) is very TTCTGTTATATTGGGTATGAGATC-3’)is within intron 1. likely identical to those of GPA and GPB, since primer extension Probe P21 (5’-GGAGAAACG/cGGACAACTTGTCCAT-3’) analysis carried out with two primers located in their common is specific for exon B-3 of the GPB gene. Probes P33 and P34 nuclcotide sequence (5’ region) provided a single extension product were used for GPE transcript amplification as shown in Fig. 1. [24]. (B) Hydropathy plot of the GPE amino acid sequence was calculated over windows of seven amino acid residues [25]

Isolation of genomic clones and sequence determination GPE and GPB genomic clones were isolated from a human leucocyte genomic library as described for GPA and GPB clones, using probes GPB-1 and GPB-3 [lo]. GPB clones (217 and 151) were distinguished from GPE clones (116 and A20) by restriction analysis and hybridization with probe P21, which recognizes GPB clones only, according to Southern [18]. EcoRI restriction fragments were subcloned in pUCl8 and sequenced by the dideoxynucleotide chain-termination method [19]. Poljwerusc. chain reaction ( P C R ) amplification

For amplification of exon 1 from the GPA, GPB and GPE genes, 20 pmol probes PI5 and PI8 were added to 2.5 units Tuq polymerase [20] and 1 pg genomic DNA. Incubations were performed in a Perkin Elmer thermal DNA cycler for 30 cycles, each comprising 1 min denaturation at 94”C, 1 min annealing at 53 ‘C and 2 min chain extension at 72‘ C. GPE transcripts were identified by PCR amplification of cDNAs prepared from human erythroblast and K562 total RNAs. First cDNA strands were synthesized by hybridization of

70 ng P34 oligonucleotide primer to 1 pg RNA, followed by elongation with 1 mM dNTP in the presence of 7 units of avian inyeloblastosis virus reverse transcriptase, as previously described [21]. The cDNA strands were amplified by PCR using P33 and P34 oligomers as above, except that annealing was carried out at 51 “C. Amplified fragments were purified on 1.5% agarose gels, subcloned into pUC18 vectors and sequenced as described above. RESULTS AND DISCUSSION Identification o f t h e GPE gene in genomic DNAs from glycophorin variants and normal individuals Extending our previous observations [7, 101 that a new gene, now called GPE, is present in the normal human genome, as well as in persons with the glycophorin variant phenotypes En(a-) and Mi.V, we found in the present study that the GPE gene is also present in the genome of a S-s-U- individual (whose red cells lack GPB), and in the genome of a very rare Mk-homozygous donor whose red cells are totally deficient

62 1 Table 1. Restriction analysis ofgenomic DNA from glycopliorin variants with selectedprohes,forGPA and GPB Genomic DNA from glycophorin variants lacking expression of cell surface GPA [Ena(-)], GPB (S-s-U-) and both GPA and GPB (Mk), were investigated in comparison with the DNA from a control (C) individual. Hybridization of restriction fragments with the probes were carried out according to Southern [18]. The 1.7-kb genomic probe and the P4 oligonucleotide probe detect exons 1 and 2, respectively, of the glycophorin genes. The cDNA probe GPB-3 detects the 3' untranslated sequences of the GPB and GPE gcnes. Exon 1 assignment was performed by PCR amplification of genomic DNAs (see text) ~~

Probe

Restriction enzyme

Size of restriction fragments

Gene assignment ~

~

_

En(a-)

Mk

s-s-u-

Southern

6.4 6.2 1.5

6.4 6.2

6.4

6.4

1.5

GPE GPB GPA

7.8 6.3 2.7

7.8 6.3

6.3

6.3 2.7

GPB GPE GPA

11.0 9.5 7.0

11.0 9.5

9.5

9.5 7.0

6.0 1.7

6.0 1.7

1.7

1.7

C

_

PCR

kb P4

HindIII

sac1

1.7-kb

GPB-3

BamHI

PVUII

in both GPA and GPB [22, 231. Indeed, the 1.7-kb and P4 probes which detect the exons 1 and 2, respectively, that are common to the GPA and GPB genes [lo], hybridized on Southern blots to three distinct genomic fragments instead to the two that would be expected for GPA and GPB, respectively, as seen from the HindIII, BumHI and Sac1 digests of common genomic DNA (Table 1). The possibility that the third band was due to a common polymorphism was ruled out by showing that it is present in DNA samples from 11 unrelated individuals. The 1.7-kb and P4 probes recognized two restriction fragments in the En(a-) or S-s-U- samples and only one in the homozygous Mk sample. When the probe GPB-3 (specific for the 3' region of the GPB gene) was used, two fragments were detected in the normal control and En(a-) DNAs, but only one in the S-s-U- and Mk samples (Table 1). Further studies by Southern hybridization (not shown), with several cDNA and oligonucleotide probes specific for the coding regions of GPA and GPB, indicated that exons A-2 A-7 of the GPA gene were absent in the En(a-) variant, whereas exons B-2 - B-5 of the GPB gene were missing from the S-s-U- variant, and that both these gene regions were absent from the Mk genome. According to these findings and our previous studies [lo, 111, each restriction fragment detected with the probes P4 and GPB-3 could be assigned to a given gene on the basis of the defects occurring in the glycophorin variants (Table 1). The correct assignment of the fragments detected with the 1.7-kb probe was deduced from PCR experiments, as will be described below. Together with previous data [lo, 111, these findings confirm the presence of three related genes ( G P A , GPB and GPE) in common genomic DNA. Only two of these genes are present in En(a-) and S-sU- individuals (GPBor GPA, respectively, in addition to GPE) and only one in the Mk donor (GPE). The GPE gene, therefore, is present in all human genomes investigated so far. That a third glycophorin gene may exist was suspected independently [34], but this gene has not previously been characterized.

GPE GPA GPB GPB GPE

GPE gene transcript and deduced amino acid sequence Since the results of Southern analysis suggested that the GPB and GPE transcripts were related, PCR amplifications of cDNAs prepared from RNAs of human erythroblasts and erythroleukemic K562 cells were carried out with amplimers P33 and P34 deduced from the GPB gene sequence. A strong signal of approximately 0.5 kb was detected, whereas control PCR amplifications with human adult liver RNAs were negative. After cloning and sequencing, most (> 80%) of the amplified sequences were found to be identical to GPB mRNA, whereas some others identified a new message with an open reading frame encoding a protein of 78 amino acids (Fig. 1A). Calculation of the hydrophobicity index [25] based on the protein sequence shown in Fig. 1A provided evidence that GPE has the structural characteristics of a membrane protein, with a putative leader peptide of 19 amino acids and a hydrophobic carboxy-terminal domain (Fig. 1 B). The leader peptide of the three glycophorin genes is identical, except for a polymorphism that generates A h , Glu or Gly at position - 7 in the GPA, GPB and GPE pre-proteins, respectively. It is predicted that the mature GPE is 59 amino acids long (6.5 kDa), with a C-terminal region (residues 27 - 59) that differs significantly from those of the GPA and GPB proteins. The first 26 residues, however, are identical to those of Mtype GPA, and carry the typcial Serl/Gly5 polymorphism [I, 17, 261. This opens the question as to whether an N-type GPE may exist (Leul/Glu5 polymorphism) and raises the possibility that other glycophorin-A- or glycophorin-B-related glycoprotein(s) might be characterized. Examination of the protein sequence shown in Fig. 1 A indicates that GPE lacks a consensus sequence for N-glycosylation [27] but may carry 0-linked oligosaccharides such as those present on the N-terminal region of GPA and GPB [3, 281. Assuming the presence of eleven such chains, as in GPB, the molecular mass of the GPE glycoprotein should be close to 17 kDa. Among other similar features, both the GPE and GPB polypeptides contain a cysteine residue, although at a position which is not conserved between the two proteins.

~

622

A A

B

E a b e

A

B E a b

-1

e

-1

-1

A

B

E a b e

26 26 26

A

B

E a

49

A

B E a

67

b

35

.__...._-_ ......

A

TTC TC?

B E

-7/ _-ce--_gL-_pbe Ser Clu Fro G - lbr Val A

a b e

- A_-

G M

-

__

gtatgt....ctttcataattttgctgctctctttat __--_----_--_--------___-_g

&---

/

____--_-_--_---c__----_----

4 2

1.8 Kb

1

.____..______...___............... ___ -

-

la Pro Val

v

Exon A-5/B-4/E-3 kTT ___ -T- ___ I G- --- --- --- --- AAT W G TCG GM: ATO GCT CGT G T T lu Ile Thr Leu Ile a1 --- Ile --- --ly --- --- --- --- Asn Trp Trp Ala Uet Ala Arg Val T-

___

76 47 39

A

B E a b e

100

I1 59

B

.......

( 3 REGION OF THE GPA GENE)

..... aMatt&&tattaatattttatggtatttcttcatag a 2.1 Kb

....... ......... ...... ............... ...........

...

R o n A-6 CCA MA CCC GTO CCT l T A TCI Lys Ser Pro Ser Asp Val Lys Pro Leu Pro Ser Pro Asp Thr Asp Val Pro Leu Ser Ser R o n A-I

A

C m C M ATA G M M T CCA G / gttggtg.. ..tttcggtcttgtattttttttactataatccttctag / AG ACA ACT GAT CIA TOA G M - m A

a

Val Glu Ile Clu Ann Pro 0

A

............. 2.1 m

3.2 Kb

~AMATA~AATAMGAMT~CCtgtga~taccat~ccccatg-31

lu Thr Ser Asp Gln end

120

............ 131

623 A glycoprotein with the properties expected for GPE has not yet been characterized on human erythrocytes, even in Mk erythrocytes, which lack the major glycoproteins, GPA and GPB, and are reactive neither with anti-M reagents, nor with anti-N, anti-S and anti-s [22,23]. GPE might be expressed at a low copy number and, therefore, be difficult to detect by conventional methods for glycoprotein identification. It is unlikely that GPE is translated inefficiently, as it contains the same consensus nucleotide sequence around the translationinitiation codon [29] as in the GPA and GPB mRNAs. Alternatively, this glycoprotein might not be correctly inserted into the red cell membrane. Indeed, transmembrane proteins usually contains a stretch of 20-25 uncharged amino acids with a strong hydrophobic character, but the similarly hydrophobic segment in GPE is only 15 residues in length (Fig. 1A and B). Other investigations have shown, however, that an artificial hydrophobic domain of only 16 residues is still capable of functioning to anchor membrane proteins [30]. On the other hand, hydropathy profiles may be misleading to define integral membrane proteins, as exemplified by dogfish lactate dehydrogenase or trypsinogen, which each contain a typical stretch of apolar residues and are nevertheless water-soluble proteins [25, 311. The GPE and GPB transcripts could not be separately identified on Northern blots since they have a similar size (0.5 - 0.6 kb), and because the short oligonucleotide sequence specific for GPE (24-mer, see below) gives no detectable signal. Nevertheless, when RNAs from various cell lines and tissues were examined for glycophorin expression [32], a 0.6-kb transcript was detected only in those of erythroid origin, suggesting that the GPE message, like those of GPA and GPB, is erythroid specific. The biological function of GPE, however, remains an open question as long as this glycoprotein is not detected on red cells.

Structure and organisation of the GPE gene The GPEgene structure (except for exon 1) was determined by restriction mapping (not shown) and nucleotide sequencing of two genomic clones [lo], using the GPE transcript sequence for exon assignment. The GPE gene consists of only four exons (E-1 -E-4) distributed over 30 kb of DNA, whereas the GPA and GPB genes are composed of seven exons (A-I -A7) and five exons (B-I -B-S), respectively [9, 101. Alignment of the nucleotide sequences revealed that the three genes are homologous over most of their structure (Fig. 2). The GPE gene, however, more closely resembles the GPB gene, since it contains identical 3’ end sequences, not present in the GPA gene. Exons 1 of the three glycophorin genes are nearly identical and were difficult to assign, since they are located more

than 15 kb upstream from their respective exons 2 and since no overlapping clone could be identified. The assignment was finally deduced from an analysis of the genomic DNA from the glycophorin variants (see below). The nucleotide sequence of exon E-2 was essentially identical to that of exon A-2 or exon B-2 of the GPA and GPB genes, respectively. One point mutation at the first nucleotide of exon 2 is responsible for the leader peptide polymorphism at position -7, and three others for the MN-blood type polymorphism of the mature proteins (Fig. 2A). Downstream from exon E-2, there is a relatively large intron of 3.6 kb, which includes the nucleotide sequences used as exons A-3 and A-4 in the GPA gene and as exon B-3 in the GPB gene counterparts. This can be explained by two groups of point mutations: (a) a dinucleotide exchange within a tcttgac 5‘ branch site sequence necessary for lariat formation [33] in the GPA gene, into the tcttgtt sequence in the GPE gene, and (b) two point mutations which change the 5‘ splice sites G/gtat of two exon-intron junctions of the GPA gene into gatat in the GPE gene (Fig. 2A). In the GPB gene, the mutations tcttgac into tcttgtt and one of the G/gtat into gatat are also present [9]. The third exon of the GPE gene differs in several ways from its counterparts exons A-5 and B-4 of the GPA and GPB genes, respectively. First, exon E-3 contains an in-frame 24-nucleotide insertion encoding eight amino acids found neither in GPA nor in GPB; secondly it carries as many as ten point mutations, of which eight introduce amino acid substitutions into the protein sequence, one is silent and another generates a stop codon, shortening the reading frame by four codons when compared to its exon B-4 counterpart. A search in the GenBank database did not reveal homology between this sequence and an already known gene structure. As exon E-3 contains a stop codon, the 3’ untranslated sequences are encoded in both exons E-3 and E-4, which are separated by an intron at a conserved position with respect to the GPB gene. Downstream from exon E-3, the GPE and GPB nucleotide sequences are very similar (> 95% identical), and thus diverge from those of the GPA gene (Fig. 2 B and C). Alu sequences are also present between exons E-3 and E-4 (not shown), as found in the GPA and GPB gene counterparts [9]. The direct repeats flanking these sequences in the GPE and GPB genes are 95% identical, but are rather different from those in the GPA gene [9]. Moreover, the postulated breakpoint junction generating GPB from the GPA gene [9] was present at the same position and with the same sequence in the GPE gene (EMBL data accession nmuber X 53009).. It has been recently postulated [9] that the GPB gene has arisen from the acquisition of 3’ sequences different from those of the GPA by homologous recombination at Alu repeats. As we found the same 3’ sequences and the same 3’ Alu repeats in the GPE and GPB genes, our data indicate that these genes

Fig. 2. Nucleotide sequence alignment of GPA, GPB and G P E genes. Nucleotides encoding exons are indicated in capital letters whereas nucleotides of introns and 3’ and 5’ flanking regions are given in small letters. GPA nucleotide sequence data were determined from a genomic clone (16) isolated in our laboratory [lo] and are in excellent agreement with sequence data reported recently [9], except for a minor difference (stretch of seven T in exon 1). However, among PCR amplified exons A-I, some have seven T and others only six T. Sequence data for GPB were partly derived from one of our clones (151) and partly taken from published data [9]. A, B and E refer to nucleotide sequences and a, b and e to amino acid sequences of the GPA, GPB and GPEgenes and proteins, respectively. Identical sequences with respect to GPA nucleotide and GPA protein sequences are indicated by dashes. Exons are numbered A-I -A-7 for GPA, B-1 - B-5 for GPB and E-1 - E-4 for GPE. (A) Nucleotide sequences in which the GPA, GPB and GPE are homologous. The open arrow indicates a 5’ splice site mutation within the GPB and GPE genes. Closed arrows 1 and 2 indicate a splice site mutation within the GPE and GPB genes, respectively; (B) 3’-end sequence of the GPA gene; (C) 3’-end sequences of the GPB and GPE genes. Assignment of exon 1 and upstream sequences between the glycophorin genes was deduced from PCR analysis of genomic DNAs from glycophorin variants (see text). Splicing branch points are indicated by asterisks, and amino acid residues are numbered on the right hand side

624 -B-

-A-

-E-

ss-uMk

Fig. 3. Proposed model of GPA, GPB and GPE gene arrangement on chromosome 4 . This model shows the position of the three genes on chromosome 4, as well as the postulated deletions (thick lines) occurring in En(a-) (Finnish type), S-s-U- (Fav.) and Mk (R. S.) glycophorin variants. For each gene, exon 1 (named A-I, B-I and E-I, respectively) is represented separatcly, since it is located a t least 15 kb upstream from the second exon

may have evolved similarly; however, which gene appeared first remains unclear. Preliminary investigations based on evolutionary studies of non-human primates suggest that the GPB and GPE might have appeared at about the same time, following a duplication of the GPA ancestral gene. Indeed, although the GPA gene was still present in the capuchin monkey, the GPE and GPB genes could only be detected up to the gorilla [34]. Some features of the nucleotide sequences, such as the positions of splice sites in exon E-3 (as compared to its counterpart A-5), suggest that the GPE gene appeared after the GPB gene, from a duplication of the GPA gene. Others, like the quasi-identity of the 3' end regions of the GPE and GPB genes, suggest that GPE arose directly from a GPB gene duplication. Exon 1 and upstream sequences of glycophorin genes in glycophorin variants The genomic clones isolated from the human leucocyte genomic library carry the bulk of the GPE gene but not exon 1 (see above). This exon was tentatively isolated from the Mk variant DNA, since it is the only glycophorin-related gene structure present in this genome (see Table 1). A gene fragment containing this exon was successfully obtained from two different PCR amplifications (using probes PI5 and PI8 as amplimers), but when the nucleotide sequence, determined from four distinct clones, was compared to that of the GPE transcript, two A/G nucleotide exchanges were observed at position 38 and 46 (see Figs 1 A and 2A). This sequence, surprisingly, corresponds to that of exon A-I from the GPA gene. Since this result was rather unexpected, further studies were carried out to examine which exon 1 structure is present in the genomic DNA of other glycophorin variants. Each time, two independent amplification experiments were performed and several clones sequenced to eliminate possible replication errors introduced by the PCR methodology [35]. Out of 10 clones sequenced from En(a-) DNA amplifications, two had nucleotide sequences typical of exon E-I, and eight had typical exon A-I nucleotide sequences (Fig. 2 A). Similarly, amplification analysis of S-s-U- DNA provided three clones with the exon 1 structures typical of A-I and B-I, but no clone carrying exon E-I. In order to differentiate exon A-I from exon B-I, the sequence of 40 nucleotides upstream these exons were compared from the same amplification products. It was found that these sequences could be distinguished by tjc nucleotide exchanges at positions -18 and -28 (Fig. 2A). Both exon B-I and upstream sequences were confirmed independently by sequence analysis of' a genomic clone (1.31). We found in

addition that the GPE gene carries also a tjc mutation at position - 18 (Fig. 2A). Therefore, all these data provided the necessary information to distinguish the 5' regions of the three genes. From the nucleotide sequences of the amplification products of S-s-U-, En(a-) and Mk DNAs, we concluded that the En(a-) genome contains the 5' regions from the GPA and GPE genes, whereas the S-s-U- genome contains 5' regions from the GPA and GPB genes. The only S'region identified in the Mk genome was that from the GPA gene, and not from the GPE gene itself.

Chromosome assignment of the GPE gene DNAs from hybrid 9TK cells containing only chromosome 4 as the residual human chromosome [I71 and from control hamster cells were digested with BamHI endonuclease and hybridized on Southern blots with the 1.7-kb genomic probe. Three strongly hybridization bands, similar to those corresponding to the GPA, GPB and GPE exon 1 fragments (I 1 .O, 9.5 and 7.0 kb, see Table 1) were present in the 9TK DNA digest, whereas no hybridizating bands were found in the control hamster DNA digest (not shown). These results indicate that the human chromosome 4 carries all three glycophorin genes, but their relative distance remains unknown. A model for the organization of the glyrophorin genes

It was known from previous studies that the GPA and GPB genes are adjacent and aligned in that order, downstream of the transcription direction, on chromosome 4 [36]. Since the three glycophorin genes are located on the same chromosome (see above) and since the GPA and GPB genes, but not the GPE gene, are deleted in the Mk genome, the only likely position for the GPE gene is either on the 5' or on the 3' side of the two others. That the GPE gene is located on the 3' side was deduced from the findings that exon A-I but not exon E-I could be detected in the Mk genome. Thus, the deletion occurring in Mk can be explained by the removal of a large DNA region covering most of the GPA gene (exons A-2-A-7), the entire GPB gene (exons B-I -B-5) and exon E-1 of the GPE gene. This would bring exon A-I and its 5' flanking sequences upstream of the bulk of the GPE gene (exons E-2 - E-4). Similarly, appropriate deletions would bring exons A-I and B-I and their flanking sequences upstream of the GPB and GPE genes, respectively, in the genome of En(a-) and S-s-U- individuals.

625 A model which accounts for the gene alterations observed in the glycophorin variants is shown in Fig. 3. The deletion events proposed in this model might occur by chromosome misalignment, followed by unequal crossing-over between homologous regions of the three glycophorin genes, such as occurs in the cr-globin or the haptoglobin gene families [37, 381. For the three glycophorin variants, the deletions should occur between exons 1 and 2 of each gene; when two genes are missing, such as in Mk, the deletion may arise either by a single-step mechanism or by a two-step process in which the En(a-) or S-s-U- types of rearrangement would be intermediates. Further development of the model predicts that reciprocal gene-fusion products carrying gene duplications can be expected from these unequal cross-overs. Such an event could expand gene tandem repeats and create additional glycophorin gene variants still to be discovered. Moreover, this general model allows the possible use of different homologous regions for cross-over-generating hybrid genes of the Lepore and anti-Lepore types, as previously postulated for the Mi.V, Stones and Dantu variants [lo, 11, 13, 141. As a result of partial deletions within the glycophorin gene cluster, new hybrid gene structures are generated by positioning a given exon 1 and its 5’ flanking sequences upstream to another glycophorin gene. These rearrangements are of interest to study the regulation of the glycophorin gene expression. For instance, although the hybrid GPB gene of En(a-) individuals appear to be under the control of the GPA promoter region, the level of GPB protein remains the same as in normal controls, possibly because important regulatory sequences within the GPA gene, as those in the /I-globin gene [39], are removed. Whether the hybrid GPE genes of S-s-Uand Mk genomes can be transcribed under the control of the GPB and GPA promoter regions, respectively, is not known, since erythroid tissues from these donors are not available for analysis and because the GPE protein itself has not yet been found in either the common or variant erythrocytes. Functional analysis of the different promoters of the glycophorin gene cluster will be useful to clarify these questions and to understand how the expression of these genes may be regulated. After submission of this paper, we became aware that the GPE gene and transcript were identified independently by Kudo and Fukuda [40]. We thank C. Andrk (UPR48, CNRS) for assistance in the identification of some genomic clones, M. P. Lefranc (URA199, CNRS) for the gift of unrelated human DNA samples, N . Creau-Goldberg (INSERM U173) and E. Chu (Ann Arbor) for the human-hamster hybrid cell line (9TK) and P. 11. Romeo (INSERM U91) for the supply of human erythroblast RNAs. We are also grateful to Dr M. Girard (CTS Asnieres) for the S-s-U- blood samplc and to Pr C. G . Gahmberg and L. C. Anderson (Helsinki, Finland) for the En(a-) cell line.

REFERENCES 1. Dahr, W. (1986) in Recent advances in blood group biochemistry (Vcngelen-Tyler, V. & Judd, W. J., eds) pp. 23-65, American Association of Blood Banks, Arlington, VA. 2. IIadley, T. J., Klotz, F. W. & Miller, L. H. (1986) Annu. Rev. Microbiol. 40, 451 -477. 3. Tomita, M., Furthmayr, H. & Marchesi, V. T. (1978) Biochemistry 17,4756-4770. 4. Blanchard, D., Dahr, W., Hummel, M., Latron, F., Beyreuther, K. & Cartron, J. P. (1987) J . B i d . Chern. 262, 5808-5811. 5. Siebert, P. D. & Fukuda, M. (1986) Proc. Nut1 Acad. Sci. USA 83, 1665-1669.

6. Siebert, P. D. & Fukuda, M. (1987) Proc. Natl Acad. Sci. U S A 84, 6735 - 6739. 7. Rahuel, C.. London, J., d’Auriol, L., Mattei, M. G., Tournamille, C., Skryznia, C., Lebouc, Y., Galibert, F. & Cartron, J. P. (1988) Eur. J . Biochem. 172, 147-153. 8. Tale, C. G. & Tanner, M. J. A. (1988) Biochem. J . 254,743-750. 9. Kudo, S. & Fukuda, M. (1989) Proc. Natl Acad. Sci. U S A 86, 4619 -4623. 10. Vignal, A., Rahuel, C., El-Maliki, B., London, J., Le Van Kim, C., Blanchard, D., AndrC, C., d’Auriol, L., Galibert, F., Blajchman, M. A. & Cartron, J . P. (1989) Eur. J . Biochem. 184, 337-344. 31. Rahuel, C., London, J., Vignal, A,, Cherif-Zahar, B., Colin, Y., Siebert, P., Fukuda, M. & Cartron, J. P. (1988) Eur. J. Biochem. 177, 605-614. 12. Huang, C. H., Johe, K., Moulds. J. J., Siebert, P. D., Fukuda, M. & Blumenfeld, 0. 0. (1987) Blood 70, 1830-1835. 13. Huang, C. H. & Blumenfeld, 0. 0. (1988) Proc. Nut1 Acad. Sci. U S A 85, 9640 - 9644. 14. Huang, C. H., Guizzo, M . L., Kituchi, M. & Blumenfeld, 0. 0. (1989) Blood 74, 836-843. 15. Marchesi, V. T., Tillack, T. W., Jackson, R. L., Segrest, J. P. & Scott, R. E. (1972) Proc. NatlAcad. Sci. USA 6Y, 1445-1449. 16. Cartron, J. P., Colin, Y., Kudo, S. & Fukuda, M . (1990) in Blood cell biochemistry (Harris, J. R., ed.) vol. 1, pp. 299 - 335, Plenum Press, New York. 17. Stanley, W. & Chu, E. H. Y. (1978) Cytogenet. Cell Genet. 22, 228 - 231. 18. Southern, E. M. (1975) J . Mol. Biol. 98, 503-517. 19. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl Acad. Sci. USA 74, 5463 - 5467. 20. Saiki, R. K., Bugawan, T. L., Horn, G. T., Mullis, K . B. & Erlich, H. A. (1988) Nature 324, 163-166. 21. Colin, Y., Le Van Kim, C., Tsapis, A,, Clerget, M., d’Auriol, L., London, J., Galibert, F. & Cartron, J. P. (1989) J . B i d . Chcnz. 264, 3773 -3780. 22. Dahr, W., Uhlenbruck, G. & Knott, 13. (1977) J . Zmmunogenet 4, 191-200. 23. Tokunaga, E., Sasakawa, S., Tamaka, K., Kawamata, H., Giles, C. M., Ikin, E. W., Poole, J., Anstee, D. J., Mawby, W. & Tanner, M. J. A. (1979) J . Immunogenet. 6, 383-390. 24. Rahuel, C., Vignal, A,, London, J., Hamel, S., Romko, P. H., Colin, Y. & Cartron, J. P. (1989) Gene 85,471 -477. 25. Kyte, J. & Doolittle, R. F. (1982) J . Mol. B i d . 157, 105-132. 26. Anstee, D. J. ( 3 980) in Zmmunobiology ofthe erythrocyte (Sandlcr, S . G., Nusbacher, J. & Schanfield, M. S., eds) pp. 67- 98, Alan R. Liss Inc., New York. 27. Marshall, R. D. (1982) Biochem. Soe. Symp. 40, 17-26. 28. Furthmayr, H. (1978) Nature 271, 519-524. 29. Kozak, M. (1989) J . Cell B i d . 108, 229-241. 30. Davis, N. G . & Model, P. (1985) Cell41, 607-614. 31. Engelman, D., Steitz, T. A. & Goldman, A. (1986) Annu. Rev. Biophys. Biophys. Chem. 15, 321 -353. 32. Le Van Kim, C., Colin, Y., Mitjavila, M. T., Clerget, M., Dubart, A., Nakazawa, M., Vainchenker, W. & Cartron, J. P. (1989) J . Biol. Chenz. 264, 20407-20414. 33. Padgett, R. A., Grabowski, P. J., Konarska, M . M., Seiler, S. & Sharp, P. A. (1986) Annu. Rev. Biochem. 55, 11 19- 1150. 34. Creau-Goldberg, N., London, J., Cochct, C., Rahuel, C., Cartron, J. P., Turleau, C. & de Grouchy, J. (1989) in Humun gene mapping conference (HGMIO), p. 54, New Haven. 35. Krawczak, M., Reiss, J., Schmidtke, J. & Rosler, U. (1989) Nucleic Acids Res. 17, 21 97 - 2201. 36. Murray, J. C., Buetow, K . H., Ferrel, R. E., Siebert, P. D. & Fukuda, M. (1988) Cytogenet. Cell Genet. 47, 149-151. 37. Higgs, D. R., Vickcrs, M . A , , Wilkie, A. 0. M., Pretorius, I.-M., Jarman, A. P. & Wcathcrall, D. J. (1989) Blood 73, 10811104. 38. Maeda, N. & Smithies, 0. (1986) Annu. Rev. Genet. 20, 81 -108. 39. Antoniou, M., deBoer. E., Habets, G. & Grosveld, F. (1988) E M B O J . 7, 377-384. 40. Kudo, S. & Fukuda, M. (1990) J . Biol. Chem. 265, 1102-1110.

A novel gene member of the human glycophorin A and B gene family. Molecular cloning and expression.

A new gene closely related to the glycophorin A (GPA) and glycophorin B (GPB) genes has been identified in the normal human genome as well as in that ...
788KB Sizes 0 Downloads 0 Views