Biochimica et Biophysica Acta. 1088 (1991) 9Y,-103 ,-~ 1991 Else~Jer Science Publishers BV. (bm,-ledical Division) 0167-4781/91/$0350 ADONIS 016747819100062V

95

BBAEXP 92195

Primary structure of pregnancy zone protein. Molecular cloning of a full-length PZP c D N A clone by the polymerase chain reaction K o e n D e v r i e n d t , H e r m a n V a n d e n Berghe, J e a n - J a c q u e s C a s s i m a n a n d Peter M a r y n e n Center for Human Genetics, University of Leu~n, Campus Gasthuisbetg, Leut~n cBelgium) (Received 14 May 1990) (Revised manuscript received 31 August 1990)

Key words: Pregnancy zone protein: Cloning: DNA sequence; (Human)

A full-length eDNA done of the human pregnancy zone protein (PZP) was domed from the hepatoeeilular carcinoma

cell line Hep3B. Based on rite exon sequences of the PZP gene (Devriendt et al. (1989) Gene 81, 325-334; Mm3men et ai., mapuMished data), primer pails were designed to amplify six overlapping fragments of the PZP eDNA. The obtained eDNA is 4609 bp long and contains an open reading frame coding for 1482 amino acids, including a signal peptide of 25 amino add residues. Comparison with the published partial PZP amino acid sequence (Sottrup-Jensen et al. (1984) Prec. Natl. Aead. SoL USA 81, 7353-7357) and the PZP genomic sequences confirmed the identity as a PZP eDNA. 71% of the eomreslm~ng amino acid residues in PZP and human a,-muteroglo~in (ezM) are identical and all cysteine residues are conserved. A typical internal thiol ester site and a bait domain were identified. A P r o / T h r p o l y ~ was identified at amino add position 1180, and an A / G nudeotida polymorphism at bp 4097. Introduction Human pregnancy zone protein (PZP) is quantitatively the most important pregnancy associated plasma protein. PZP was first described by Smithies, who detected a protein zone on starch gel electrophoresis, present in late pregnancy serum [3]. Later, this protein was found in high concentrations in the plasma of all pregnant women (mean value of 1000-2000 rag/l), while very low basal concentrations were detected in nonpregnant women (5-25 mg/l) and in men (5-15 mg/l), with a slight age-dependent increase [4-6]. Increased PZP production was shown in response to the adn-,ini~ tration of estrogens, as in patients with prostate carcinoma [7], or in women taking oral contraceptives

Abbreviations: P'ZP. pregnancy zone protein; a2M, a2-macrosiobulin; aM, a-macrogiobulin; bp, basepair; PCR, polyngrase chain reaction; nt, nucleotide; EtBr, ethidium bromide; all 3, rat a t inhibitor 3; C 3, C,s, complement factor 3 and 4. The nucleofide sequence data reported in this paper will appear in the EMBL/Genbank and DDBJ nucleotide sequence Database under the accession number X54380. Correspondence: P. Marynen, Center for Human Genetics, University of Leuven, Campus Gasthuisberg. Hca-estraat 49, B-3000 Lenven,

Belgium.

[81. In addition, elevated PZP blood levels were found in association with certain tumors [9,10]. PZP is a member of the a 2 - m a ~ r o g l o b u l i n plasma protein family (for a review see Ref. 11). PZP is an ae-glycoprotein with an M, of 360000, consisting of two identical, covalently bound subunits of 180000 !12,13]. Partial amino acid sequence determination of PZP showed that 68% of the amino acid residues are co,nserved in PZP and a2M [2]. The PZP protein contains a stretch of amino acids with recognition sites for a variety of proteinases !14-161. Upon cleavage of this "bait domain', a sequence of events is triggered, leading eventually to the hydrolysis of an internal thiol ester bond and the trapping of the proteinase i15-17]. In addition, a conformational change of the molecule occurs, resulting in the exposure of a receptor binding domain, recognized by a cellular receptor on hepatocytes [181, fibroblasts 117], macrophages 119] and on the syncytiotrophoblast [20]. This mediates the fast removal from the circulation of the protemase-macrnglobulin complex, by receptor mediated endocytosis [211. Recently, genomic cloning of the PZP gene and chromosome mapping showed a cluster of three related genes, PZP, a2M and an a2M pseudogene on human chromosome 12p [1]. This further illustrated the close evolutionary relationship between PZP and azM. In this report, we present the cloning and sequence of a full-length PZP cDNA clone from the human

96 hepatocellular carcinoma cell line Hep3B, obtained by means of the polymerase chain reaction (PCR) technique.

S"

//

The hepatoma cell lines Hep3B and HepG2 [22] were cultured in DMEM-F12 (Duibecco's modified Eagle's medium 50~, Ham F12 50~) without phenol red, with 4~ glucose (GIBCO) and supplemented with 2~ Uitroser-G (GIBCO).

cDNA synthesis Total RNA was extracted as described [23]. eDNA was prepared from 20 pg total RNA with MMLV-reverse transcriptase (500 U, BRL), and primed with oligo(dT)17 (2.5/tg), in the recommended buffer, with 500 /tM of each dNTP and RNasin (20 U). After ethanol precipitation, the eDNA was resuspended in distilled water and 25% of the reaction product was used for amplification.

PZP mRNA

TAILED ¢l[gt4

Materials and Methods

Cell culture

~)n

®

®÷ ~ m t 7

(e)

~(~ ®.~Co)

*~

P'~34Tn

*cnn®

Fig. 1. Schematic representation of the amplification of the 5' end of the PZP eDNA with the adapted RACE protocol [25]. The four steps of the amplification procedure to obtain the 5' end of the PZP eDNA are shown. The different oligonucleotide primers are represented by different arrows. Step 1: Reverse transcription with the PZP-specific primer PZP 405R. Step 2: Poly(A)-tailing of the obtained eDNA, after removal of the PZP 405R primer. Step 3: First amplification reaction, with the primers adaptor(T)lT, PZP 363R and adaptorprimer. Step 4: Re,amplification of the fragments 300-1000 bp of the first PCR, by means of the primer PZP 347R and the adaptor-primer.

Polymerase chain reaction Polymerase chain reactions were performed in l0 mM Tris-HC! (pH 8.3), 50 mM KCI, 1.5 mM MgCI 2, 0.01~ gelatin, 100 pmol primers, 200 ~tM of each dNTP and 2.5 U Taq DNA polymerase (Perkin-Elmer Cetus, Norwalk, CA), in a total volume of 100 pl, overlaid with 100 pi mineral oil [24]. After an initial 4 rain denaturation at 94°C, 40 to 50 cycles were performed on a Perkin Elmer Cetus DNA Thermal Cycler: denaturation 1 rain 94°C, annealing 1 rain (62°C to 72°C, depending on the different primers) and primer extension for 1 to 2 rain at 72°C, and followed by a final extension step at 72°C for 1O udn. As a general rule, we took 1 rain extension for each 1000 bp. Each amplification experiment included a water control and a negative control, containing an a2M eDNA clone (pa2M 1, obtained from C.C. Kan, La Jolla, CA). The oligonucleotides were synthesized on a Cyclone DNA synthesizer (Biosearch, San Rafael, CA).

Cloning of the amplification product Amplification products were extracted with 100 pl chloroform and ethanol precipitated before analysis on ethidium bromide stained agarose gels. Appropriate bands were cut from the gel and electroeluted. Fragments were phosphorylated and ligated in the dephosphorylated Smal cut plasmid vector pGem-3Z (Promega, Madison, WI). Alternatively, some fragments were ckmed alter digestion with the appropriate restriction enzymes.

Cloning of the 5" and 3" ends of the PZP eDNA The 5" end of the PZP eDNA was obtained following the RACE protocol [25] (Fig. 1). 50/tg total RNA was

reverse transcribed with the PZP-specific primer 405R. Excess primer was removed on a Bio-Gel A1.Sm column (Bio-Rad) and the eDNA was ethanol precipitated. Next, a poly(A) tail was added to the eDNA with terminal deoxynucleotidyl transferase (30 U), 200 pM dATP in 1 x tailing buffer (BRL) and a total volume of 60 pl, for 10 rain at 37°C. Subsequently, ampfification of the PZP 5" end eDNA was performed with 507o of the tailed eDNA. The primers were an internal PZP specific ofigonueleotide (PZP 363R, 25 pmol) and the aspecific primers oligo(dT)-adaptor (10 pmol) and the adaptor primer (25 pmol). The first cycle consisted of 1 min at 94°C, 2.5 min at 55°C and 60 min at 72°C. This was followed by 40 cycles 94°C (I min), 62°C (I min) and 72°C (I min). An aliquot was analyzed on an EtBr stained agarose gel and by a Southern blot, hybridized to the ofigonucleotide PZP 297F. The DNA fragments between 300 bp and I000 bp were eluted from the gel and reamplifled using the primer PZP 347R and the adaptor primer, I00 pmol each. The PCR product was again analyzed on an EtBr stained agarose gel and with a Southern blot hybridized to the internal ofigonucleotide PZP 297F. The fragments were cloned in the vector pGem-3Z after Sa/l digestion. To isolate additional clones containing the longest f~?~ments, colony hybridization was done, using the PZP specific ofigonucleotide PZP 17F (5' ~TTATCCCTCACAATG). The procedures for S 11 blot analysis, cloning and colony hybridizath ,,e been described [26]. The 3" end of the PZP eDNA was cloned in a similar way as the 5" end [25]. eDNA was synthesized with the

97 BAIT l~a;

~

RRS

4609 rG~

i

II

0.7

J *

III

1.31

i i

IV

1.37 V

1.4 ,

VI

0.6

p

Fig. 2. PCR Cloning strategy of the PZP cDNA. The PZP cDNA is schematically represented 5' to 3'. Below, the positions of the six overlapping amplified fragments (i to VI) is shown with their respective length in kb. The positions of the start-codon (ATG) and stop-codon (TGA) is indicated. The location of three important functional domains is shown: the bait domain (~ ~ ), the internal thiol ester site (*) and the receptor recognition site (thick line).

ofigo(dT)17 adaptor as a primer (2.5/tg). After ethanol precipitation, PCR was performed with the primers PZP 3884F (25 pmol), and the adaptor dT(17) (10 pmol) and adaptor primer (25 pmol), using the same PCR cycles as for the 5' end. Reampfification was done using 1/1000 of the reaction product with the primer PZP 3977F and the adaptor primer.

DNA sequencing Dideoxy chain termination sequencing [27] was done with T7 DNA polymerase (Pharmacia, Uppsala, Sweden). Compressions were resolved by means of 7deaza dGTP (Pharmada, Uppsala, Sweden). Subclones were constructed in the vector pGem3Z with the appropriate restriction fragments or by means of shotgun cloning. At least three independent clones from each ampfified fragment were sequenced. The obtained data

were analyzed with the GENEPRO program (Riverside Sci, Seattle, WA). Results Our attempts to clone a PZP cDNA by conventional techniques were hampered by two problems. First, tht. apparently very low basal levels of PZP tuRN ~, and our inabihty to induce detectable synthesis of PZP in ~ i:ro. Second, the difficulty to detect this rare PZP transcript in a population of a2M mRNA. Indeed, the human a2M and PZP sequences are closely related, as was apparent from the partial PZP aa sequence ]2], and from the exon sequences of genomic PZP clones [1]. In addition, those tissues thought to produce PZP, also synthesize a2M [11]. As a first step in the characterization of PZP, we isolated a genomic PZP clone, repre-

TABLE l

Sequence of the oligonucleotideprimers The sequence of the oligonucleotide primers for each amplified fragment (I to VI) is shown in the 5' to 3' end direction. The numbering indicates the position of the first nucleotide in the PZP eDNA. 'F' means forward primer, 'R' means reverse primer.

PZP 363R PZP 347R

ITi ~TGTTCAGTACCAGA (reverse transcription primer) C'ITGCGTA~CCCTTTATC TATCTGGATGCTAAGGAATGCC

Fragment ii:

PZP 297F PZP 1002R

CTCCCAAGGATC TCAGCCTCTTCA C i'TCrCGGATCCT~CCACTCT

Fragment II1:

PZP 939F PZP 2252R

GCATC~CAGA~ACAAATACC-~

Fragment IV:

PZP 2901F PZP 3465R

ATCCCTTCCATGTCTGCAGGA GC CKTK;TCCCCTCt.: !T I GCTACA

Fragment V:

PZP 2792F PZP 4188R

I T I CAGATCTATGACCTGTGCC GAA'll'CACCATATrC_K~AAGCAGGACC-~JI l -I

Fragment VI:

PZP 3884F PZP 3977F

CG'I']'CAGGA'I~CACAGACu1-1-1 - 1 C T A C A A A ~ AGAATATGTCATAACAGTAA~AAA

Fragment I:

PZP 405R

AAAATCA'VrGCGCACCG i H CAGGGA

Adaptor primer:

GCATC_~GTCGACGAG

Adaptor (dT) 17:

GCATC~GTCGACGAG-O")I 7

98

senting the 3' end of the gene [1]. Using this clone, additional clones spanning the entire PZP gene were isolated, and the PZP exons were sequenced (Marynen et al., unpublished data). From these data, a set of primers specific for PZP was designed to amplify the complete PZP cDNA in six overlapping fragments (Fig. 2). The oligonucleotide primers showed minimal sequence homology with the a2M cDNA sequence, especially at their 3' ends (Table I). Some oligonucleotides were modified and contained a restriction site, to facilitate the cloning of the amplified D N A fragments. Total RNA, extracted from the hepatocarcinoma cell line Hep3B was reverse transcribed into cDNA. Subsequently, four overlapping fragments of the PZP cDNA (fragment II to V, Fig. 2) were amplified using PZPspecific oligonucleotides. Distinct bands were visible on ethidium bromide stained agarose gels. Aspecific fragments appeared in the amplification reactions, but this could largely be reduced using higher annealing temperatures and shorter extension times. The 5' end the PZP cDNA was cloned following a protocol for amplification with single sided specificity (see Materials and Methods). After the first amplification reaction, no sharp bands were visible on EtBr stained agarose gel, and a Southern blot, hybridized to an internal oligonucleotide, showed a high background. Therefore, a second amplification reaction was performed, using the adaptor primer and a PZP-specific primer .internal to the first PZP primer. This resulted in four distinct bands on an EtBr stained agarose gel. Further analysis by means of a Southern blot (Fig. 3) and by sequencing showed that the four PCR products originated '~-em the 5" end of the PZP cDNA. Clones from the i~tgest fragment of 420 nt had an added pot~" ~ tail 46 to 51 base pairs long. Twelve out of the t~,~~t~:~:~clones analyzed started at the same position (G nt 1 of the cDNA), whereas one clone started 12 nucleotides downstream. The two shorter fragments of 320 nt and 200 nt contained a poly(A) tail of 15 to 20 basepairs and started at two distinct positions of the cDNA (at bp 72 and 195, respectively). The shorter 140 nt fragment was not analyzed. The 3' end of the PZP was obtained in a similar way as the 5" end, including a second amplification round to increase the specificity of the amplification process. The poly(A) tails of the sequenced clones ranged in length from 11 to 50 basepairs. Some of the clones lacked the last two basepairs (T and C), most likely the result of a mispriming of the oligo(dT)-adaptor primer to the three adenine nucleotides immediately upstream.

400

-

3 2 0

-

200

-

lZlO

-

Fig. 3. Southern blot analysis of the 5' end amplificationproduct. 10~, of the PCR products of the 2nd amplification reaction of the 5' end (see Materials and Methods) was analyzed by means of a Southern blot, hybridized to the specific oligonucleotide PZP 297F. The nuclcotidelength of the fragmentsis indicated.

The cDNA sequence

The different amplified PZP c D N A fragments were cloned and sequenced and a contiguous c D N A sequence of 4609 bp was obtained {Fig. 4). The sequence from at least three independent clones from the same amplification reaction was determined. A total of 44 869 nucleotides were sequenced, representing both strands. A single open reading frame was found, with an A T G start codon at bp 30 and a T G A stop codon at bp 4476. A polyadenylation signal (AATAAA) was found 26 bp upstream of the poly(A) tail. The identification of the fragments as PZP c D N A clones was based on the comparison with the published partial PZP AA sequence [2] and with the genomic PZP clones ([1], Marynen et al., unpublished data).

Fig. 4. The cDNA and amino acid sequence of PZP. The cDNA sequence of PZP is shown (upper row) in the 5' to 3' direction. The ATG statt-codon (bp 30) and the AATAAApolyadenylationsignal (bp 4583) are in bold letters. The TGA stop-codon (bp 4476) is indicated by a (*). The correspondingamino acid sequenceis shown below in the singleletter code, ( + l ) indicates the amino terminal Thr of the mature protein. The bait domain is underlinedand the thiol ester site is in bold letters.

GGACACAACCCTGAGATTTATCCCTCACAATGCC.-C~I~GACAGACTTCT TCATTTATGTCTTGTGCTACTTCTTATCCTGCT TTCTGCCAGTGACTCAAACTCT A C A ~ G T A T A -25 M R K O R L L H L C L V L L L I L L S A S D S N S T E P Q Y H +1 TC~vTG~LTGOvT~p~TCC~LG~LHC~.~TCTEGA~KAAC~K GG~GTGTI~LTTCLTG~G~HC~LTN C ~ A T ~ T ~ C T O T ~ G T ~ = ~ G ~ C T ~ ~

120 6

GccTcTTcAcTGAccTGGTc`c`cGGAGAA~ATT~cTGTGTcTccTTcAcTcTccc~GATcrcAGccTcTrcAGAGGT~TTA~T~T~TA~

36o

L

F T

D L

V A

E K O L

F H C V S F

T L

P R I

S A S S

E V A F L

S Z Q I

K G p

T Q

240 86

~GATTTcAGG~A~AGTrcr=TAcTc`~`AcccAAAGTcT~TcT~zaT~~~TGTAT~~T~TT~Gw~TcT~T=

F V Q T D K P M Y K P G Q T V R F R V V S V O

480 126

ATGAAAATTTT~CGAAATGAACTGATTCCACTGATATACCTTC~".,/AACCCAAGAAGAAATCGAATTGCACAATGGCAGAGTCT~TA~T~T~T~G~GT~ E N F R P R N E L I P L I Y L E N P R R N R I A Q N C) S L K L E A G ] N Q L S F

600 166

TTCCCCTC~CATCAGAGCCCATTCAGGGCTCCTACAGGGTGG~GGTACAGACAGAA~CAGGT~TA~~T~GT~G~T~ P L S S E P I Q G S Y R V V VQT E S GG R I Q H P F T V E E F V L P K F E V K

~

AAGTT~AGGTGCCAAAGATAAT~AGTATCATGGATGAAAAAGTGAACATAACAGTCTGTGGAGAATACACTTATGGGAAG~CTGTCCCAGGACTTC~CAACTGTGAGCCTGTGTA~T V 0 V P K I I S I H D E K V N I T V C G E Y T Y G K P V P G L A T V S L C R K L

840 246

D F R K R N T V L

V

L

N T Q S L V

TATCTCGTGTTCTTAATTGTGACAAGCAGGAGGTCTGTGAGGAATTCAGTCAACAGCTTAACAGcAATGGCTC.`CATCAC~cAA~AAGTACACACCAAAATGCTCCA~A~T~ S R V L N C D K Q E V C E E F S Q Q L N S N G C I T 0 Q V H T K N L Q I T N T G

"~

GCTTTGAAATGAAGCTTAGAGNGGATC~AGAGAA~GGAAGTCACTNTCAG~GAAATCACAAACATTGTAT~CAAACTCAAA~T~G F E M K L R V E A R ] R E E G T D L E V T A N R ] S E I T N 1 V S.K L K F V K V

1080 326

TGGATTCACACTTTAGAC~j~GAATCCcCTTTTTTGCACJ~GGTGCTTCTGGTGGATGGAAAAGGTGTGCCCATCCCCAAT~C~T~GT~T~A~ D S H F R Q G I P F F A O V L L V D G g G V P 1 P N K L F F [ S V N D A N Y Y S

1~

CCAATC-CAACCACCAAT~CTTGCACAG 1t 1]'CAATCAATACTACCAGTATCTCGGTTAATAAACTTTTTGTCCGGGTTTTCACTGTGCAT~GTGT~A~ N A T T N E O G L A Q F S I N T T S I S V N K L F V R V F T V H P N L C F H Y S

1320 406

CATGGGTAGCAGAAGACCACCAGGG~CAGcACACTGCAAATCGTGTTTTCTCCTTAAGTGGAAGTTACATTCA~GT~TA~T~TGT~~ H V A E D H Q G A Q H T A N R V F S L S G S Y I H L E P V A G T L P C G H T E T

1440 446

CTATCA~CA~TATACACTGAATAGA~TGGGAGAGTTATCGGAGCTcAGTTTCCATTACCTGATCATGGCTAA~GTCATCGTCAGAT~TGGAA~T~G I T A H Y T L N R Q A M G E L S E L S F H Y L I N A K G V % V R S G T H T L P V

1S60 486

TGGAGTCAGGAGACAT~GTTTTG(~`TTATCCTTCCCTGTGGAGTCAGACGTT~TTGCACCd~ATGTTCATCTTTGCCA~TA~T~GTTG~ T E S GO M K G S F A L S F P V E S O V A P 1A R M F I F A I L P O GE V V GD S

1,~

CTGAAAAATTTGAGATTGAAAACTGTCTA~GGTGGATTTGAGCTT~GCACAAAGTCCCCCAGCCTCACATGCCCACCTGCAAGTAGcAGCTGCTCCGcAGT~T~ E K F E I E N C L A N K V D L S F S P A 0 S P P A S H A H L Q V A A A P Q S L C

1800 566

GTGCCCTTCGTGCTGTGGACCAAAGTGTC~`CTGcTCATGAAC`CCTGA~GAGCT~TCTGTGTCCTCAGTATATAAT~TGCTAA~GT~TCT~~TGT~ A L R A V D Q S V L L N K P E A E L S V S S V Y lu L L T V K D l T N F P O N V O

1C~

ACCAGCAGGAGGAAGAACAAGGACACTGTCCCCGTCCTTTCTTCATTCATAATGGAC.-CCATC IA T G T T C C C T T A T C A A G T A A T G A A G C A G A T A ~ A T ~ T ~ T ~ T Q O E E E Q G H C P R P F F I H N G A [ Y V P L S S N E A D ! Y S F L K G N G L

2040 646

TGAAGGTGTTCACTAACTCAAAAATCCGAAAACCAAAGTCGTGTTCAGTCATCCCTTCCGTGTCTGCAGGAGCAGTA~T~TA~TAT~T~T~T~C K V F T N S K 1 R K P K S C S V I P S V S A GA V GQ G Y Y G A G L G V V E R P

216~

CATATGTTCCTCAATTAGGCACATATAATGTGATACCCTTAAATAATGAACAAAGTTCAGGGCCAGTCCCTGAAACGGTG(~JU~ATTTTCCTGAGACTT~T~~ Y V P ~L G T Y N V 1P L N N E Q S S G P V P E T V R S Y F P E T M I H E L V A

272~

CAGTGAACTCATCAGGTGTGGCTGAGGTAGGAGTAACAGTCCCTGACACCATCACCGAGTGGA~CTG~CTGTCTGAA~T~TAT~T~T V N S S G V A E V G V T V P D T 1T E H K A G A F C L S E D A G L G ] S S T A S

27~

CTCTCCGAGCCTTCCAG~cCTTCTTTGTGGAGCTCACAATGCCTTACTCTGTGATTCGTGGAGAGGTCTTCACACTCAA~T~A~T~T~T~ L R A F Q P F F V E L T M P Y $ V I R G E V F T L K A T V L N Y L P K C I R V S

2~

GTGTGCAGCTGAAAGCCTCTCCAGCC~CCTAGCTTCCCA~TACA~GA/~TCCTATTGTATCTGTGGAAGTGAG~GTCTTGGACAGTGA~C VQL K A S P A F L A S Q N T K G E E S Y C I C G S E R Q T L S W T V T P K T L

2~

TGGGGAATGTGAACTTCTCAGTGAGTG~GAGGCAATGCAGTCCTTAGAACTCTGTGGAAATGAGGTTGTTGAGGTCCCTGAGATTAAAAGAAAA~T~T~G~ G N V N F S V S A E A N Q S L E L C G N E V V E V P E I g R K D T V [ K T L L V

2760 B86

TGGAGGCTGAAGGTATTGAGCAAGAAAAGACTTTCAGTTCCATGACCTGTGCCTCAGGTGCTAATGTGTCTGAGCAG~GT~~T~TGT~T~ E A E G [ E Q E K T F S S M T C A S G A N V S E Q L S L K L P S N V V K E S A R

2880 926

GAGCTTCTTTCTCAGTTCTGGGTGACATATTAGGTrCTGC~TATG~TATACA~TCTCCTCCAGATG~CATATGGCTGTGGAG~T~T~A~~T~ A S F S V LGD I L G S A M Q N I Q N L L Q M P Y G C G E Q N M V L F A P N 1Y ~TGTCTTGAACTATCTGAATG/L~CCCAGCAGCTGACGCAGGAGAT~TTGGCT~TCTCATCACTGGTTACCAGA~~~T~ V L N Y L N E T Q Q L T Q E I K A K A V G Y L I T GY 0 ROL N Y K H Q D G S Y

30~

?~

ACAGCACcTTTGGG~TATGGCAGGAACCAGGG~CACTTGGG~CACAGCTTTTGTACTGAAGACTTTCGCCCAGGcTCGATCCTACATCTTCATTGATGAAGCA~TTACCC S T F G E R Y G R N Q G N T H L T A F V L K T F A Q A R S Y I F I D E A H I T Q

l~i2~

AATCTCTCACGTGGCTCTCCCAGATGCAGAAGGACAATGGCTG ,~CAGGAGCTCTGGGTCACTGCTCAACAATGCCATAAA~TGTA~T~~ ~ATG S L T W L S Q M Q K D N G C F R S S G S L L N N A I K G G V E D E A T L S A Y V

3360 10e6

TTACTATTGCCCTTCTGGAAATTCCTCTCCCAGTCACTAACCCTATTGTTCGCAATGCcCTGTTCTGCCTGGAGTCA~TGTA TGGGAGCCATGTCT T ] ALLE ] P L P V T N P I V R N A L F C L E S A N N V A K E G T H G S H V Y ACA~CAAGGCATTGG`TGGCCTATGCTTTTTCCCTACTGGGAAAG~TCAGAATAGAGAAATACTGAACTCACTT~T~GT~~T~ T K A L L A Y A F S k L G K Q N QN R E I L N S L D K E A V K E D N L V H k E R

~ ~

GCCCTCAGAGACCCAA~CCAGTGGGGCATCTTTACCJ~CCCTCTGCTGAGGTGGAGATGACATCCTATGTGCTCCT~GCTTATCT . A P Q R P K A P V G H L Y Q T Q A P S A E V E M T S Y V L L A Y L T A Q P A P T S

3720 1206

CAGGGGA~cTGACCTCTGCAACTAACATTGTGAAGTGGATCAT~GA~TGGTTTCTCCT~CACCCAGGACACAGT~T~T~~ATG G D L T $ A T N I V K H I M K Q Q N A Q G G F S S T Q D T V V A L H A L S R Y G GAGCAGCCACTTTCACCAGAACTGAGAAAACTGCACAGGT~T~CAGGATT~CTACAAATTTCCAAGTAGACAACAACAA~CTCCTATTACTGcAGCAGATGTCAT A A T F T R T E K T A Q V T V Q D S Q T F S T N F Q V D N N N L L L L Q Q I S L

3840 1246

TGCCAGAGCTCCCTGGAGAATATGTCATAACAGTAACTGGGGAAAGATGTGTGTATCTT•AGACATCCATGAAATACAATATTCTT P E L P G E Y V I T V T G E R C V Y L Q T $ N K Y N I L P E K E O $

14~

P

FTTTGCTITAJU~AG A L g V

TGCAGACTGTGCCCCAGACTTGOGAT C~ATCTCACTGACCATCAGTTACACA~ AAOCGTCCTGCTT~TATGGTGATTGTTGATGTAAAGA Q T V P 0 T C D G H K A H T S F Q I S L T I $ Y T G N R P A S N 14 V I V D V K N TGGTATCTGGTTTTATTCCCCT~TA~TGCTTGA~TCTAGCTCTGTGAGC~TGAGC~TGTCCTCATTTAT TC V S G F I P L K P T V K N L E R S S S V S R T E V S N N H V L I Y V E Q V T N Q AGACG/2TAAGTTTTTCCTTCATGGTTCT~TCCCAGT T L $ F S F M V L Q D I P V G D L

K P A

I

~ 4320 1406

1-[GTTAAAGTCTATGATTACTATGACJiC.AC-U~TGAGTCTGTGGTTGCTC-,AGTATATCG 1444~ V K V Y D Y Y E T D E $ V V A E Y T A

CC(p C~TACAGAGCATGGAAATGTTTGAGGACCATACAGG~TGTATATTTTGGTGGATTCTCTGTCCTATACATTTACTTAGAAGGAATGGAGTTATTTGTCT~AT~ 445~ T E H G N V I, ATAGACACTNULEATATTTGCTGNtTAAATATGTACTTCTGGTCAAACT( A)n

4Ca09

100 TABLE il

Protein structure o f P Z P

The c D N A contains an open reading frame coding for the 1482 amino acids of the proPZP protein. A typical signal peptide of 25 amino acid residues is found at the amino terminal end. The calculated molecular weight of the mature, unglycosylated protein is 161 028, which is in good agreement with earlier results, given a reported 10~ carbohydrate content [12]. Comparison of the PZP and human a2M amino acid sequences shows an overall high conservation (Fig. 5). 71~ of all corresponding amino acid residues are identical in both proteins. This figure increases to 78~ when chemically similar residues are included (V = I == L,K = R,T -- S,E = D,F ==Y,G = A,Q = N). As described for rat a2M and all3, the carboxy-terminal half of PZP is also more conserved than the amino-terminal half of the molecule (Fig. 5). PZP contains 26 cysteine residues in a position corresponding exactly to the human and rat a2M cysteins [28-30]. In the mature protein, ten potential N-glycosylation sites were identified, eight of them being conserved in human a2M [28,29]. Taken together, this predicts a very similar tertiary structure for these two macroglobulins. Near the middle of the protein, a region is found that shows no resemblance to any of the corresponding a M sequences known so far. This domain is present in all a-macroglobulins and is called the ' b a i t domain', since it contains recognition sites for a wide variety of proteinases [14]. The PZP bait domain is 49 amino acid residues long, which is ten more than the human a2M bait domain [14]. 58 amino acids upstream

IEm n

~1211 A

.

• ~

."

•.

~

/

/...

'

/,,

,

.

.

.

[

.

. .

"

..

,

" ;,,,

~,

,

B



F

. ."

,

n

i

'.

. . . .

a

:"

.

,

n

.

.

,-



.,,

,u paopZp

Fig. 5. Dot blot comparisonof PZP and human a2M. Human proPZP (x-axis) and proa2M (y-axis) were compared in a dot blot matrix by means of the C~acpro Program (window 3. matches 3. ktup 1). The positions of the bait domain ( J, ~,), internal thiol ester site (*) and the 20 kDa carboxyl terminal receptor binding domain are indicated (11).

Comparison with the published partial PZP amino acid sequence

The nineteen amino acid differences between the PZP sequence in Hep3B (column 1) and the pubfished partial amino acid sequence [2] (column 2) are shown. Numbering is according to the sequence in Fig 4. For comparison, the corresponding human a2M sequence is shown in column 3. ( * ) indicates chemically related amino acid. Residue 663 is within the bait domain, and residue 624 is in the unconserved region preceding the bait domain. The underlined Asn residues 728 and 907 are potentially glycosylated in PZP. The lie residue at position 663 is in agreement with the recently published PZP bait domain sequence [14]. Amino acid 18 20 40 268 394 399 411 455 460 470 542 624 663 728 832 907 918 1180 1450

PZP Hep3B Pro Lys Ser

Asn Val Pro Asp( * ) Arg( * ) Glu lie Ser Gly lie Asn Ser Asn Ser Pro/Thr Thr( * )

PZP (2) Val lie Glu Lys Ser Vai Glu Lys Gin Thr His Thr

Hu a2M Thr Lys Ser

His Ser Val Gin Gly Gly lie Ser Gly

Thr

Gin Asn Glu Pro Pro Ser

Asn Asn Glu Pro Pro Lys

the bait domain, a second divergent region is found in PZP and human a2M. Similarly, an unconserved region has been described in rat a2M and anl 3 in the corresponding position [30-32]. Immediately downstream the bait domain, a long stretch of 108 amino acids are found that are extremely conserved in human a2M and PZP (95% identical residues). The sequence C G E Q (aa 953 to 956) is found in a position identical to human a2M [28,29] and rat a2M [30] and a l l 3 [31,32], flanked by two regions more conserved than the overall sequence. An internal thiol ester bond is formed between the Cys and Glu residue and constitutes a major feature of the a M plasma protein family. When compared to the published partial amino acid sequence based on the analysis of random tryptic peptides of PZP [2], 19 amino acid differences are noted (Table II). The sequence of the HEP3B e D N A shown here, however, was confirmed by sequencing at least three independent clones and no polymerase misincorporations were found in these positions (see below). In addition, the corresponding exon sequence is in agreement with these results, except at amino acid 832, where an Asn residue is found (Marynen et ai., unpublished data). These amino acid differences could in fact be

101 amino acid sequence polymorphisms. Indeed, three of them show substitution by a chemically related amino acid. Furthermore, the data seem to support the presence of a Pro/Thr amino acid polymorphism at residue 1180: of four isolates of fragment V obtained from mRNA isolated from Hep 3B mRNA, three showed a Thr at position 1180 and one a Pro. Three additional clones were obtained from mRNA isolated from the HepG2 cell line (not shown), one coding for a Thr and two coding for a Pro suggesting both cell lines to be heterozygous. Interestingly, the Pro in this position is a highly conserved residue, since it is present in all thiol ester protein sequences known so far, including the complement factors C3 and C4. In addition, at nucleotide 4097, an A / G polymorphism was found in Hep3B and HepG2, both being heterozygous at this locus (15 isolates were sequenced of fragment V and VI, 6 with a G, 9 with an A). This DNA sequence polymorphism is not apparent at the protein level, since both codons, CAA and CAG, code for a Gin residue.

PCR artefacts When we aligned the sequences of independent isolates, a total of 36 differences were noted. This is the result of misincorporations in the amplification process. The error frequency in the final cloned amplification products ranged from 1/345 to 1/752 basepairs, depending on the amplification conditions. We observed a majority of transitions A-T to G-C. Almost half of the mutations consisted of a T to C transition, which is in agreement with previous reports [24,33-35]. No deletions or insertions were noted. In the clones V, isolated from Hep3B, three different haplotypes at the two polymorphic sites nucleotide 3642 and nucleotide 4097 were found. This resulted from a PCR artefact, described as the shuffling of alleles [24]. Incomplete primer extension will yield a shorter DNA fragment, that can anneal, :~n a next cycle, to a fragment representing the other allele. Discussion

The polymerase chain reaction circumvents the problems associated with the cloning of a rare PZP mRNA in a population of cross-hybridizing a2M mRNA. By means of this highly sensitive amplification reaction, readily cionable amounts of eDNA were obtained, starting from a limited number of eDNA copies. It has been shown that specific amplification of two closely related sequences can be obtained with primers differing only at the 3' end nucleotide [36]. The knowledge of exon sequences allowed us to construct highly specific PZP amplification primers, not cross-hybridizing with aeM eDNA. The specificity of the amplification relied on small (20 to 30 nO oligonucleotides0 hybridizing at a high temperature. Also, reamplification with internal

primers was done for certain fragments to increase the specificity of the amplification process. Primers annealing to sequences from different exons were chosen to avoid the amplification of contaminating genomic DNA. The amplification process also yields artefacts, such as truncated fragments and amplification of contaminating ge,oa, ic DNA or reverse transcribed, unspliced RNA. These were easily recognized when compared to the known PZP amino acid and nucleotide sequences. In addition to this, a more generally observed problem with PCR is nucleotide misincorporation by the Taq DNA Polymerase. Its frequency has been estimated at around 1/10000 to 1/50000, depending on the conditions of the amplification process (such as the concentration of dNTP's and Mg 2+, the eventual presence of DMSO, and the temperature) [371. Since these errors are propagated in the following amplification cycles, mistakes will be present in the final amplification product, the relative number of which will depend on the number of initial template molecules. Unless the initial number of templates i,~ extremely low, or the misincorporation rate is very high, the chance of selecting two random clones containing the same misincorporation is very small. Indeed, if only two template cDNA molecules were present in the original reaction mixture and a misincorporation would occur in the first reaction cycle, only 25~ of the reaction products would carry the mutation. In present experiments, amplification reactions with 10~ of the amount of eDNA normally used as described in Materials and Methods also yielded a visible PZP band on EtBr stained gels (not shown). Therefore, at least ten template molecules (and probably more) were used for the amplification reaction reported above. This agrees with the fact that all observed artefacts were different from each other. Thus, the analysis of three independent clones from the same amplification reaction allowed us to establish a consensus sequence. The PCR protocol used to clone the 5' end of the PZP cDNA may overcome the problem of identifying the transcription initiation site. The first step in this procedure is a regular primer extension step towards the transcription initiation site. The obtained cDNA, containing full-length cDNA copies, is tailed and subsequently amplified. Apart from the truncated fragments resulting from a PCR artefact, all regular clones, except one, started at the same position. In addition, Southern blot analysis of the PCR product revealed no longer fragments, even after prolonged exposure (Fig. 3). We therefore conclude that the identified 5" end clones probably are full length, starting at the probable transcription initiation site in the Hep3B cell line. This result will be confirmed by means of a primer extension experiment and an RNase protection assay, but this will be difficult to perform, given the very low basal PZP mRNA levels.

102 The PZP protein, as derived from the cDNA sequence, has all the main characteristics of the members of the aM plasma protein family. A signal peptide, typical of secreted proteins is found at the amino terminal end. The internal thiol ester site (CGEQ) constitutes an unique feature of the aM. An internal //-cysteinyl-y-glutamyi thioi ester bond is formed posttranslationally between the cysteine residue and the glutamine residue. The mechanism by which this occurs has not been determined. The bait region consists of a stretch of amino acids near the middle of the protein, with a sequence totally different in all the aM sequenced so far [14]. In addition, the length varies considerably, from 32 amino acids in rat a2M to 53 amino acids in rat a113 [14], which is significantly more than the occasional 1 to 4 amino acid insertions or deletions found in the rest of the molecule. After bait cleavage and thioi ester lysis, a conformational change occurs. As a result, a domain in the PZP molecule is expose 4 which is recognized by a receptor on hepatocytes, macrophages and fibroblasts and on the placenta [17-20]. In human a2M, this receptor binding domain has been assigned to a 20 kDa carboxyl terminal fragment [38]. Human PZP and a2M are recognized by the same receptors [17-20], and interspecies conservation of the receptor and receptor binding domain is suggested by the receptor mediated endocytosis of the human a-macroglobulins by rat monocytes and hepatocytes. Indeed, the carboxyl terminal end contains stretches of extremely conserved amino acid residues ([1,39], present work). The tertiary structure of these macroglobulins is very similar, as can he predicted from the overall conserved primary sequence, with cysteine residues present in the corresponding positions. The quaternary structure of PZP differs from a2M [16,40]. PZP occurs mainly as disulfide-bridged dimers, which associate to form tetramers upon cleavage of the bait region and binding of the proteinase. The physiological significance of this is not yet understood. Two polymorphisms in PZP have been identified. A Pro/Thr amino acid polymorphism is present at position 1180. A nucleotide polymorphism at position 4097 is a silent site polymorphism. A Met-Val polymorphism has been identified in the bait region [41] and comparison with the partial amino acid sequence [2] points to additional possible polymorphisms. It may be that amino acid polymorphisms are in fact more frequent than recognized so far. This is due largely to the fact that classical eDNA sequencing usually relies on the analysis of single cDNA clones, and to the difficulty to examine single base pair polymorphisms not creating an RFLP. The PCR technique overcomes this problem, allowing the rapid and accurate sequence determination of specific DNA fragments. PCR will therefore he very

useful in the further detection and investigation of the polymorphic sites in PZP and in other proteins. The structure of the a-macroglobulins and their mechanism of proteinase inhibition is extremely well conserved through evolution. This close evolutionary relationship is further illustrated by the clustering of the human PZP, a2M, and an a2M pseudogene on human chromosome 12 p. However, two features differ in most of the aM studied so far: the bait region sequence, which defines the inhibitory spectrum of proteinases, and the specific physiological or pathological setting in which the macroglobulins are produced. Thus, the organism is provided with a defense mechanism against a wide spectrum of potentially harmful proteinases in different situations, The identification of the proteinases inhibited in vivo, and the elucidation of the factors inducing the PZP production will lead to further understanding of the function of this protein. Acknowledgements P.M. is a "Bevoegdverklaard Navorser" and K.D. an "Aspirant" of the "Nationaal Fonds voor Wetenschappelijk Onderzoek" of Belgium. The excellent technical assistance of Hilde Braeken and Marlcen Wiilems is gratefully acknowledged. We thank Staf Doucet for the synthesis of the oligonucleotides and Karel Rondou for the photography. This investigation was supported by grant 3.0027.88 of the Fonds voor Geneeskundig Wetenschappelijk Onderzoek, Belgium, by the InterUniversity Network for Fundamental Research sponsored by the Belgian Government (1987-1991) and by a grant "Geconcerteerde Acties" from the Belgian Government. References I Devriendt,K.. Zhang, J.. Van Leuven. F., Van den Berghe, H., Cassiman,J.J. and Marynen,P. (1989) Gene 81,325-334. 2 Sottrup-Jensen,L.. Folkersen.J., Kristensen,T. and Tack. B.F. (1984) Proc. Natl. Acad. Sci. USA 81, 7353-7357. 3 Smithies,O. (1959) in Advancesin Protein Chemistry,Vol. XIV, pp. 103-107. 4 Von Schoultz,B. (1974) Am. J. Obstet. Gynecol.15, 792-797. 5 Von Schoultz, B. and Stigbrand,T. 0982) in PregnancyProteins, Biology, Chemistry and ClinicalApplication(Grudzinskas,J.G., Teisner,B. and Sepp~il&M., eds.), pp. 167-175. 6 Folkersen,J., Teisner, B., Grunnet, N., Grudzinskas,J.G., Westergaard, J.G. and Hindersson,P. (1981) Clin. Chim. Acta 110, 139-145. 7 Cooper, D.W.(1963) Nature 200, 892. 8 Damber, M.-G., Von Schoultz, B., Solheim,F. and Stigbrand,T. (1975) Am. J. Obstet. Gynecol.124, 289-292. 9 Stimson,W.H. (1975) J. Clin. Patl:. 28, 868-871. 10 Than,G.N., Csaba, I.F., Szabb,D.G., Karg, N.J. and Novkk, P.F. (1975) Arch. Gyn~tk.218, 125-130. 11 Sottrup-Jensen,L. (1987)in The PlasmaProteins.Structure,Function and GeneticControl(Putnam,F.W.,ed.), Vol.V, pp. 191-291, AcademicPress, Orlando.

103 12 Von Schoultz, B. and Stigbrand, T. (1974) Biochim. Biophys. Acta 359, 303-310. 13 Stimson. W.H. and Farquharson, D.M. (1978) Int. J. Bio~hem. 9, 839-843. 14 Sottrnp-Jensen, L., Sand, 0., Kristensen. L. and Fey. G.H, (1989l J. Biol. Chem, ~:64, 15781-15789. 15 Sand, O., Folker~n, J., Westergaard, J.G. and Sottrup-Jensen. L. (1985) J. Biol. Cicero. 260, 15723-15735. 16 Christensen, U., Simonsen, M.0 Harrit, N. and Sottrup-Jensen, L. (1989) Biochemistry 28, 9324-9331. 17 Van Leuven, F., Cassiman, J.J. and Van den Berghc, H. (1986) J. Biol. Chem, 261. 16622-16625. 18 Gliemann, J., Moestrup, S., Jansen, P.H., Sottrnp-Jensen, L.. Andersen, H.B., Petersen, C.M. and Sonne. O. (1986) Biochim. Biophys. Acta 883, 400-406. 19 Moestrup, S.K., Christensen, E.I.. Sottrnp-Jensen, L. and Gliemann, J. (1987) Biochim. Biophys. Acta 930, 297-303. 20 Jensen, P.H., Moestrup, S.K., Sottrup-Jensen, L., Petersen, C.M. and Gliemann, J. (1988) Placenta 9, 463-477. 21 Van Leuven, F.. Cassiman, J.J. and Van den Berghe, H. (1979) J. Biol. Chem. 254, 5155~-5160. 22 Knowles, B.B.. Howe, C.C. and Aden, D.P. (1980) Science 209, 497-499. \ 23 Marynen, P., Zhang, J.,/Cassiman, J.J., Van den Berghe, H. and David, G. (1989) J. Biol~Chem. 264, 7017-7024. 24 Saiki, R.K., Gelfand, D~-I., Stoffel, S.° Scharf. S.J.. Higuchi, R., Horn, G.T., Mullis, K.I~ and Erlich, H.A. (1988) Science 239, 487-491. 25 Frohman, M.A., Dush, M.K, and Martin, G.R. (1988) Proc. Natl. Acr,d. SCi. USA 85. 8998-9002. 26 Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) in Molecular Cloning: A Laboratory Manual, Cold Spring Harbor. ~Iew York.

2'7 Sanger. ['., Nicklem, S. and Coulson. A.R (197"/) Proc. Natl. Acad. Sci USA 74. 5463-5467. 28 Sottrup-Jqm~n. L., Stepanik, T.M.. Kristensen, T.. Wierzbicki, D.M.. Jones. C.M.. L~nblad, P.B.. Magnusson, S. and Peter~n, T.E. (198,1) J. Biol. Chem. 259. 8318-8327. 29 Kan, C.-C., Solomon, E., Belt, K,T., Chain, A.C.. Hiorns, L.R. and Fey, G. (1985) Proc. Natl. Acad. ScL USA 82, 2282-2286. 30 Gehring, M.R., Shi¢ls, B.R., Northemann, W., De Bruijn. M.H.L., Kan, C.-C.. Chain, A.C., Noonan, D.J. and Fev, G.H. (1987) J. Biol. Chem. 262, 446-454. 31 Braciak, T.A., Northemann. W., Hudson, G.O., Shiels, B.R., Gehring, M.R. and Fey. G.H. (1988) J. Biol. Chem. 263, 3999-4012. 32 Aie~lo. L.F., Shia, M.A., Robinson, G.S.. Pikh. P.F. and Farmer, S.R. (1988) J. Biol. Chem. 263. 4013-4022. 33 Tindall, K.R. and Kunkel, T.A. (1988) Biochemistry 27. 6008-6013. 34 Dunning, A.M., Talmud, P. and Humphrics, S.E. (1988) Nucleic Acids Res. 16, 10393. 35 Keohavon8. P. and Thilly. W.G. (1989) Proc. Natl. Acad. Y~i. USA 86, 9253-9257. 36 NewtotL C.R., Graham. A., Heptinstall, L.E.. Powell, SJ., Summers. C. Kalsheker. N., Smith, J.C. and Markham, A.F. (1989) Nucleic Acids Res. 17. 2503-2516. 37 Gibbs, R.A. and Chamberlain, J.S. (1989) Genes Dev. 3, 10951098. 38 Van Leuven, F., Marynen. P., Sottrup-Jensen, L., Cassiman, J.J. and Van den Berghe, H. (1986) J. Biol. Chem. 261. 11369-11373. 39 Enghild. J.J., Thegersen, I.B., Roche. P.A. and Pizzo, S.V. (1989) Biochemistry. 28, 1406-1412. 40 Sottrup-Jensen, L. (1989) J. Biol. Chem. 264, 11539-11542. 41 Marynen, P., Devdendt, K., Van den Berghe, H. and Cassiman. J.J. (1990) FEBS Lett. 262, 349-352.

Primary structure of pregnancy zone protein. Molecular cloning of a full-length PZP cDNA clone by the polymerase chain reaction.

A full-length cDNA clone of the human pregnancy zone protein (PZP) was cloned from the hepatocellular carcinoma cell line Hep3B. Based on the exon seq...
804KB Sizes 0 Downloads 0 Views