DNA AND CELL BIOLOGY Volume 10, Number 5, 1991 Mary Ann Liebert, Inc., Publishers Pp. 319-328

cDNA and Gene Structure for a Human Subtilisin-Like Protease with Cleavage Specificity for Paired Basic Amino Acid Residues PHILIP J.

BARR, OWEN B. MASON, KATHERINE E. LANDSBERG, POLLY A. WONG, MICHAEL C. KIEFER, and ANTHONY J. BRAKE

ABSTRACT A cDNA encoding the human fur gene product was isolated from a human hepatoma cell line. The cDNA encodes a protein with significant amino acid sequence identity to the prokaryotic subtilisin family of serine proteases. More extensive sequence identity was found when the protein was compared with eukaryotic proteases such as PRB1 of Saccharomyces cerevisiae, and with PC2 and PC3, the only other known mammalian subtilisin-like proteases. In contrast to these proteins, however, the fur gene product shares a more extensive topographic and functional homology with the KEX2 endoprotease of S. cerevisiae. Each protease contains a signal peptide, a glycosylated extra cytoplasmic domain, a hydrophobic membrane-spanning region, and a short, hydrophilic "tail" sequence. As with KEX2, the expressed human protease was shown to cleave mammalian proproteins at their paired basic amino acid processing sites. We have, therefore, proposed the function-based acronym PACE (paired basic amino acid cleaving enzyme) for this prototypic mammalian pro-

protein processing

enzyme.

INTRODUCTION dues is DE essential PROPEPTi vation

cleavage at

paired basic amino acid resi-

process in the maturation and actiof many secreted mammalian proteins. Such propeptides function in a variety of roles as the protein precursors move through the secretory apparatus. These roles include the correct folding and disulfide bond formation in the maturation of such growth and metabolic factors as nerve growth factor (NGF) (Edwards et al, 1988), transan

forming growth factor-/? (TGF-/3) (Derynck et al, 1985), platelet-derived growth factor (PDGF) (Antoniades et al, 1987), bone morphogenetic proteins (BMPs) (Wozney et al, 1988) and insulin (Steiner, 1982), and coagulation factors such as von Willebrand factor (vWF) (Wise et al, 1988). More extensive modifications, such as the 7-carboxylation of glutamic acid residues in the vitamin K-dependent bone proteins bone Gla protein/osteocalcin (Kiefer et ai, 1990) and bone matrix Gla protein (Pan and Price, 1985), are also linked to the processing of the corresponding precursors at paired basic amino acid residues. This is Chiron

Corporation, Emeryville, CA

94608.

319

also the case for vitamin K-dependent plasma proteins such as factor IX, protein C, and thrombin (Furie and Furie, 1988). Other important roles for such an enzyme activity include the regulation of synthesis of multiple peptides from a single precursor, typified by pro-opiomelancortin (POMC), the precursor to seven neuroendocrine hormones that include adrenocorticotrophic hormone (ACTH), lipotropins, and /3-endorphin (Douglass et ai, 1984; Thomas et al, 1988a), directing intracellular targeting of, for example, pro-somatostatin (Sevarino et al, 1989), and maturation of serum precursors such as proalbumin (Brennan and Peach, 1988), Despite the profound significance of this proteolytic process, the enzyme or enzymes responsible for this processing have not been characterized at the molecular level. Recently, however, a clue to the identity of such a mammalian processing enzyme was derived from studies with the KEX2 endoprotease of Saccharomyces cerevisiae. KEX2, a membrane-bound Ca2+-dependent serine protease of the subtilisin family, has been considered to be a prototypic proprotein convertase. The KEX2 endoprotease

BARR ET AL.

320

functions late in the secretory pathway of S. cerevisiae, and cleaves the polypeptide chains of prepro-killer toxin and prepro-a-factor at the paired basic amino acid sequences Lys-Arg, and Arg-Arg (Julius et al, 1984). In addition to these natural functions, the KEX2 protease also accurately processes proinsulin and proalbumin expressed in yeast (Thim et al, 1986; Sleep et al, 1990). Also, KEX2 that was overexpressed in yeast, and partially purified, was capable of efficient and accurate cleavage of the propeptide of proalbumin in vitro (Bathurst et al, 1987). Furthermore, this activity was inhibited by a,-antitrypsin Pittsburgh, a naturally occurring variant of the liver-derived protease inhibitor that inhibits proalbumin conversion in vivo (Owen et al, 1983). Last, when the KEX2 gene was transfected into murine cells that cannot process POMC, the expressed protease was capable of processing the prohormone precursor to a set of product peptides normally found in vivo (Thomas et al, 1988b). More recently, two cDNAs that encode novel mammalian subtilisin-related proteins, designated PC2 and PC3, were cloned and fully characterized and their encoded products were implicated in the endoproteolytic processing of prohormones based on their homology to KEX2 (Seidah et al, 1990; Smeekens and Steiner, 1990; Smeekens et al, 1991). To date, however, no functional activities have been reported for either

putative protease.

Full structural characterization of the KEX2 protease (Fuller et al, 1989) has led to the detection of significant sequence identity to furin, the product of the human fur gene (Roebroek et al, 1986a,b). The fur gene was first identified by virtue of its proximity to the human fes/fps proto-oncogene. Sequence analysis of a partial cDNA from the fur gene transcript revealed a translation product that contains a transmembrane domain and a cysteine-rich region with significant topographic homology to the insulin and epidermal growth factor (EGF) receptors. However, truncation of the cDNA did not allow the detection of any relationship of the fur gene product to the subtilisin family of serine proteases. Subsequent cDNA cloning confirmed the overall homology of the fur gene product to KEX2 (Fuller et al, 1989; van den Ouweland et al, 1989). Here we report the structural characterization of a fur cDNA isolated from a human liver cell line, the homology of the encoded protein product to eukaryotic and prokaryotic subtilisins, and the intron/exon organization of the fur gene deduced from the sequence of a human genomic clone. Since the cDNA, when cotransfected into mammalian cells with cDNAs for human prepro-vWF (Wise et al, 1990) and murine prepro-NGF (Bresnahan et al, 1990) directs expression of a product that is capable of cleaving these precursors at their paired basic amino acid processing sites, we have proposed the function-based acronym PACE (paired basic amino acid cleaving enzyme) for this prototypic mammalian proprotein convertase (Barr et al, 1990; Wise et al, 1990).

MATERIALS AND METHODS

Molecular DNA

tially

as

cloning of PACE cDNAs

and gene

manipulations and RNA preparations were essendescribed (Sambrook et al, 1989). Restriction en-

ligase, and other enzymes were purchased Boehringer Mannheim. Synthetic oligonucleotides were synthesized by the phosphoramidite method using Applied Biosystems 380 DNA synthesizers. The molecular cloning of a PACE cDNA of 4.35 kb from HEPG2 poly(A)+mRNA has been described previously (Wise et al, 1990). zymes, T4 DNA

from NEB, BRL, and

Genomic DNA

cloning

To obtain the human PACE gene,

a

human

genomic

li-

brary cloned in the cosmid pWE15 (Stratagene) was screened using overlapping PACE cDNA clones as probes. An approximately 44-kb clone that contained the entire cfur and c-fes/fps genes was obtained. For sequencing of the fur gene exons, we initially subcloned a 13-kb Not IEco RI fragment into the modified plasmid cloning vector pUC19 Not I. This vector is a derivative of pUC19 that was modified by the insertion of an in-frame polylinker that contains Eco RI, Sac I, Kpn I, Pst I, Nco I, Not I, Xma III, Bam HI, Sph I, Sal I, Eco RV, Xba I, and Hind III restriction sites.

DNA

sequencing strategy

The 4.35-kb composite PACE cDNA was excised with Eco RI and subcloned into M13mpl8 and M13mpl9 phage vectors; the single-stranded viral DNA was used as template for sequencing by the dideoxynucleotide chain-termination method (Sanger et al, 1977). Synthetic primers (18mers) were used to sequence the cDNA in both directions. For confirmation of the first exons of the PACE gene, we excised fragments from the above, cloned 13-kb Not l-Eco RI fragment of the fur gene. Overlapping fragments that encompassed an ~9 kb fragment from the Not I site shown (see Fig. 3) to an Xho I site in the 3' region of the fur coding sequence (Roebroek et al, 1986a) were subcloned into M13 vectors as above and then sequenced in each direction using specific oligonucleotide primers.

Northern blot

analysis

Poly(A)*RNA was fractionated on a 1.4% agarose gel in the presence of formaldehyde and directly transferred to nitrocellulose. Filters were prehybridized for 1-2 h at 37°C in 40% formamide, 5x SSC (1 x SSC 0.15 M sodium chloride/0.015 M sodium citrate pH 7), 5 x Denhardt's solution (1 x Denhardt's solution 0.02% polyvinylpyrrolidone/0.02% Ficoll/0.02% bovine serum albumin), 10% dextran sulfate, 50 mM sodium phosphate pH 6.8, 1 mM sodium pyrophosphate, 0.1% NaDodS04, and 50 fig/ml denatured salmon sperm DNA. A truncated PACE cDNA probe (coding region only) was "P-labeled (Feinberg and Vogelstein, 1984), added to the hybridization mix at a concentration of 106 cpm/ml, and allowed to hybridize overnight at 37°C. The filters were washed twice at 65°C in O.lx SSC and 0.1% NaDodS04. The truncated cDNA was generated by a polymerase chain reaction (PCR) (Saiki et al, 1985) using synthetic primers at the 5' end of the coding sequence and approximately 70 bp into the 3' untranslated region. The 3' primer generated the Sal I cloning =

=

PAIRED BASIC AMINO ACID CLEAVING ENZYME

site. The 5' primer generated an Eco RI site for cloning into Bluescript SK" (Stratagene). All products of PCR reactions were verified by M13 dideoxy sequencing.

Southern blot analysis For genomic blots, 10 ¡ig of genomic DNA was digested with Eco RI, fractionated on a 0.7% agarose gel, and transferred to nitrocellulose. Hybridization and washing conditions were identical to those described for Northern blot analysis. For clone blots, DNA from genomic clones was digested with various restriction endonucleases, fractionated on 1% agarose gels, transferred to nitrocellulose, and hybridized as above except that formamide was omitted from the hybridization solution. The filters were washed at 60-65 °C in 2x SSC and 0.1% NaDodS04. Mammalian DNAs were obtained from Clontech. Xenopus and Drosophila DNAs were kindly provided by D. Julius and U. Heberlein, respectively.

RESULTS

321

(Moehle et al, 1987). The amino terminus of mature currently unknown. The appearance of multiple immunoprecipitated PACE proteins from transfected cells with gel mobilities in the 75-kD range (Wise et al, 1990) may indicate proteolytic processing of the expressed prodsors

PACE is

uct. The sequence contains three consensus sites for Nlinked glycosylation and 22 cysteine residues. The 3' untranslated region is relatively long (1,597 bp) and contains a possible polyadenylation signal (ATTAAA) at nucleotides 3,939-3,943 of our composite clone. There are numerous regions of extensive potential secondary structure involving coding sequences, and the 3' untranslated sequences around the termination codon. For example, in the 60 bases starting at the AUG initiation codon of the RNA, a hairpin structure exists with a AG value of -29.6 kcal/mole. In other systems, RNA structures of this type and location severely impair translation (see, for example, Clements et al, 1989). The effect of these multiple hairpin structures found throughout the mRNA on the efficiency of translation of PACE is currently unknown.

PACE belongs to a subclass subtilisin-like proteases

of the family of

of the amino acid sequences of PACE and well as those of the PC2- and PC3-encoded proWe constructed an oriented cDNA library in the yeast teins reveals a striking similarity (Fig. 2). This is particuexpression vector pAB23BXN (Schild et al, 1990) using larly evident in a region of approximately 250 residues (50poly(A)*mRNA from the human liver cell line HEPG2. 60% identity) that includes a putative catalytic domain hoOligonucleotide probes, based on the fur cDNA sequence mologous to the family of subtilisin-related serine pro(Roebroek et ai, 1986a,b), were used to isolate a 3,259-bp teases (Fig. 1). Prior to the molecular characterization of clone from the library. For isolation of the 5' end of the the KEX2 gene from S. cerevisiae and the homologous PACE cDNA, we constructed a second cDNA library KEX1 gene from Kluyveromyces lactis (Mizuno et al, from HEPG2 poly(A)*mRNA in bacteriophage XZAPII 1988), this family was thought to consist only of nonspeusing a specific internally primed message. Using the long- cific proteases from bacteria (Bacillus subtilisins and Therest clone isolated from this library, we were able to conmoactinomycetes vulgaris thermitase) and from fungi (S. struct a composite cDNA for PACE of 4,351 bp, containcerevisiae vacuolar protease Bl, Tritirachium album proing 388 bp of 5' untranslated region, a proposed coding se- teinase K, and Yarrowia lipolytica alkaline extracellular quence corresponding to 794 amino acids, and 1,597 bp of protease K) [for compilation, see Moehle et al, 1987]. 3' untranslated region, including two termination codons PACE, KEX2, PC2, and PC3 exhibit considerably more and a tail of 17 dA residues. The full sequence of the com- sequence similarity to one another than to other subtilisinposite PACE cDNA and the encoded protein sequence are related proteases, sharing a number of identical residues shown in Fig. 1. that distinguish them from other members of this family and apparently forming a distinct branch. These residues may confer a more narrow substrate specificity upon these Structure of PACE enzymes. In addition to 5 invariant Cys residues, stretches The translation of PACE is probably initiated at the of especially high similarity are clustered around regions ATG start codon at nucleotides 389-391. Although there that are aligned with residues of subtilisin that are thought are four ATG codons upstream from nucleotide 389, the to be involved in catalysis. Asp-153, His-194, and Ser-368 ATG at 389-391 is the only in-frame Met codon in the 5' correspond to residues in subtilisin that constitute the region of the cDNA, and the subsequent 26 amino acids "charge relay" system during catalysis (Kraut, 1977), and constitute a classical hydrophobic signal sequence (von are invariant among all members of this protease family. Heijne, 1986) that would be predicted for a membrane- Several amino acids near the catalytic residues are identical bound protein. The large open reading frame encodes a in these three proteins but different in all other members PACE precursor protein with a calculated molecular of the subtilisin family. Notable examples are Arg-197 (His weight of 86.7 kD. A predicted signal peptidase cleavage in other family members) and Asp-154 (Ser or Thr in site (von Heijne, 1986) occurs between amino acids 26-27. others). The significant topographic homology of KEX2 In addition, several paired basic amino acid residues are lo- and PACE, and alignment of their catalytic residues is shown cated in the amino-terminal region of the PACE precursor (Fig. 2). Also shown are schematic representations of the (Fig. 1), and could represent autolytic processing sites, as structures of PC2 (Seidah et al, 1990; Smeekens and Steiner, is seen for the subtilisin and protease Bl (PRB1) precur- 1990), PC3 (Smeekens et al, 1991), the yeast vacuolar

Molecular

cloning of PACE cDNAs

Comparison

KEX2,

as

322

BARR

-400 320 -240 -160 -80 -

gajittcg

MMtcCacaggGCTGCCCCCGCCCGCGCCGGAGCTGGAGCCCAGGCCGAGCCCTGCCCTGGTCGCCGGCCGGGCCGAGG

CCGCGCCGCCGCGCCTCCCCGCCTCCGCGCCGTGACGCTGCCGCCGGGCGCGGGGACCGCGCCGAGCCCAGGCCCCCGCC GCCGGGCTCTCCGCTCGGCCGAGGGGCGCCCGAGCCGCCGCGGCGGTCGCCTGGAAAAGTTTCCCCGCCAGGGCTCCCCA GGGGTCGGCACTCTTCACCCTCCCGAGCCCTGCCCGTCTCGGCCCCATGCCCCCACCAGTCAGCCCCGGGCCACAGGCAG TGAGCAGGCACCTGGGAGCCGAGGCCTGTGACCAGGCCAAGGAGACGGGCGCTCCAGGGTCCCAGCCACCTGTCCCCCCC 10 20 Met Glu Leu Arg Pro Tro Leu Leu Trp Val Val Ala Ala Thr Glv Thr Leu Val Leu Leu 1 ATG GAG CTG AGG CCC TGG TTG CTA TGG GTG GTA GCA GCA ACA GGA ACC TTG GTC CTG CTA 30 40 Ala Ala Asp Ala Gin Glv Gin Lys Val Phe Thr Asn Thr Trp Ala Val Arg Ile Pro Gly GCA GAT GCT CAG GGC CAG AAG GTC TTC ACC AAC ACG TGG GTG CGC CCT GGA 61 GCT GCT ATC 50 60 Gly Pro Ala Val Ala Asn Ser Val Ala Arg Lys His Gly Phe Leu Asn Leu Gly Gin He 121 GGC CCA GCG GTG GCC AAC AGT GTG GCA CGG AAG CAT GGG TTC CTC AAC CTG GGC CAG ATC 70 * 80 Y Phe Gly Asp Tyr Tyr His Phe Trp His Arg Gly Val Thr Lys Arg Ser Leu Ser Pro His 181 TTC GGG GAC TAT TAC CAC TTC TGG CAT CGA GGA GTG ACG AAG CGG TCC CTG TCG CCT CAC 90 100 Arg Pro Arg His Ser Arg Leu Gin Arg Glu Pro Gin Val Gin Trp Leu Glu Gin Gin Val 241 CGC CCG CGG CAC AGC CGG CTG CAG AGG GAG CCT CAA.GTA CAG TGG CTG GAA CAG CAG GTG * no 12° y Ala Lys Arg Arg Thr Lys Arg Asp Val Tyr Gin Glu Pro Thr Asp Pro Lys Phe Pro Gin 301 GCA AAG CGA CGG ACT AAA CGG GAC GTG TAC CAG GAG CCC ACA GAC CCC AAG TTT CCT CAG 130 140 Gin Trp Tyr Leu Ser Gly Val Thr Gin Arg Asp Leu Asn Val Lys Ala Ala Trp Ala Gln 361 CAG TGG TAC CTG TCT GGT GTC ACT CAG CGG GAC CTG AAT GTG AAG GCG GCC TGG GCG CAG * 160 150 Gly Tyr Thr Gly His Gly He Val Val Ser He Leu ASP Asp Gly Ile Glu Lys Asn His 421 GGC TAC ACA GGG CAC GGC ATT GTG GTC TCC ATT CTG GAC GAT GGC ATC GAG AAG AAC CAC 170 180 Pro Asp Leu Ala Gly Asn Tyr Asp Pro Gly Ala Ser Phe Asp Val Asn Asp Gin Asp Pro 481 CCG GAC TTG GCA GGC AAT TAT GAT CCT GGG GCC AGT TTT GAT GTC AAT GAC CAG GAC CCT

Y

y

A

À

* 190 200 Met Asn Asp Asn Arg HIS Gly Thr Arg Cys Ala Gly 541 GAC CCC CAG CCT CGG TAC ACA CAG ATG AAT GAC AAC AGG CAC GGC ACA CGG TGT GCG GGG _

Asp Pro Gin Pro Arg Tyr Thr Gin

210 Val Cys Gly AAC GGT GTC TGT GGT 230 Asp Gly Glu Val Thr GAT GGC GAG GTG ACA 250 His He Tyr Ser Ala

601 GAA GTG GCT GCG GTG GCC AAC

He Gly Gly Val Arg Met Leu 661 ATT GGA GGG GTG' CGC ATG CTG

A 721

Gly GGC

Gly

Leu Asn Pro Asn His He CTG AAC CCC AAC CAC ATC CAC ATC

Lys Thr Val Asp Gly Pro Ala Arg Leu 781 AAG ACA GTG GAT GGG CCA GCC CGC CTC 841

Gly Arg Gly Gly GGC GGG GGG CGA

Leu Gly Ser He Phe CTG GGC TCC ATC TTT

Asp Ser Cys Asn Cys Asp Gly Tyr Thr 901 GAC AGC TGC AAC TGC GAC GGC TAC ACC Thr Gln Phe Gly Asn Val Pro Trp Tyr 961 ACG CAG TTT GGC AAC GTG CCG TGG TAC

Tyr Ser Ser Gly Asn Gln Asn Glu Lys 1021 TAC AGC AGT GGC AAC CAG AAT GAG AAG *

Thr Glu Ser His Thr Gly Thr SER Ala 1081 ACG GAG TCT CAC ACG GGC ACC TCA GCC Thr Leu Glu Ala Asn Lys Asn Leu Thr 1141 ACC CTG GAG GCC AAT AAG AAC CTC ACA

A

Ser Lys Pro Ala His Leu Asn Ala Asn 1201 TCG AAG CCA GCC CAC CTC AAT GCC AAC

Ser His Ser Tyr Gly Tyr Gly Leu Leu 1261 AGC CAC TCA TAT GGC TAC GGG CTT TTG

Trp Thr Thr Val Ala Pro Gln Arg Ly» 1321 TGG ACC ACA GTG GCC CCC CAG CGG AAG

Y

II« Gly Lya Arg Leu Glu Val Arg Lys 1381 ATC CGC AAA CCC CTC CAG CTC CGC AAG

A

220 Val Gly Val Ala Tyr Asn Ala Arg GTA GGT GTG GCC TAC AAC GCC CGC 240 Asp Ala Val Glu Ala Arg Ser Leu GAT GCA GTG GAG GCA CGC TCG CTG 260 Ser Trp Gly Pro Glu Asp Asp Gly TAC AGT GCC AGC TGG GGC CCC GAG GAT GAC GGC 270 280 Ala Glu Glu Ala Phe Phe Arg Gly Val Ser Gln GCC GAG GAG GCC TTC TTC CGT GGG GTT AGC CAG * 290 Val Trp Ala Ser Gly ASN Gly Gly Arg Glu His GTC TGG GCC TCG GGG AAC GGG GGC CGG GAA CAT 310 320 Asn Ser He Tyr Thr Leu Ser He Ser Ser Ala AAC AGT ATC TAC ACG CTG TCC ATC AGC AGC GCC 330 340 Ser Glu Ala Cys Ser Ser Thr Leu Ala Thr Thr AGC GAG GCC TGC TCG TCC ACA CTG GCC ACG ACC 350 Gln He Val Thr Thr Asp Leu Arg Gln Lys Cys CAG ATC GTG ACG ACT GAC TTG CGG CAG AAG TGC 370 380 Ser Ala Pro Leu Ala Ala Gly He He Ala Leu TCT GCC CCC TTA GCA GCC GGC ATC ATT GCT CTC 390 400 Trp Arg Asp Met Gln His Leu Val Val Gln Thr TGG CGG GAC ATG CAÁ CAC CTG GTG GTA CAG ACC 410 420 Asp Trp Ala Thr Asn Gly Val Gly Arg Lys Val GAC TGG GCC ACC AAT GGT GTG GGC CGG AAA GTG 430 Asp Ala Gly Ala Met Val Ala Leu Ala Gln Asn GAC GCA GGC GCC ATG GTG GCC CTG GCC CAG AAT 460 Cy» II« II« Aap II« Leu Thr Glu Pro Ly» Asp TGC ATC ATC GAC ATC CTC ACC GAC CCC AAA GAC 470 480 Thr Val Thr Ala Cys Lau Gly Glu Pro Aan His ACC GTG ACC CCC TGC CTC CGC CAG CCC AAC CAC _

Glu Val Ala Ala Val Ala Asn Asn

30oK

_

A

4*

-

A

PAIRED BASIC AMINO ACID CLEAVING ENZYME

323

490 II« Thr Arg Leu Glu His Ala Cln Ala Arg Leu 1441 ATC ACT CGC CTC CAG CAC GCT CAC CCG CGC CTC 510 Leu Ala II« Hla Leu Val Ser Pro Mat Gly Thr 1501 CTC GCC ATC CAC CTC CTC ACC CCC ATG GCC ACC 530 Hla Asp Tyr Ser Ala Asp Cly Ph« Asn Asp Trp 1561 CAT GAC TAC TCC CCA CAT CGC TTT AAT CAC TGC

Y

50° Thr L«u Ser Tyr Asn Arg Arg Gly Asp ACC CTG TCC TAT AAT CGC CCT GCC GAC 520 Arg Ser Thr L«u Leu Ala Ala Arg Pro CGC TCC ACC CTG CTG GCA GCC AGG CCA A 540 Ala Phe Mat Thr Thr His Ser Trp Asp CCC TTC ATC ACA ACT CAT TCC TGC CAT

550 560 Glu Asp Pro Ser Cly Glu Trp Val Lau Glu II« Clu Asn Thr S«r Glu Ala Asn Asn Tyr 1621 GAC GAT CCC TCT GCC CAG TGC GTC CTA GAC ATT GAA AAC ACC AGC GAA CCC AAC AAC TAT 570 580 Gly Thr Leu Thr Lys Phe Thr Leu Val Leu Tyr Gly Thr Ala Pro Glu Gly Leu Pro Val

1681 GGG ACG CTG ACC AAG TTC ACC CTC GTA CTC TAT GGC ACC GCC CCT GAG GGC CTG CCC GTA

A Pro

590 600 Pro Glu Ser Ser Gly Cys Lys Thr Leu Thr Ser Ser Gln Ala Cys Val Val Cys Glu 1741 CCT CCA GAA AGC AGT GGC TGC AAG ACC CTC ACG TCC AGT CAG GCC TGT GTG GTG TGC GAG -

Glu Gly Phe Ser Leu His Gln Lys Ser Cys 1801 GAA GGC TTC TCC CTG CAC CAG AAG AGC TGT 630 Gln Val Leu Asp Thr His Tyr Ser Thr Glu 1861 CAÁ GTC CTC GAT ACG CAC TAT AGC ACC GAG 650 Cys Ala Pro Cys His Ala Ser Cys Ala Thr 1921 TGC GCC CCC TGC CAC GCC TCA TGT GCC ACA _

_

A

Asn Asp Val Glu Thr AAT GAC GTG GAG ACC .

Gln Gly Pro Ala TGC CAG GGG CCG GCC

Ser Cys Pro Ser His Ala Ser Leu Asp Pro Val 1981 AGC TGC CCC AGC CAC GCC TCC TTG GAC CCT GTG 690 Ser Ser Arg Glu Ser Pro Pro Gln Gln Gln Pro 2041 AGC AGC CGA GAG TCC CCG CCA CAG CAG CAG CCA 710 Gly Gln Arg Leu Arg Ala Gly Leu Leu Pro Ser 2101 GGG CAÁ CGG CTG CGG GCA GGG CTG CTG CCC TCA 730 Ser Cys Ala Phe Ha V&L Leu Val Phe Val Thr 2161 AGC TGC GCC TTC ATC GTG CTG GTC TTC GTC ÁCT 750 Gly Phe Ser Phe Arg Gly Val Lys Val Tyr Thr 2221 GGC TTT AGT TTT CGG GGG GTG AAG GTG TAC ACC 770 Gly Leu Pro Pro Glu Ala Trp Gln Glu Glu 2281 GGG CTG CCC CCT GAA GCC TGG CAG GAG GAG Cys TGC 790 Arg Gly Glu Arg Thr Ala Phe He Lys Asp Gln 2341 CGG GGC GAG AGG ACC GCC TTT ATC AAA GAC CAG .

640 He Arg Ala Ser Val ATC CGG GCC AGC GTC 660 Leu Thr Asp Cys Leu CTG ACA GAC TGC CTC 680 Ser Arg Gln Ser Gln TCC CGG CAÁ AGC CAG J 700 Pro Glu Val Glu Ala -

Cys

_

Glu Gln Thr Cys GAG CAG ACT TGC

C

y s

620

Val Gln His Cys Pro Pro Gly Phe Ala Pro GTC CAG CAC TGC CCT CCA GGC TTC GCC CCC

670

2485 2565 2645 2725 2805 2885 2965 3045 3125 3205 3285 3365 3445 3525 3605 3685 3765 3845 3925

-i

r

i c

h R e

g i o

n

Pro Arg Leu Pro CCT CGG CTG CCC CCG GAG GTG GAG GCG

720 His Leu Pro Glu Val Val Ata Gly Utt CAC CTG CCT GAG CTG GTG CCC GGC CTC 740 Val Ph« Leu Val Leu Gln Leu Are Ser GTC TTC CTG GTC CTG CAG CTG CGC TCT 760 Met Asp Arg Gly Leu He Ser Tyr Lys ATG GAC CGT GGC CTC ATC TCC TAC AAG 780 Pro Ser Asp Ser Glu Glu Asp Glu Gly CCG TCT GAC TCA GAA GAG GAC GAG GGC Ser Ala Leu OP OP AGC GCC CTC TGA TGA GCCCACTGCCCACCCC

GGGAGGCAAGAGGGGTGGAGACTGTTTCCCATCCTACCCTCGGGCCCACCTGGCCACCTGAGGTGGGCCCAGGACCAGCT GGGGCGTGGGGAGGGCCGTACCCCACCCTCAGCACCCCTTCCATGTGGAGAAAGGAGTGAAACCTTTAGGGCAGCTTGCC CCGGCCCCGGCCCCAGCCAGAGTTCCTGCGGAGTGAAGAGGGGCAGCCCTTGCTTGTTGGGATTCCTGACCCAGGCCGCA GCTCTTGCCCTTCCCTGTCCCTCTAAAGCAATAATGGTCCCATCCAGGCAGTCGGGGGCTGGCCTAGGAGATATCTGAGG GAGGAGGCCACCTCTCCAAGGGCTTCTGCACCCTCCACCCTGTCCCCCAGCTCTGGTGAGTCTTGGCGGCAGCAGCCATC ATAGGAAGGGACCAAGGCAAGGCAGGTGCCTCCAGGTGTGCACGTGGCATGTGGCCTGTGGCCTGTGTCCCATGACCCAC CCCTGTGCTCCGTGCCTCCACCACCACTGGCCACCAGGCTGGCGCAGCCAAGGCCGAAGCTCTGGCTGAACCCTGTGCTG GTGTCCTGACCACCCTCCCCTCTCTTGCACCCGCCTCTCCCGTCAGGGCCCAAGTCCCTGTTTTCTGAGCCCGGGCTGCC TGGGCTGTTGGCACTCACAGACCTGGAGCCCCTGGGTGGGTGGTGGGGAGGGGCGCTGGCCCAGCCGGCCTCTCTGGCCT CCCACCCGATGCTGCTTTCCCCTGTGGGGATCTCAGGGGCTGTTTGAGGATATATTTTCACTTTGTGATTATTTCACTTT AGATGCTGATGATTTGTTTTTGTATTTTTAATGGGGGTAGCAGCTGGACTACCCACGTTCTCACACCCACCGTCCGCCCT GCTCCTCCCTGGCTGCCCTGGCCCTGAGGTGTGGGGGCTGCAGCATGTTGCTGAGGAGTGAGGAATAGTTGAGCCCCAAG TCCTGAAGAGGCGGGCCAGCCAGGCGGGCTCAAGGAAAGGGGGTCCCAGTGGGAGGGGCAGGCTGACATCTGTGTTTCAA GTGGGGCTCGCCATGCCGGGGGTTCATAGGTCACTGGCTCTCCAAGTGCCAGAGGTGGGCAGGTGGTGGCACTGAGCCCC CCCAACACTGTGCCCTGGTGGAGAAAGCACTGACCTGTCATGCCCCCCTCAAACCTCCTCTTCTGACGTGCCTTTTGCAC CCCTCCCATTAGGACAATCAGTCCCCTCCCATCTGGGAGTCCCCTTTTCTTTTCTACCCTAGCCATTCCTGGTACCCAGC CATCTGCCCAGGGGTGCCCCCTCCTCTCCCATCCCCCTGCCCTCGTGGCCAGCCCGGCTGGTTTTGTAAGATACTGGGTT GGTGCACAGTGATTTTTTTCTTGTAATTTAAACAGGCCCAGCATTGCTGGTTCTATTTAATGGACATGAGATAATGTTAG

AGCTTTrAAAC.TC.ATTAAAC.r.Tr.c.A.r.Ar.TATr.r.A.AAr.r.«j,j,l.«1,t,j,j.ffl,Hj.^1,l,rr-gf-rgi''-aiiiigrggrt'gr

FIG. 1. PACE cDNA and protein sequences. Numbering is based on the single open reading frame translated from the DNA sequence. Oligonucleotide adaptor sequences present in the cDNA are indicated by lower-case italics. The putative signal peptide is indicated by underlining and the transmembrane domain by shading. Likely active site residues are indicated by asterisks. Consensus sites for Asn-linked glycosylation are marked by diamonds and cysteine residues are marked by bars. Potential dibasic auto-proteolytic processing sites are indicated by arrows above the protein sequence. The cDNA coding sequence is identical to that described (van den Ouweland et al, 1990), except for the 5' region up to nucleotide 238 (see text for discussion). Intron/exon splice junctions within the coding sequence are arrowed beneath the DNA sequence.

324

BARR ET AL. pací N

S

Sig

CRR

TMD

Kva N

±_*4

S

FIG. 2. 4_*_ S/TRR

Sig

D

H

D

H

N

TMD

S

Sig

Organization comparison

of PACE and homologous proteases. The sequence organization of PACE and five homologous proteins is shown schematically. Active-site Asp, His, Asn(Asp), and Ser residues are shown. Potential glycosylation sites are indicated by diamonds. Putative hydrophobic signal peptides (Sig) and transmembrane domains (TMD) are indicated by solid boxes. Cysteine-rich (CRR) and serine/threonine-rich (S/TRR) regions are cross-hatched boxes. Pro-regions of the protease B and subtilisin precursors are shaded boxes.

Sig SUBTILISIN BPN' N

S

subtilisin-like protease PRB1 (Moehle et ai, 1987), and the BPN' (Wells et ai, 1983). Both PACE and KEX2 retain an Asn residue (Asn-295 in PACE) that is present in all other members of this family except human PC2, where an Asp residue is present at this position (Smeekens and Steiner, 1990). On the basis of site-directed mutagenesis studies, this Asn residue appears to be important for stabilization of the transition state during catalysis by subtilisin (Wells et al, 1986). The observation (Seidah et al, 1990) that murine PC2 also contains an Asp residue at this position would seem to rule out the possibility that this is a cloning artefact. Significant similarity extends beyond the subtilisin-like regions among these three sequences. In addition, both PACE and KEX2 contain potential hydrophobic transmembrane (TM) domains distal to the homologous regions. This may be relevant to the membrane association seen for the KEX2 proalbumin and proinsulin processing activities present in rat liver and insulinoma (Brennan and Peach, 1988; Rhodes et al, 1989). Between the homologous regions and putative TM domains, PACE contains a cysteine-rich region, whereas KEX2 possesses a region rich in serine and threonine (Fig. 2). Both of these features have been found in other membrane-bound proteins (Yamamoto et ai, 1984). PC2 and PC3 lack either type of region, and also appear to lack a lipid bilayer-spanning domain. More recently, it has been proposed that predicted amphipathic helical regions in the carboxyterminal domains of PC2 and PC3 may function as membrane anchors (Smeekens et al, 1991). This type of membrane interaction has been noted for carboxypeptidase E (Fricker

prokaryotic subtilisin

et

al, 1990).

Gene structure The intron/exon organization of the fur gene, and its proximity to the c-fes/fps proto-oncogene is shown schematically (Fig. 3). Our cDNA, from a human liver cell line, contains sequences from an additional 5' untranslated exon when aligned with the previously published cDNA (van den Ouweland et al, 1990) and gene (van den Ouweland et al, 1989) sequences. We refer to this exon as 1A (Fig. 3). Although the source of the previously described cDNA was not reported, it is possible that this additional exon is the result of transcription from an alternate promoter downstream from the exon 1 promoter followed by an alternate splicing event, or it is due to alternate splicing following transcription from a single promoter upstream from exon 1. Preliminary Northern blot analysis using oligonucleotides specific for exon 1 and exon 1A as probes suggest that several cell types including liver, kidney, and lymphocytes primarily contain the exon 1A sequences, whereas monocytes contain both exon 1 and 1A sequences (data not shown). Thus, there may be some tissue specificity in the use of alternative PACE promoters or splice sites. Between our first two noncoding exons (exons 1 and 1A) and the proportionately large 3' exon (exon 16), the 14 exons that encode the putative catalytic domain and glycosylated regions are relatively small (encoding 25-70 amino acids) and are separated by short introns. Interestingly, several important structural features of the PACE protease are encoded by their own discrete exons. For example, the proposed signal peptide (exon 2), and the four putative active site amino acid residues Asp-153 (exon 5), His-194 (exon 7), Asn-295 (exon 9), and Ser-368 (exon 10)

325

PAIRED BASIC AMINO ACID CLEAVING ENZYME

separate exons. Also, the cysteine-rich rewith the TM domain and hydrophilic tail sequences that are characteristic of membrane-bound proteins, such as the homologous receptors for insulin and EGF (Roebroek et al, 1986b), are all encoded by the 3' exon (exon 16). The intron/exon splice junctions are highlighted in Fig. 4. are

found

on

gion, along

fur

EcoRl

The PACE gene is

single-copy, and

is

expressed

in many tissues The diversity of tissue sources and protein precursor substrates for proteases with specificity for paired basic amino acid residues has suggested that these enzymes might be encoded by a multigene family. To address this

fes/fps

(PACE)

EcoRI

NotI

EXONS

2- 16

1A

1

Schematic representation of the intron/exon organization of the contiguous c-fur and c-fes/fps genes. Coding regions are shown in black. The expanded diagram shows the structure of the PACE-encoding c-fur gene showing the 17 exons (1, 1A, and 2-16). Catalytic site residues are located on exon 5 (Asp-153), exon 7 (His-194), exon 9 (Asn-295), and exon 10 (Ser-368). Precise locations of known intron/exon boundaries are shown in Fig. 1. The 5' terminus of exon 1A is currently unknown, and corresponds, as drawn here, to the 5' end of our cDNA clone. Intron/exon splice junctions are shown in Fig. 4.

FIG. 3.

Exon number 1 1A 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Intron

Exon size

(bp) 119 nd 336 99 96 129 77 89 173 213 101 104 118 180 125 51 2167

5'

Splice

donor

TGG TGA Agtatgg CCC CAG Ggtaagt G GGC CAGgtaagt G CCT CAAgtgagt G TAC CTGgtacgt C AAT TATgtatgg AC AAC AGgtaaga ATT GGA Ggtgagt T AGC CAGgtgagg G CAG ATCgtgagt AG GCC AAgtaagt CGG AAA Ggtgagg AG CCC AAgtgagg CA GCC AGgtgctt AAC TAT Ggtactg TGT GTG Ggtcagt

3'

Splice acceptor

size

(bp)

nd

nd

tctcagGGT CGG C ttgcagATC TTC G ccacagGTA CAG C ccacagTCT GGT G tcttagGAT CCT G tggcagG CAC GGC ggccagGG GTG CG ccacagGGC CGA G tggcagGTG ACG A ttgcagT AAG AAC ccacagTG AG CCA gcacagA GAC ATC ccccagG CCA GAT gaacagGG ACG CT cggcagTG TGC GA

3959 338 109 270 110 179 379 452 467 125 92 109 416 115 245

Amino acid inter-

rupted Q/I Q/V L/S Y/D R

G

Q/G I/V N V K R G V

FIG. 4. Intron/exon organization of the c-fur gene. Exon sequences are in capital letters; intron sequences are in lowercase letters. Amino acids interrupted by intron/exon junctions are shown (see also Fig. 1) for all the coding exons (2-16), as are the intron sizes.

326

BARR ET AL.

issue, we performed Southern blot analysis on Eco RI-di- cyte cell line U937. This is consistent with the widespread gested human DNA, and a variety of other eukaryotic tissue distribution observed in Northern blot analysis of DNAs, using labeled PACE cDNA probes (Fig. 5A). fur (Schalken et al, 1987). Banding patterns revealed the presence of only one PACE

gene in each mammalian genome. Of the three nonmammalian genomes analyzed, one (chicken) hybridized to the PACE probe, while two (Xenopus and Drosophila) displayed no hybridizing bands under the high-stringency conditions used (Fig. 5A, legend). Low-stringency washing of the Southern blots (55-60°C, 2x SSC, 0.1% NaDodS04) revealed several additional weakly hybridizing bands that could be detected beneath an extremely high,

nonspecific background (data not shown). These weakly hybridizing bands may represent additional PACE-related genes; however, due to the high background level the re-

sults remain inconclusive. Similarly, Northern blotting with poly(A)*mRNA from human liver and kidney and from hamster pancreatic tissue sources revealed a single transcribed product of approximately 4.5 kb in all tissues tested (Fig. 5B). Furthermore, the PACE mRNA transcripts were present in the human lymphoblastoid cell line UC-729-6 and the mono-

DISCUSSION That the mammalian genome encodes subtilisin-like prois a somewhat surprising finding. No mammalian subtilisin-like protease had been identified previously by enzymatic activity or protein characterization. Indeed, the notion that such proteases were restricted to prokaryotes had led to their paradigmatic role as a model for convergent evolution. Here, the alignment of catalytic triad residues (Asp, His, and Ser) was proposed to have evolved by convergence towards an identical function, the charge relay system, as, for example, the His, Asp, and Ser residues of the trypsin family of serine proteases (Kraut et al, 1977). The availability now of subtilisin-like protease genes that contain introns will allow a more rigorous analysis of the mechanisms underlying the evolutionary development of the charge relay system. This will include the localizateases

FIG. 5. Southern and Northern blot analysis of PACE. Left. Southern blot of genomic DNA. Eco RI-digested genomic DNA (15 fig) from man (lane 1), rhesus monkey (lane 2), rat (lane 3), mouse (lane 4), cattle (lane 5), dog (lane 6), pig (lane 7), chicken (lane 8), guinea pig (lane 9), Xenopus (lane 10), and Drosophila (lane 11). Molecular weight markers were derived from Hind Ill-digested X DNA and Hae Ill-digested X174 DNA (BRL). The markers were run on an adjacent lane and identified by ethidium bromide staining of the gel prior to transfer to nitrocellulose. Digestion of human genomic DNA with Hind III and Bam HI and subsequent Southern blotting revealed only the bands expected from the genomic DNA sequence, thereby confirming that/w is a single-copy gene (data not shown). Right. Northern blot analysis. Poly(A)*RNAs (2 fig) from human liver (lane 1, Clontech), human embryonic liver (lane 2), human hepatoma cell line HEPG2 (lane 3), human embryonic hepatoma cell line WRL (lane 4), human embryonic kidney cell line 293 (lane 5), and hamster insulinoma HIT-15 (lane 6) were separated by electrophoresis on a 1.4% agarose-formaldehyde transgel, ferred to nitrocellulose, and hybridized with the PACE probe. 28S and 18S RNAs were identified by staining with méthylène blue.

PAIRED BASIC AMINO ACID CLEAVING ENZYME

tion of specific functional domains to discrete exons, and the roles, if any, of intron/exon junction sliding and exon shuffling (Craik et al, 1983; Gilbert, 1985) in the "convergence" toward mechanistically similar enzymes. In separate studies (Bresnahan et al, 1990; Wise et al, 1990), we have begun to define the substrate specificity for PACE. Based on previous studies with KEX2, it seemed highly likely that PACE represented the long-sought mammalian proprotein convertase with cleavage specificity for paired basic amino acid residues. Our finding that PACE can cleave the precursors to vWF (Wise et al, 1990) and NGF (Bresnahan et al, 1990) at their natural Lys-Arg processing sites, suggests a crucial role for PACE in the numerous and extremely important biochemical, biological, and biomédical processes involving proprotein cleavage at paired amino acid residues. The nature of the observed activities together with the widespread tissue distribution of PACE expression (Schalken et al, 1987, and present study) suggests strongly that PACE is an integral enzyme in the constitutive secretory pathway in most, if not all, mammalian cell types. Possible roles for PACE, and the other newly discovered KEX2-re\zXed enzymes PC2 and PC3, in the processing of peptide precursors stored in the secretory granules on the regulated secretory pathway (Kelly, 1985) remain to be determined.

327 role of yeast mRNA sequences and structures in translation. In Yeast Genetic Engineering. P.J. Barr, A.J. Brake, and P. Valenzuela, eds. (Butterworths, Stoneham, MA) pp. 65-82. CRAIK, C.S., RUTTER, W.J., and FLETTERICK, R. (1983). Splice junctions: Association with variation in protein structure. Science 220, 1125-1229. DAVIDSON, H.W., RHODES, C.J., and HUTTON, J.C. (1988). Intraorganellar calcium and pH control proinsulin cleavage in the pancreatic ß cell via two distinct site-specific endopeptidases. Nature 333, 93-96.

DERYNCK, R., JARRETT, J.A., CHEN, W.Y., EATON, D.H., BELL, J.R., ASSOIAN, R.K., ROBERTS, A.B., SPORN, M.B., and GOEDDEL, D.V. (1985). Human transforming growth factor beta 1 (TGF-/31). Nature 316, 701-705. DOUGLASS, J., CIVELLI, O., and HERBERT, E. (1984). Poly-

protein gene expression: Generation of diversity of neuroendopeptides. Annu. Rev. Biochem. 53, 665-715. EDWARDS, R.H., SELBY, M.J., GARCIA, P.D., and RUTTER, W.J. (1988). Processing of the native nerve growth factor precursor to form biologically active nerve growth factor. J. Biol. Chem. 263, 6810-6815. FEINBERG, A.P., and VOGELSTEIN, B. (1984). A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity. Anal. Biochem. 137, 266-267. FRICKER, L.D., DAS, B., and ANGELETTI, R.H. (1990). Identification of the pH-dependent membrane anchor of carboxypeptidase E (EC 3.4.17.10). J. Biol. Chem. 265, 2476crine

2482.

FULLER, R.S., BRAKE, A.J., and THORNER, J. (1989). Intra-

ACKNOWLEDGMENTS We acknowledge our many friends and collaborators for their encouragement and advice, and in particular, we thank Drs. W.J. Rutter, P. Valenzuela, R.S. Fuller, and J. Thorner. We also thank P.T. Anderson for preparation of the manuscript, and Dr. S.P. Smeekens for communication of results prior to publication.

REFERENCES ANTONIADES, H.N., PANTAZIS, P., and OWEN, A.J. (1987). Human platelet-derived growth factor and the sis/ PDGF-2 gene. In Oncogenes, Genes and Growth Factors. G. Guroff, ed. (John Wiley & Sons, New York) pp. 1-40. BARR, P.J., MASON, O.B., WONG, P.A., GIBSON, H.L., KIEFER, M.C., and BRAKE, A.J. (1990). Yeast expression: Secretion and intracellular systems. In Recombinant Systems in Protein Expression. K.K. Alitalo, M.-L. Huhtala, J. Knowles, and A. Vaheri, eds. (Elsevier Science Publishers B.V., Amsterdam) pp. 37-46. BATHURST, I.C., BRENNAN, S.O., CARRELL, R.W., COUSENS, L.S., BRAKE, A.J., and BARR, P.J. (1987). Yeast KEX2 protease has the properties of a human proalbumin converting enzyme. Science 235, 348-350. BRENNAN, S.O., and PEACH, R.J. (1988). Calcium-dependent KEX2-liY.e protease found in hepatic secretory vesicles converts proalbumin to albumin. FEBS 229, 167-170. BRESNAHAN, P.A., LEDUC, R., THOMAS, L., THORNER, J., GIBSON, H.L., BRAKE, A.J., BARR, P.J., and THOMAS, G. (1990). Human/«/- gene encodes a yeast KEX-2 like endoprotease that cleaves pro-/3-NGF in vivo. J. Cell. Biol. Ill, 2851-2859. CLEMENTS, J.M., LAZ, T., and SHERMAN, F. (1989). The

targeting and structural conservation of a prohormoneprocessing endoprotease. Science 246, 482-486. FURIE, B., and FURIE, B.C. (1988). The molecular basis of blood coagulation. Cell 53, 505-518. GILBERT, W. (1985). Genes-in-pieces revisited. Science 228, cellular

823-824.

JULIUS, D., BRAKE, A., BLAIR, L., KUNISAWA, R., and THORNER, J. (1984). Isolation of the putative structural gene for the lysine-arginine-cleaving endo-peptidase required for processing of yeast prepro-a-factor. Cell 37, 1075-1089. KELLY, R.B. (1985). Pathways of protein secretion in eukaryotes. Science 230, 25-32. KIEFER, M.C., SAPHIRE, A.C.S., BAUER, D.M., and BARR, P.J. (1990). The cDNA and derived amino acid sequences of human and bovine bone Gla protein. Nucleic Acids Res. 18, 1909.

KRAUT, J. (1977). Serine proteases: Structure and mechanism of catalysis. Annu. Rev. Biochem. 46, 331-358. MIZUNO, K., NAKAMURA, T., OHSHIMA, T., TANAKA, S., and MATSUO, H. (1988). Yeast KEX2 gene encodes an endopeptidase homologous to subtilisin-like serine proteases. Biochem. Biophys. Res. Commun. 156, 246-254. MOEHLE, CM., TIZARD, R., LEMMON, S.R., SMART, J., and JONES, E.W. (1987). Protease B of the lysosome-like vacuole of the yeast Saccharomyces cerevisiae is homologous to the subtilisin family of serine proteases. Mol. Cell Biol. 7, 4390-4399.

OWEN, M.C., BRENNAN, S.O., LEWIS, J.H., and CARRELL, R.W. (1983). Mutation of antitrypsin to antithrombin: a,-Antitrypsin Pittsburgh (358 Met —Arg), a fatal bleeding disorder. N. Engl. J. Med. 309, 694-698. PAN, L.C., and PRICE, P.A. (1985). The propeotide of rat bone 7-carboxy-glutamic acid protein shares homology with other vitamin K-dependent protein precursors. Proc. Nati. Acad. Sei. USA 82, 6109-6113. RHODES, C.J., BRENNAN, S.O., and HUTTON, J.C. (1989). Proalbumin to albumin conversion by a proinsulin processing

328

BARR ET AL.

endopeptidase

of insulin secretory granules. J. Biol. Chem. THIM, L., HANSEN, M.T., NORRIS, K., HOEGH, I., BOEL, 264, 14240-14246. E., FORSTROM, J., AMMERER, G., and FUL, N.P. (1986). ROEBROEK, A.J.M., SCHALKEN, J.A., BUSSEMAKERS, Secretion and processing of insulin precursors in yeast. Proc. M.J.G., VAN HEERIKHUIZEN, H., ONNEKINK, C, DENati. Acad. Sei. USA 83, 6766-6770. BRUYNE, F.M.J., BLOEMERS, H.P.J., and VAN DE VEN, THOMAS, G., THORNE, B.A., and HRUBY, D.E. (1988a). W.J.M. (1986a). Characterization of human c-fes/fps reveals a Gene transfer techniques to study neuropeptide processing. Annew transcription unit (fur) in the immediately upstream region nu. Rev. Physiol. 50, 323-332. of the proto-oncogene. Mol. Biol. Rep. 11, 117-125. THOMAS, G., THORNE, B.A., THOMAS, L., ALLEN, R.G., ROEBROEK, A.J.M., SCHALKEN, J.A., LEUNISSEN, HRUBY, D.E., FULLER, R., and THORNER, J. (1988b). J.A.M., ONNEKINK, C, BLOEMERS, H.P.J., and VAN DE Yeast KEX2 endopeptidase correctly cleaves a neuroendocrine VEN, W.J.M. (1986b). Evolutionary conserved close linkage of prohormone in mammalian cells. Science 241, 226-230. the c-fes/fps proto-oncogene and genomic sequences encoding VAN DEN OUWELAND, A.M.W., VAN GRONINGEN, a receptor-like protein. EMBO J. 5, 2197-2203. J.J.M., ROEBROEK, A.J.M., ONNEKINK, L., and VAN DE SAIKI, R.K., SCHARF, S., FALOONA, F., MULLÍS, K.B., VEN, W.J.M. (1989). Nucleotide sequence analysis of the huHORN, G.T., ERLICH, H.A., and ARNHEIM, N. (1985). man fur gene. Nucleic Acids Res. 17, 7101-7102. Enzymatic amplification of /3-globin genome sequences and re- VAN DEN OUWELAND, A.M.W., VAN DUIJNHOVEN, striction site analysis for diagnosis of sickle cell anemia. Science H.L.P., KEIZER, G.D., DORSERS, L.C.J., and VAN DE 230, 1350-1354. VEN, W.J.M. (1990). Structural homology between the human SAMBROOK, J., MANIATIS, T., and FRITSCH, E.F. (1989). fur gene product and the subtilisin-like protease encoded by Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spryeast KEX2. Nucleic Acids Res. 18, 664. ing Harbor Laboratory, Cold Spring Harbor, NY). VON HEIJNE, G. (1986). A new method for predicting signal seSANGER, F., NICKLEN, S., and COULSON, A.R. (1977). quence cleavage sites. Nucleic Acids. Res. 14, 4683-4690. DNA sequencing with chain-terminating inhibitors. Proc. Nati. WELLS, J.A., FERRARI, E., HENNER, D.J., ESTELL, D.A., Acad. Sei. USA 74, 5463-5467. and CHEN, E.Y. (1983). Cloning, sequencing, and secretion of SCHALKEN, J.A., ROEBROEK, A.J.M., OOMEN, P.P.CA., Bacillus amyloliquefaciens subtilisin in Bacillus subtilis. Nucleic WAGENAAR, S.SC, DEBRUYNE, F.M.J., BLOEMERS, Acids Res. 11,7911. H.P.J., and VAN DE VEN, W.J.M. (1987). fur Gene expres- WELLS, J.A., CUNNINGHAM, B.C., GRAYCAR, T.P., and sion as a discriminating marker for small cell and nonsmall cell ESTELL, D.A. (1986). Importance of hydrogen bond formation in stabilizing the transition state of subtilisin. Phil. Trans. lung carcinomas. J. Clin. Invest. 80, 1545-1549. SCHILD, D., BRAKE, A.J., KIEFER, M.C., YOUNG, D., and R. Soc. Lond. A. Math. Phys. Sei. 317, 415-423. BARR, P.J. (1990). Cloning of three human multifunctional de WISE, R.J., PITTMAN, D.D., HANDIN, R.I., KAUFMAN, novo purine biosynthetic genes by functional complementation R.J., and ORKIN, S.H. (1988). The propeptide of von Willeof yeast mutations. Proc. Nati. Acad. Sei. USA 87, 2916-2920. brand factor independently mediates the assembly of von WilleSEIDAH, N.G., GASPAR, L., MION, P., MARCINKIEWICZ, brand factor multimers. Cell 52, 229-236. M., MBIKAY, M., and CHRÉTIEN, M. (1990). cDNA se- WISE, R.J., BARR, P.J., WONG, P.A., KIEFER, M.C., quence of two distinct pituitary proteins homologous to Kex2 BRAKE, A.J., and KAUFMAN, R.J. (1990). Expression of a and furin gene products: Tissue-specific mRNAs encoding canhuman proprotein processing enzyme: Correct cleavage of the didates for pro-hormone processing proteinases. DNA Cell von Willebrand precursor at a paired basic amino acid site. Biol. 9, 415-424. Proc. Nati. Acad. Sei. USA 87, 9378-9382. SEVARINO, K.A., STORK, P., VENTIMIGLIA, R., MAN- WOZNEY, J.M., ROSEN, V., CELESTE, A.J., MITSOCK, DEL, G., and GOODMAN, R.H. (1989). Amino-terminal seL.M., WHITTERS, M.J., KRIZ, R.W., HEWICK, R.M., and quences of prosomatostatin direct intracellular targeting but WANG, E.A. (1988). Novel regulators of bone formation: Monot processing specificity. Cell 57, 11-19. lecular clones and activities. Science 242, 1528-1534. SLEEP, D., BELFIELD, G.P., and GOODY, A.R. (1990). The YAMAMOTO, T., DAVIS, CG., BROWN, M.S., WOLFsecretion of human serum albumin from the yeast SaccharoGANG, J.S., CASEY, M.L., GOLDSTEIN, J.L., and RUSmyces cerevisiae using five different leader sequences. BiotechSELL, D.W. (1984). The human LDL receptor: A cysteine-rich nology 8, 42-46. protein with multiple Alu sequences in its mRNA. Cell 39, SMEEKENS, S.P., and STEINER, D.F. (1990). Identification of 27-38.

human insulinoma cDNA encoding a novel mammalian protein structurally related to the yeast dibasic processing protease Kex2. J. Biol. Chem. 265, 2997-3000. a

SMEEKENS, S.P., AVRUCH, A.S., LAMENDOLA, J., CHAN, S.J., and STEINER, D.F. (1991). Identification of a cDNA encoding a second putative prohormone convertase related to PC2 in AtT20 cells and islets of Langerhans. Proc. Nati. Acad. Sei. USA 88, 340-344. STEINER, D.F. (1982). Proteolytic processing of secretory proteins. In Molecular Genetic Neuroscience. F.O. Schmitt, S.J. Bird, and F.E. Bloom, eds. (Raven Press, NY) pp. 149-159.

Address reprint requests

to:

Dr. Philip J. Barr Chiron Corporation 4560 Horton Street Emeryville, CA 94608 Received for publication November 9, March 22, 1991.

1990,

and in revised form

cDNA and gene structure for a human subtilisin-like protease with cleavage specificity for paired basic amino acid residues.

A cDNA encoding the human fur gene product was isolated from a human hepatoma cell line. The cDNA encodes a protein with significant amino acid sequen...
5MB Sizes 0 Downloads 0 Views