Scand. J. Imtnutwl. 36, 703 712. 1992

Analysis of the Genomic and Derived Protein Structure ofa Novel Human Serum Amyloid A Gene, SAA4 G. WATSON, S. COADE & P, WOO Section of Molecular Rheumatology, Clinical Research Centre. Harrow, Middlesex, UK

Watson G. Coade S, Woo P, Analysis of the Genomic and Derived Protein Structure of a Novel Human Serum Amyloid A Gene, SAA4. Scand J Immunol 1992:36:703 12 We have determined the structure of the novel SAA gene. SAA4. The gene is 6,2 kb in lenglli and comprises three inlrons and four exons, Introns 2 and 3 are significantly longer than those ofthe other human SAA genes. We have sequenced thccxons and junclion fragments and have shown ihat Ihe sequence is the same as c-SAA [1] and does not correspond to the pseudogene carried on GSAA4 [2], The prediclcd SAA4 protein sequence has an eight amino acid insertion relative to the other human SAA proteins and is more closely related to labbit and mouse SAA proteins than lo the other human SAA proteins, or to those of animal species which also possess an insertion. We have analysed the predicted SAA4 protein relative to the other human SAA proteins and have identified three important structural regions. We predict thai region I oi SAA4 represents a lipid binding domain. Region 2 forms an extensive, distinetive. hydrophobic fl sheet region in place ofa helical region. In region 3. SAA4is the only SAA protein having an 5( helix which is not amphipaihic. We predict thai the SAA4 prolein relains a modified function of the conserved region, retains ihe Ca-' binding site, has an amino terminal surface site and has a potentially dislinetsL'creiion pattern. Together, these differences indicate a distinct function from those ofthe other SAA proleins, Dr G. Watson. Seciimi nf Moleciihir Rhcumtiwliigy. Clinical Re.seurch Centre, Watjord Roud. Harrow. Middlesex. H.^l 31./. UK

The scrum amyloid A (SAA) proteins arc a group of 12-kDa acute phase proteins. Levels of SAA increase 1000-fold in response to trauma, infection and inflammation. In chronic inflammatory diseases such us rheumatoid arthritis, continued elevation of SAA may lead to the potentially fatal complication known as amyloidosis. In this condition, lihrils derived in part from SAA are formed and deposited in the vital organs resulting in compromise of their function [.'^], Previous work in our group has identified four SAA genes [4]. These genes have been mapped to ehromosomc I Ip and appear to lie within a 350kb fragment of DNA |5, and unpublished observations], SAAl has polymorphisms at amino adds (aa) 52 and 57 where SAA la has Val and Ala respectively, while SAAl^ has Ala and Val respectively [6], SAA2 has a polymorphism at amino acid 71 where the a form has His and the (i formhas Arg[4], SAAl and SAA2 cDNAs have been isolated and these two proteins form the predominant circulating SAA proteins [7, 8],

There is, at present, controversy over whether SAA3 is a functional gene. No cDNA has yet been isolated for this gene and diserepaneies in the reported sequence of SAA3 include the presence ofa premature stop codon in exon 3 [9, 10], The fourth SAA gene was identified by Betts ct al. [4] in our laboratory and maps to within 10 kb of SAA2. A novel cDNA, designated c-SAA, has been isolated and encodes a predicted 130 residue polypeptide from which an 18 aa leader peptide is cleaved. The protein level does not increase during an aeute phase response, indicating that the protein is likely to be constitutively expressed. Furthermore, appro,\imately 50';^) of the molecules are glyeosylaled [I]. The SAA genes are highly conserved across several species, indieating an essential funetion for the SAA proteins. As yet, no role has been assigned to the protein. However, SAA3 has homology with a rabbit protein known to induce eollagenase, whieh is of importance in the normal maintenance of eonneetive tissue and in disease

703

704

G. Watson et al. (a) 1

ggtaccaacT ATAGCTCCAC CGGCCAGAAG ATACCAGCAG CTCTGCCTTT

50

51

ACTGAAATTT CAGCTGGAGA AAGGTCCACA Ggtgaggttt cttccaagga

100

101

g

M R L F T G I V F t tctcttcagC ACAATGAGGC TTTTCACAGG CATTGTTTTC

150

151

C S L V M G V T S E S W R S F F K TGCTCCTTGG TCATGGGAGT CACCAGTGAA AGCTGGCGTT CGTTTTTCAA

200

201

E A L GGAGGCTCTC CAAggtaaga actctggagg agtgagggc

t

250

251

V G D M G R A Y W D I M I ttctttccaG GGGTTGGGGA CATGGGCAGA GCCTATTGGG ACATAATGAT

300

301

S N H Q N S N R y L G N y D Y A R ATCCAATCAC CAAAATTCAA ACAGATATCT CTATGCTCGG GGAAACTATG

350

351

R G P G G V W A A Q A A K L I ATGCTGCCCA AAGAGGACCT GGGGGTGTCT GGGCTGCTAA ACTCATCAGg

400

401

taacacagat ccccggggac

ctcctttctt cgggttgcag

•550

•351

R S R V Y L Q G L I D Y Y L F G N CCGTTCCAGG GTCTATCTTC AGGGATTAA7 AGACTACTAT TTATTTGGAA

500

501

S S T V L E N E K D S K S A E E ACAGCAGCAC TGTATTGGAG GACTCGAAGT CCAACGAGAA AGCTGAGGAA

551

W G R S G K D ? D R F R P D G L P TGGGGCCGGA GTGGCAAAGA CCCCGACCGC TTCAGACCTG ACGGCCTGCC

600

601

K K Y TAJiGAAATAC

cgctcctctg ctcrcaggga aactgggctg

650

651

tgagccacac acttctcccc ccagacagga cacagggtca ctgagctttg

700

701

tgT.ccccagg aactggtata gggcacctag aggtgttcaa taaatgtttg

753

751

tcaaattgaa t-tgttggrg gaaactggga acattgaggc agactttctg

800

BOl

ggaagaatgg tcatctgagg ccatacggaa gataaacagc accatggcag

850

esi

atgagcctgt gactgagggg gaccaaggac rcccag

88 6

1kb

Kpnl

Xbal Psll

Bglll

Bglll

Psll Pvull Ncof SmalPsll

(c) SAA4 SAAip SAA2P SAA3

INTRON 1 1121 576 562 1330

INTRON 2 3548 2138 1831 2640

INTRON 3 919 372 369 314

PvullHindlll

Genomic and Protein Structure of SAA4

705

states such as rheumatoid arthritis [10, 11], SAA is also thought to have important effects in reverse eholesterol transport. The enzyme lecithin: cholesterol aeyltransferase (LCAT) eonverts cholesterol to cholesteryl ester in a reaetion which occurs on high density lipoprotein (HDL) and requires apollpoprotein as an activator. The activator Apo A-1, like SAA. is found associated with HDL, At low concentrations. SAA markedly increases the aclivalion of LCAT by Apo A-1, resulting in reduced LCAT aetivity and increased levels of cholesterol [12], This may explain the observation thai mortality from cardiovascular disease is increased in chronic inflammatory conditions such as rheumatoid arthritis [13],

orienlations [!5], Single-stranded templates were prepared from I-ml cultures of TG! or JM IO9cellsinfected with a single M 13 recombinant plaque [16]. Sequencing was performed using the Sequenasc DNA sequencing kit (United States Biochemical Corporation, Cleveland, OH. USA) and universal, —40 or reverse primers. Computer analyses. Sequences were assembled using the Microgenie sequence analysis program (Beckman Instruments Inc. Palo Alto, CA. USA) [17], The sequence was translated and the primary peptide structure and hydrophilicity of the predicted protein were determined using the same program, SAA prolein sequences were obtained from the Swiss-Prol database and were aligned and their structures analysed using the University of Wisconsin Genetic Computing Group programs [ i 8], Relationships between the proteins were defined using the Phylip programs.

We have now isolated the gene for human SAA4 and have determined its structure, and exon and intron/exon junction sequences and have shown that the sequence is the same as that of c-SAA [I], We show thai a pseudogene recently isolated by Sack & Talbot [2] and described as corresponding to ihe SAA4 locus which was first reported by Betts et al. [4] difl'ers in structure, restriction map and nucleotide sequence from SAA4. We have analysed the predieted protein and present a comparative analysis ofthe SAA4 protein with the known members of ihe SAA family. We demonstrated significant structural differences in SAA4 which predict a serum amyloid A protein with a distinct function from ihe other human SAA proteins.

RESULTS

MATERIALS AND METHODS Isolation of DNA. DNA was isolated from plasmids and cosmids by the method of Birnboim & Doly [14] and purified on pZ523eolumns(5 Prime-'.I Prime Inc. PA. USA), Fragments of DNA required for sequencing were isolated from agarose using prepagene (Northumbria Biologicals Ltd. Northumberland. UK) after appropriate restriction enzyme digestion and gel electrophoresis. Sequence analysis. Fragments of DNA were cloned into M 13 mpl8 and MI3 mpl9 to obtain clones in both

Structure of the SAA4 gene A cosmid library of genomic DNA from eell Ime HPB-ALL (gifl of Dr P. Brickcll) was screened with a cDNA probe from cxons 2, 3 and 4 of SAA 1. The clone cos 2,3 was isolated which carried both SAA2 and a novel gene, designated SAA4, 10 kb downstream from SAA2 [4]. A 6.4kb Kpnl IlindlU fragmeni carrying the SAA4 gene was isolated from cos 2.3 and cloned into the bluescript vector pKS, to produce the clone pKSH, Fragments of pKSH were cloned inlo MI3 and sequenced lo obtain the nucleotide sequence of the coding regions and of tbe exon/ intron boundaries. These are presented in Fig, la witb the predicted protein sequence. The structure of the SAA4 gene is presented in Fig. lb, together wilh tbe sequencing strategy. The sizes and positions of the introns were established using DNA sequence and restriction map information. The sizes of the SAA4 introns are shown compared with those of the other buman genes in Fig. lc. Inlrons 2 and 3 are significantly longer than those of SAAl, SAA2 and SAA3 and intron 1 is significantly longer than those of SAAl and SAA2.

FIG. 1, Structure ofthe SAA4 gene, (a) The nucleotide sequence ofthe exons and intron/exon junctions orSAA4 is presented in numbered rows. Exon sequences are shown in upper case and intron sequence in lower case. Dots indicate intron regions (sequence not shown). The predicted protein sequence is shown in one letter amino acid code, above the nucleotide sequence, (b) The structural gene for SAA4, Open boxes represent exons. Appropriate restriction sites are shown above the map. Fragments used in the determination of nucleotide sequences are shown below the line, (c) Intron sizes of the human SAA genes, Intron size in base pairs is shown for each ofthe human SAA genes.

706

G. Watson el al.

(a) Human SAA4 Mink SAAl Mink SAA2 Dog SAA Cat SAA Bovine SAA Horse SAA Hamster SAAl Hamster SAA2 Hamster SAA3 Mouse SAAl Mouse BAA2 Human SAAl Macaque SAA Rabbit SAA Human SAA3 Duck SAA

1 MRLFTGIVFC SLVMGVTSES WRSFFKEALQ GVGDMGRAiW MKLFTGLIFC SLVLGVSSQ. WYSFIGEAAQ GAWDMYRAYS MKLFTGLIFC SLVLGVSSQ. WYSFIGEAVQ GAWDMYRAYS Q. HYSFVGEAAQ GAWDMLRAYS E. WYSFLGEAAQ GAWDMWRAYS WMSFFGEAYE GAKDMWRAYS LLSFLGEAAR GTWDMLRAYN MKPFVAIIFC FLVLGVDSQR WFQFMKEAGQ GTRDMWRAYT MKPFLSIIFC FLVLGVDSQR WFQFMKEAGQ GTRDMWRAYT MKPFLAIIFC FLILGVDSQR WFQFMKEAGQ GSTDMWRAYS MKLLTSLVFC SLLLGVCHGG FFSFVHEAFQ GAGDMWRAYT MKLLTSLVFC SLLLGVCHGG FFSFIGEAFQ GAGDMWRAYT MKLLTGLVFC SLVLGVSSRS FFSFLGEAFD GARDMWRAYS RS WFSFLGEAYD GARDMWRAYS MKLLSGLLLC SLVLGVSGQG WFSFIGEAVR GAGDMWRAYS MKLSTGIIFC SLVLGVSSQG WLTFLKAAGQ GAKDMWRAYS DNPFTR GGRFVLDAAG GAWDMLRAYR

50 DIMISNHQNS DMIEAKYKNS DMREANYKKS DMREANYKNS DMREANYIGA DMREANYKGA DMREANYIGA DMREANWKNS DMREANWKNS DMREANWKNS DMKEANWKNS DMKEAGWKDG DMREANYIGS DMKEANYKNS DMREANYINA DMKEANYKKS DMREANHIGA

51

100 L.FGNSSTVL IKYGDSGHGV FKYGDSGHGV LRFGDSGHGA FRHGNSGHGA FKGTTSGQGQ FSFGGSGRGA GRGI GRGI GRGA GRGH GRGH GHGA GHGA GHGA GDHA GGVSGRGA

Human SAA4 Mink SAAl Mink SAA2 Dog SAA Cat SAA Bovine SAA Horse SAA Hamsler SAAl Hamster SAA2 Hamster SAA3 Mouse SAAl Mouse SAA2 Human SAAl Macaque SAA Rabbit SAA Human SAA3 Duck SAA

DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY DKYFHARGNY

Human SAA4 Mink SAAl Mink SAA2 Dog SAA Cat SAA Bovine SAA Horse SAA Hamsler SAAl Hamster SAA2 Hamsler SAA3 Mouse SAAl Mouse SAA2 Human SAAl MacaqDe SAA Rabbit SAA Human SAA3 Duck SAA

101 EDSKSNEKAE EDSKADQAAN EDSKADQAAN EDSKADQAAN EDSKADQ... EDSRADQAA EDSRADQAAN EDSRADQFAN EDSRADQFAN ADSP.ADQFAN EDTIADQEAN EDTMADQEAN EDSLADQAAN EDT EDSMADQAAN EDSLAGQATN EDTRADQEAN

DAJAQRGPGGV WAAKLISRSR WAAKVISDAR D/'AQRGPGGA WAAKVISDAR D/iAQRGPGGA WAAKVISDAR D/'AQRGPGGA WAAKVISDAR D?AQRGPGGA WAAKVISDAR D/iAKRGPGGA WAAKVISDAR D;IAQRGPGGA WAAKVISDAR D^AQRGPGGA WAAKVISDAR WAAKVISDAR D^AQRGPGGV WAAEKISDGR D^AQRGPGGV WAAEKISDAR WAAEAISDAR DAAQRGPGGV WAAEVISDAR WAAKVISDVR DAi/QRGPGGV WATEVISDAR Dfl'^RRGPGGA WAARVISDAR

VYLQGLIDYY ERSQRITD.L ERSQRVTD.L ENSQRITD.L ENSQRVTD.F ENIQRFTDPL ENFQRFTDR. EGFKRIT EGFKRMR EGIQRFT EAFQEFF ESFQEFF ENIQRFF ENIQKLL EDLQRLM ENVQRLT ENWQ

131 EWGRSGKDPD RFRPDGLPKK Y EWGRSGKDPN HFRPPGLPDK Y EWGRSGKDPN HFRPSGLPDK Y EWG EWG EWGRSGKDPN EWGRSGKDPN EWGRSGKDPN KWGRSGKDPN RHGRSGKDPN RHGRSGKDPN EWGRSGKDPN

HFRPHGLPDK FFRPPGLPSK FFRPPGLPSK HFRPAGLPSK YYRPPGLPDK YYRPPGLPAK HFRPAGLPEK

Y Y Y Y Y Y Y

EWGRSGKDPN HFRPKGLPDK Y KWGQSGKDPN HFRPAGLPEK Y AWGRNGGDPN RYRPPGLP

Getioniic and Protein Structure of SA.44

101

SAA4 (b)

Mouse SAA2 Rabbit. Monkey Mouse SAAl SAA3

Hamster Hamster SAA3 /--SAA2 //

/>-

Hamster SAAl

Covj

Duck

Mink SAAl

Fiti. 2, Comparison orthc SAA amino dcid scquenuL's, (a| (opposite) An alignment orSAA4 wilh tlic olhcr liunuin SAA ;ind animal SAA prolcin,s is presemed. The alignment was prepared usmg ihc'I-IIPLI'program ofGCG software, [!)ots indicale gaps inserted to optimize ahgnnicnls, (b) An unrooted tree showing prolein parsimony relationships between the SAA proteins. The tree was created using ihe PHOTPARS program of GCG soHware, from the alignment presented in (a).

Comparative analysis of the S.4A4 primary structure A comparison ofthe human and several animal SAA amino acid sequences is presented in the alignment in Fig, 2a. Substitutions in the leader sequences oflhe human genes are conservative al all positions relative lo SAAl and SAA2 and in all except the subslitulion of phenylalanine for serine at position 4 relative to SAA3. The amino terminus oflhe mature protein difTers from SAAl and SAA2 in 7 of the first 12 amino acids and

from SAA3 in 7 amino acids. Of these, the substitutions are conservative at positions 3. 7 and II with respect lo SAAl and 2 and al positions 2. 5 and 7 wilh respect to SAA3. In the protein region coded for by exon 3 of the SAA genes, a conserved region of 12 amino acids exists from amino acid 33 to amino acid 44, The conserved sequence is DKYHHARGNYDA and is invariable in all SAA proteins identified in both humans and in a variety of animal species (Fig. 2a). However, in SAA4 there are four substitutions at amino acids 33. 34. 36 and 37 which result

708

G. Watson et al. 10 B A A A

B A A T

SAA4 SAA1B SAA2A SAA3

A

A

A

T

T

B

A B

T T

T T

B A

B B

SAA4 SAA1B SAA2B SAA3

A

A

A

Q

SAA4 SAAIB S>W2A SAA3

T T T T

T

SAA4 SAAIB SAA2A SAA3

' ffl

A A A

A A A

SAA4 SAAIB SAA2A SAA3

A

A

SAA4 SAAIB SAA2A SAA3

T T T T

B B B T

B B B B

B B B B

B B B

A A A A

A A A A

B B

B B

B

T

B

B

B

T

B

ffl

B A A A

ffl

B A A A

ffl

SAA4 SAAl SAA2 SAA3

T

Region 1 f

A

A

A

B A

B A

A A A

B

A

^-^ A A A

A A A A

T A A

A A

A

30

A

A

A A

A A

B

B '

B

B

T T T

B A A A

A A

T A

T T A

T A

T T T T

B B

SO ffl

B

B

B

T T

ffl

B

B

B

ffl

A T A

B A A A

B A B A

B B B B

A A A A

Rft fiinn 1 A A A A A A A A

A

B T T T

B B

A

B

T

B B

B B

B

B

A A A A

A A A

A A

T

T T T

B

A A

A

110 T T T T T

ffl

A A

A A B

B

T

^n

A A A

ffl

B A A

T

A A T T

B

A T 7 T

1

A A A

Conserved rsciinn B B B B B B B B B B B

Rec;ion? A A A A A

B

B

A A B

T

T

A

T T T T

T

A A A

ffl

A

T

T T T A

ffl ffl

A

A A A

T T

A A A

ffl

ffl

B

A

B

T T

A A A

B A A A

20 A A A A

B

T

•in T

T

T T

T T

T

T

A

fin T A

B

BO B

T T

A A A

100

T

T T T

T T

B T T T

FIG. 3, Comparison of SAA secondary struclure, (a) Secondary structure comparison ofthe leader sequences using the MiCRtXJENir. program. Top line indicates position of the amino acid in the seqtience. The struclure of SAA4 is presenled in line 2; SAA 1^ structure on line 3; SAA2a structure on line 4 and SA A3 struclure on line 5, A represents a helix. T represents a turn and B represents a /J sheet structure.

in an alteration in charge. At amino acid 33 asparagine is substituted for aspartic acid, at amino acid 34 arginine is substituted for lysine, at 36 leucine is substituted for phenylalanine and at amino acid 37 tyrosine is substituted for histidine. The substitutions are conservative at amino acids 34 and 36. The Ca^* binding site sequence GPGG is a further conserved sequence found at amino acids 48-51 in all SAA proteins [19] and is retained by SAA4. The region coded for by exon 4 differs markedly from the other human genes. In particular. a 24-bp insertion relative to other human SAA

genes is present in exon 4. The insertion maintains the reading frame and results in an octapeptide insertion with the sequence DYYLFGNS, The octapeptide insertion has homology with amino acids in equivalent positions in the SAA proteins of several species including cat [20], dog [21]. cow [22]. mink [23] and horse [24] (Fig. 2a). Interestingly, the insertion in SAA4 forms a giycosylation site not present in any other SAA protein including those with an equivalent region. Using the PROSITE program, no additional motifs were found in SAA4 which were not also present in the other human proteins. A search of the GenEmbl

Genomic ami Protein Structure of SAA4 1

I Region 1

I 1 50 Conserved region

I Region 2

709

I 100 Region 3

-2 2

0

. ... ^

r

-1

*

.,„

'

'',..._.-"

[-

*

_

,?."_....,





..

•"

**• '•

-2 2

0

:.'.'...•.

-1

...'.'..."

'

'

./":•

.'...., '

,

_. „.._....', . . , : , ,

,_

.

^.

.,'..,'""...,.,'"...,..,:*,-

*

-2 2 1 0

- 1

-

..

- • - _ - - - • -

.'

. ^

.

-.'

, • .

• • - - - "

: - • -

*'•

.

.

. _ - - . - - - . . - - - - . . ^ - - ^ ,

; - ^ -

'

-2 FIG, 4, Comparison of hydrophilicity of SAA proteins, Hydrophilicity plots of the SAA proteins were determined using the MirR(Kii;NiK program, Hydropalhic index scale from 2 lo - 2 is provided on ihc left. Numbers above the traces indicate the position in the umino acid sequence. Traces are provided for SAA4—lop line, SAAl^—^line 2, SAA2c(—line 3 and SAA_1—line 4,

database using the TFASTA program revealed thai the insertion polypeptide is a unique sequence not possessed by any other protein. Using the Phylip program protein parsimony algorithm and the alignment shown in Fig, 2a, SAA4 was found to be more closely related to the rabbii and mouse proteins than lo the olher human SAA proteins, or ihose of the various animal species with an equivalent insertion region, A tree is provided in Fig. 2b which illustrates the protein parsimony of SAA.

Cotnparalive analysi.s of the secondary structure ofSAA4 The secondary structure of the human SAA proteins was analysed using the MICROGENIF program and the results arc presented in Fig. 3. All of the leader sequences are predominantly hydrophobic (data not shown). Several differences exist in the secondary structure of the mature proteins. The SAA4 amino terminal i helix from amino acid 1 to 12 (labelled

710

G. Watson et al.

region I) lorms an amphipathic helix. The remaining human SAA proteins have an equivalent amphipathic helix between amino acids 9 and 32 of the mature proteins, which in SAA I and 2 is known to be involved in lipid binding (25]. The conserved region from amino acids 33 to 44. in which SAA4 has four subslilulions. forms predominantly the same secondary structure as the other human SAA proteins. The regions from amino acids 53 to 67 in SAA I and 2 and from amino acids 58 to 66 in SAA3 form a second predominantly x helical structure. In SAA!, this region forms a further amphip:tthic region. The equivalent region in SAA4. labelled region 2. forms a predominantly fi sheet region. with a single interruplion at the position of the exon 3/4 boundary at amino acid 59. This ji sheet region extends to the first 4 amino acids of the 8 amino acid insertion. A third a helical region, region 3, from amino aeids 81 to 94. docs not form an amphipathic helix in SAA4. but does form an amphipathic helix in the olher human proteins. There are no significant structural differences in the remainder of the protein. Hydrophilicity Hydrophilieity plots oi ihc SAA4. SAAl/J. SAA2a and SAA3 proteins arc presented in Fig. 4. In sharp eonlrasl lo the other human proteins, Ihe amino terminus of the mature SAA4 protein is hydrophilic in nature. Overall, however, region I ofSAA4 is more hydrophobic Ihan the equivalent J. helices in the other human genes The N"tcrminal region of SAA4 to amino acid 82 is more hydrophobic than thai ofSA A1. SAA2 or SAA3. The effeci of the unique substitutions within the conserved region is to make ihis region more hydrophobic than in any ofthe human SAA proteins. Within region 2 from the exon 3/4 boundary onwards. SAA4 is considerably more hydrophobic than the other SAA proteins. The exon 3/4 boundary falls wiihln a hydrophilic motif from amino aeids 57 to 61 in SAA4 which corresponds to the disruption in the ji sheet structure described above. The equivalent motifis more hydrophilic in the other proteins. The insertion encodes a highly hydrophobic moiifand ils C terminus marks the boundary between this and ihe highly hydrophilic C terminus of the protein. The C terminus of SAA4 including region 3 is more hydrophilic than SAA I, 2 or 3

and docs not have the hydrophobic motif at amino acids 96 and 97 which is present in ihe others. Using ihe GCG Peptideslruclure program, ihe conserved region, the exon 3/4 boundary and the carboxy terminus from amino aeid 100 were found to be potential surface sites in all of the human SAA genes. In addition, SAA4 has a potential surt'ace site at the amino terminus of the mature proiein. These surface sites were also found to eorrcspond to anligcnie regions.

DISCUSSION The SAA4 gene is 6.2 kb in length from exon 1 to the polyadcnyiation site and is organized into 4 exons and 3 introns. The cxon, inlron boundaries conform to the consensus sequence with exceplions only at the more variable positions. The sequence obtained from ihe coding regions of SAA4 confirms thai il is the same as the cDNA sequence described by Whitehead et ai fl]. The SAA4 gene is predieted to encode a funclional protein. A recent report that the clone GSAA4 corresponds to SAA4 and is a pseudogene differs from our lindings in several important points: (1) exon 1 of GSAA4 falls within a 9kb B}il\ 1 fragment. Exons 3 and 4 orSAA4 fall within a 9-kb %/l I fragmcnl. but due to the presence of B^l\ I silcs wilhin the gene, exons I and 2 do not; (2) the restriction maps oi' SAA4 and GSAA4 are different. SAA4 has no HindlM sites or FAOR\ sites within the gene, but these are pre.sent in GSAA4; (3) exons I and 2of SAA4are not detected under standard hybridiiralion conditions using probes from exon 2 ofSAAi, due to the low degree of homology between SAA4 and SAA I in these regions. However, we have now identified all four exons oriheSAA4 gene. Exon 2 of GSAA4 eould not be found by Sack & Talbol [2]: (4) the exon I nucleotide sequence of GSAA4 is 30 bp in length and is not ihe same as the 72-bp exon I sequence orSAA4; (5) In GSAA4. the 3' donor splice sile at Ihe end of exon I isAG GC. In SAA4 this splice site is AG/GT. We therefore conclude that GSAA4 does nol correspond lo SAA4. Kurthermorc. the results presented in this paper indicate thai SAA4 is a funclional protein. The SAA4 protein is mosl closely related to the rabbil and mouse protein parsimony analysis and ihe SAA4 gene is most similar in structure to the SAA3 gene, by comparison ofintron sizes. From

Genomic und Protein Structure of SAA4 Ihc unrooted tree analysis, SAA4 appears to have developed or retained the eight amino acid insertion independently of the other animal species within this region of the tree. Analysis oflhe human SAA proteins revealed significant differences between SAA4 and the olher proteins. The overall effeet of substitutions in the leader sequence is to increase the polarity of this region. The replacement ofthe short a helix with a/# sheet struciure in this region may indicate some alteration in the secretion process of SAA4 (Fig. 3a). Turnell identified the N-terminal amphipathic a helix as a lipid-binding. trans-membrane region potentially responsible for the association of SAAl and SAA2 with HDL[19], In SAA4, the Nterminal region is hydrophilic suggesting that this is a surface site and not a buried trans-membrane region. However, by analogy with SAAl and 2, the hydrophobic region I a helix represents a potential lipid-binding domain of SAA4 [19]. SAA4 also retains the Ca-* binding region GPGG found in the other proteins, A novel finding is that the region conserved throughout the species is variant in SAA4, having four substitutions. Although secondary structure is maintained, increased hydrophobieity may modify the functional efficiency of this region in SAA4, Region 2 represents a significant structural difference from the other human SAA proteins, having an extended (.i sheel region in place of an a helical structure and being considerably more hydrophobic than the equivalent region in the other human SAA proteins. McCubbin et al. [25] suggest that reduced helical structure may indicate or reflect a stronger tendency to aggregate in murine SAA. Although SAA4 may aggregate more readily than the other human SAA proteins, it is produced at constitutively low levels and is not the major constituent of amyloid fibrils [1], We predict that this distinctive, significant region confers a specific function on SAA4 relative to the other human proteins, perhaps by affecting location ofthe SAA4 protein. In SAAl this region forms an amphipathic helix and SAA4 would be expected to bind lipid with a lower efficiency than SAAl. This effect is emphasized by the absence of an a helix with an amphipathic nature in region 3, as region 3 forms an amphipathic helix in all human SAA proteins except SAA4. This may indicate a further functional difference between SAA4 and the other

711

human SAA proteins and SAA4 is therefore predicted to exhibit a weaker association with HDL than SAAl or 2. This may explain the observations by Whitehead et al. [I] that c-SAA is found at low concentrations in the HDL fraction of plasma and that its level does not increase in the acute phase response. The unique insertion sequence of SAA4 within region 2 is also predicted to affect the function of the protein as it forms a hydrophobie domain and produces a glycosylalion site. Taken together, these features suggest that SAA4 has a function whieh is distinct from that of the other human SAA proteins. We arc currently investigating this hypothesis.

ACKNOWLEDGMENTS We would like to thank the Clinical Research Centre computing staff for help with the programs.

REFERENCES 1 Whitchc:id AS, DcBecr MC. Steel DM ei al. Idcntilicalion of novel members ofthe serum amyloid A prolcm supcrfamily as constiiutivc apoltpoproteins of high density lipoprotein, J Biol Chem 1992;267:3862 7, 2 Sack GH, Talbot CC, The human serum amyloid A locus SAA4 is a pseudogene, Bioehem Biophys Res Commun 1992:183:362 6. 3 Benditt EP. Eriksen N, Amyloid prolein SAA is associated wilh high density lipoprotein from human serum, Proc Natl Acad Sci USA 1977:74:4025 8, 4 Betls JC. Edhrookc MR, Thakker RV. Woo P. The human acute-phase serum amyloid A gene family: structure, evolution and expression in hepatoma cells, Scand J immunol 1991:34:221-32, 5 Kluve-Beekerman B, Naylor SL, Marshall A, Gardner JC, Shows TB, Benson MD, Localization of human SAA gcne(s) to chromosome 11 and detection of DNA polymorphistiis, Bioehem Biophys Res Commun I986;137:I1% 204, 6 Parmelee DC, Titian K, Ericsson LH, Eriksen N, BenditI EP, Walsh KA, Amino acid sequence of amyloid-related apoprotein (apo-SAA) from human high density lipoprotein, Bioehem !982:21:3298 303. 7 Sipe JD. Colten HR, Goldberger G ei at. Human serum amyloid A (SAA): biosynthesis and postsynthetie processing of pre SAA and strtictural variants defined by complementary DNA, Bioehem 1985:24:2931-6, 8 Kluve-Beckerman B, Long GL, Benson MD, DNA sequence evidence tor polymorphic forms of human

712

9 10

11

12

13

14

15 16

G. Watson et al.

serum amyloid A (SAA). Biochem Geiiel 1986:24:795-803. Sack GH. Molecular cloning of human genes for scrum amyloid A. Gene l983;2I:19-24. Kluve-Beckerman B. Brinckerhoff C, Benson MD. Sequence analysis ofa third human SAA gene. In: Natvig JB. Korre O, Husby G et til., eds. Amyloid and Amyloidosis, Proceeding of the VI"' (nlernational Symposium on Amyloidosis. Norway: Kluwer Academic Ptiblishers. 1990:24 7. Brinckerhoff CE, Mitchell Tl, Karmilowicz MJ, Kluve-Beckerman B, Benson MD. Autocrincinduetion of collagenase by serum amyloid A-like and Bimicroglobulin-like proteins. Science 1989;24:?:6657. Steinmetz A, HockcG, Saile R. Puchois P. Friichart J. Influence of serum amyloid A on cholesterol esterification in human plastna. Bioch Biophys Acta t989;1006:173 8. Pineus T. Callahan LF. Taking mortality in rheumatoid arlhritis seriously—predictive markers, socioeconomic status and coniorbidily. J Rheumatol 1986:13:841 5. Birnboim HC. Doly J. A rapid alkaline e.xiraction procedure for screening recombinant plasmid DNA. NucI Acid Res I979;7:15I.V22. Messing J. New M 13 vectors for cloning. Methods in Enzymol I983;1OI:2O 78. Sambrook J. Fritsch FF. Maniatis T. Molecular Cloning: a Laboratory Manual. 2nd edn. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, 1989.

17 Queen C. Korn LJ. Microgenie sequenee analysis program. Nucl Acid Res 1984:12:581-9. 18 Devereux J. Haeberli P, Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucl Acid Res 1984:12:387 95. 19 Turnell W. Sarra R. Glover ID et al. Secondary structure prediction ol" human SAA|. Presumptive identification of calcium and lipid binding sites. Mo! Biol Med I986;3:387 407. 20 Kluve-Beckerman B, Dwulet FE, Di Bartola SP. Benson MD. Primary structure of dog and cat amyloid A proteins: comparison to human AA. Comp Biochem Physiol I989;94B:I75 83. 2! SellarGCDeBeerMCLeliasJMcffl/. Dog serum amyloid A protein. J Biol Chem I991;266:35O5 10. 22 Benson MD. Di Batola SP, Dwulet FF. A unique insertion in the primary structure of bovine amyloid A A protein. J Lab Clin Med 1989; 113:67 72. 23 Marhaug G. Hushy G. Dowton SB. Mink SAA protein: expression and primary structure based on cDNA sequences. J Bioi Chem 1990;265:10049-54. 24 Sletten K. Husebekk A. Husby G. The primary strueture of equine serum amyloid A protein. Scand J Immunol 1989;30:l 17 22. ' 25 McCubbin WD. Kay CM, Narindrasorasak S. Kisilevsky R. Circular-dichroism studies on two murine serum amyloid A proteins. Bioehem J 1988:256:775 83.

Received 27 April 1992 Accepted in revised form 26 June 1992

Analysis of the genomic and derived protein structure of a novel human serum amyloid A gene, SAA4.

We have determined the structure of the novel SAA gene, SAA4. The gene is 6.2 kb in length and comprises three introns and four exons. Introns 2 and 3...
3MB Sizes 0 Downloads 0 Views