A year in the life of the immunoglobulin superfamily.

Q

Immunozogy Today, voL 8, No. 10, 1987

-rt, yt t4 5

A year in the life of the immunoglobulin superfamily super[amily of molecules with immunoglobulin-like obmains has recently been gaining new members- largely on the basis of sequence homology. HereA/an Williams reviews this new work and reveals how the comparison of sequence patterns enablesdecisionson membership to be made. Accommcx~tion of the new structures demands the provision of new categories, and forces the abandonment of the conserved disulphide bond as the last invanant characteristic of an immunoglobulin-type domain. They may, however, provide more dues to the origins and evolution of the immunoglobulin supeffamily. The concept of the immunoglobulin (Ig) superfamily as a set of structures involved in basic cell surface recognition events1 has been greatly strengthened by sequences published over the past year or so. In particular, Igrelated structures found on neural but not lymphoid tissues have been identified (Fig. 1)including the neural cell adhesion molecule (N-CAM) (Refs 2, 3), myelinassociated glycoprotein MAG (Refs 4-6j and Po, the major glycoprotein of peripheral myelin4,7. The plateletderived growth factor receptor (PDGFR) (Ref. 8) and colony-stimulating factor-1 receptor (CSF1R) have also been identified as being Ig-related4, as has the mouse macrophage Fc receptor (FcR) (Refs 9 and 10). Most recently cardno-embryonic antigen (CEA) has joined the family11-13, possessing a flamboyant structure with seven Ig-related domains (Fig. 1). T-cell antigens have also featured recently with the reports of sequencesfor CD1 (Ref. 14), CD2 (Refs 15,16), CD3 ~ chain (Ref. 17), and CD8 chain II (Ref. 18) along with further analysis of the Ig characteristics of CD4 antigen 19 and the (.u.~ -/and G chains2o. CD5 antigen has also been claimed as Ig-related21 but I would query this assignment (see below). Finally, some structures appear to break the generalization that Ig superfamily molecules are at some stage cell surface molecules, with secreted or degraded molecules as possiblesoluble forms. These structures are the serum glycoprotein alpha 1B-gp which has a five-domain structure22 and the link protein of basement membranes that has a V-like domain at the amino terminus of a chain of 334 amino acids23.24. Alpha 1B-gp may yet be found in a cell surface form but it seems unlikely that the link protein is expressed as an integral plasma membrane protein since its function is to bind together the hyaluronic and proteoglycan components of the basement membrane matrix. In considering these structures, I identify a set of domains that constitute a second category of sequences that may be folded as Ig C domains. I also propose the abandonment of the conserved disulphide bond as the last invariant characteristic of the Ig-related domains, and consider functiona! and evolutionary aspects of the Ig superfamily.

298

MRCCellularImmunologyUnit,Sir WilliamDunnSchoolof Pathology, Universityof Oxford,Oxford,OXl 3RE,UK

Alan F. Williams New sequencepatterns Immunoglobulin domains show two related but distinct folding patterns, namely the V- and C-domain patterns (Fig. 2, centrepages/:5.26. These structures are similar with two ~-sheets forming the characteristic sandwich of the Ig fold which is stabilized by the conserved disulphide bond. In the 13-strands, hydrophobic and hydrophilic amino acids alternate, with the hydrophobic side chains pointing inwards to form the interior of the sandwichz7. The main structural difference between V and C domains is that the V-domain fold has an extra loop in the middle that forms the second hypervariable regions in antibodies. This loop is made up of 13-strandsC' and C" in the centrefold. Some conserved sequences are common to V and C domains while others are specific to either the V or C domain types. The conserved patterns often involve substitution of similar amino acids, and favoured exchanges are indicated by the Dayhoff scoring matrix which is based on the frequency of amino acid substitutions seen between equivalent molecules in distantly related species28. Sequences of V of V-related domains are shown at the top of the centrefold (V-SET) while the C or C-related domains found in antigen receptors or major histocompatibility (MHC) antigens are shown at the bottom. The latter sequences are called the C1-SET to distinguish them from a new grouping that I call the C2-SET which forms the middle group of alignments. The alignments begin at 13-strand B, which contains one Cys of the conserved disulphide bond, and continue to 13-strand F containing the other Cys. The exclusion of 13-strands A and G from the alignments simplifies the data and excludes the main regions where highly conserved sequences in the immunoglobulins may exist to mediate interactions between Ig chains25. However, in the statistical analyses discussed below, 13-strandA and G sequences are included. In Fig. 1 all putative Ig domains are placed in the V- or C1- or C2-SETS. Sometimes these assignments are somewhat arbitrary and in allocating the C-related domains to the C 1- or C2-SET, sequences have been ass;gned as C2 unless they have the typical C1 pattern (ALIGN analysis as discussed below generally supports this bias). The CD2 antigen domain II sequence shows one sequence in the C2-SET that is somewhat atypical. Conserved residuesthat are characteristic of the whole superfamily are centrally placed in 13-strandsB, C, E and F, and these strands make up the core of the Ig fold in both V and C domains29. In contrast conserved patterns for V-SET and Cl-SET sequences are at the bends or edges of 13-strands. In the V-SET the sequence Asp-XGly/Ala is commonly found adjacent to 13-strand F and this correlates with an Arg (or Lys, not shown) at the base of 13-strand D. In Ig domains of known structure the Asp and Arg residues form a salt bridge and this may also i~) 1987. ElsevierPublications, Cambridge 0167- 4919/87/$02.00

Immunology Today, vol. 8, No. 10, 1987

occur for other sequences in the V-SET (Refs 30, 31). In the Cl-SET the bend between 13-strands B and C shows highly conserved residues; this is notable given that in V domains the bend between these strands constitutes the first hypervariable region, which has many possible sequences. The new grouping, the C2-SET, contains sequences that are mostly from the recently determined structures. They are grouped together because although they commonly resemble V domains in I~-strands E and F, they cannot contain the loop between I~-strands C and D characteristic of V domains. In statistical tests using the ALIGN algorithm, these sequences generally score best within the set but also give good scores with some V-SET and C 1-SETsequences (Fig. 3). A characteristic feature of some C2-SETsequences is an apparent loss of part of the middle of the domain. This is seen in CEA domains III, V (see centrepages) and VII, and in the CD3 I~ and sequences, and also in the mouse CD3 ~ sequence which has 8 fewer residuesthan human CD3 ~ between the Cys residues2o. In some cases the fold may consist of only three clear 13-strands in each ~-sheet (A,B,E and C,G,F) with a short interconnecting sequence across the top of the domain between 13-strandsC and E. A more extreme divergence from the typical Ig pattern is seen in domains that appear to be Ig-related but lack the conserved disulphide bond previously thought of as an invariant feature of the Ig fold 3o. The disulphide bond would not be expected to be essential for folding of the domain and a functional antibody exists in which the Cys of 13-strand F in the VH domain is replaced with a Tyr residue32. Domains lacking the disulphide bond can also be argued for the T-cell antigens CD2 (domain I)~6 and CD4 (domain III)~9, for the PDGF and CSF1 receptors (domains IV)4 and for the first domain of CEA (Ref. 11). These domains from CD2, PDGFRand CEA are shown in the centrefold. Some of the conserved 13-strand patterns are present and in the three cases shown the hydrophobic amino acids lie, Val, Val and Val, Met, Leu are aligned at the positions of the conserved Cys residues; these could point inwards and contribute to the hydrophobic interior between the 13-sheets. Criteria for inclusion in the Ig superfamily It is commonly thought that the evolution of the Ig superfamily has involved divergence of sequences within constraints imposed by conservation of basic elements of the Ig fold. Initially, the argument for an Ig relationship is made on the basis of sequence similarities, with the ultimate prospect of this being tested by structure determination by X-ray crystallography. Thus far the similarity of structure has been reposed only for Ig V and C domains and 132-microglobulin~3. Other indications of structure can come from determination of disulphide bonds and measurement of [&-sheet or ~-helical content by circular dichroism. Secondary structure predictions can be used to test domain assignments~4 but these require interpretation and are about 60% accurate on average. The exon-intron gene structure can also indicate domains since Ig-related domains are usually confined to one exon. However, this is not always the case and the nucleotides coding for the Cys residues of the disulphide bond are separated by introns in the genes coding for domain I of CD4 antigen3s and all the N-CAM domains2.

~.~ N-CAM ~%

MAG

PgGFR ~ | CEA CSF1R C ~

TT TTTTTT, TTTTT TTTTTT TTI'TYI'T tttttt{ttt

TcR

C04 i

CD3

.8

OC CD3

CD1

CD8

m

~"

CD2

~"Z- I

¢ooll

Fig.1.Ig superfamilystructuresat cellsurfaces that haverecentlybeen sequencedor ana~ed. Thesourcesof the moleculesare: Po(rat), N-CAM(chicken),MAG (rat), PDGFR(mouse),CEA (human), FcR(mouse),CD3 (human), TcR(mouse), CD7(humanR4B3), CD8 (rat), CD4 (human) and CD2 (rat). The CD3moleculesare shown in associationwith TcRa and ~ chains. Thedrcles show s~ments that may ~rm ~-relat~ domains_with at !e_a~tthr~ R ~_rand~~ ~ of two ~-sheets. The names V, C1 and £_2inside the circlesindicate the possible domain type as defined in the text. The~ symbolsindicate putative disulphidebonds that appearsimilarto the conserved Ig disulphide bond and the ~ symbols indicatepotential sites TorN-linkedglycosylation (Asn-X-Thr/Ser). Theamino-terminalpart of CD1 is basedon unpublisheddata and shown with permission from L. Martin, F. Calabiand C Milstein. In CO4 the secondsegment shown without a circlehassomesequencesimilaritiesto Ig but is far from typical. In CD1 the s~ments shown as wavy lines have no recognizablesimilarity to Ig supeffamily domains. Ig superfamily structuresnot shown include the new sequencesalpha 1 8-gp and link protein which are not known to exist at cell surfaces plus the following molecules: Ig L (2 domains) and H (4-5 domains) chains, TcR~l(2 domains); Thy-1 (1 domain), MHC class I and T1 and Oa antigens (structures as for CDI), MHC class II ~ (1 domain) and ~ (I domain) chains; MRC OX-2 (2 domains); polylgR (5 domains).

Fig.2. Detailsof alignments. (Seecentrepagediagram.) Domains are numbered from the amino terminusas is obvious in Fig. I. CD4 has four domainsand the three segments in CD1 H chainsare labelled from the amino terminus as alpha I, 2 and 3. The residuesmarked with colours include conserved identities or alternative residues that are commonly se~, 3t Each position. The followingsequence: ~re ,efererx_3by,J~P: pro~en databasecodes given in parenthesesor literat."re references. Ig V lambda, mouse (L1MS4E);Ig VH, human (G1HUNM); Tcr V alpha, mouse (RWMSAV); Tcr V beta, human (RWHUW); CD4, human (R:.THUTA);CDS,rat18;poly IgR, rabbit (QRRBG);MRC OX-2, rat (TDRTOX);P~ r~tz," Thy-1, rat (TDRT);N-CAM, chicken2; MAG, rat4; PDGFR,mouseS;CEA,human11; alpha 1 B-gp, human (OMHUIB); FcR,mouseg; CD3 epsilon, humant7,• CD2, rat16; Ig C lambda, human (L2HU); Ig C kappa, human (K3HU); Ig C heavy, human (GHHU); TcRC beta, human (RWHUCY);TcR C gamma, mouse(R£~MSC1);~2-M, human (MGHUB2);MHClalpha 3, human (HLHUB2),MHC II be:a 2, human (HLHU3D);CD1 alpha3, humanlY.

299

Immunology Today, vol. 8, No. 10, 1987 ? +

+

+

Tt,llpl~lllc:I SXI!I?~I~s!TIES &S~-~_ ~.D BY TI~ ALI~N pIqOGIUU¢ lirJ[llliliN 2G-,~QPI~FAN:T.Y ¢,~ CON~'~O~ S|QOENClS AND S ~ u e u m ~ S IN Tim[ C2-SET CL)~

csz,a~ ~x~ l,x:

'"",,+, Ix|

:.7 313 ~

|.,

~3..+

(IZl

x

2,;l

Ixxx) ~

cv)

scan

(ztl lztx)

tzv) cv~ ,,, ~:x)

m

,

~

IS.=

is

~

,.,

(Ill)

CXVl

iv)

~

dleltt

,~.,.~

,s)

(zzl

'--e

:.:

x.4

1,6

a.3 ~.=

(z~l

iv)

(xl

tzz)

VLX ItCim (IV)

,+

~

a.z

s 3.+ 3.7 , =

cl~ {V)

IP~ (IV)

co3

CD3

(l)

(lI)

~

CONTROLS IL-31R

=.9 -.e

=.s ~ 3 . ' , r l ' ~ - - - I

"t.~

,+.,+~ 5.t 4."s 2.g

1.4

2.4 ~

4.0~ 3."I

".= ~

-£,.+

•

a.41

xo.+ a~.+ s

s.x 7.o ,."+r

.,

,.=

=.~

:.3

G,O "

6."I

"."

=.s

s.'z I . "~1 i ! ~ s I .3

,.,

-.S

.1

=.7

-.=

s.s 14.7

=.3

-.7

IS.~ +.___q~J x.s 8.31 3.1S1

l" ~,.+ ,,., ~ , 3 . 4

,.,

;! =

~ 5 , + 1

2.5"

3.3 I s.~

41.5

S.Sl

3.71

-.=

.e

,

3.6 3.___3_.~j

4.+.

=.s =.., "z L :."t

3.1

3.7

4.a|

-. -.5

.7

~

3.~i -.4 +,.s-,.

~

.4

,.,

-.0

"= 2.:l i ~

=~

-.

-.s .

i: ~ ~x . 6 ~

~ 2.2

4:l,

a.a

1.a

'.-+

-.a 2.7

~ 4 . s I 1.4 ~a I~.~ s . s l =.: -a.~ 13.7 =.* s.ol .

.~ .~ .~ :.~ ,,.~r S.31 =.s x.s I =.6 +.9 3.41 1,3 1.6

2.7 1 , . s

l+.o

:,.+ _"FT-, i_~_'+ =-11 '.-__. 3.0 =.l a.+

~.7

s.~ I s.=[ x.s I a.s

13.5

P~=lq (ZZ)

s.= 3.s s.9 ~.a x.7 o

3,;1~ g,3 =,3"17,3 ,3

clr~ (IV)

,.+ -"

.o

:~!

1

a.s .s .x

.I 66

-[

., -"

1.0

.~

~4

:.o

=.s

x.=

-.o

1.x

-.s

-.~'

.e

.8

.1

.3

.=

-.2

-.X

I ~ 3, Sequencesimilarit~ assessedby the ALIGNprogram between Ig superfamilyor ~ntrol seqc~,mcesand sequemesin the C2-SET. Sequ~Kes are from referencesin Fig. 2 legend exceptfor L-CAand lL-2 receptor where the ~:gmenl~ are shown in Fig. 4. In all ca~ domains for statical analysis are defined by identif~ng Cys residues equimlent to those forming the consermJ disulphide bond of Ig or re,dues thought to replacethemasin Fig.2 and takirKj20 residueson eitherside of em~eposition~ ff the arrmo terminusbeginslessthan20 residuesfrom the Cysthen sequences are induded from the leadersegment if amilable fin N-CAM(Ref II) only I I residuesbefore the ~ Cyswereavailablewhen the AUGNtests were run). Differencesin sequencelength in the middleof the domainare ignored. TheAUGNprogram wasscored using the matrixshown in the cenl~ld with a biasof 6 and gap penaltyof 6 (Ref 28). AUGNscorm of 3, 4and5 SD ind'mte chanceprobabilities of, a ~ + 10-3 10-s and I0 -z when randomized forms of the same sequexe are scored. Howewr theseprobabilitiesunderestimate the chanceresult with domains selectedas for 19~elatedSC~luencessince the mean and standard deviation of 682 conlml SDscotes(seetext) was O.54 3_ I.

3OO

At present, for many structures only the predicted sequence from cDNA is available and a criterion of significant sequence similarity should be used to identify candidates for the superfamily. The ALIGN program of , . ~ . . . . =t .-- scor~ similarities between test sequences in terms of the number of standard deviations that the score obtained with the best alignment of the real sequence is away from the random mean best score obtained by scrambling the sequence a given number of times (e.g. 1 0 0 - 1 5 0 ) . However, this ooes not take into account the conserved patterns of sequence that are so important in assessing relationships by eye. This can be overcome by testing a new sequence against a set of Ig-related sequences. If repeated good scores are obtained, this indicates the presence of a sequence pattern that is similar to the conserved pattern in the Ig-related set36. Figure 3 illustrates some scores obtained when sequences are tested by the ALIGN program against 27 domains assigned to the C2-SET. Scores for the C and V domains o[ an Ig X chain are shown along with those for N-CAM domain IV which is a typical C2-SET sequence. It can also be seen that CEA domains IV and V would be regarded as typical C2-SET sequences and that they match particularly well with N-CAM and MAG sequences. The second domains of PDGFR (sequence not shown) and CD2 (centrepages) are sequences that are less typical of the C2-SET but both give 10/27 ALIGN scores of -> 3 so against the C2-SET. Figure 3 also shows scores for PDGFR domain IV and CD2 domain I which both lack Cys residues. Both give a

pattern of good scores and CSF1R domain IV scores in a similar manner to the PDGFR domain IV sequence (not shown). The amino-terminal domain of CEA and domain Ill of CD4 are candidates for a V-like fold and the CEA domain gives 9/21 scores - 3 SDwith diverse sequences from the V-SET (not shown). CD4 domain III does not look convincing against the family of sequences with only 3/21 scores -> 3 so for human CD4 versus the V-SET. However, in this case a score of 7 SD with a 200 amino acid stretch of poly IgR that included CD4 domain III, and secondary structure prediction, were considered sufficient for domain assignment 38 given that CD4 is known to be Ig-related on the basis of its undeniably V-like amino-terminal domain 37. The pattern of disulphide bonds 38 and exons 33 in CD4 also supports the domain assignments shown in Fig. 1. Finally, the last two domains of CD4 match convincingly with CD2 domains I and II (Ref. 16) and the CD2 domains score well with other C2-SET (Fig. 3) and V-SET sequences 16. The clearest simple case for the inclusion of CD2 in the superfamily is made by comparing CD2 domains I and II with N-CAM domains IV and V. Over these regions (175 residues), scores of 7.2 and 6.3 so units were obtained in ALIGN analysis with rat and human CD2, respectively. This is particularly interesting since N-CAM and CD2 are both cell-adhesion molecules. It could be argued that the ALIGN scores are misleading due to sequence selection or because some sequence patterns may be common in cell surface membrane proteins. To test this I have chosen sequences from integral membrane glycoproteins including the leucocyte-common antigen (L-CA), the interleukin 2 (IL-2) receptor, and influenza neuraminidase and haemagglutinin, on the basis that they should have Cys residues at about the spacing of the conserved disulphide bond of the Ig superfamily with not more than two other Cys residues in between. In some cases a Trp was also present at about the 13-strand C position and two of the control sequences are shown in Fig. 4, while their ALIGN scores against the C2-SET are included in Fig. 3. Altogether 11 control sequences were tested against 21, 27 and 14 sequences from V-SET, C2-SET and C1-SET sequences, respectively. Out of the 682 comparisons the number of SDscores > 5, 4-5 and 3-4 were 0, 3 and 13, respectively. The scores for the L-CA sequence in Fig. 4 were typical for most of the controls while those for the IL-2R sequence show 2/27 favourable scores that have presumably occurred by chance. Another pair of controls that contained appropriate Cys residues were an ~-2 domain from class I MHC [NBRF database*, HLMSKD] arid a 13-1 domain from class II MHC [HLHUDB]. These domains do not show Ig-related patterns other than the Cys residues and in a total of 68 comparisons with V-SET or C1-SET sequences no scores >3 SDwere obtained. The AUGN tests were also applied to the aminoterminal seouence of CD5 shown in Fig. 4, which has been suggested as an Ig-related domain 21. In tests against 62 domains in the V-, C1- and C2-SETS no scores > 3 so were obtained for mouse or human CD5 and only 12/124 scores were > 2. Thus on this basis CD5 should not be considered Ig-related. The issue may be resolved when the positions of disulphide bonds in CD5 are *Protein Identification Resource(1987)ProteinSequenceDatabase(NatlBiomed. Res.Found.Washington,DC)Release12.0.

Immunology Today, vol. 8, No. 10, 1987 ...........................................................................................................................................................................................................................

determined since it is not obvious that all the Cys residues in the CD5 sequence in Fig. 4 could be disulphide-bonded in accord with the Ig fold.

Structural evolution in the Ig superfamily Analysis of the new structures highlights the difficulties in trying to determine a lineage of evolut;on for the Ig superfamily. It remains likely that the whole superfamily was derived from a primordial single domain 39 but multi-domain structures may sometimes share a common ancestor that also had a multi-domain structure, while others may arise independently by gene duplication from one- or two-domain structures. The PDGFR and CSFiR sequences have similar five-domain patterns and probably shared a common ancestor with a similar domain pattern. In contrast the seven-domain pattern of CEA seems to have been built up by a double gene duplication of a two-domain segment that gave rise to the last six domains of the structure. Domains II, IV and Vl are about 70% identical to each other as are II10V and VII while between these sets the identities are about 25% (Ref. 11). Domains IV and V are shown in the centrepages. In many cases it will be difficult to determine whether multidomain structures are related by duplication and divergence from a previous multidomain structure or whether they independently evolved by duplication of similar single domains. The concept of duplication and deletion of domains has been extensively discussed with regard to the Ig V-domain families 4o. A second puzzle in structural evolution is the selective mechanism for the conservation of sequence patterns as shown in the centrepages. For example, why is the di-sulphide bond so commonly conserved if it is not essential for a functional conformation? The conserved sequences cannot be a direct result of the biological specificities of the molecules since the essence of these is that they provide unique recognition specificities that do not cross-react (see below). Selection for resistance to ~IV~,~..VIT,JI~I

IllU~

r

I~Pl.,

I.I1~,.

I ~

IU~I..LVI

III

LII~

IllUllll+t~'llUlll.+~

UI

conserved sequence patterns since the Ig-related molecules mostly function in environments where proteases are common. The disulphide bond may reduce the time

~ ~:~=

~

~r ~:a ~~ ~::;, ,

that a domain exists in a conformation that is susceptible to proteolysis and hence maintain the concentration of a cell surface molecule at a level that is functionally effective. Selection for stability may also prevent the sequence of a domain drifting to one that retains the disulphide bond and produces an Ig-fold, but otherwise does not show the sequence pattern of the Ig superfamily. Sequences with these properties may be possible but an evolutionary path from an Ig-related sequence to an unrecognizable alternative may be difficult t~, achieve without going through a structure that is susceptible to proteolysis. It seems unlikely that the similarities in sequences could be produced by convergent evolution since at every 13-strand and bend position variants exist that deviate from the patterns that generally occur at each position. Given that such diversity is possible, it is not obvious that similar sequence patterns would have arisen in structures that had converged to a similar folding pattern.

Functional aspects Antibodies recognize foreign determinants without the involvement of other molecules, but T-cell receptors are restricted to determinants presented in association with MHC antigens. The poly Ig receptor was found to be Ig-related at about the same time that the TcR-Ig relationship was established 41.42 and together these findings prompted the idea that recognition within the Ig superfamily of molecules may be a common phenomenon 43. This would naturally arise if the superfamily had evolved from cell-surface single domains that were capable of homophilic interaction between cells

(Fig. 5) 36,43,44.

The idea that Ig-related molecules commonly recognize each other is strongly reinforced by the new structures. The macrophage Ig Fc receptor has a two-domain structure and an Ig relationship may be expected for t-~t~k~r r . . r ~ * - ~ r ~ + * - ~ . e g . l O IJ~ll~;;I

I ~.. l ~ . . ~ ' ~ , l l t . ' , b , / I . 2

"

•

KI / - A L A

IXl--~../ml, IVl

;~ ÷l-,.-.,,-.k.÷ + , - . . c , , . . . . + ; ^ . . . . . ; ~ I . ) L I I U M ~ l l L LU l U I | ~ . L | ~ I J I I VIC]

homophilic adhesion interactions2 mediated by the amino-terminal part of the sequence that consists solely of four Ig-related domains. Po constitutes 50% of the

LCA ANTIGEN EWKIKNKFTCDIQKISYNFR~TPEMKTFALDKHGTL~LHNLTVRTNYTCAA EVLYNNVILLKQDRRVQTDFGTPEMLPHVQ~KNSTNSTTLVSWAEPASKHH IL-2 RECEPTOR LNCECKRGFRRI KS GS L Y M L , ~ T G N S S H S S ~ D N Q C Q C T S S A T R N T T K Q V T P Q PEEQKERKTTEMQSPMQPVDQASLPGH~REPPPWENEATARI YHFVV

HUMAN CD5 AHTIGEN

RLSWYDPDFQARLTRSNSKr~QGQLEVYLKDG.G.~HMVCSQSWGRSSKQWEDPSQ ASKVCQRLNCGVPLSLGPFLVTYTPQSS

I ILCJYGQLGSFSNCSHSRNDMCHS

Fig.4. Sequencesegments that are unrelated to the Ig superfamilyin terms of ALIGNanalysiswith Cysand Trp residues(boxed) in positions roughly equivalent to the conserved residue~of Ig domains. Sequencesare from the partial rat LCA sequenceresidues 88-189 (NBRFcode TDRTL1),h,,man IL-2Rresidues47-145 (NBRFcode UHHU2)and human CD5 residues 1-103 (Ref.21). In the £DS segment there are 19 rather than 20 residuesbefore the first Cysand an extra residuefrom leaderwas not included in this case in ALIGNanalysis(seetext and legend to Fig. 3).

301

Immunology Today, vol. 8, No. 10, 1987

IG-LIKE DOMAIN STRUCTURE FOR INTERACTIONS BETWEEN PRIMITIVE CELLS (NEURAL TYPe7 ).

CU,L I'~PE I

CELL T~PE 2

'I"~E 2

DUPLICATION AND DIVERGENCE TO GIVE A AND B DOMAINS SUCH THAT A: A AND &: B INTERACTIONS OCCUR BUT NOT B: H.

- EF CEtL TYPE 2

TYPE 3

~

VJLRIOUS TYPES

~

t TARGET ~ _ .!-.

~ ! ~

/

/ ~

DIFFERENTIAL GENE EXPRESSION SUCH TEAT CELL TYPE 3 EXPRESSES ONLY DOMAIN B. CELL TYPE 3 CAN ONLY RECOGNISE TYPE 2.

VARIOUS DUPLICATION AND DIVERGENCE OF CELL &: B SYSTEM FOR CELL:CELL TYPES RECOGNITION&lid OTHER RECEPTOR FUNCTIONS. MAY INCLUDE SPECIFICITIES FOR PROGRAMKED CELL DEATH. NOTE THE &I-An AND BI-Bn UNITS MAY REPRESENT SINGLE TO MULTIPLE DOMAIN AND TWO CHAIN STRUCTURES.

KIY.y_.R~ OR PBAGOCYTIC CELL

SPECIFICITY OF A CELL DEATH SYSTEM C3J~GB]D TO INCORPORATE K DETERMIN&NT OF A C0~0N P&THOGEN( F } . DIVERSIFICATION OF THIS SYSTEM GIVES THE I G RELATED VERTEBRATE IIMqONE SYSTEM.

5. A speculativeschemefor the evolution of an immunesystem based on Ig superfami~/ molecules(adaptedfrom Ref 36).

302

protein in peripheral myelin, and a homophilic interaction is also suggested to occur between its e~ernai domains on opposing membrane surfaces when compaction of myelin membrane occurs 7. If this is proven correct it will provide a model of considerable interest regarding possible functions of the putative primordial domains. MAG is also thought to have a recognition role in myelin formation but this is not well-defined 4-6. Among the T-cell antigens, CD4 and CD8 are suggested to interact with MHC class II and I molecules as ligands on opposing cells but this also requires proof. The CD2 antigen, however, is clearly involved in recognition of LFA-3 antigen in adhesion reactions between cells45 and the sequence of LFA-3 will reveal whether or not this represents another case of interactions within the superfamily. PDGFR and CSF1R are not involved in cell-cell interactions- here the molecules act as a trigger for cell growth and differentiation after binding the appropriate growth hormone 8. This function is perhaps similar to that of cell surface Ig although the process of signal transmission may be very different since Ig has a very small cytoplasmic domain while the PDGFR and CSF1R have large intemal domains with tyrosine kinase activity. The transmembrane and cytoplasmic parts of the Ig superfamily molecules show enormous diversity, ranging from forms with a glycophospholipid membrane anchor and no

cytoplasmic domain (Thy-1 (Ref. 46) and the 120 kDa form of N-CAM (Refs 2,3)) to structures with very large cytoplasmic protein sequences. CEA is interesting since a hydrophobic carboxy-terminal region is predicted from its cDNA but basic residues as expected in a cytoplasmic domain are absent. CEA is shown as having a protein tail in Fig. 1 but this sequence may in fact be the signal for attachmenL of a glycophospholipid tail after cleavage of the hydrophobic domain, by analogy with Thy-1 and presumably the N-CAM 120kDa form. The function of CEA is unknown but it has a strikingly large number of predicted N-linked glycosylation sites 28 for the 7 external domains, with six per domain in some cases. In essence it can be argued that the Ig domain is a stable structure which allows presentation of unique determinants for recognition via sequence variation at the bends or on the faces of the 13-sheets1. This concept is compatible with the display of carbohydrate structures and these may be the functionally relevant parts of CEA. This possibility has also been considered for Thy-1 which has three N-linked structures on one domain 1. In summary, the new molecules are generally consistent with the concept of structures whose role is to function at cell surfaces to control the movement or differentiation of cells. The link protein strays from this theme but perhaps does not present a large conceptual jump since basement membrane is itself involved in various interactions with cell surface~. Evolution of an immunesystem

It now seems likely that the Ig superfamily originated from molecules that first evolved to mediate interactions between cells and that the vertebrate immune system developed out of this set of structures. Extensive diversification in the superfamily probably occurred as primitive sensory systems developed and it is interesting that Thy-1 and Po, the only Ig-related molecules known to exist alone as single-domain structures, are major mole~.,~r'"~ ,,,; . . .,,,~,,,u,o,,~k . . . . . of neurons and g,aw . . . . . .ce,s, . . respectively. Neuronal cells also express the MRC OX-2 antigen which has a simple two-domain structure47 and N-CAM, while glial cells express N-CAM and MAG. Genetic linkage brings together structures involved ;n neural and immune aspects, the genes for Thy-1, N-CAM and the CD3 e, 7 and 8 chains all being found on the q23 band of chromosome 11 (Ref. 48). This linkage may identify a chromosomal segment where extensive early duplication of Ig superfamily genes occurred. A key point in Ig superfamily evolution seems likely to have been the development of heterophilic recognition between related molecules, and a possible scheme for this beginning with homophilic re:~gnition between single domains is shown in Fig. 5. Extensive diversification of a heterophilic recognition system would allow sophisticated signalling in cell interactions. However, how can recognition-mediating interactions within an organism be turned outwards to produce an immune system? One possibility is derivation from a cytotoxicity system involved in programmed cell death. In neural differentiation in Caenorhabditis elegans many cells die in an ordered way; usually cells differentiate into a cell type that dies and is phagocytosed but in some cases one cell appears to kill another before phagocytosis49.so. Such cytotoxic cells could provide a precursor for the

Immunology Today, vol. 8, No. I0, 1987

vertebrate immune system and integration of the functional and structural aspects could occur if the specificity of a naturai killer cell for programmed cell death was mediated b7 Ig-related molecules. If this specificity was modified to incorporate a determinant of a common virus or parasite, then the result would be a recognition system with features similar to those of the antigen receptors of T lymphocytes (Fig. 5). I am most grateful for assistance from: Dr A. Nell Barclay for help with the computer; Dr Francis Marriott, Oxford U~iversity Department of Biomathematics, for advice on statistics; Mr Stan Buckingham and Ms Catherine Lee for photography and Ms Denise Roby for manuscript preparation.

References ! Williams, A.F. (1982)J. Theoret. BioL 98, 221-234 2 Cunningham, B.A., Hemperly, J.J., Murray, B.A. etal. (1987) Science 236, 799-806 3 Barthels, D., Santoni, M-J., Wille, W. etal. (1987) EMBOJ. 6, 907-914 4 Lai, C., Brow, M.A., Nave, K-A. etal. (1987)Proc. NatlAcad. Sci. USA. 84, 4337-4341 $ Salzer, J.L., Holmes, W.P. and Colrnan, D.R. (1987)J. Cell Biol. 104, 957-965 6 Arquint, M., Roder, J., Chia, L-S. etal. (1987)Proc. NatlAcad. Sci. USA 84, 600-604 7 Lemke, G. and Axel, R. (1985)Cell 40, 501-508 8 Yarden, Y., Escr}~edo, J.A., Kuang, W-J. etal. (1986)Nature 323, 226-232 9 Lewis, V.A., Koch, T., Plutner, H. and MeUman, I. (1986) Nature 324, 372-375 10 Ravetch, J.V., Luster, A.D., Weinschank, R. etal. (1986) Science 234, 718-725 11 Oikawa, S., Imajo, S., Noguchi, T., Kosaki, G. and Nakazato, H. (1987)Biochem. Biophys. Res. Commun. 144, 634-642 12 Thompson, J.A., Pantie, H., Paxton, R.J. etaL (1987)Proc. Natl Acad. Sci. USA 84, 2965-2969 13 Zimmerman, W., Ortlieb, B., Friedrich, R. and von Kleist, S. (1987) Proc. Natl Acad. Sci. USA 84, 2960-2964 14 Martin, L.H., Calabi, F. and Milstein, C. (1986)Proc. Natl ~1,"=~ ¢,-; Q ~

r~l,,.,.

..,,.,.

ijj,

Q1 r..A .i

i J-r----j

(31 c . o i Ju

15 Sewell, W.A., Brown, M.H., Dunne, J., Owen, M.J., Crumpton, M.J. (1986) Proc. Natl Acad. Sci. USA 83, 8718-8722 16 Williams, A.F., Barclay, A.N., Clark, S.J., Paterson, D.J. and Willis, A.C. (1987) J. Exp. Med. 165, 368-380 17 Gold, D.P., Puck, J.M., Pettery, C.L. etal. (1986)Nature 321, 431-434 18 Johnson, P. and Williams, A.F. (1986) Nature 323, 74-76 19 Clark, S.J., Jefferies, W.A., Barclay, A.N., Gagnon, J. and Williams, A.F. (1987)Proc. NatlAcad. Sci. USA 84, 1649-1653 20 Gold, D.P., Clevers, H., Alarcon, B. etal. (1987) Proc. Natl Acad. Sci. USA (in press) 21 Huang, H-J.S., Jones, N.H., Strominger, J.L. and Herzenberg, L.A. (1987)Proc. NatlAcad. Sci. USA 84, 204-208 22 Ishioka, N., Takahashi, N. and Putnam, F.W. (1986) Proc. Natl Acad. Sci. USA 83, 2363-2367 23 Bonnet, F., Perin, J-P., Lorenzo, F., Joll~s, J. and JollL~s,P. (1986) Biochim. Biophys. Acta. 873, 152-155 24 Neame, P.J., Christner, J.E. and Baker, J.R. (1986)J Biol. Chem. 261,35!9-3535 25 Amzel, L.M. and Poljak, R.J. (1979)Annu. Rev. Biochem. 48, 961-997 26 Edmundson, A.B., Ely, K.R., Abola, E.E., Schiffer, M. and Panagiotopoulos, N. (1975) Biochemistry 14, 3953-3961 27 Beale, D. and Feinstein, A. (1976) Q. Rev. Biophysics 9, 135-180. 28 Dayhoff, M.O., Barker, W.C. and Hunt, L.T. (1983)Meth. Enzymol. 91,524-545

29 Lesk, A.M. and Chothia, C. (1982)J. MoL BioL 160, 325-342 30 Williams, A.F. (1984) Immunol. Today 5, 219-221 31 Schlffer, M., Wu, T.T. and Kabat, E.A. (1986) Proc. Nati Acad. 5ci. USA 83, 4461-4463 32 Rudikoff, S. and Pumphrey, J.G (1986)Proc. NatlAcad. 5ci. USA 83, 7875-7878 33 Becker, J.W. and Reeke, G.N. (1985) Proc. NatlAcad. Sci. USA, 82, 4225-4229 34 Cohen, F.E, Novotny, J., Sternberg, M.J.E., Campbell, D.G and Williams, A.F. (1981)Biochem. J. 195, 31-40 35 Littman, D.R and Gettner, S.N. (1987) Nature 325, 453-455 36 Barclay, A.N., Johnson, P., McCaughan, G.W. and Williams, A.F. The T-cell Receptor(Mak, T., ed.), Plenum Press (in press) 37 Maddon, P.J., L,t;'~an, D.R., Godfrey, Metal. (1985) Cell 42, 93-104 38 Classon, B.J., Tsagarotos, J., McKenzie, i.F.C, and Walker, I.D. (1986)Proc. NatlAcad. Sci. USA 83, 4499-4503 39 Hill, R.L., Delaney, R., Fellows, R.E. and Lebovitz, H.E. (1966) Proc. Natl Acad. Sci. USA 56, 1762-1769 40 Hood, L., Eichmann, K., Lackland, H., Krause, R.M. and Ohms, J.J. (1970) Nature 228, 1040-1049 41 Mostov, K.E., Friedlander, M. and Blobel, G. (1984)Nature 308, 37-43 42 Hedrick, S.M., Cohen, D.i., Nielsen, E.A. and Davis, MM. (1984) Nature 308, 149-153 43 Williams, A.F. (1984) Nature308, 12-13 44 Williams, A.F., -aarclay, A.N., Clark, M. and Gagnon, J. (1985) Proc Sigrid Juselius Symp. 125-138 45 Selvaraj, P., Plunkett, M.L., Dustin, M. etal. (1987)Nature 326, 400-403 46 Tse, A.G.D., Barclay, A.N., Watts, A. and Williams, A.F. (1985) Science 230, 1003-1008 47 Clark, M.J., Gagnon, J., Williams, A.F. and Barclay, A.N. (1985) EMBOJ. 4, 113-118 48 Gold, D.P., van Dongen, J.J.M., Morton, C.C. etal. (1987) Proc. Natl Acad. Sci. USA 84, 1664-1668 49 Horvitz, H.R., Ellis, H.M. and Sternberg, P.W. (1982) Neurosci. Commun. 1, 56-65 50 Hedgecock, E.M., Sulston, J.E. and ]hompson, J.N. (1(383) Science 220, !277-!279

Coming soon in ImmunologyToday Among the articles to be featured in forthcoming issues of ImmunologyTodayare: Down syndrome.MalcolmTaylorproposesthat the originsof Downsyndrome- a disorderrelatedto the possessionof threecopiesof chromosome21 - maybetraced to the aberr.3ntexpressionof lymphocytefundlonal antigen (LFA-I~, which is encodedon thischromosome.

fl=2 complex. GeorgeCarlsonreportson a meetingmarkingthe 50thanniversaryof the discoveryof the mouseH-2 complexand summarizescurrent unde~,andingof H-2 geneticsand the role of H-2 moleculesin antigen presentationand disease susceptibility. Cataractogenesis.JerryNiederkornassessesrecentwork that lendsweightto the hypothesisthat cataractformationis an autoimmunephenomenon. Myasthenia gravis. Gillian Harcourtand Andy Jermyreport on a meeting that focused upon the immunogenicity,in both T-cell and B-cell responses,of the acetylcholinereceptor,and its importancein the generationof anti-receptorantibodiesin the autoimmunediseasemyastheniagravis.

[The immunoglobulin superfamily].

Molecules in the immunoglobulin superfamily.

Evolution of the immunoglobulin superfamily by duplication of complementarity.

Molecular cloning of a novel member of the immunoglobulin gene superfamily homologous to the polymeric immunoglobulin receptor.

BEN, a surface glycoprotein of the immunoglobulin superfamily, is expressed in a variety of developing systems.

Developmentally regulated expression of embigin, a member of the immunoglobulin superfamily found in embryonal carcinoma cells.

Reciprocal Interactions between Cell Adhesion Molecules of the Immunoglobulin Superfamily and the Cytoskeleton in Neurons.

Characterization of a secreted glycoprotein of the immunoglobulin superfamily inducible by mitogen and oncogene.

A year in the life of eLife.

C-CAM (cell-CAM 105)--a member of the growing immunoglobulin superfamily of cell adhesion proteins.

oligodendrocyte glycoprotein is a unique member of the immunoglobulin superfamily.

Molecular characterization of the Schwann cell myelin protein, SMP: structural similarities within the immunoglobulin superfamily.

The short mRNA isoform of the immunoglobulin superfamily, member 1 gene encodes an intracellular glycoprotein.

PECAM-1 (CD31) cloning and relation to adhesion molecules of the immunoglobulin gene superfamily.

The MRC OX-47 antigen is a member of the immunoglobulin superfamily with an unusual transmembrane sequence.

Structure of the axonal surface recognition molecule neurofascin and its relationship to a neural subgroup of the immunoglobulin superfamily.

Complement receptor of the immunoglobulin superfamily reduces murine lupus nephritis and cutaneous disease.

Tracheotomy in the first year of life.

Retinoblastoma in the first year of life.

The cell adhesion molecule Cell-CAM 105 is an ecto-ATPase and a member of the immunoglobulin superfamily.

The hinge region of the CD8 alpha chain: structure, antigenicity, and utility in expression of immunoglobulin superfamily domains.

A year in the life of FAS : 2014.

Regulation of WNT Signaling at the Neuromuscular Junction by the Immunoglobulin Superfamily Protein RIG-3 in Caenorhabditis elegans.

Forward Genetics Identifies a Requirement for the Izumo-like Immunoglobulin Superfamily spe-45 Gene in Caenorhabditis elegans Fertilization.