Biomed & P harmacother ( 1991) 46.343-351 0 Elsevier, Paris

343

proacb

to

V-1 envelo JF Zagury *, H Cantalloube2, J Bernard*, A Lachgar’ L Fall’, A Achour’, 4 , B Bizzini3, E Thoreau4, JP Mbika’, MH Cosme’, F Pellion’, W Issing’, C Care111 I Callebaut4, A Burny”, JP Mornon4, D Zagury’ ‘Laboratoire

de Physiolo$ie Cellulaire, Universite’ Pierre et Marie Curie. 4 place Jussieu (Tour 32). 75003 Paris: OJyice National d ‘ikitlrdes et Recberches AProspatiales, Cluitiklon; 31ttstitut Pasteur, Paris: 4Universit& P6/P7, CNRS URAO9, Paris, France ‘UniversitP libre de Bruxelles, Brussels, Belgium

(Received 29 June 1992: accepted 31 July 1992)

Summary - We have designed two software systems allowing the study of proteins through a comparison to those stored in data banks. The first one, “Automat”, locates in a systematic manner all identities shared by P given protein and the proteins in a data bank. The second. “Critic” enables the selection of specific segments in a given molecule by comparing them with those gathered in a data bank. These sites were termed “critical” since they mostly cnrrespond to functional sites (active sites) of the well-known protems which were studied with the aid of this program (sonlatostatin. insulin. lL2, etc). Automat allowed us to reveal homologies between HIV-I and the CD4. which ha\.re remained *Jnsolved unti! n?w. These similitudes proved to be critical sites (according to Critic). The putative involvement of these sites in the physiopathological processes as induced by HIV-1 are worth considering since the results of our experiments are consistent with

this assumption. HIV / automat I critic / protein R&am6 - Sites critiques: une approche Jmantique des s&quenres prot6iques. Application & la mol6ccolede l’enveloppe da VIH-1. Nous avow krit deux logiciels permettant de dC;duire des informations concernant les protiines au travers des banques de donndes sgquenciques. Le premier, Automat, permet de mettre en Pvidenre de mani& syst&tatique les similitudes de. siquence pouvant exister entre une protkine et routes les protiines d’une banque de Ionnkes. Le deuxikme, Critic. pet-met d’ident&er des sites particuliers dans une prott%w donnie en comparant sa sPquence b routes celles de ia banque de donnPes. Nous appelons ‘ces sites *critiques* car ils correspondent gt%tPralement riux sires fonctionnels (sites actifs) des prott%nes connues que nous avons analysJes avec ce programme (somatostatine, insuline, X2, etc). Automat nous a permis notamment de re’viler des similitudes entre HIV-I et le CD4, qui n’avaient pas t?e’ dtQect&es jtsqu’alors. Ces similitudes se trouvent Stre des sites critiques (selon Critic) du CD4. I1 est justif de considhrer qup ces sires peuvent participer aux processus physiopathologiques induits par le HIV-1 comme le suggPrent les r&ultats pk%miaaires d’une Prude expe’rimentale. VIH / aulomat I critic / prott%ne

Introductiran Protein sequences are ‘at present mostly determined by DNA sequencing. The number of known protein sequences is steadily increasing: from 15000 in 1989 it renched the value 45000 at the

end of 1991 in the MIPS data bank. To exploit this abundance of information, fast and efficient analytic and predictive programs are required. The presently available programs either utilize small sets of well-known proteins to ascribe defined values to the various amino-acid species on

344 the basis of various criteria or work on a comparison basis (alignments) [3, 5, 8, 10, 13, 151. However, aone of these programs can take advantage of the whole information stored in the data bank to study a particular protein. As a consequence, much information is lost, and this limitation prompted us to develop a novel approach that can exploit all the information available when studying a single protein: the software we have designed is called “Critic”. This system was able to locate important functional sites of the proteins we tested. In our search for segments in viral proteins which were similar to segments of proteins in the data bank, we also designed another software system, “Automat”; which, unlike the Fastp program [ 121, finds similitudes between protein sequences in a systematic manner. From this computer investigation, HIV- 1 peptides selected by their similarity with CD4, have been experimentally tested to evaluate their role in T4 cell activation.

Materials and methods Software systems We work.-.4on a VAX-VMS system. The software Critic is based on a pre-screening of the data bank by the FASTP program designed by Lipman et al [12]. We mostly worked with a reduced non-redundant version of the MIPS library made in Dr Mornon’s laboratory, designated as RMIPS. FASTP gathers a list of alignments between RMIPS proteins and a given protein, ranked according to a score judging the quality of the alignment. The parameters used for FASTP were standard with 2-uples and the Dayhoff matrix [12]. “Critic” and “Automat” software systems - Critic first ascribes a 0 value to each amino-acid of a tested sequence and then screens the alignments pre-selected by FASTP (we chose the first 600 to 1000 alignments depending on the size of the sequence studied, which usually corresponded to an “initial score” window of 30 to 70). Each time an amino-acid of the tested sequence participates in a significant match, “Critic” increments the value associated to this amino-acid. This leads to a histogram such as that shown in figure 1. A significant match is defined as follows: N being a chosen integer, it is a succession of N or more consecuti’de identities in the protein alignments screened by ‘Critic”. The higher N is chosen, the more stringent the se!ection is. “Automat” directly screens the whole data bank for significant matches with a given protein. “Automat” works systematically and all matches are picked up in the bank sequences. However, it is possible to set classes of equivalent amino-acids so that slight differences are also retrteved (for eg, Asp=Glu).

esults General properties of Critic Figure 1 presents the curves obtained with somatostatin (SOM: i i6 amino-acids) with a choice of N being 1, 2, 3 or 4. A limited number of clear-cut peaks was observed only with N 2 3 and this was the case for all the proteins we tested. Consequently a value of N = 3 was selected because of its apparent higher biological significance. On average, one or two major peaks per 100 aminoacids occurred in the tested cases. Critical sites and secondary structure We have tested our program on several proteins taken from the Protein Data Bank. The secondary structure of these proteins was assigned on the basis of the PDB coordinates using the programs DSSP 171, Define [!?I, and P-curve [22]. There was no specific correlation between the “Critical” sites and any of the three main predicted secondary structures; helix, beta strand or loop. Application to well-known proteins The active sites of many proteins have been determined by either mutagenesis experiments or through crystallographic studies with their ligands. The proteins considered in the present ininsulin, Alpha vestigation are somatostatin, Interferon (a-IFN), IL2, HIV tat, BLV envelope gp51.

The active site has been identified [23] as corresponding to the peak region of figure 1 (AA 108111). IL2 The. binding re;ion to the IL2 receptor has been located on the first helix of IL2 from residues 11 to 19 [24]. The peak is appropriately located at residues 17 to !.9 (fig 2A). Insulin The insulin receptor binding site comprises mostly residues 23-25 [l] which also constitutes the major pesk site (fig 2B).

10

;o

40

SDP~E’Ki?KS.~~~~A=‘~ 10

Fig 1. Histograms obtained by the program “Critic” (with the library RMIPS) N = 4.

!yv-

111

-!y-=y-=-

i-3.

ilo

1sc for Somatostatin with various values of the parameter N: A. N = 1, B. N =;2, C. N = 3, D.

60

g VI

346

Insulin

Fig 2. Histograms obtained by the program Critic (with the library RMIPS. and N = 3) for the following molecules: A. IL2. B. Insulin, C. Tat.

Tat

27-32, and AA S-60)

This main regulatory molecule of HIV was thoroughly investigated by mutagenesis. Tat includes two main active regions: one for nucleus location, the other for RNA binding 1211. These reglens correspond to the iwo r;;lajor peaks @.A

(fig 2C).

that

we have observed

a-ZFN

The active suppressive site has been shown to correspond to known suppressive regions of ret-

347 roviruses especially HIV [ 191. The peak (AA 122125) falls just at the end of the suppressive site (fig 3A). BLV gp51 A single peak (fig 3B) was found next to a main neutralizing region of the BLV enveiope as determined by neutralizing assays (I Callebaut, personal communication). In general, neutralizing regions are biologically important regions for viral functions. We have processed with success other proteins such as Ill-beta, uteroglobin, trypsin, and somatotropin and its receptor. The determined critical sites correspond to known biological!y-active sites of the molecules or are at a distance of less than 4 amino-acids distant from these (present& role). In addition, proteins belonging to the same family or displaying parallel functions, such as glucocorticoid and corticosteroid receptors, apparently yield similar peaks at locations suspected to be binding sites. Influence of &e Fastp parameters The Fastp parameters are mainly the library, matrix, and score (“initial” or “optimised”). We have w0rkc.d both on the whole MIPS library, and on its reduced non-redundant version RMIPS: changing the library influenced the curves in a few instances. We have tested two matrices: the evolutionary matrix of Dayhoff, and the matrix set by Risler [I 81. However, at the present time, our program only gives results correlating biological data when the Dayhoff evolutionary niatrix is used. Pre-selecting alignments on the basis of the “optimized score” given by Fastp also did not result .in peaks correlating biological data. Changing the “initial score” window led to some modifications of the peaks and we are now trying to evaluate the significance of such variations. Statistical validity Taking into consideration only those proteins whose active sites have been unambiguously experimentally identified, we performed the following computation. Let us suppose, as for the case of the tat molecule, that there are two functionally important sites in every 80 amino-acids. The probability that at least one of these two biologically active sites falls randomly within the major

peak given by “Critic” (which spans 10 aminoacids), i*: equal to: l-(7O/8O)2, which is less than 0.25. The probability obtained for such a random event occurring in 10 molecules would then be less than (0.25)“, therefore less than one in one million. Since there are, in a few cases, two equivalent major peaks, we can change the random probability of having one of the two biologically important sites within one of the two peaks to l-(6G/80)2 (this is less than 0.5) for one molecule. Under these conditions, this would give a prot;;=ility of less than one in 1000 for ten molecules. These’ values support the non-random basis of our approach. In addition, it is worth mentioning that the frequency in the data bank of the peptides defined as “critical” is not higher than other randomly chosen peptides having the same size.

Application to HIV-l

HIV-l, as a retrovirus may have picked-up important peptidic sites for assuring its survival [4] in the course of its evolution. The HIV- 1 envelope (Env) shares homologous segments with immune regulatory molecules such as uIFN, IL2. The IL2 similitude (LERILL on residues 856-861 of HIV-l and IS- 19 of IL2) has previously been reported and it is located on the IL-2 receptor binding site [ 161. We have ourselves detected a similitude with uIFN (ILAVERY on residues 585-591 of HIV-l and 118-124 of crIFN) which is located on a segment of HIV-l known to be anti-proliferative [19]. All these segments are critical sites according to the “Critic” program in IL2 and aIFN proteins. The program “Automat+’ also enabled us to identify segments which are common to both the HIV-l envelope and the CD4 receptor: SLWDQ (residues 110-114 in env, and residues 60-64 in CD4), and CTASQK (residues 28-33 in env, and residues 16-21 in CD4). The first segment appeared to be perfectly conserved in all HIV-l strains and the second segment well-conserved in only a few strains. Interestingly, SIV,,, known to be highly related to HIV-l [6] also carries the SLWDQ segment. These sites are critical for the CD4 molecule, the first one with the RMIPS library (fig 3C), the second one with the full MIPS library (not shown).

dFN

BLV gp51

A

B

Fig 3. Histograms obtained by the program “Critic” (with the library RMIPS. and N = 3) for the following molecules: A. alFN. B. BLV gp51, C. CD4 molecule.

349

In the 3-D structure of the CD4 molecule, these sites are readily available at the surface of the CD4 molecule (fig 4).

derived

peptides,

some of which contained

the

SLWDQ segment or the CD4 binding region, and CD4 peptides surrounding the CTASQK region in

CD4. As shown in table I, peptides containing SLWDQ inhibited T-cell proliferation in the presence of SEB and PPD as well as the classically defined CDcbinding region 193 although to a lesser extent), while the CD4 peptides related to the CTASQK segment only inhibited the SEB-induced stimulation and the other control peptides from HIV-l had no effect. These results are consistent with a role of these sites’ as predicted by our program.

Discussion

Fig 4. 3-D representation of the CD4 molecule obtained from the PDB coordinates [25]. BI: first loop containkg residues 17-21 (TASKQ). suspected to interact with SEB antigen or MHC. B2: third loop containing residues 60-64 (SLWDQ). suspected to interact with MHC, 83: second loop containing the residues 41-45, suspected to interact with gpl20 of HIV-I [20].

Experimenfal

proceduri?

We set out experiments !‘o evaluate the capacity of HIV-1 and CD4 peptides to interfere with the process of T4 cell immune activation. For this purpose, purified HLA-DR negative T-cells from normal human blood were stimulated with autologous macrophages presenting PPD or SEB antigens, in the presence of various peptides. The T-cell proliferation was assessed by 3H thymidine incorporation. The peptides tested were HIV-l

This study was carried out to determine whether immune cells suppression (cytostasis) observed in HIV-l infection and AIDS [26] is induced by peptidic segments of HIV-l proteins, as suggested by experiments showing the active suppressive effect of gp120 Env protein [ 141 and gp41 peptides [ 191. For this reason, we have introduced the program “Critic” which defines “critical sites”, and the program “Automat” which systematically locates all similitudes between protein sequences. “Critic” deals with sequences, and is not suited to detect fully functional sites involving conformational (tri-dimensional) interactions. The criti~1 sites which we have described are mostly ligand binding sites (to DNA or proteins). This is not surprising when considering that ligand-receptor binding often occurs at specific segments (site) on proteins and the pre-selection by Fastp uses the Dayhoff evolutionary matrix which is based on proteins with binding properties. We have observed that the matrix and the library could affect the shape of our curves: we are now matrices and working with other pre-selection more specific libraries (eg human sequences or enzymatic sequences) with the purpose of improving our results. It is noteworthy to mention that in rare instances, suspected functional sites of a tested protein, even though not determined but only hypothesii.ed, did not correspond to a “critical site”. For example, “Critic” did not depict the presumptive active site of IL8 121, but on the 3-D representation, the critical site lies on an exposed loop. This apparent discrepancy could be explained by the involvement of more than one site in the molecular function, as is the case for

Table I. Peptides inhibiting T-cell activation. PPD and SEB sensitized macrophages were co-cultured with autologous HLA-DR negative T-cells with or without HIV-I peptides at optimal concentration of 300 pglml and 150 Bglml for each peptide of the CD4 mix. T-cell proliferation measured by thymidine incorporation. Pep&ides 1 and 2 contain the SLWDQ segment, peptides 3a and 3b are CD4 mixed peptides, peptide 4 is the HIV-l CD4-binding peptide, peptides 5 and 6 are control peptides. ++: strong inhibition (> 50%), +: slight inhibition (c 3O%), -: no inhibition. Inhibition of T-cell proliferation

Peptides (HIV-I strain LAI)

1 2 2a 3b 4 5 6

HEDIISLWDOSLK SLWDOSLKPCVKLTPL TASQK (CD4 17-21) KSIQ (CD4 22-25) NMWQEVGKAMYA KAKRRVVQREKRAVG SSGGDPEIVTHSFNC No peptide

PPD

SEB

(env 105-117) (env 110-125)

++ ++

++ ++

CD4 (env (env (env

++

+ ++

MIX 430-441) 505-519) 369-383

CD4 (fig 4) or other molecules

involving conformational interactions. Shou!d the concept underlying our computer approach be confirmed, this would mean that there is an internal logic relating the protein sequences which allows important functional sites to be determined. This would fit the experimentally-deduced concept that the structures and properties of naturally produced polypeptides are determined by primary structures [ 111. He?ncz, if proteins are like sentences and amino-acids are the alphabet, our approach enables us to determine the words (vocabulary). This semantic approach allowed us to select among the numerous peptidic sites raised by “Fastp” and “Automat”, those which are putatively involved in biological functions. “Automat”,

by contrast

Together with the other known HIV-l Env segments [16, 191, these sites may play a role in the immunosuppressive effects of HIV- 1. These sites are lacking in HIV-2 which is less pathogenic. The fact that HIV-l peptides, showing similarities with immune regulatory factors, inhibit T cell proliferation in vitro, contributes to explaining the cellular immune defect (cytostasis) associated with HIV-l infection and AIDS. It also provides a rationale for the new strategy of an anti-AIDS vaccine directed against immunosuppressive factors [26].

References

to the Fastp program

which ghthers se&Ii simiiariries

in a non-system-

atic manner, systematically locates all existing similitudes (but more stringently than Fastp) between proteins. Its application to HIV-l proteins allowed us to bring to light important and, until now, undetected similitudes between HIV-l Env and CD4. The brological investigation of this study was aimed at assessing the role of the newly selected HIV-l peptides related to CD4, in T cell activation. These experiments partially allow the molecular dissection of .CDCMHC interaction, The SLWDQ appears to have a major role during T cell activation, probably through the binding of the CD4 with MHC, while the CTASQK site of CD4 seems to disturb the presentation of SEB antigens (and not PPD), perhaps by blocking a direct interaction between SEB and CD4 or SEB and MHC.

Brange i, Ribel U, Hansen JF, Dodson G, Hansen MT, Havelund S, Melberg SG, Norris F, Norris K, Snel L, Sorensen AR, Voigt HO (1988) Monomeric insulins obtained by protein engineering and their medical implications. Nature 333. 679

Clore GM, Appella E, Yamada M, Matsushima K, Gronenbom AM (199Oj Three-dimensional structure of Interleukin 8 in solution. Biochemistry 29, 1689 Feng DE Johnson MS, Doolittle RF (1985) Aligning amino-acid sequences. Comparison of commonly used methods. J Mel Evol 21, 112 Fields BN (1990) Chapter retroviridae, Virology (2nd edition) Raven Press Ltd, New York Hopp TP, Woods KR (1931) Prediction antigenic determfinants from amino-acid Proc Nat1 Acad Sci USA 78, 3824

of protein sequences,

6 Huet T, Cheynier R. Meyerhans A, Roelants G, Wain-Hobson S (1990) Genetic organization of a

351 chimpanzee lentivirus related to HJV- 1. Nature 245, 356 7 Kabsch W, Sander C (1983) Dictionary of protein secondary structure. Pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22, 2577 8 Kidera A, Konishi Y, Ooi T, Scheraga HA (1985) Statistical analysis of the physical properties of the 20 naturally-occuring amino-acids. J Protein Chem 4, 265 9 Lasky LA, Nakamura G, Smith DH, Fennie C, Shimasaki C, Patzer E, Berman P, Gregory T, Capon DJ (1987) Delineation of a region of the human immunodeficiency virus type 1 gp120 glycoprotein critical for interaction with the CD4 receptor. Cell 50, 975 10 Lemesle-Varloot L, Hen&sat B, Gaboriaud C, Bissery V, Morgat A, Momon JP (1990) Hydrophobic cluster analysis: procedures to derive structural and functional information from 2-D-representation of protein sequences. Biochimie 72, 555 11 Lim VI (1978) Polypeptide chain folding through a highly helical intermediate as a general principle of globular protein structure information. FEBS Lett 89, 10 12 Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity search. Science 227, 1435 13 Nakai K, Kidera A, Minoru K (1988) Cluster analysis of amino-acids indices for prediction of protein structure and function. Protein Eng 2, 93 14 Oyaizy N, Chirmule N, Kalyanaraman VS, Hall WW, Pahwa R, Shuster M, Pahwa S (1990) Human immunodeficiency virus type 1 envelope glycoprotein gp120 produces immune defects in CD4+ T lymphocytes by inhibiting interleukin 2 mRNA. Proc Nat1 Acad Sci USA 87, 4022 15 Pongor S (1987) The use of structural profiles and parametric sequences comparison in the rational design of polypeptides. Methods Enzymol 154, 450 16 Reiher WE, Blalock JE, Brunck TK (1986) Sequence homology between acquired immunodefi-

17

18

19

20

21

22

23

24 25

26

ciency syndrome virus envelope protein and interleukin-2. Proc Nat1 Acad Sci USA 83, 9188 Richards FM, Kundrot C (1988) Identification of structuiai motifs from protein coordinate data. Setondary structure and 1st level supersecondary structure. Proteins 3, 71 Risler JL, MO Delorme, H Delacroix, Henaut A (1988) Amino-acid substitutions in structurally related proteins A pattern recognition approach. J Mel Bio! 204, 1019 Rucgg CL, Monell CR, Strand M (1989) Inhibition of lymphoproliferation by a synthetic peptide with sequence identity to gp41 of HIV-l. J Wrol63.3257 Silherman SL, Goldman SJ, Mitchell DB, Tong AT, Rosenstein Y, Diamond DC, Finberg RW, Schreiber SL. Burakoff SJ (1991) The interaction of CD4 with HIV-l gp 120. Sem Immunol 3, 187 Siomi H, Shida H, Maki M, Hatanaka M (1990) Effects of a highly basic region oi HIV-1 tat protein on nucleolar localization. J WoC 64, 1803 Sklenar H, Etchebes C, Lavery R (1989) Describing protein structure. A genral algorithm yielding complete helicoIda parameters and a unique overall axis. Proteins 6, 46 Tran VT, Flint Beal M, Martin JB (1985) Two types of somatostatin receptors differentiated by cyclic somatostatin analogs. Science 228, 492 Thomson A (199 1) Chapter IL2. The Cytokine HuEdbook. Academic Press, New York Wang J, Yan Y, Garett TP, Liu J, Rodgers DW, Garlick RL, Tarr GE, Husain Y, Reinherz EL, Harrison SC (1990) Atomic structure of a fragment of human CD4 containing two immunoglobulin-like domains. Nature 348, 411 Zagury D, Bernard J, Halbreich A, Bizzini B. Carelli C. Achour A, Defer MC, Bertho JM, Lanneval K, Zagury JF, Salaun JJ, Lurhuma Z, Mbayo K. AboudPirak E, Lowell G, Lebon P, Bumy A, Picard G (1992) One year follow-up of vaccine therapy in HIV infected immune deficient individuals’ a new strategy. J AIDS 5, 676-681

Critical sites: a semantic approach to protein sequences. Application to the HIV-1 envelope molecule.

We have designed two software systems allowing the study of proteins through a comparison to those stored in data banks. The first one, "Automat", loc...
1016KB Sizes 0 Downloads 0 Views