Proc. Natl. Acad. Sci. USA Vol. 88, pp. 6800-6804, August 1991 Immunology

Prediction of optimal peptide mixtures to induce broadly neutralizing antibodies to human immunodeficiency virus type 1 (AIDS/vaccines/combinatorial opimization)

L. HOWARD HOLLEY*t, JAAP GOUDSMITt, AND MARTIN KARPLUS* *Department of Chemistry, Harvard University, Cambridge, MA 02138; tIBM Cambridge Scientific Center, 101 Main Street, Cambridge, MA 02142; and

*Human Retrovirus Laboratory, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands Contributed by Martin Karplus, April 22, 1991

ABSTRACT Sequences of the principal neutralizing determinant (PND) of the external envelope protein, gpl20, from 245 isolates of human unodefcency virus type 1 are analyzed. The minimal set of peptides that would elicit antibodies to neutralize a majority of U.S. and European isolates ofhuman immunodeficlency virus type 1 is determined with the assumption that peptides of a given length including the central Gly-Pro-Gly triad are required. In spite of the hypervariability of the PND, 90% of these 245 sequences include peptides from a set of 7 pentapeptides, 13 hexapeptides, or 17 heptapeptides. Tests ofthese peptide sets on 78 additional PND sequences show that 95% are covered by the 7 pentapeptides, 94% by the 13 hexapeptides, and 86% by the 17 heptapeptides. To anticipate vpimnts not yet observed, single amino acid mutation frequencies from the 245 isolates are used to calculate an expanded set of the 10,000 most probable PND sequences. These sequences cover 86% of the total distribution expected for the central portion of the PND. Peptide lists derived from this expanded set when tested on the 78 additional sequences show that 7 pentapeptides cover 95 %, 13 hexapeptides cover 94%, and 17 heptapeptides cover 94%. These results suggest that peptide cocktails of limited size with the potential to cover a large fraction of PND sequence variation may be feasible vaccine candidates.

contained in 92% of the 245 sequences. Epitope mapping in two virus strains shows that neutralizing antibodies are elicited by peptides containing this GPG and adjacent upstream sequences in one isolate and GPG and adjacent downstream sequences in another isolate (12). The hexapeptide GPGRAF occurs in 60%6 of the 245 HIV-1 isolates sequenced (11). Polyclonal antisera elicited by the synthetic peptide (GPGRAF)3C neutralize 4 of 4 HIV-1 isolates that contain that sequence at the tip of the loop and do not neutralize 3 of 3 isolates lacking this sequence (13). The 4 neutralized isolates have various sequences flanking the GPGRAF hexapeptide. These observations suggest that it is worth examining the possibility of constructing vaccine candidates for HIV-1 from a cocktail of peptide immunogens involving portions of the PND. Because of the variability of the PND, it is important to determine the minimal number of peptides of given length that is required to include the sequences of a certain fraction of the known isolates. An analysis of this type can help in determining whether such a general cocktail is likely to be small enough not to overtax the immune system. This is a general problem in the development of vaccines, since the most antigenic regions of viruses are often the most variable. Because the GPG-containing tip of the PND is the binding site of neutralizing antibodies (12), only peptides that include this region are considered. Antibodies elicited by peptides of five, six, and seven amino acids have been shown to provide sequence-specific recognition of linear epitopes (14-17). The recently published structure for a cocrystal of a peptide antigen-antibody complex shows side-chain-specific interactions over a span of seven residues (18). This structure may be especially relevant since the peptide has a type II turn in the antibody binding site, which is the structure predicted for the central region of the PND (11). In this report we present the results for such vaccine cocktails obtained from the central portion of the PND and based upon the 245 sequences previously reported (11); transcription errors in 9 of these sequences have been corrected in the present analysis. As we demonstrate, the problem of finding the smallest set of peptides is NPcomplete (19, 20); therefore, the time to compute an exact solution grows exponentially with the number of peptide candidates. We display two approximate methods for solving the problem. One is a "greedy" algorithm (21) and the other makes use of simulated annealing (22). Comparison of the results from the two methods shows good, but not exact, agreement. Given the resulting cocktails, we test the coverage provided against a set of 78 European sequences not used in the original analysis. Finally, we construct an expanded sequence base of 10,000 sequences and demonstrate that small cocktails with reasonable coverage can still be constructed.

The principal binding site for human immunodeficiency virus type 1 (HIV-1)-neutralizing antibodies, the "principal neutralizing determinant" (PND), is contained in an approximately 36 amino acid disulfide-crossbridged loop (residues 303-338) in the third hypervariable domain of the external envelope protein, gpl20 (1-4). Polyclonal antibodies, elicited by PND peptides, as well as PND-binding monoclonal antibodies, neutralize free-virus infectivity and prevent fusion of virus-infected lymphocytes with uninfected CD4-bearing cells (1, 2, 4-6). HIV-infected mothers with high-affinity antibodies to the PND are less likely to transmit HIV to neonates (7). Recent chimpanzee immunization/challenge experiments (8-10) suggest that PND-directed neutralizing antibodies prevent infection after intravenous virus inoculation. Thus, the production of a PND-directed antibody response may be one possible route to HIV-1 vaccines. Sequences of the PND from 245 HIV-1 isolates, selected largely from United States donors, have shown that, in spite of the hypervariability of the PND, the amino acid variability is limited at most positions (11). Further, a neural network analysis suggests that the PND has a conserved secondary structure motif (11). It consists of a P-strand, a type II turn, aB-strand, and an a-helix; the p-strand, type II turn, p-strand sequence may form a p-hairpin. A key portion of this motif is the turn tripeptide GPG (the standard one-letter symbols for amino acid residues are used throughout), which is The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Abbreviations: HIV-1, human immunodeficiency virus type 1; PND, principal neutralizing determinant.

6800

Immunology: Holley et al.

Proc. Natl. Acad. Sci. USA 88 (1991)

METHODS We can formulate the problem of choosing the minimum number of peptides as follows: MINIMUM PEPTIDES. Given a peptide length L and a set of amino acid sequences (e.g., the 245 PND sequences), find the smallest set of peptides, each of length L, such that at least one peptide of the set is contained within each sequence.

Even with the restriction of peptides to those that overlap the central GPG positions there are many possible peptides to consider. There are 265 unique heptapeptide candidates from the 245 sequences, 172 hexapeptides, and 95 pentapeptides. Consequently, we also consider peptide sets sufficient to cover a fixed percent of the isolates. The problem of finding the shortest list ofpeptides to cover a set of sequences is similar to the combinatorial optimization problem known as Minimum Hitting Set (23), which may be stated as follows: MINIMUM HITTING SET. Given a collection F ofsubsets of a finite set S, find the smallest subset S' C S such that S' contains at least one element from each subset in F. Minimum Hitting Set belongs to the set of computationally intractable problems known as NP-complete (19, 20). No algorithm is available for any of these problems with running time that can be bounded by a polynomial in the size of the problem. Indeed, if such an algorithm were available for one of these problems, then it can be shown that a polynomial time algorithm must exist for all of them. While the proof that no such algorithm exists remains an unsolved problem in computer science and mathematics, these problems have been thoroughly studied and most researchers are willing to conjecture that NP-complete problems are in fact intractable. Conventionally, NP-complete problems are formulated as decision problems, that is, problems with a "yes" or "no" answer. An optimization problem can be restated as a decision problem by asking whether a solution exists of size smaller than some fixed integer k. Since algorithms for exact solutions of these problems have exponential time bounds, it is important to know whether a problem belongs to this category before attempting to devise an algorithm to compute an exact solution for any but the smallest examples. Minimum Peptides can be shown to be NP-complete if it can be demonstrated that a restricted version of this problem is equivalent to another NP-complete problem (23). It can be shown that a general instance of Minimum Hitting Set can be transformed into a restricted instance of Minimum Peptides (L.H.H., unpublished results). We outline the proof with the aid of an example. Consider a set S consisting of the amino acids. A collection F of subsets of S is illustrated on the left below. We assume each subset is specified by a list of amino acid one-letter symbols in a specific order. We establish a one-to-one correspondence between members of F and sequences as shown on the right below.

f,

{A, R, W, T} f2 {Q, Ng A, P.V, T} =

=

f3

=

{G, P}

+-

Seq1 Seq2

Seq3

=

=

=

ARWT

QNAPVT GP

Now consider an instance of Minimum Peptides with peptide length 1 for the sequences on the right. Any solution of this instance of Minimum Peptides is also a solution for the instance of Minimum Hitting Set on the left. Thus, solving a general instance of Minimum Hitting Set is equivalent to solving an instance of Minimum Peptides restricted to sequences with no repeated amino acids and with peptide length 1; hence, Minimum Peptides is NP-complete. In this case with peptide length 1 we need to consider in an exhaustive search

6801

at most all possible subsets of the 20 amino acids. There are 220 of these subsets, a computationally feasible number. The smallest realistic case to be considered for vaccine cocktails, however, is for a peptide length of 5. In this case there are 95 pentapeptides to consider as candidates. If, as we show below, the vaccine cocktail has 21 peptides or fewer, then there are 95!/(95 - 21)! 21! = 6.11 x 1020 subsets of 21 peptides to consider, a computationally infeasible number. Thus, given the computational complexity of Minimum Peptides and the number of peptide sets to consider, we cannot expect to compute exact solutions in reasonable time. We consider instead approximate solutions which can be efficiently computed. In the present application, we present results from two approaches. The first is a variant of a "greedy" algorithm (21). In this algorithm we first find the most common peptide and add it to the cocktail. Next we mark all sequences containing this peptide. We repeat this procedure on the unmarked sequences until all sequences are marked. One can construct counter-examples which demonstrate that peptide lists built in this fashion may not be the shortest possible lists. Hence we refine the algorithm to also search for solutions near the greedy solution; that is, we explore a tree of solutions where for each peptide to be added we consider not only the most frequent peptide but also other peptides of nearly the same frequency. (In these results, we have considered peptides of frequency at least 80% of the most frequent.) In most cases this results in peptide lists that are shorter than the greedy solution by one or two peptides. In the second approach we perform the combinatorial optimization by simulated annealing (22). We divide the peptides into two sets, the cocktail peptides and the remaining peptides. An initial guess of cocktail size can be interactively changed as the program runs if it proves too small to provide the desired degree of coverage or larger than necessary. We make random exchanges between the two sets and evaluate the cocktail by computing a pseudo-energy that is the fraction of sequences not covered by the cocktail. Exchanges that improve the cocktail are accepted. Otherwise, exchanges are accepted with probability proportional to exp(-AE/k), where AE is the change in "energy" for the cocktail, k is the Boltzmann constant, and T is a pseudo-temperature. The "'temperature" is gradually reduced until no further improvement in the cocktail is observed.

RESULTS Table 1 gives the size of the shortest cocktails found as a function of peptide length. We consider the 245 HIV-1 sequences and require that the central GPG positions be included, although 8% of the 245 sequences have some sequence other than GPG at these positions-e.g., GLG. For peptide lengths 9 and 10 we were able to find peptide lists by the simulated annealing algorithm that are shorter by one peptide than those computed by the modified greedy algorithm. For peptide lengths 5-8 the two algorithms give lists of the same length, though the lists are not identical. As an example, full results for hexapeptides are given in Table 2 for Table 1. Cocktail size as a function of peptide length No. of peptides Peptide No. of peptides

length

(100%6 coverage)

(90% coverage)

5 6 7 8

21 30 35 45

7 13 17 25

9

53

29

10

61 37 Only peptides that overlap the central GPG positions of the PND are considered.

6802

Immunology: Holley et al.

both the greedy algorithm and simulated annealing. The most common sequences are identical, but slight differences occur for the less common sequences; e.g., in the greedy algorithm the fourth sequence is GPGRVY, which does not occur at all in the simulated annealing algorithm. Of the first 13 required for 90% coverage, 7 are the same, while for 100%6 coverage 14 of 30 are identical. It is striking that a cocktail of only 10 peptides, for example, yields better than 87% coverage and complete coverage is obtained with 30 peptides. Corresponding results are obtained for peptides of other lengths (see Table 1). By requiring peptides to include the GPG positions of the PND we include the portion of the PND identified as a neutralizing antibody binding site (12) and we also force peptides to overlap the highly variable portions of the PND that flank these positions. If we relax this requirement somewhat by allowing any peptide within a 20-residue range centered on the GPG positions, lists shorter by one or two peptides can be found. Moving outside this 20-residue region leads to highly conserved flanking sequences (11) for which there is no evidence of antibody binding and virus neutralization. An additional 78 sequences from 12 European patients (J.G., unpublished data), not available when these lists were constructed, are used as a blind test of the cocktails. Results for hexapeptides are given in Table 3. In all cases the cocktails initially perform slightly better on the new set of Table 2. Hexapeptide cocktails derived from 245 HIV-1 sequences Greedy algorithm solution Simulated annealing solution Total New Cum. Total New Cum. hits, hits, hits, hits, hits, hits, % % No. Peptide % % % Peptide % 1 IGPGRA 63.67 63.67 63.67 IGPGRA 63.67 63.67 63.67 2 GPGRAF 59.59 7.76 71.43 GPGRAF 59.59 7.76 71.43 3 IHIGPG 37.96 3.67 75.10 IHIGPG 37.96 3.67 75.10 4 GPGRVY 2.86 2.86 77.96 TMGPGR 2.45 2.45 77.55 5 GPGKAF 2.45 2.45 80.41 GPGKAF 2.45 2.45 80.00 2.04 2.04 82.04 6 IFIGPG 2.04 2.04 82.45 IFIGPG 7 GPGRRF 1.63 1.63 84.08 GPGRRF 1.63 1.63 83.67 8 GAGRAI 1.22 1.22 85.31 LGPGRV 1.22 1.22 84.90 1.22 1.22 86.53 TTGPGR 1.22 1.22 86.12 9 IHIAPG 10 PIGLGQ 1.22 1.22 87.76 GAGRAI 1.22 1.22 87.35 11 TLGPGR 1.22 1.22 88.98 IAPGRA 1.22 1.22 88.57 12 YVGSGR 0.82 0.82 89.80 TPIGLG 1.22 1.22 89.80 13 IHMGLG 0.82 0.82 90.61 GQGRAJ 0.82 0.82 90.61 14 GQGRAI 0.82 0.82 91.43 TKGPGR 0.82 0.82 91.43 15 GPRRAF 0.82 0.82 92.24 GPGKVI 0.82 0.82 92.24 16 ITKGPG 0.82 0.82 93.06 VGSGRK 0.82 0.82 93.06 17 GPGKVI 0.82 0.82 93.88 IRVGPG 0.82 0.82 93.88 18 GPGRVF 0.82 0.82 94.69 MGLGRT 0.82 0.82 94.69 19 IRVGPG 0.82 0.82 95.51 IGPRRA 0.82 0.82 95.51 20 GPGGAF 0.41 0.41 95.92 GPGRTL 0.41 0.41 95.92 0.82 0.41 96.33 GPGQAL 0.41 0.41 96.33 21 IAIGPG 22 IHFGPG 0.41 0.41 96.73 YQRGPG 0.41 0.41 96.73 4.08 0.41 97.14 23 EPGKAI 0.41 0.41 97.14 ITIGPG 24 GPGMAF 0.41 0.41 97.55 LGPGSA 0.41 0.41 97.55 25 GPERAF 0.41 0.41 97.96 GPERAF 0.41 0.41 97.96 26 IRIGPG 5.31 0.41 98.37 RVEPGK 0.41 0.41 98.37 27 GPGSAI 0.41 0.41 98.78 RPERAF 0.41 0.41 98.78 28 RPERAF 0.41 0.41 99.18 SIGPGR 4.49 0.41 99.18 29 IGPGRK 0.82 0.41 99.59 RIGPGK 0.41 0.41 99.59 30 TMGPGR 2.45 0.41100.00 GPGMAF 0.41 0.41 100.00 Peptides are forced to overlap the central GPG positions of the PND. "Total hits" in this table and in the following tables gives the percentage of sequences containing this peptide. "New hits" counts new sequences containing this peptide that were not already covered by a peptide appearing earlier in the table. "Cumulative hits" is the percentage of sequences covered by the table to that point.

Proc. Natl. Acad. Sci. USA 88 (1991)

sequences, though coverage is not complete; e.g., the first seven sequences cover 93.6% of the new sequences instead of only 83.7% of the 245 sequences (Table 2). Performance of the cocktails on these sequences may reflect differences between European and U.S. isolates and/or the fact that these 78 sequences are from only 12 patients and are, therefore, less variable than the 245 sequences, which are from 133 individuals (11). Table 4 lists the amino acid frequencies found for the central portion ofthe PND from the 78 new sequences and the original 245 sequences. A key observation is that no different amino acid substitutions are observed in the 78 new sequences in the 13 central positions of the PND. Also, the distribution of amino acids is very similar. The conservation of substitutions suggests that the majority of the antigenic variation at each position of the central portion of the PND has been observed, though not necessarily all the possible sequences for this region. There may, for example, be structural constraints on this region (e.g., the secondary structure motif predicted for the original 245 sequences.) Over 14 billion combinations of the observed amino acids are possible, but from the observed amino acid frequencies most would be extremely rare. This leads to the question of whether a cocktail can be designed that covers a significant portion of the distribution from this larger set. Under the simplifying assumptions that mutations at each position are independent and that the observed substitution frequencies are a good estimate of the expected frequencies, we may calculate the probability of any sequence. Using these assumptions, we have computed the 10,000 most probable Table 3. Test of the hexapeptide vaccine cocktail derived from 245 HIV-1 sequences on 78 European sequences Total New Cum. hits, % No. hits, % hits, % Peptide 66.67 1 66.67 66.67 IGPGRA 75.64 2 GPGRAF 75.64 8.97 92.31 53.85 16.67 3 IHIGPG 92.31 0.00 4 0.00 TMGPGR 92.31 5 0.00 GPGKAF 7.69 92.31 0.00 0.00 6 IFIGPG 1.28 93.59 7 2.56 GPGRRF 93.59 0.00 0.00 8 LGPGRV 93.59 TITGPGR 0.00 0.00 9 93.59 0.00 0.00 10 GAGRAI 93.59 0.00 0.00 11 IAPGRA 0.00 93.59 0.00 12 TPIGLG 93.59 0.00 0.00 13 GQGRAI 93.59 0.00 0.00 14 TKGPGR 93.59 0.00 0.00 15 GPGKVI 93.59 0.00 0.00 16 VGSGRK 93.59 0.00 0.00 17 IRVGPG 93.59 0.00 0.00 18 MGLGRT 93.59 0.00 0.00 19 IGPRRA 93.59 0.00 GPGRTL 0.00 20 93.59 0.00 0.00 21 GPGQAL 93.59 0.00 0.00 22 YQRGPG 93.59 0.00 0.00 ITIGPG 23 93.59 0.00 0.00 24 LGPGSA 93.59 0.00 0.00 25 GPERAF 0.00 93.59 0.00 26 RVEPGK 93.59 0.00 0.00 RPERAF 27 93.59 0.00 1.28 SIGPGR 28 93.59 0.00 0.00 RIGPGK 29 93.59 0.00 0.00 GPGMAF 30 2. Table in solution are shown for the simulated Results annealing Similar results are obtained for the greedy solution. Column headings are interpreted identically to those of Table 2 except that percentages are of the 78 new sequences.

Immunology: Holley et al.

Proc. Natl. Acad. Sci. USA 88 (1991)

6803

Table 4. Amino acid frequencies for the central portion of the PND Amino acid residues at PND position 14 Sequences 13 15 16 17 18 19 20 21 22 24 25 23 New S (57) I (77) H (46) I (69) - (78) - (78) G (77) P (78) G (78) R (66) A (72) F (78) Y (70) G (18) M (1) N (15) T E (1) H (4) K (6) T (2) (8) R (3) P (7) M (3) Q (5) R (2) Y (7) L (2) G (1) K (2) S (2) R (1) Original S (124) I (230) H (113) I (202) - (234) - (234) G (240) P (233) G (241) R (223) A (204) F (177) Y (196) G (69) L (5) T (27) M (17) Q (10) R (10) A (3) L (5) E (2) K (11) V (18) I (29) V (17) R (40) M (3) R (28) L (10) R (1) G (1) E (1) A (3) R (2) Q (4) N (5) V (14) H (13) H (4) T (3) P (22) V (5) R (1) Q (2) (3) T (5) L (10) L (8) K (4) V (2) N (16) T S (2) S (2) R (4) Y (3) (7) F (5) A (3) E M (1) K (4) W (6) R (1) S (16) R (2) (3) (1) F (1) Y (13) K (2) G (1) P (3) T (1) S (1) F (5) Y (2) S (1) S (1) M (1) A (2) F W I (1) (1) (1) G (1) S (1) V (1) K (1) The top part of the table is derived from 78 additional sequences (J.G., unpublished data) and the bottom part is from the 245 original sequences revised to correct 9 sequences (11). Numbering of PND positions is taken from figure 1 of ref. 11. A hyphen indicates a gap introduced in the algnment. Amino acids are designated by standard one-letter symbols followed by the number of sequences containing this substitution in parentheses. sequences for the 13-amino acid region shown in Table 4. The cumulative probability for these 10,000 sequences is 86%. We have used these sequences to generate cocktails by weighting each sequence with its calculated probability. The cocktail of

hexapeptides produced by the simulated annealing algorithm is given together with its performance on the test set of 78 sequences and original 245 sequences in Table 5; corresponding results are obtained for the pentapeptide and heptapep-

Table 5. Hexapeptide vaccine cocktail derived with the simulated annealing algorithm on 10,000 generated HIV-1 sequences and test results on 78 European sequences and the original 245 sequences Test results Test results (78 sequences) (245 sequences) Expected hit rates Total New Cum. Cum. Total New Cum. Total New No. Peptide hits, % hits, % hits, % hits, % hits, % hits, % hits, % hits, % hits, % 1 GPGRAF 48.63 48.63 48.63 75.64 75.64 75.64 59.59 59.59 59.59 2 IHIGPG 29.31 12.97 61.59 53.85 16.67 92.31 37.96 8.57 68.16 3 GPGRAI 7.57 4.90 66.49 0.00 0.00 92.31 5.31 4.08 72.24 4 GPGRVF 3.96 2.52 69.01 0.00 0.00 92.31 0.82 0.82 73.06 5 GPGRAV 3.53 2.24 71.25 0.00 0.00 92.31 75.92 4.90 2.86 6 IRIGPG 1.72 7.09 72.97 1.28 1.28 93.59 76.33 5.31 0.41 7 ITIGPG 6.83 1.65 74.62 0.00 0.00 93.59 4.08 0.00 76.33 8 IPIGPG 5.53 1.32 75.94 7.69 0.00 93.59 6.12 76.33 0.00 9 GPGRAL 2.47 0.95 76.88 0.00 0.00 93.59 0.82 0.41 76.73 10 IGLGRA 0.94 0.94 77.82 0.00 0.00 93.59 0.00 76.73 0.00 11 2.14 GPGKAF 0.81 78.63 7.69 0.00 93.59 2.45 2.45 79.18 12 ISIGPG 3.98 0.69 79.32 1.28 0.00 93.59 4.08 0.41 79.59 13 GPGRAY 1.69 0.54 79.86 0.00 0.00 93.59 0.00 0.00 79.59 14 IGAGRA 0.53 0.53 80.39 0.00 0.00 93.59 0.82 0.82 80.41 15 3.21 IYIGPG 0.47 80.85 2.56 1.28 94.87 2.04 0.00 80.41 16 APGRAF 0.47 0.47 81.32 0.00 0.00 94.87 1.22 1.22 81.63 17 NIGPGR 3.87 0.44 81.76 19.23 1.28 96.15 6.53 0.00 81.63 18 IGQGRA 0.33 0.33 82.09 0.00 0.00 96.15 0.82 82.45 0.82 19 IGSGRA 0.33 0.33 82.42 0.00 0.00 96.15 0.00 0.00 82.45 20 IHMGPG 2.32 0.32 82.75 0.00 0.00 96.15 2.04 82.45 0.00 21 IGPERA 0.32 0.32 83.06 0.00 0.00 96.15 0.41 0.41 82.86 22 GPRRAF 0.29 0.29 83.36 0.00 0.00 96.15 0.82 83.67 0.82 23 GPGRAW 1.44 0.27 83.63 0.00 0.00 96.15 1.22 0.00 83.67 24 GPGRNF 1.00 0.17 83.80 0.00 0.00 96.15 0.00 0.00 83.67 25 GPGRTF 1.00 0.17 83.98 2.56 0.00 96.15 0.41 0.00 83.67 26 0.71 0.15 GPGQAF 84.13 6.41 2.56 98.72 0.00 0.00 83.67 27 IEPGRA 0.14 0.14 84.27 1.28 1.28 100.00 0.00 0.00 83.67 28 GLGRAF 0.86 0.13 84.40 0.00 0.00 100.00 0.00 83.67 0.00 29 GPGRKF 0.78 0.13 84.53 2.56 0.00 83.67 100.00 0.00 0.00 30 GPGRVI 0.55 0.08 84.62 0.00 0.00 100.00 84.49 0.82 0.82 The left third of this table is the expected perfonnance based upon the summed probabilities of the covered subset of the 10,000 generated sequences.

6804

Proc. Natl. Acad. Sci. USA 88 (1991)

Immunology: Holley et al.

tides. The cumulative probability is a lower bound on the expected hit rate for these cocktails, since some fraction of the additional sequences not included in the most probable 10,000 will also be covered. As might be expected, these cocktails perform somewhat more poorly on the original 245 sequences than cocktails tailored to just those sequences. For example, 13 hexapeptides cover 80o (Table 5) versus 90% of the 245 sequences (Table 2). However, they perform significantly better on the test sequences as shown in Table 5. The 30 peptides in Table 5 cover 88% of the 323 (245 + 78) sequences. Although the limited sizes of the cocktails calculated here are promising, it may be possible to further reduce the number of peptide immunogens required to elicit antibodies that neutralize a large percentage ofisolates. For example, by using longer peptides it is possible to link the shorter sequences together. For example, the peptide IHIGPGRAF contains three of the six-amino acid peptides, IHIGPG, IGPGRA, and GPGRAF. However, this is counterbalanced by the fact that to produce an antibody response that will recognize these six amino acids with different flanking regions, it may be necessary to immunize separately with the hexapeptides or multimers of the hexapeptides linked to an appropriate carrier protein. It may also be possible to reduce the cocktail size if the antibodies elicited by one peptide immunogen cross-neutralize viruses with PND sequences not included in the immunizing cocktail.

CONCLUSION The limited number of required components in each of the proposed cocktails is very encouraging for the development of a broadly protective HIV-1 vaccine. To be effective, a vaccine based on one of the proposed cocktails will be required to elicit antibodies that recognize each of the component peptides and that neutralize all of the viruses possessing the corresponding sequences. The small size of the proposed cocktails should allow these questions to be addressed experimentally. We gratefully acknowledge the assistance of Ray Grein and his staff at the IBM Cambridge Scientific Center in using the IBM 3090. We thank Greg LaRosa, Al Profy, Kashi Javaherian, and Scott Putney for helpful discussions, particularly on the implications of the present work for vaccine development. We thank Victor Milenkovic and Lee Nackman for a discussion of NP-completeness. This work was supported in part by a grant from the National Institutes of Health. We thank the IBM Corporation for providing computer time under the IBM Research Support Program. 1. Rusche, J. R., Javaherian, K., McDanal, C., Petro, J., Lynn, D. L., Grimaila, R., Langlois, A., Gallo, R. C., Arthur, L. O., Fischinger, P. J., Bolognesi, D. P., Putney, S. D. & Matthews, T. J. (1988) Proc. Nat!. Acad. Sci. USA 85, 3198-3202. 2. Palker, T. J., Clark, M. E., Langlois, A. J., Matthews, T. J.,

Weinhold, K. J., Randall, R. R., Bolognesi, D. P. & Haynes,

B. F. (1988) Proc. Nat!. Acad. Sci. USA 85, 1932-1936.

3. Goudsmit, J., Debouck, C*, Meloen, R. H., Smit, L., Bakker, M., Aasher, D. M., Wolff, A. V., Gibbs, C. J. & Gajdusek, D. C. (1988) Proc. Natl. Acad. Sci. USA 85, 4478-4482. 4. Matsushita, S., Robert-Guroff, M., Rusche, J., Koito, A., Hattori, T., Hoshino, H., Javaherian, K., Takatsuki, K. & Putney, S. D. (1988) J. Virol. 62, 2107-2114. 5. Skinner, M. A., Langlois, A. J., McDanal, C. B., McDougal, J. S., Bolognesi, D. P. & Matthews, T. J. (1988) AIDS Res. Hum. Retroviruses 4, 187-197. 6. Thomas, E. K., Weber, J. N., McClure, J., Clapham, P. R., Singhal, M. C., Shriver, M. K. & Weiss, R. A. (1988) AIDS 2, 25-29. 7. Devash, Y., Calvelli, T. A., Wood, D. G., Reagan, K. J. & Rubinstein, A. (1990) Proc. Nat!. Acad. Sci. USA 87, 34453449. 8. Berman, P. W., Gregory, T. J., Riddle, L., Nakamura, G. R., Champe, M. A., Porter, J. P., Wurm, F. M., Hershberg, 9.

10. 11.

12.

13. 14.

R. D., Cobb, E. K. & Eichberg, J. W. (1990) Nature (London) 345, 622-625. Emini, E. A., Nara, P. L., Schleif, W. A., Lewis, J. A., Davide, J. P., Lee, D. R., Kessler, J., Conley, S., Matsushita, S., Putney, S. D., Gerety, R. J. & Eichberg, J. W. (1990) J. Virol. 64,3674-3678. Girard, M., Kaczorek, M., Pinter, A., Nara, P., Barre-Sinoussi, F., Kieny, M., Muchmore, E., Yagello, M., Gluckman, J. & Fultz, P. (1990)1J. Cell. Biochem., Suppl. 14D, 150. LaRosa, G. J., Davide, J. P., Weinhold, K., Waterbury, J. A., Profy, A. T., Lewis, J. A., Langlois, A. J., Dreesman, G. R., Boswell, R. N., Shadduck, P., Holley, L. H., Karplus, M., Bolognesi, D. P., Matthews, T. J., Emini, E. E. & Putney, S. D. (1990) Science 249, 932-935, and erratum (1991) 251, 811. Javaherian, K., Langlois, A. J., McDanal, C., Ross, K. L., Eckler, L. I., Jellis, C. L., Profy, A. T., Rusche, J. R., Bolognesi, D. P., Putney, S. D. & Matthews, T. J. (1989) Proc. Nat!. Acad. Sci. USA 86, 6768-6772. Javaherian, K., Langlois, A. J., LaRosa, G. J., Profy, A. T., Bolognesi, D. P., Herlihy, W. C., Putney, S. D. & Matthews, T. J. (1990) Science 250, 1590-1593. Geysen, H. M., Tainer, J. A., Rodda, S. J., Mason, T. J., Alexander, H., Getzoff, E. D. & Lerner, R. A. (1987) Science

235, 1184-1190. 15. Getzoff, E. D., Geysen, H. M., Rodda, S. J., Alexander, H., Tainer, J. A. & Lerner, R. A. (1987) Science 235, 1191-11%. 16. Arnon, R., Shapira, M. & Jacob, C. 0. (1983) J. Immunol. Methods 61, 261-273. 17. Lerner, R. A. (1984) Adv. Immunol. 36, 1-44. 18. Stanfield, R. L., Fieser, T. M., Lerner, R. A. & Wilson, I. A.

(1990) Science 245, 712-719.

19. Karp, R. M. (1972) in Complexity of Computer Computations, eds. Miller, R. E. & Thatcher, J. W. (Plenum, New York), pp. 85-103. 20. Cook, S. A. (1971) Proc. 3rd Annu. ACM Symp. Theory Comput., Assoc. Comput. Mach. 151-158. 21. Hu, T. C. (1982) Combinatorial Algorithms (Addison-Wesley, Reading, MA) pp. 202-205. 22. Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. (1983) Science 220, 671-680. 23. Garey, M. R. & Johnson, D. S. (1979) Computers and Intractability: A Guide to the Theory of NP-Completeness (Freeman, San Francisco), p. 122.

Prediction of optimal peptide mixtures to induce broadly neutralizing antibodies to human immunodeficiency virus type 1.

Sequences of the principal neutralizing determinant (PND) of the external envelope protein, gp120, from 245 isolates of human immunodeficiency virus t...
1MB Sizes 0 Downloads 0 Views