Composite Predictions of Secondary Structures of lac Repressor SUZANNE BOURGEOIS, T h e Salk Institute, P. 0.Box 85800, S u n Diego, California 92138; ROBERT L. JERNIGAN and SHOUSUN C . SZU, Laboratory of Theoretical Biology, Division of Cancer Biology and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20205; ELVIN A. KABAT, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20205, T h e Departments of Microbiology, Human Genetics and Development, and Neurology, Columbia University, New York, N. Y. 10032, and T h e Neurological Institute, Presbyterian Hospital, New York, N.Y. 10032; and TAI T E WU, T h e Departments of Engineering Sciences, and Biochemistry and Molecular Biology, Northwestern University, Evanston, Illinois 60201 Synopsis The secondary structure of the lac repressor protein proposed by Chou et al. has been modified to include the recent revisions in sequence. In addition to the Chou and Fasman method, five other methods were used; they include those of (1)Lim, (2) Ptitsyn and Finkelstein, (3) Burgess et al., (4) Bunting et al., and (5) Wu and Kabat. Any two individual methods gave results differing sharply from one another. Three or more methods were in agreement for 91,39, and 126 residues in helix, in 0, and in combined coil plus turn conformations, respectively; there were such agreements for a total of 256 of the 360 residues. Agreements in the amino-terminal third of the molecule were found for 68% of the residues, whereas in the remainder of the molecule only 53% of the residues showed such agreements. Only two helix-breaking and two P-breaking tripeptides were inconsistent with the composite predictions by three or more methods. The large number of disagreements among the results for different methods indicates that only very limited information is provided by each method and that the basis on which they operate is not clear. There is no a priori reason for a composite prediction to be more reliable than any individual prediction, and such a procedure does not permit the determination of an unambiguous secondary structure. Since these methods were applied to lac repressor before any three-dimensional crystallographic structure was known, the methods may ultimately be evaluated should such a structure become available.

INTRODUCTION The lac repressor of E. coli controls the expression of the lac operon by binding the operator region on the DNA. Binding of an inducer (usually P-galactosides) to the repressor lowers its affinity for operator, triggering the dissociation of the repressor-operator complex and thus allowing transcription of the operon. This repressor is the best characterized regulatory protein, and the properties of this system have been the subject of numerous reviews. (For a recent review, see Ref. 1.) Biopolymers, Vol. 18, 2625-2643 (1979) 01979 John Wiley & Sons, Inc.

0006-3525/79/0018-2625$01.00

2626

BOURGEOIS ET AL.

The lac repressor is an acidic protein made up of four identical subunits each containing 360 amino acid residues. Beyreuther et a1.2 found the repressor subunit to contain 347 residues and determined its amino acid sequence. More recently the nucleotide sequence of the gene coding for the repressor has been completed,3 and this introduced a number of additions and corrections in the amino acid sequence. The DNA sequence calls for the insertion of one polypeptide 11amino acids long and of one dipeptide (see their location in footnote of Table I). The genetic analysis of nonsense mutations had suggested the presence of additional amino acids in those region^,^ and the corrected sequence is in agreement with the genetic data.5 Recently, Beyreuther6 has verified the presence of these inserted peptides in the lac repressor. The lac repressor is the first protein for which the complete primary structure has been verified and corrected by DNA ~equencing.~ Little information is available about the secondary structures in lac repressor. Matsuura et al.7 determined a helix content of 38 and 27% for P-structure by CD, while optical rotatory dispersion measurements yielded 33 and 18%for helix and @-structure,respectively. Chou et a1.* obtained 40% helix and 42% @-structureby CD. The estimates obtained by these optical techniques vary: for helical content from 33 to 40% and for P-structure from 18 to 42%. The tertiary structure of the lac repressor is as yet totally unknown because of difficulties encountered in several laboratories in obtaining crystals suitable for x-ray crystallography. Failure to obtain proper crystals may have been due to heterogeneity in the preparations or, possibly, to some flexibility in the protein. It is thought to consist of two domains which may interact. Genetic and biochemical evidence agree in identifying the amino-terminal regions of the subunit as responsible for DNA binding. It is in this region that amino acid substitutions, whether introduced by missense mutationsg or by suppression of nonsense mutations,1° affect operator binding. Moreover, under conditions of limited digestion by trypsin, cleavage occurs mainly between lysine-59 and glutamine-60. There is a large tetrameric repressor fragment, lacking the first 59 amino-terminal residues,l1J2 which retains normal affinity for inducers but has lost its capacity to bind DNA.13 Recently the amino-terminal fragment of the repressor, containing residues 1to 59, which are necessary for DNA binding, has been is01ated.l~ This N-terminal fragment has weak affinity for nonoperator DNA,14and also interacts with the lac operator region.15 It has been observed that binding of inducer to the site present in the portion between residues 60 and 360 of intact repressor lowers its affinity for DNA. This indicates an influence of one domain on another. Geisler and Weber14 suggested the presence of a hinge region between residues 50 and 60. Two current approaches are being taken to elucidate the structure of lac repressor. One is an attempt to crystallize separately the two fragments. The other is to try to crystallize the repressor-operator complex formed by binding repressor to a synthetic lac operator DNA containing 21 base pairs.16.17

STRUCTURE OF LAC REPRESSOR

2627

In the absence of definitive structural information from x-ray scattering, potential sources of useful information are the methods for predicting secondary structures. In these empirical methods, protein sequences and their secondary structures, as determined by x-ray diffraction, are utilized to derive statistics on the tendencies of amino acids to exist in specific secondary conformations. Such methods are successful to widely varying degrees. It is possible to conclude that predictions of a-helix are more reliable than for other conformations such as 0-strands and turns. The investigations by Black et al.ls of short-range structural correlations between pairs of amino acids indicate some of the problems inherent in such statistical studies of molecular conformations. Inadequacies in all of these methods arise from errors in the reported x-ray structures, the relatively small number of molecules in the data bank, and the classification of reported conformations in ways that may neglect some forms. For example, random coils actually include a number of specific conformations, which are not distinguished. Maxfield and Scheragalg have recently concluded that no empirical method is likely to achieve reliable predictions if it is based on the number of crystal structures currently known. The diversity manifested in the details of these methods is intriguing. Composite predictions have been published for more than 30 molecule^.^^-^^ There is no a priori reason why a composite prediction should be more valid than an individual prediction. If such should prove to be the case for those regions where all predictions agree, one could search for common denominators which might improve the understanding of the basic principles involved. The unknown structure of lac repressor may provide a unique challenge to the prediction methods. Hopefully a crystal structure will eventually be available, and then the methods and corresponding results presented in this paper can be tested. It will provide a definitive evaluation of the composite as well as the individual predictions, residue by residue. This could lead to the recognition of other parameters which might improve predictability.

METHODS To predict lac repressor conformations we have chosen, from the large number of published methods, those of Chou and Fasman,26Lim,27Ptitsyn and Finkelstein,28and Burgess et al.29 In addition we have predicted turns according to Bunting et al.30and helix- and 0-strand-breaking tripeptides with the method of Kabat and Chou et aL8 have applied their method to predict secondary structures for lac repressor. They have compiled the tendencies for each of the 20 amino acids to be indifferent or to form or break helix or /%strand. Structured segments are terminated when four breakers or indifferent amino acids are found. There are substantial ambiguities in applying their method. Some of the ambiguities in their rules are demonstrated by the differences between their predictions for lac repressor and those re-

2628

BOURGEOIS ET AL.

ported by PateP3for a fragment of this molecule. We have used their reported predictions,8 except in the regions in which the sequence has been corrected by nucleotide sequencing. For these segments we have utilized their methods with the same statistics they used; helix and parameters were those based on 29 proteins,34and turn parameters were based on 17 proteins.8 LimZ7based his method on steric and polarity effects. Each amino acid is classified as being either hydrophobic or hydrophilic and having either a long or short side chain. The helix regions usually predicted contain one or more hydrophobic pairs separated by two or three amino acids. In addition the backbone atoms can be protected from solvent molecules by long hydrophilic side chains at the ends of the helix regions. From the remaining nonhelical portions of the molecule, similar rules are applied to determine the locations of P-strand. Lim supplied us with his prediction for the original incomplete sequence2;we have applied his method to the corrected sequence, including insertions. Ptitsyn and FinkelsteinZ8have tabulated amino acid occurrences within secondary structure regions for four proteins. They designated helix- and P-strand-forming potentials for each amino acid according to its size, hydrophobicity, dipole, and charge. The total potential for a proposed P-strand is taken to be a sum of individual /3 contribLtions for each residue. For helix potentials, the effects of two neighbors on each side of an amino acid are included by summing individual potentials over five adjacent amino acids. Then the total potential of a trial helix region is the sum of these neighbor-dependent potentials with additional contributions from the amino acids near the helix ends. Every combination of all possible segments of helix and @-strandis considered. The combination giving the best total potential for the entire molecule is chosen. To avoid missing some combinations, we have developed a scheme based on dynamic programming methods to perform a complete search. Burgess et al.29 have developed a prediction method based on the x-ray data for 13 proteins. Each amino acid has been assigned a-helix and P-strand probabilities and position-dependent turn probabilities. Then for a given residue, the probability of forming an or-helix or a P-strand region is composed of a product of nine sequential individual a or probabilities. These include a central amino acid and the four nearest neighbors on each side. Then the secondary structure is chosen if these neighbor-dependent probabilities exceed certain average limits and at least four consecutive residues are in the same conformation. The total turn probability is the product of nine individual turn probabilities. A computer program for this method was provided to us by H. A. Scheraga. The turn-prediction methods of both Chou et a1.8 and Bunting et al.30 are similar. The principal difference is the number of proteins on which they are based, 17 for Chou et al. and 6 for Bunting et al. Both have tabulated the probabilities of each amino acid at the four positions within a turn. The overall probability of a turn is the product of the four tabulated

STRUCTURE OF LAC REPRESSOR

2629

probabilities. Turns are predicted at sites for which these probabilities are larger than the specified cutoff values. The statistics of the structure-breaking tripeptides of Kabat and CO~ o r k e r s ~ 3 , ~are ~ , in 3 2the form of three or six values given as a 20 X 20 table of amino acids, representing the influence of the two neighbors on a central unspecified amino acid. The first entry is the number of instances in which the (4,$) values of the middle amino acid were in the helical domain minus any such values which were the second and third residues of @-turns,the turns being determined from the x-ray data; the second is the number of observed occurrences for which the (4,$)values were in the @-sheetdomain. The third is the number of cases in which the (4,$) values fell outside of these domains; the fourth, after the semicolon, if the fourth, fifth, and sixth are given, lists the number of instances in which the (4,$) values of the middle amino acid were in the helical domain and occurred in a stretch of four or more residues with helical (@,$) values (i.e., in a helical segment); the fifth gives the number of instances in which the middle amino acid was one of three or more successive residues with ($,$) values in the @-sheet domain (i.e., potentially in a @-sheet);and the sixth gives the observed frequency with which the ($,$) values were in the helical domain and were either the second or third residues of a @-turn. Using the first three numbers in each entry of the 20 X 20 table (Table I of Ref. 23), a-helixbreaking tripeptides were defined as before31to be those with a frequency of 0,0,3; 0,1,2; 0,2,1; 0,3,0 or greater, as well as those with a frequency of not less than 90%nonhelical occurrences. Other less probable helix breakers were omitted. The definition of @-sheetbreakers in the 20 X 20 table was relaxed from those with a frequency of 1,0,2;2,0,1; 3,0,0; 0,0,3 or greater to include, in addition, those with a frequency of not less than 90% non@-strandoccurrences. These values were computed according to Wu and Kabat31 using the most recent 20 X 20 table based on 19 proteins.23 Breaking tripeptides are not taken to be contradictory to predictions if, for helix, they occur in the three terminal residues at either end, or if for 0-strand, they appear at either terminal residue.

RESULTS AND DISCUSSION Predictions with these six independent methods are given in Table I and Fig. 1 for each residue of the revised lac repressor sequence. There are numerous disagreements among the predictions by the several methods. From the totals given at the end of Table I, it can be determined that three or more of the methods agree for only 58%of the 360 residues. The extent of agreement varies for different portions of the molecule. In the joint predictions, for the first third of the molecule, 68% of the residues are predicted to be in the same conformation by three or more of the methods. Corresponding values for the middle and last third of the molecule are 53 and 52%. It is noteworthy that somewhat more agreement is observed in the portion of lac repressor which interacts with DNA. This tendency to

BOURGEOIS ET AL.

2630

TABLE I Predictions of Secondary Structures in the lac Repressor"

Wu et al. (Ref. 23) 1 Met 2 Lys 3 Pro 4 Val 5 Thr

6 Leu 7 Tyr 8 Asp 9 Val 10 Ala 2,0,3

p

Ptitsyn and Chou Burgess Bunting Finkeletal. Lim etal. et al. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28)

C C C C

C C C

C C C

P P P P

P P P P P P

P P P P P

P P P P P P

C

C

p'

H H H

11 Glup8,1,4 12 Tyr 13 Ala 14 Gly 15 Val H 3,5,22;1,1,0

H H H C T

C C C C C

C C C C C

16 Ser H 1,3,7 17 Tyr 18 Gln H 0,6,6 19 Thr 20 Val

T

C C H H H

C T T

p

P P P P P* P P P

C C C

p'

P P P P P T T

P P

P P P* P P P

P P P* P P P

C

H

C

H H H* H H

H H H H H

H C C C C

T T C C C

H H H H H

H H H H H

C C C C H

C T T H H

H H H H H

36 Glu 3,1,6;1,0,0 37 Lys 38 Val 39 Glu 40 Ala

p

H H H H H

H H H H H

H H H H H

C C H H H

41 Ala 42 M e t p 20,1,16

H H

H H

H H

H H

21 Ser 2,0,6 22 Arg 23 Val B 0,3,2 24 Val 25 Asn 26 Gln 27 Ala 28 Ser 29 His 30 Val 31 Ser 32 Ala 33 Lys 34 Thr 35 Arg

p 5,0,7

p 5,0,4

T T

Joint

STRUCTURE OF LAC REPRESSOR

2631

TABLE I (continued) ~

Wu et al. (Ref. 23)

Ptitsyn and Chou Burgess Bunting Finkelet al. Lim et al. et al. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28)

43 Ala 44 Glu 45 Leu

H H H

H H H

H H H

46 Asn H 1,3,7 47 Tyr 48 Ile H, 0,1,10 49 Pro 50 Asn

C C C T T

C C C C C

C C T T T

51 Arg52 Val P 4,0,2 53 Ala 54 Gln 3,0,4 55 Gln

C H H

H H H H H

C C C C C

56 Leu 57 Ala 58 Gly 59 Lys IT 0,4,3 60 Gln

H H C C

H H H H H

C C C C C

p

3

61 Ser 62 Leu 63 Leu 64 Ile 65 Gly 66 Val 67 Ala 68 Thr 69 Ser 70 Ser

H H

P P P P P P P P P H H

C

C

P P P C C C C C

P P P P P P P P P C C C C C

C

71 Leu 72 Ala 73 Leu 74 His 75 Ala

H

C C C C C

76 Pro 77 Ser Ff, 0,0,3 78 Gln 79 Ile 80 Val

T T H H H

C C H H H

T T C C C

81 Ala 82 Ala 83 Ile 84 Lys85 Ser 0 5,0,4

H H H H H

H H H H H

H H H H H

H H H H

p

Joint

2632

BOURGEOIS ET AL. TABLE I (continued)

Wu et al. (Ref. 23) 86 Arg 87 Ala

p p 6,1,6

88 Asp 3,0,4 89 Gln 90 Leu

0,1,6

91 Gly 92 Ala H 3,5,22;1,1,0 93 Ser 94 Val 95 Val

Ptitsyn and Chou Burgess Bunting Finkelet al. Lim et al. et al. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28)

H H C

P P P P P P P P P P P

H C C C C

C C C

H C T T C

C C

C

C

P P P P P P P P

T T C C H

H H H H H

T T T T C

106 Ala 3,0,0 107 Cys 108 Lys P3,0,2 109 Ala 110 Ala

H H H H H

H H H H H

C C H H H

111Val 112 His 113 Asn 114 Leu 115 Leu

H H H H H

H H H H H

H H H H H

116 Ala 117 Gln 118 Arg 119 Val 120 Ser

H H H H C

H H H H H

H T T C T

C

H H H C C

T T C T T

C C C

T T T

96 Val 97 Ser 98 Met 99 Val 100 Glu 2,0,6

p

101 Arg 102 Ser H 1,2,8 103 Gly 104 Val 105 Glu

p

121 Gly 122 Leu 123 Ile 124 Ile

125 Asn 126 Tyr H 0,2,3 127 Pro H 0,4,4 128 L e u p 1,0,3

P P P P P C H

P P P P P C

Joint

STRUCTURE OF LAC REPRESSOR

2633

TABLE I (continued)

Wu et al. (Ref. 23)

Ptitsyn and Chou Burgess Bunting Finkeletal. Lim etal. etal. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28)

H

C C

T T

H* H H

H H H

H

H H

H

H

H

H

136 Val 137 Glu 138 Ala 139 Alap 1,0,5 140 Cys

H H H H H

H H H

H

141 Thr 142 Asn 143 Val R 0,2,3 144 Pro 145 Ala

129 Asp 130 Asp 131 Gln H$0,0,6 132 Asp 133 Ala 134 Ile 20,1,16 135 Ala

H

H

H H H H H H H H

H H

C

T T

H H

C

C

H C

C C C

P

C C C C T

P P P P* P

C

146 Leu 147 Phe 148 Leu 149 Aspp 6,1,10 150 Val 4,0,10

p

151 Ser 152 Asp 153 GlnP 3,1,11 154 Thr H 0,1,2 155 Pro

C C C C C

C

156 Ile 157 Asn 158 Ser 159 Ile H 0,2,1 160 Ile

T T C

P P P P

p p 166 Gly p 3,1,11

H

H H H

P P P P P* P

161 Phe 162 Ser 163 His 6,0,3 164 Glu 165 Asp 6,1,5

T T

C H

P

T T

167 Thr H 0,2,6 168 Arg 169 L e u E 1,2,8 170 Gly P6,1,10

T T C C C

171 Val 172 Glu

C H

P P P P p*

P

H H C C

C

P

P

T

T T T C

T T T

C

C

T

C

C C C

T T

T T T

P P P P P

C

P P P P P P

C

C C C

C C C

C C C

C H H

C C C T

T C T T T T T C C

c

T T T

C C C

C H H H H

H* H H* H H H

Joint

2634

BOURGEOIS ET AL. .

Wu e t al. (Ref. 23)

TABLE I (continued)

Ptitsyn and Chou Burgess Bunting Finkeletal. Lim et al. et al. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28)

173 Hisp 11,0,4 174 Leu 175 Val 176 Ala 177 Leu178 GIyP 3,0,3 179 His H 0,4,3 180 Gln 2,0,1

H H H H H H H* H

181 Gln 182 Ile 183 Ala 184 Leu 185 Leu

H H H H H

186 Ala 187 Gly 188 Pro 189 Leu 190 Ser

H C T T C

p

2,1,10;1,1,0

191 Ser 192 Val 1,4,19 193 Ser 194 Ala 8 2,0,3 195 Arg

C H H H H

196 Leu 197 Arg 198 Leu 199 Ala 200 Gly

H H H H H

4,0,2

201 T r p 202 His 2,0,2 203 Lys 0,1,2 204 Tyr P 10,1,8 205 Leu Ff 0,6,6

g

206 T h r 207 Arg 208 Asn 209 Gln 210 Ile 211 Gln 212 Pro 213 Ile 214 Ala 215 Glu

216 Arg

H H H H H H H H

C C C C C C H H

H

H H H H H

P P P P P C C C C C

P P P* P P P P

H C T T T

T C C T T

H H H H H

H H* H

T T T

H H H H H H H H H* H

H H* H H H H H H H H

C C

C C C T T

C C C C C

T C T T C

C C C C C

T T T C C

H H H H

C C C C C

C C C C C

C C H H H

H

C

C

H

H H H

P P P T T

P P P

T T

H H H H C T T

C C C C C

Joint

STRUCTURE OF LAC REPRESSOR

2635

TABLE I (continued)

Wu et al. (Ref. 23) 217 Glu H 1,2,8 218 Gly 219 Asp P4,0,1 220 T r p 4,0,10 221 Ser 222 Ala 0,0,4 223 Met 224 Ser 225 Gly

p w,p

Ptitsyn and Chou Burgess Bunting Finkeletal. Lim et al. etal. stein (Ref. 8) (Ref. 27). (Ref. 29) (Ref. 30) (Ref. 28) . C H C T C T T C T T C C C C C C C C C C T T C T T C C

226 Phe H 0,4,3 227 Gln 228 Gln 3,0,4 229 T h r 1,0,3 230 Met

H H H H H

231 Gln 232 Met 233 Leu 234 Asn @ 6,0,5 235 Glu 4,1,9

H H H H H

p

C C C C

P P P P P* P

T C C C C C C C C C

P P P P P P* P P P P

C C C C C

C

P P P P P

H H H H H

246 Asn 8,1,4 247 Asp 2,0,6 248 Gln 249 Met 250 Ala 2,OJ

T T H H H

C C H H H

H H C C T

251 Leu 252 Gly 253 Ala 254 Met 255 Arg

H H H H H

H H H H H

T C C T T

256 Ala 257 Ile 258 T h r 259 Glu 260 Ser 6,1,5

H H H C T

H H H H C

C C C C C

236 Gly 7,0,1 237 Ile 238 Val 239 Pro 240 T h r

p

241 Ala 2,0,2 242 Met 243 Leu 244 Val 245 Ala

p p

p

C C C C

Joint

BOURGEOIS ET AL.

2636

TABLE I (continued)

Wu et al. (Ref. 23) 261 Glv 262 Leu Ff 0,2,6 263 Argp 6;1,10 264 Val Ff 1,2,8 265 Gly 266 Ala 267 Asp 268 Ile 4,0,10 269 Ser 270 Val

p

271 Val 272 Gly H 1,3,7 273 Tyr 274 Asp 275 Asp3 3,IJl

Ptitsyn and Chou Burgess Bunting Finkelet al. Lim et al. et al. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28)

T C C

P P P P P* P P P P P

C C C C C

T T C C T

C C

T T

P P P P

P P P P

T T

H H* H H H C C C C C

T T

C C C C

T T T T

T T

C C C C C

276 Thr 277 Glu 278 Asp 279 Ser 4,0,10 280 Ser

C C C T T

C C C C C

T T T T T

T T T T T

C C C C C

281 Cys H$1,1,12 282 Tyr 283 Ile Ff$0,1,10 284 Pro 285 Pro

T C C C T

C C C C C

T C C T T

T

C C C

T T

C C

286 Leu 287 Thr 288 Thr 289 Ile Ff 1,5,10 290 Lys

P P P P P P P* P P P* P

H H

T C C C C

T

C C C C H

p

291 Gln Ff 0,1,2 292 Asp 2,0,4 293 Phe 294 Arg 295 Leu P 4,1,6 296 Leu 297 Gly 298 Gln 299 Thr 300 Ser

C T T C

301 Val 302 Asp 2,0,6 303 Arg 6,1,6 304 Leu 6 4,1,6

p

P P* P* P*

H H* H

H*

H H H

P P

P P

C C

H H H H

C C T T C

T T T T T

H H

H H

H

T T T T T

H C C C H

T

H H

P P* P*

H H

Joint

T

STRUCTURE OF LAC REPRESSOR

2637

TABLE I (continued)

Wu et al. (Ref. 23)

Ptitsyn and Chou Burgess Bunting Finkelet al. Lim et al. et al. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28)

Joint

305 Leu

P

H

P

H

(H),(P)

306 Gln 307 Leu 308 Ser 309 Gln 310 Gly

P P C T T

H H H H H

C C C T T

T T

H H H H H

(H) (H) (H),(C) (W,T W,T

311 Gln 312 Ala 313 Val 314 Lys 315 Gly Ff 1,3,7

c c c

c c c

H

T

C

C C C C C

T

H H H H H

(WAC) C C C (C),(T)

316 Asn H 0,4,3 317 Gln 318 Leu 319 Leu H1,4,6 320 P r o p 6,1,10

T P P P

C C C C

C

T

H C C C H

(C),(T) (C) (C) (P),(C)

p

321 Val 2,1,10 322 Ser 323 Leu 324 Val 325 Lys 2,0,6

p

C

c

P P*

P* P* P P P

P* P P P

P P P

C

P

T

H H H H H

T T T T C

H H C C C

-

C C C C C

C (C),T (CAT C C

T T T

C H H H H

C (C) C (C),(T) (H),(T)

T T

H H H H H

326 Arg 327 Lysp 2,1,9 328 Thr 329 Thr 330 Leu

c

c

P P P

C C C

331 Ala H 1,4,6 332 Pro 333 Asn 334 Thr P 2,0,6 335 Gin

P T T P P

C C C C C

336 Thr 337 Ala 338 Ser 339 P r o p 2,0,3 340 Arg

P P

C C

T T

C H

341 Alap4,1,6 342 L e u 6 20,1,16 343 Ala 344 Asp345 S e r b 6,1,6

H H H H H

H H H H H

346 Leu 347 Met

H* H

H* H

0,0,4

T T

C

c

P

c

p*

C T

T C C C C C C C

T T T

c C H H H

H* H

H* .H

(P) P P P P -

(C) (C) (C) C

H

H H H

H H*

H (continued)

BOURGEOIS ET AL.

2638

TABLE I (continued)

Wu et al. (Ref. 23) 348 G l n p 2,0,1 349 Leu 350 Ala

p

351 Arg 3,0,4 352 Gln H 0,3,2 353 Val 354 Ser 2,0,6 355 Arg

6

356 Leu 357 Glu 358 Ser 6,1,5 359 Gly 360 Gln Totals H breaking: 32 P breaking: 66 H,P breaking: 8 Total number of breakers: 106

Ptitsyn and Chou Burgess Bunting Finkele t al. Lim etal. et al. stein (Ref. 8) (Ref. 27) (Ref. 29) (Ref. 30) (Ref. 28) H H H

H H H

H H* H H H

H H* H H H

T

H H H

H H H

H H H

H

H H H C C

H H H (WAC) (HI (HI (H),(C) C C C

H H C

H

H

T

C

H

H

C

C C C

H H H

C C C

H:148 H:139 P:108 P:62 C:60 C+T:159 T44

Joint

T T

C H:62 P:42 C:151 T:105

C C C

H:222 H:91 P:52 (H):77 C T 8 6 8:39 (P):29 c:49 (C):94 T:28 (T):35 Uncertain: 4

+

T81

a The amino acid sequence shown is that determined by Beyreuther e t al. (Ref. 2) except for residue 215, later identified as glutamic acid rather than glutamine (J. G. Files, personal communication) and two inserted regions (11residues 148-158 and 2 residues 231 and 232) and a change of residue 164 from glutamine to glutamic acid as recently determined by Farabaugh (Ref. 3). Helix-breaking residues are identified by H, and P-strand breaking residues by followed by three or six values. Each residue is indicated as being either in a helix, H, P-strand, @,coil, C, or turn, T, on the basis of prediction methods of Lim (Ref. 27), Burgess e t al. (Ref. 29), Bunting e t al. (Ref. 30), or Ptitsyn and Finkelstein (Ref. 28). For turns, only the central two residues are designated by T. A horizontal bar represents a separation between two adjacent, but separate, regions having the same conformation. T h e predictions listed under Chou et al. (Ref. 8) are those from their Tables 3 and 4 except for the new residues and some changes due to the insertions. Residues 248-258 are predicted as helical (Ref. 8) on the basis of 29 known proteins, although this region was predicted as a @-structureregion on the basis of 15 known proteins. The last column gives the structures most frequently predicted for each residue. If three or more of the methods agree, these are listed; but if only two methods agree, the results are given in parentheses. Coils are distinguished from turns in these joint predictions, although the prediction schemes of Lim and of Ptitsyn and Finkelstein do not allow distinctions between coil and turn structures, while the prediction method of Bunting et al. (Ref. 30) predicts only turn conformations. The totals shown a t the end of the table represent the number of residues predicted as H, @, C, or T (C + T i n the cases of Lim and of Ptitsyn and Finkelstein) by each prediction scheme. Asterisks indicate contradictions between predictions and the location of helix or P-breaking tripeptides. Four breakers conflicting with the joint predictions are observed a t residues 21,149, 179, and 346.

p,

STRUCTURE OF LAC REPRESSOR

& 1 II

1 1 1

Xll3H

2640

BOURGEOIS ET AL.

obtain better predictions for the amino portion of a protein has also been observed for other molecules.20.21Differences among the predictions may indicate a lack of reliability for individual methods. There is no evidence to indicate that predictions, for which there is substantial agreement among the methods, would be subject to less uncertainty. Because of the known operator binding to the amino-terminal portion, we have compared the Chou et a1.8 findings with the joint predictions in the range of residues 1-59. By discussing joint predictions we intend only to indicate regions of substantial consensus; there is no intention to lend support to the joint prediction as necessarily being better than any single method. There are joint predictions of two 0regions at 4-9 and 19-24; this coincides almost completely with Chou et al.,8 who gave p’s at 4-7 and 17-24. In the latter region, there is one 0breaker a t residue 21; however, it is contradicted by 0predictions by three other methods. Chou et a1.8 predicted a helix region at 8-13, which was not predicted by any other method. In the helix region at 26-45, only the latter half a t 34-45 agrees with other methods; the first portion from 26 to 33 is predicted to be helix by only one other method, namely, by Ptitsyn and Finkelstein. Most of the helix a t 52-57 is predicted by two other methods as well. The joint predictions do not support the presence of an extensive 0-structure region between residues 200 and 340 as proposed by Chou et a1.8 In particular their 0 regions at 204-206,209-211,28&294, and 32S337 are not confirmed by any of the other methods. The method of Wu and Kabat23allows the identification of 32 a-helixbreaking tripeptides, 66 0-structure-breaking and 8 a- and 0-breaking tripeptides. Among these, 15 breaking tripeptides contradict the predictions of Chou et ala 4 a-breaking tripeptides are located in regions predicted as helical, and 110-breaking tripeptides are located in predicted 0-structure regions. A comparison of the predictions of individual methods with the occurrences of structure-breaking tripeptides should give some indication of the nature of any overpredictions. The largest number of contradictions of this kind occurs for helix predictions in the method of Ptitsyn and Finkelstein and for 0predictions in the results of Chou et a1.8 Many contradictions disappear when the results of the joint predictions are considered. For the joint prediction there are only four contradicting cases: helix breakers a t 179 and 346 and 0breakers a t 21 and 149. The method of Chou and Fasma1-13~ is currently the most popular for attempting to predict secondary structures; however, it produces some uncertain results and might prove more reliable if its predictions were confirmed by some of the other available methods. There are many regions which are predicted to have the same conformation by the four complete methods. Those regions demonstrating such uniform agreements are: C 1-3, 0 5-7, H 35, H 38-45, (C or T) 48-50, H, 81-86, (C or T) 88, 0 94-98, H 108-116, H 131-139, (C or T) 151-155, H 179-181, (C or T) 207-208, (C or T) 274-285, (C or T) 299, (C or T) 332-333,

STRUCTURE OF LAC REPRESSOR

2641

and H 343-353, comprising 23% of the whole molecule. We have recently received from K. Nagano (personal communication) predictions for lac repressor with his method.35 His results are generally in accord with these same residues; 22% of the residues show agreement by all five methods. Several detailed models have been proposed for the interaction between the lac repressor protein and the operator DNA region. One36of these requires, in particular, the presence of a helix in the region of residues 13-33 of the repressor, approximately. According to composite predictions, this is not one of the regions which has a very high probability of being helical (see Fig. 1). In contrast, Gursky et a1.37 have proposed two long P-strands from 14 to 32 and from 53 to 71; only 9 of these 38 residues are given as P in the composite predictions. Recently Jones and Olson38have presented a model in which the double-stranded operator DNA is separated between base pairs 7 and 29 to form single-strand loops. However, this model is not consistent with the experimental extent of unwinding39and the properties of mutants. A large number of amino acid substitutions corresponding to specific mutations in the repressor have already been identified, and the use of different suppressors allows one to introduce five or six different substitutions at the location of nonsense c0dons.~95 The properties of the altered repressor resulting from each specific substitution can either be deduced from the phenotype in uiuo or examined in uitro by direct measurements of the affinity of the repressor for operator and inducer.40 The correlation between specific substitutions and the resulting alterations in repressor properties, together with the location of probable secondary structures, could give useful clues about the three-dimensional structure of the repressor. We used the present methods in trying to predict secondary conformations in a series of mutants; it was not possible to reach any conclusions about the effects of sequence changes on secondary conformations. Other hypothetical models will probably be proposed before the x-ray data will be available; these should attempt to account for predictions of secondary structures and properties of substitution mutants. The composite method leads to values of secondary conformation for a substantial fraction of the residues, many of which do not support previously proposed secondary structures. The individual as well as the composite methods are not suitable for providing insight into the threedimensional structure of the lac repressor. The data, however, especially since they were compiled in the absence of x-ray crystallographic studies, if ultimately compared with such a structure, might give some insight into the principles on which they select secondary structure. This may lead to better understanding and ultimately improved predictive methods.

We thank H. A. Scheraga for sending a computer program for his method and V. I. Lim and K. Nagano for providing predictions with their methods. Support for this work was provided

2642

BOURGEOIS ET AL.

in part by grants from the National Science Foundation, PCM 72-02219 A04 and PCM 7681029, to E.A.K.; a grant from the National Institutes of Health, 5-R01-GM21482, to T.T.W.; and grants from the National Institutes of Health, 5-R01-GM20868 and 5-R01-GM25617, to S.B. Also, T.T.W. is a Research Development Career Awardee, 5-K01-AI70497, from the National Institutes of Health.

References 1. Bourgeois, S. & Pfahl, M. (1976) in Adu. Protein Chem. 30,l-99. 2. Beyreuther, K., Adler, K., Geisler, N. & Klemm, A. (1973) Proc. Natl. Acad. Sci. USA 70,3576-3580. 3. Farabaugh, P. (1978) Nature 274,765-769. 4. Miller, J . H., Ganem, D., Lu, P. & Schmitz, A. (1977) J . Mol. Biol. 109,275-301. 5. Coulondre, C. & Miller, J. H. (1977) J. Mol. Biol. 117,525-575. 6. Beyreuther, K. (1978) Nature 274,767. 7. Matsuura, M., Oshima, Y. & Horiuchi, T . (1972) Biochem. Biophys. Res. Commun. 47,1438-1443. 8. Chou, P. Y., Adler, A. J. & Fasman, G. D. (1974) J . Mol. Biol. 96,29-45. 9. Pfahl, M., Stockter, C. & Gronenborn, B. (1974) Genetics 76,669-679. 10. Miller, J. H., Coulondre, C., Schmeissner, U., Schmitz, A. & Lu, P. (1975) in Protein &and Interactions, Sund, H. & Blauer, G., Eds., Walter de Gruyter, Berlin, pp. 238-252. 11. Platt, T., Files, J. G. & Weber, K. (1973) J . Biol. Chern. 248,110-121. 12. Huston, J. S., Moo-Penn, W. F., Bechtel, K. C. & Jardetzky, 0. (1974)Biochem. Biophys. Res. Commun. 61,391-398. 13. Files, J. G. & Weber, K. (1976) J . Biol. Chem. 251,3386-3391. 14. Geisler, N. & Weber, K. (1977) Biochemistry 16,938-943. 15. Ogata, R. T . & Gilbert, W. (1978) Proc. Natl. Acad. Sci. USA 75,5851-5854. 16. Goeddel, D. V., Yansura, D. G. & Caruthers, M. H. (1977) Proc. Natl. Acad. Sci. USA 74,3292-3296. 17. Bahl, C. P., Wu, R., Stawinsky, J. & Narang, S. A. (1977) Proc. Natl. Acad. Sci. USA 74,966-970. 18. Black, J. A,, Harkins, R. N. & Stenzel, P. (1976) Int. J. Pept. Protein Res. 8, 125130. 19. Maxfield, F. R. & Scheraga, H. A. (1976) Biochemistry 15,5138-5153. 20. Schulz, G. E., Barry, C. D., Friedman, J., Chou, P. Y., Fasman, G. D., Finkelstein, A. V., Lim, V. I., Ptitsyn, 0. B., Kabat, E. A., Wu, T. T., Levitt, M., Robson, €3. & Nagano, K. (1974) Nature 250,140-142. 21. Matthews, B. W. (1975) Biochim. Biophys. A d a 405,442-451. 22. Wallace, D. G. (1976) Biophys. Chem. 4,123-130. 23. Wu, T. T., Szu, S. C., Jernigan, R. L., Bilofsky, H. & Kabat, E. A. (1978) Biopolymers 17,555-572. 24. Argos, P., Schwarz, J. & Schwarz, J. (1976) Biochim. Biophys. A C ~ Q 439, 261-273. 25. Lenstra, J. A. (1977) Biochim. Biophys. A C ~ 491,333-338. Q 26. Chou, P. Y. & Fasman, G. D. (1974) Biochemistry 13.211-223,223-245. 27. Lim, V. I. (1974) J. Mol. Biol. 88,857-872,873-894. 28. Ptitsyn, 0. B. & Finkelstein, A. V. (1970) Biophysics 15,785-796. 29. Burgess, A. W., Ponnuswamy, P. K. & Scheraga, H. A. (1974) Israel J. Chem. 12, 239-286. 30. Bunting, J. R., Athey, T. W. & Cathou, R. E. (1972) Biochim. Biophys. A C ~ 285, Q 60-70. 31. Wu, T. T. & Kabat, E. A. (1971) Proc. Natl. Acad. Sci. USA 68,1501-1506. 32. Kabat, E. A. & Wu, T. T. (1973) Proc. Nat. Acad. Sci. USA 70,1473-1477. 33. Patel, D. J. (1975) Biochemistry 14,1057-1059. 34. Chou, P. Y. & Fasman, G. D. (1977) in Peptides-Proceedings of the Fifth American Peptide Symposium, Goodman, M. & Meienhofer, J., Eds., Halsted Press, Wiley, New York, pp. 284-287.

STRUCTURE OF LAC REPRESSOR

2643

35. Nagano, K. (1977) J. Mol.Biol. 109,251-274. 36. Adler, K., Beyreuther, K., Fanning, E., Geisler, N., Gronenborn, B., Klemm, A,, Muller-Hill, B., Pfahl, M. & Schmitz, A. (1972) Nature 237,322-327. 37. Gursky, G. V., Tumanyan, V. G., Zasedatelev, A. S., Zhuze, A. L., Grokhovsky, S. L. & Gottikh, B. P. (1977) Nncleic Acid-ProteinRecognition,Academic, New York, pp. 189217. 38. Jones, C. E. & Olson, M. 0. J. (1977) J. Theor. Biol. 64,323-332. 39. Wang, J. C., Barkley, M. D. & Bourgeois, S. (1976) Nature 251,247-249. 40. Bourgeois, S. (1971) Methods Enzymol. 21,491-500.

March 8,1978 Accepted May 23,1979

Composite predictions of secondary structures of lac repressor.

Composite Predictions of Secondary Structures of lac Repressor SUZANNE BOURGEOIS, T h e Salk Institute, P. 0.Box 85800, S u n Diego, California 92138;...
863KB Sizes 0 Downloads 0 Views