Journal o f Protein Chemistry, Vol. 11, No. 6, 1992

Identification of Tissue Proteins by Amino Acid Analysis After Purification by Two-Dimensional Electrophoresis P. Jungblut, I'a'4 M. Dzionara, 2 J. Klose, ~ and B. Wittmann-Leibold 2

Received March 31, 1992

Mouse brain proteins were separated by two-dimensional electrophoresis (2-DE). The proteins of a section of the 2-DE pattern were blotted onto hydrophobic membranes and 43 of them were excised and hydrolyzed by liquid-phase hydrolysis. The amino acid composition of these proteins was determined by orthophthaldialdehyde precolumn derivatization and compared with the compositions of known proteins stored in the NBRF sequence database. An identification program named ASA was developed for this purpose. The ASA program includes correction and weighting factors, data reduction by molecular weight windows, and exclusion or inclusion of certain organisms as desired. As a control, eight test proteins and five wellknown proteins from mouse brain, all separated by 2-DE, were correctly identified by the program. Out of the 43 brain proteins selected, 19 were identified with high confidence. KEY WORDS: Two-dimensional electrophoresis; blotting; amino acid analysis; databases; identification; proteins.

1. INTRODUCTION

thousands of polypeptide spots of a 2-DE pattern. The problem then arising is a characterization or, if possible, the identification of the proteins of interest. Identification does not only mean to find the "name" of a protein in a protein sequence database; often, it is already important enough to identify one protein spot with another spot closely located in the same pattern (possibly a modification of this protein) or in a corresponding site of another pattern. Identification of proteins from 2-DE patterns can be performed by immunostaining (Anderson et aL, 1982), N-terminal sequencing (Vandekerckhove et al., 1985; Aebersold et al., 1986; Matsudaira, 1987; Walsh et al., 1988; Eckerskorn et al., 1988), or internal sequencing (Aebersold et aL, 1987; Kennedy et aL, 1988; Bauw et aL, 1989; Eckerskorn and Lottspeich, 1989; Hirano and Watanabe, t990). Compared to amino acid sequencing, determination of the amino acid composition is easier to perform and more sensitive with regard to the protein amount needed. Therefore, if amino acid analysis could be used for identifying proteins, this method would be particularly indicated when proteins present

High resolution two-dimensional electrophoresis (2DE) has reached a stage of development that may allow the separation of a great part, if not all proteins, of a certain cell type or tissue. This high potential in resolution can be utilized to detect protein alterations which are characteristic or specific for a distinct stage of cell differentiation, for genetic changes, for the response to certain environmental agents, or for a particular disease. Using a subtractive analysis as described by Aebersold and Leavitt (1990), proteins showing an alteration in their amount or electrophoreric position can be detected among the hundreds or Institut fiir Toxikologie und Embryonalpharmakologie, Institut f'tir Humangenetik, Freie Universit/it Berlin, 1000 Berlin 33, Garystr. 5, Germany. 2Max-Planck Institut fiir Molekulare Genetik, I000 Berlin 33, lhnestr. 63, Germany. 3 Present address: Deutsches Herzzentrum Berlin, 1000 Berlin 65, Augustenburger Platz 1-3, Germany. 4 To whom all correspondence should be addressed at: Institut ffir Toxikologie und Embryonalpharmakologie, Freie Universit~it Berlin, 1000 Berlin 33, Garystr. 5, Germany.

603 0277-8033/92/1200-0603506.50/0© 1993PlenumPublishingCorporation

604 in large numbers but low amounts--a situation given in 2-DE patterns--have to be analyzed. Different methods of amino acid derivatization in the pmol range have been described: orthophthaladehyde (OPA) (Turnell and Cooper, 1982), phenylisothiocyanate (PITC) (Watanabe and Imai, 1981), Dabsyl (Chang et al., 1983), and 9-fluoroenylmethyl chloroformate (F-Moc) (Einarsson et al., 1983). Identification of proteins on the basis of their amino acid composition was reported by Eckerskorn et al. (1988a), who used a preprocessed database, and recently by Sibbald et al. (1991), whose strategy included an algorithm for searching fragments of sequences. We have developed a search program named ASA that utilizes molecular mass windows, correction and weighting factors, and that might be further refined by the inclusion of other restriction factors. The program was tested by 13 well-known proteins which had been separated by 2-DE. Amino acid analysis and the program ASA were used for the identification of 43 proteins of a mouse brain 2-DE pattern. 2. MATERIALS AND METHODS

2.1. Protein Sample Preparation The protein sample preparation was the same as described earlier (Eckerskorn et al., 1988a) with slight modifications. Four adult female DBA/6J mice (Zentralinstitut fiJr Versuchstierzucht, Hannover, Germany) were used to prepare solubilized cell proteins of whole brains. The excised brains were rinsed several times with 0.9% NaC1 to remove blood. After homogenization within a glass homogenizer, the homogenate was centrifuged for 40 min at 225,000g. The protein concentration of the clear supernatant was 22.3 mg/ml, as determined by the method of Peterson (1977). Urea (Bio-Rad, Richmond, California), dithiothreitol (DTT, BioRad), and carrier ampholytes pH 2-4 (Serva, Heidelberg, Germany) were added to the clear supernatant to obtain final concentrations of 9 M, 70 mM, and 2%, respectively. After carefully stirring this solution for 40 min at room temperature, the resulting protein sample was stored at -70°C. From this sample, 20 pl (223/tg protein) were applied to 2-DE. 2.2. Two-Dimensional Electrophoresis (2-DE) and Blotting 2-DE was performed by the combination of isoelectric focusing (first dimension) and sodium dodecyl

Jungblut et

aL

sulfate polyacrylamide gel electrophoresis (SDSPAGE, second dimension) as developed by Klose (1975, 1983) and modified by Eckerskorn et al. (1988a). The isoelectric focusing gels contained 3.5% acrylamide, 0.3% bisacrylamide, and a total of 4% (w/v) carrier ampholytes composed as follows: Ampholines pH 3.5-10 (Pharmacia-LKB, Freiburg, Germany), Servalytes pH 2-11 (Serva), Pharmalytes pH 4-6.5 (Pharmacia/LKB), and Pharmalytes pH 58, at a ratio of 1 : 1 : 3 : 2, respectively. The protein sample was applied onto the anodic side of the gel and focused at 8870 Vh without cooling. After stopping isoelectric focusing, the gels were equilibrated for 15 rain in a buffer containing 125 mM Tris/phosphate pH 6.9, 40% glycerol, 65 mM DTT, 3% SDS, and then stored frozen (-70°C). The isoelectric focusing gels were applied onto the SDS-PAGE gels containing 15% acrylamide (w/v) and 0.2% bisacrylamide. The SDS-PAGE system of Laemmli (1979) was used omitting the stacking gel. After 2-DE an area of the SDS slab gels in the range of molecular masses 24-43 kD (11.5 cm) and apparent pI values of 5-8 (14 cm) was excised and immediately blotted onto GlassYbond membranes (Biometra, G6ttingen, Germany) or Immobilon membranes (Millipore, Eschborn, Germany) under semidry blotting conditions (Jungblut et al., 1990). The blotting buffer contained 50 mM sodium borate, pH 9.0, and 20% methanol. The blotting time was 3 hr and the current per area was 1 mA/cm 2 at room temperature. The proteins were stained with Serva Blue R (Coomassie blue, Serva) for 5 min, destained three times for 5 rain by a solution containing 40% methanol/10% acetic acid, and aired until dry.

2.3. Amino Acid Analysis Within the selected gel section, 240 protein spots could be detected on the blotting membrane by Coomassie blue staining. The 60 spots with the highest staining intensity were excised with a scalpel. For amino acid analysis, single spots were used if the staining intensity was high and the spot area above 10 mm 2. If the spots showed a relatively low staining intensity and an area of about 3 mm 2, up to six spots were used. To reduce any contamination of the protein spots by amino acids, the spots were washed on a D2 frit with 100 ~1 80% acetonitrile in water (HPLC grade) and dried by water pump vacuum. The membrane pieces were placed into tempered hydrolysis glass tubes (50 x 6 mm) and 50/A 5.7 M HCI (Sigma, for amino acid analysis) were added. Liquid phase

Two-Dimensional Eleetrophoresis

605

hydrolysis was performed for 24 h at 110°C in flamesealed evacuated glass tubes, which had been purged with nitrogen three times. After hydrolysis, the content of the tubes was dried in a desiccator for 2 hr. 30 pl of a buffer containing 66.7 mM sodium citrate, pH 2.2, 2% thiodiglycol and 3% mercaptoethanol, were used for dissolution of the remainder and elution of the amino acids from the membrane pieces. After vigorously stirring and 10min incubation, about 20 pl of the citrate buffer containing the amino acids were filled into WISP applicator tubes (WISP sampler, Waters, Eschborn, Germany). Amino acid analysis was performed with precolumn OPA derivatization and reversed-phase HPLC (Waters) to separate the amino acid derivatives (Ashman and Bosserhof, 1985). Each time after 5 HPLC chromatographies a test run with a standard amino acid mixture was performed to test the accuracy of the quantitative determination and to correct the dataset, if necessary. The number of molecules of amino acid i in the complete protein molecule (z~) was calculated by the following equation:

zi-

qi × Mr

(1)

qT x i~Z qi× Mri-18 i=A

qT

in which q~ is the picomole amount of amino acid i obtained by amino acid analysis, qr the sum of the picomole amounts of all amino acids obtained by the amino acid analysis, Mr the molecular mass of the protein as estimated by 2-DE, Mrs-~8 the molecular mass of amino acid i minus the molecular mass of water, and A to Z represent the individual amino acids according to the one letter code in alphabetic order, with A for alanine and Z for the sum of glutamine and glutamic acid. A computer program was developed for this calculation. For each protein, a mean value of each amino acid was calculated from 2-16 analysis runs and corrected if necessary (see below). The mean values obtained for the amino acid compositions were the input data when the ASA program was applied to the NBRF sequence database.

2.4. ASA Program Not all of the 20 proteinogenic amino acids can be quantified after hydrolysis by common amino acid analysis procedures. Fifteen amino acids can be determined by combining liquid-phase hydrolysis and OPA-derivatization (Gln is calculated together with

Glu, Asn with Asp; Pro, Trp, and Cys are not determined). For this reason mole fractions of the 15 amino acids were used for comparison of 2-DE separated proteins with proteins in the sequence database. A measure for the degree of relationship between two proteins S was defined by the following equation: i=Z

S=

100

(2)

in which mug is the mole fraction of amino acid i in the protein of the sequence database, and mr the mole fraction of amino acid i found for the protein of the 2-DE pattern; A and Z indicate the amino acids in alphabetic order (one letter code); and wz is the weighting factor of amino acid i. The weighting factor was introduced to compensate for inaccuracies in amino acid analysis (see below). The ASA program was developed on the basis of this equation and was used to compare the amino acid composition of an unknown protein with that of all the proteins in the NBRF sequence database (after reduction of the dataset, see below). The experimental data used for identification by the ASA program included the raw data of the amino acid composition and the molecular masses of the proteins as estimated by 2-DE. Windows were set for the molecular masses of the proteins (accepting 10% variations) to reduce the number of proteins in the database to be compared with the protein of interest. Other possibilities to reduce the number of potential candidates of proteins identical or homologous to the unknown ones are the exclusion of proteins of certain organisms or the restriction to the proteins of the organism from which the unknown protein was derived and to some related species.

2.4.1. Weighting Factor Amino acid analysis in the pmole range from proteins immobilized on membranes has several limitations in accuracy, due to: (i) contaminations in the membranes and chemicals used; (ii) to pipetting errors; and (iii) to variations in the hydrolysis and derivatization conditions. For a given combination of the five successive steps to be done--namely 2-DE, blotting, hydrolysis, derivatization, and HPLC--each amino acid can only be calculated with a certain accuracy. Therefore, in the ASA program a weighting factor "w" (ranging between 0 and 1) was given to each amino acid and is included in Eq. (2). For example,

606

Jungblut et

aL

I E F ~

Mr KD

66

43

-29 -25

-14

r

I

I

4.5

5.9

8.3

Plapp

A Fig. 1. Two-dimensionalelectrophoretic (2-DE) protein pattern of mouse brain proteins. The pattern was calibrated accordingto molecular masses (M~) and apparent pI (PIapp)by marker proteins. The sample was applied to the anodic side of the isoelectric focusing gel. (A) The complete 2-DE pattern. (B) A blot (Immobilonmembrane) of the section investigated. (C) A drawing of the pattern shown in B. The numbers indicate the spots which were investigated. the lysine values resulting from the OPA derivatization are not very reliable, because incomplete derivatization frequently occurs. Therefore, Weys has to be lower than 1. A procedure for estimating w-values is shown below.

2.4.2. Correction Factor Contaminations, oxidations of amino acids, or incomplete hydrolysis occasionally result in a repro-

ducible over- or underestimation of the amino acid amount. These reproducible under- or overestimations were attempted to be corrected by another factor, the correction factor. For example, glycine is always overestimated in a system in which Tris/glycine is used as electrode buffer. For our analysis system, the overestimation of glycine amounted to 14.4% (relative to the measured value; mean value of variations for five test proteins, as shown below).

Two-Dimensional Electrophoresis

607

KD

40-

35-

30-

25-

KD

40-

35-

30-

25-

Fig. I, Continued.

608 Therefore, 14.4% of the pmol values of glycine has to be subtracted by the ASA program. 2.4.3. Determination o f Weighting and Correction Factor To obtain a reliable estimation of weighting and correction factors for the analyzed amino acids, we compared t h e amino acid composition data determined for five known proteins (internal standards) of the mouse brain 2-DE pattern (Eckerskorn et al., 1988a) with their true amino acid composition, given by the sequence data. Mean values of variations, standard deviations, and coefficients of variability (co) were determined for each amino acid. The mean value of variation was used as correction factor. The following six amino acids showed for all five proteins overestimations ( - ) or underestimations ( + ) ; the correction factors were: Glx -15.7%; Val +13.2%; Ile +20.6%; Gly -14.4%; His +25.6%; Tyr +25.0%. Per definition, amino acids with a co-value below 8.2% obtained a weighting factor of 1.0; co-values from 8.2-10%, 0.8; from 10-15%, 0.6; from 15 20%, 0.5; and above 20%, 0.4. Amino acids with weighting factors below 1 were: Gly0.6; Asx0.8; His0.8; Ile0.8; Lys0.6; Arg0.6; Ser0.6; T h r 0 . 5 ; and Tyr 0.4. Depending on the gel system, buffers and amino acid analyzer employed, correction factors and weighting factors have to be determined experimentally by using standard proteins and electrophoresis, blotting, and analysis methods as used for the unknown proteins.

3. R E S U L T S

3.1. Two-Dimensional Electrophoresis and Blotting A 2-DE pattern of mouse brain proteins stained with Serva Blue R is shown in Fig. 1A. The section of the pattern analyzed in this investigation is indicated in Fig. 1A. Figure 1B shows an enlargement of this section after blotting onto Immobilon membrane. Out of the 60 spots selected for analysis, 43 (spots are numbered in Fig. 1C) gave reliable amino acid data (i.e., data ranging at least 10-fold above the background level). The amount of amino acids of the other 17 proteins was not high enough to obtain reliable results. The protein amount of spot 51 (six spots used for analysis) and spot 202 (one spot used for analysis) was 10 pmol and 166 pmol, respectively, as calculated by amino acid analysis.

Jungblut et aL 3.2. Analysis of Blanks To investigate the amino acid contaminations of some of the materials used in amino acid analysis, two types of blanks were tested: (i) 50/11 pure 5.7 M HC1, and (ii) different blank areas (about 10 mm 2) of a membrane which was used for blotting and which was washed in 100/tl 80% acetonitrile in water. In the latter investigation, HCI-impurities were also contained, because the blank areas were subjected to the complete amino acid analysis procedure, including hydrolysis by HCI. The results are shown in Table I. The contamination by glycine (about 20 pmol) was higher than that by any other amino acid. For calculating the absolute value of contamination in 50 pl 5.7 M HCI or in a blank membrane piece, the values in Table I have to be multiplied by the factor 3, because only a third of the hydrolysate was applied to the analyzer. The contaminations of glutamic acid, serine, and lysine were in the range of 10 pmol, and sero for histidine and threonine; all other amino acids gave errors between 1.0 and 6.6 pmol. The high standard deviations and coefficients of variability (Table I) show that calculation of a correction factor on the

Table I. Amino Acid Contaminations in 5.7 M HCI Used for Hydrolysis of Proteins and in Glassybond Blanks After Blottinga Amino acid Asx Glx Ser His Gly Thr Arg Ala Tyr Val Phe lie Leu Lys my

5.7 M HC1 m~ s co (pmol) (pmol) (%) 2.18 3.60 10.05 0 19.83 0 0 3.49 0.99 1.40 9.00 1.00 2.90 8.73 4.51

4.8 3.9 11.8 0 17.8 0 0 4.0 1.4 2.0 12.1 1.5 3.9 7.0

220.2 108.3 117.4 0 89.8 0 0 114.6 141.4 142.9 134.4 150.0 134.5 80.2

Glassybond blanks my s c~ (pmol) (pmol) (%) 5.73 8.01 8.84 0 23.1 0 5.45 4.90 1.04 4.8 6.59 1.63 4.04 9.84 6.00

6.4 6.8 9.3 0 15.7 0 7.7 3.5 1.9 5.1 11.2 2.3 4.5 8.6

111.7 84.9 105.2 0 68.0 0 141.3 71.4 182.7 106.3 170.0 141.1 111.4 87.4

50 pl of 5.7 M Hcl (Sigma, for amino acid analysis) and Glassybond pieces similar in size as a protein spot (about I0 mm2) were subjected to amino acid analysis as described in Materials and Methods. Glassybond pieces were excised from a region of a 2-DE pattern, which contained no Coomassie blue detectable protein spots. The amino acids were injected with citrate buffer. The values given correspond to 1/3 of the total amount, my= mean value, s=standard deviation, co=coefficientof variance; number of experiments= 8.

Two-Dimensional Electrophoresis

609

basis of blank data is not possible. The contaminations increased, as expected, from pure HCI to membrane blanks, but only slightly. Therefore, the contamination o f HCI, rather than the contaminations due to electrophoresis, blotting, and membrane, is the limiting factor for the sensitivity of amino acid analysis after blotting. From our analysis of blank contaminations, the practical conclusion follows that the OPA derivatization is sufficiently sensitive and accurate, if the content of individual amino acids in the hydrolysate is in the 100 pmol range.

3.3. Analysis of Mouse Brain Proteins The efficiency of the ASA program was checked by using eight test proteins (commercially purified proteins or N-terminally sequenced and identified proteins) and five well-known (identified by N-terminally sequencing, see below) proteins of the mouse brain. All these proteins were purified by 2-DE, blotted onto Glassybond membranes, and subjected to amino acid analysis. The identity of these proteins was examined by screening the N B R F protein sequence database. The results for the eight test proteins are shown in Table II. In each case, the correct protein was found in position one, except for creatine kinase, which was placed at position two, whereas in position one a nonmammalian protein was indicated. The analysis of the eight test proteins was performed without correction factors and weighting factors. All S-values were below 0°05. The test proteins placed on the first position often did not refer to the corresponding organism, but to homologous protein sequences

included in the database. Obviously, the experimental variations of the analysis were in a similar range as the variations between functionally identical proteins of different organisms. This becomes an advantage of this identification method, if a protein spot of an organism is analyzed whose corresponding protein is not available in the database. On the other hand, omission of the proteins of very unrelated organisms (different classes or phyla) may be an advantage. Using the ASA program and the correction and weighting factors determined (see Materials and Methods), the attempt was made to identify 43 mouse brain proteins by screening the N B R F protein sequence database with regard to the amino acid composition found experimentally for these proteins. From each run, the 30 best matches were registered. Since mouse proteins were analyzed, matches with nonmammalian proteins were rejected, except in the cases where the S-values were below 0.03 or where other evidence for this protein was available. Furthermore, proteins of the database representing preferms and having, moreover, molecular masses at the lower end of the molecular mass window were omitted. In most cases, the protein with the lowest S-value belonged to a mammalian species homologous to that of the investigated protein. Table III shows the results obtained for the 43 proteins tested; Table IlIA presents the S-values of five known proteins (Eckerskorn et al., 1988a) and demonstrates that the S-values decreased when the correction and weighting factors were introduced. The remaining 38 proteins were weighted and corrected with the same factors. The probability for a correct

Table II. Test of the Amino Acid Composition Identification Program ASAa Protein a-Hemoglobin, mouse Carbonic anhydrase, bovine Albumin, mouse Myoglobin, horse Triosephosphate-isomerase,mouse Creatine kinase, mouse Albumin, sheep Cyclophilin, mouse

Position no. of correct protein found by ASA

S

Proteins in positions followingno. 1

No. of analyses

1 (Flamingo) 1 (Bovine) 1 (Rat) 1 (Mammal) 6 (Horse) 1 (Human) 2 (Rabbit) 1 (Human) 1 (Rat)

0.02748 0.03758 0.03924 0.04882 0.06055 0.03476 0.04669 0.04583 0.04592

Severala-hemoglobins Carbonicanhydrases of other sources Other proteins 17 Myoglobins

1 4 1 5

4 Triosepbosphateisomerases Other proteins 2 Albumins Other cyclophilins

1 I l 2

a Test proteins were separated by two-dimensionalelectrophoresis(2-DE), blotted onto Glassybondmembrane, and excised.The amino acid composition was determinedafter liquid-phase hydrolysisand OPA precolumnderivatization.The amino acid compositionof each protein was determined with the ASA program. A molecular mass window with a deviation of 10% of the molecular mass determined by 2-DE was set. Sequencesof fragments, virus, and bacteriophageswere excludedfrom the analysis. The result was given as a list of proteins with decreasing probability of identity. The position of the authentic protein is shown in the table. S indicates the degree of identity, defined by Eq. (2).

610

Jungblut et al. Table IlL Identified Proteins of Solubilized Mouse Brain Proteins Separated by 2-DE"

Spot no.

Amino acid no. 2-DE Database

S correction +

-

0.01589 0.02477 0.03298 0.02313 0.00990

0.02804 0.04914 0.03738 0.03859 0.03151

Identity

A. Known protems 19 98 169 200 202

373 315 249 223 223

381 335 259 248 248

Creatine kinase Glycerolaldehydephosphate dehydrogenase Carbonic anhydrase Triose-phosphate isomerase Triose-phosphate isomerase

B. Proteins identified with high probabi~ty 11 12 15 16 18 47 51 52 53 64 67 130 167 168 179 182 191 210 213

383 383 366 369 374 348 349 349 349 340 336 285 249 249 243 232 235 214 217

433 433 375 375 381 338 388 390 390 363 362 277 259 259 254 223 219 234 209

0.03447 0.02145 0.02772 0.02846 0.02346 0.03377 0.03127 0.03127 0.02886 0.02915 0.03598 0.03563 0.03844 0.03478 0.01805 0.01976 0.03301 0.03467 0.03216

a -Enolase a-Enolase Aktin Aktin Creatine kinase Fructose bisphosphatase Pyruvate dehydrogenase, a Pyruvate dehydrogenase, a Pyruvate dehydrogenase, ct Fructose bisphosphate aldolasc MHC class 1 molecule FLAA23 Myo-inositol- 1-monophosphatase Carbonic anhydrase Carbonic anhydrase Phosphoglycerate mutase B Ubiquitin-carboxyl-terminal hydrolase, isozyme I 1 GTP-binding protein Rab 3b Thymidine kinase Glutathione transferase

0.04839 0.03705 0.03779 0.04586 0.02792 0.04773 0.03251 0.04274 0.04381 0.04274 0.04819 0.04003 0.04087

Glycosyl asparaginase Wnt-5a-protein Wnt-5a-protein Malate dehydrogenase Histidine decarboxylase (Lactobacillus) Phosphoprotein phosphatase 2A, a Malate dehydrogenase (yeast) Phosphoglycerate mutase Serine esterase, precursor Cytotoxic T-lymphocyte proteinase 3, precursor Triose-phosphate isomerase (yeast) Transforming protein N-Ras Superoxide dismutase

0.02852 0.06275 0.05717 0.05291 0.06076 0.05393

Saccharopine dehydrogenase (yeast) Glycerolaldehyde phosphate-dehydrogenase Cytotoxic T-lymphocyte proteinase 2 Transforming protein R-Ras Glutathione peroxidase-related protein C-reactive protein, precursor

C. Proteins identified with middleprobab#ity 46 68 69 100 108 116 121 178 190 197 201 218 224

344 337 337 312 301 292 300 243 236 227 223 207 208

346 379 379 314 307 309 334 254 247 248 250 193 222

D. Protemsidentified with ~wpl~babi~ty 63 97 184 209 226 230

342 315 232 217 201 205

369 335 248 218 190 255

Identification was performed by OPA amino acid analysis and search in the NBRF database (released July 1991) using the ASA-program. 2-6 spots were used for one experiment and 2-16 analyses were performed for each protein. Mean values of the amino acid composition were used for the ASA-search. Correction and weighting factors were estimated by mean value determinaton of five proteins with known identity. The resulting correction factors were: Glx -15.7%; Val +13.2%; Ile +20.6%; Gly -14.4%; His +25.6%; Tyr +25.0%. The weighting factors were estimated by the deviation of the experimental data from the theoretical data: Gly 0.6; Asx 0.8; His 0.8; lie 0.8; Lys 0.6; Arg 0.6; Ser 0.6; Thr 0.5; Tyr 0.4. (A) Known proteins with and without correction and weighting, (B) the proteins with high probability of correct assignment (criteria: S < 0.04, mammalia), (C) the proteins with middle probability (criteria: 0.05 > S > 0.04, also nonmammalian proteins with low S values), (D) the proteins with low probability (criteria: mammalia S > 0.05, nonmammalian proteins with dubious function in mammalia). S is a measure of identity, defined by Eq. (2).

Two-Dimensional Electrophoresis identification was considered to be high, if the Svalues were below 0.04 and the protein belonged to the group of mammalian proteins. Nineteen identifications fulfilled these criteria as shown in Table IIIB. The criteria for a middle probability of identity were: S-values between 0.04 and 0.05 and attributable to the mammalian proteins. Also, nonmammalian assignments were accepted if the S-values were below 0.05, and if other evidences supported this assignment. Thirteen proteins compatible with these criteria were found (Table IIIC). Finally, six proteins formed a class of low probability of identity (Table IIID). The criteria for this class of assignments were: S-values between 0.05 and 0.065 and attributable to mammalian proteins. Additionally, a nonmammalian protein was assigned to this class because of a very low S-value. However, a homologous function of this protein in a mammalian organism is difficult to imagine. 4. DISCUSSION Amino acid analysis and the use of the ASA program yielded information about the identity of 43 proteins from a 2-DE pattern of mouse brain proteins. Out of these 43 proteins, the identity of five known proteins was confirmed and 19 proteins were identified with a high degree of authenticity. Eight test proteins used as controls were identified correctly with one exception. Here, no weighting or correction factors were used. The quality of identification was estimated by the S-value, the molecular mass, and the phylogenetic relationship to the mouse. The molecular mass may be determined more exactly by mass spectrometry. Isoelectric points (pI) were not used as a criterion because proteins separated in urea (see Materials and Methods) usually show pI values different from those determined under native conditions or on a theoretical basis (Ui, 1971). Weighting and correction factors could not be determined from the contamination data (Table I), because of their high variance after blotting and analysis. More pragmatically, these factors were determined by analyzing known proteins plotted on membranes. The factors obtained led to lower S-values for all of the analyzed proteins (Table III). An alternative procedure that was not tested here would be to derive an individual correction and weighting factor for each amino acid of each protein according to the pmol-value determined for this amino acid. The identification of proteins with the ASA program has some limitations: (i) proteins not included

611 in the sequence database necessarily yield wrong identifications, which may, however, be recognized by high S-values; (ii) because only precursor sequences are listed in the sequence database, mature proteins derived from proteins with signal sequences cannot directly be found in the existing database. However, a solution to this problem has recently been suggested by Sibbald et al. (1991); (iii) using the existing sequence databases, a reduction of proteins to those of a certain class or phylum of organisms (e.g., mammalia or chordata) is not possible, except manually; (iv) because of raising contaminations, the number of protein spots analyzed in one experiment is limited and should not exceed eight membrane pieces. Using precolumn OPA-derivatization, a high sensitivity for amino acids may be reached. Quantities as low as 500fmol (Tous et aL, 1989) could be detected. However, as was shown in Table I, not the sensitivity, but an increase in amino acid contaminatons in the low picomole range was the limiting factor in the amino acid analysis of blotted proteins. While amino acid analysis with the ion exchange ninhydrin method requires 200pmol of a 25 kD protein to obtain an amino acid composition with a maximal deviation of 10% (Ozols, 1990), the OPA precolumn derivatization in combination with our amino acid analysis conditions required about 10 pmol (0.2 pg) of a 25 kD protein to obtain less than 10% deviation (estimated from the contamination values of Table I). Reduction of contaminations was achieved by washing the membranes (Jungblut et al., 1989) before hydrolysis. The effect is clearly seen by the contaminations of the blank membranes including the contaminations of the HCI, which were only slightly higher than the contaminations contributed by the blank HC1 alone. Proteins on gels may be identified by coelectrophoresis with a purified protein, by immunostaining, or by peptide mapping. All these methods are useful if a protein already known has to be found among the large number of unknown cell proteins. However, often another situation exists: a protein spot in a 2DE gel is, for some reason, of particular interest, but no hypothesis about its identity is possible. In this case, microsequencing and amino acid analysis are the methods of choice provided that this protein had already been sequenced and entered into a sequence database. Amino acid analysis in combination with the ASA program may strongly suggest the identity of a protein, but cannot, in contrast to the sequence analysis, indicate the "name" of the protein with

612 100% certainty. On the other hand, amino acid analysis can more easily be performed and needs less protein; therefore, this method can also be used in a large-scale analysis of protein spots. An example may show that the amino acid approach can even help to clarify a sequencing result. An unknown spot of a 2DE pattern of human heart proteins was subjected to amino acid sequencing. Two internal sequences (FDDYM and LGVEFDETTA) were obtained (Jungblut et al., 1992). With the Fast A search program, both sequences resulted in two matches with 100% alignment: human cardiac fatty acid binding protein (FABP) (Fzhuc) and bovine mammaryderived growth inhibitor (A29466). These sequence data do not allow an unequivocal decision. Therefore, amino acid analysis and ASA-search were performed. The results showed S-values for human cardiac FABP of 0.02848 and for bovine mammary-derived growth inhibitor of 0.07119. This is a clear indication that the protein in question was FABP. When 2-DE patterns are to be evaluated, another situation often occurs: one has to prove if two spots from different patterns represent the same protein, regardless of whether this protein is known or not. Usually, the position, size, contours, intensity, and formation of characteristic groups of spots indicate that two spots from different patterns are identical. A comparison of the amino acid composition of the two spots can be performed to confirm the identity. A similar situation may occur for spots of the same pattern. The demonstration that several spots of a pattern represent the same protein is advantageous if this protein should be sequenced for some reason. All these spots can be combined to one sample, which is an advantage particularly if the spot of interest contains an amount of protein too low to be sufficient. Our results and suggestions show that amino acid analysis in combination with the ASA-program offers a valuable tool in addition to others for the identification of proteins.

ACKNOWLEDGMENTS The authors are indebted to Mr. Alfred Beck for his valuable help in accomodating the program on the VAX machine, Mrs. Ursula Scherm for performing the protein hydrolysis, Mr. Kamp for the OPA analysis, Dr. Jean Nowak for performing the ASA searches, and Mrs. Ursula Kobalz for her excellent technical assistance.

Jungblut et aL ADDENDUM The program ASA running on a Vax/VMS computer is written in Fortran 77, based on the procedure library of GCG-6.2, and is available on request. REFERENCES Aebersold, R., Teplow, D., Hood, L. E., and Kent, S. B. H. (1986). J. Biol. Chem. 261, 42294238.

Aebersold, R. H., Leavitt, J., Saavedra, R. A., Hood, L. E., and Kent, S. B. H. (1987). Proe. Natl. Aead. Sei. USA 84, 69706974.

Aebersold, R., and Leavitt, J. (1990). Eleetrophoresis 11,517-527. Anderson, N. L., Nance, S. L., Pearson, T. W., and Anderson, N. G. (1982). Eleetrophoresis 3, 135-142.

Ashman, K., and Bosserhoff, A. (1985). In Modern Methods in Protein Chemistry--Review Articles (Tschesche, H., ed.),

Walter de Gruyter, Berlin, Vol. 2, pp. 155-171. Bauw, G., Van Damme, J., Puype, M., Vandekerckhove, J., Gesser, B., Ratz, G. P., Lauridsen, J. B., and Cells, J. E. (1989). Proe. Natl. Acad. Sci. USA 86, 7701-7705. Chang, J. Y., Kneeht, R., and Braun, D. G. (1983). Methods Enzymol. 91, 41-48. Eckerskorn, C., Jungblut, P., Mewes, W., Klose, J., and Lottspeich, F. (1988a). Eleetrophoresis 9, 830-838. Eckerskorn, C., Mewes, W., Goretzki, H., and Lottspeich, F. (1988b). Eur. J. Bioehem. 176, 509-519. Eckerskorn, C., and Lottspeich, F. (1989). Chromatographia 28, 92-94. Einarsson, S., Josefsson, B., and Lagerkvist, S. (1983). J. Chromatogr. 282, 609-618. Hirano, H., and Watanabe, T. (1990). Electrophoresis 1, 573-580. Jungblut, P., Choli, T., and Wittmann-Liebold, B. (1989). Biol. Chem. Hoppe-Seyler 370, 775. Jungblut, P., Eckerskorn, C., Lottspeich, F., and Klose, J. (1990). Electrophoresis 11, 581-588. Jungblut, P. et al., in preparation (1992). Kennedy, T. E., Gawinowicz, M. A., Barzilai, A., Kandel, E. R., and Sweat, J. D. (1988). Proc. Natl. Acad. Sci. USA 85, 70087012. Klose, J. (1975). Humangenetik 26, 231-243. Klose, J. (1983). In Modern Methods in Protein Chemistry--Review Articles (Tschesche, H., ed.), Walter de Gruyter, Berlin, pp. 49-78. Laemrnli, U. K. (1979). Nature 227, 680-685. Matsudaira, P. (1987). J. Biol. Chem. 262, 10,035-10,038. Ozols, J. (1990). Methods in Enzymol. 182, 587-601. Peterson, G. L. (1977). Anal. Biochem. 83, 346-356. Sibbald, P. R., Sommerfeldt, H., and Argos, P. (1991). Anal. Biochem. 198, 330-333. Tous, G. I., Fausnaugh, J. L., Akinyosoye, O., Lackland, H., Winter-Cash, P., Vitorica, F. J., and Stein, S. (1989). Anal Biochem. 179, 50-55. Turnell, D. C., and Cooper, J. D. H. (1982). Clin. Chem. 28, 527531. Ui, N. (1971). Bioehhn. Biophys. Acta 229, 567-581.

Vandekerckhove, J., Bauw, G., Puype, M., Van Damme, J., and Van Montagu, M. (1985), Eur. J. Bioehem. 152, 9-19. Walsh, M., McDougall, J., and Wittmann-Liebold, B. (1988). Biochemistry 27, 6867-6876.

Watanabe, Y-., and Irnai, K. (1981). Anal. Biochem. 136, 471-474.

Identification of tissue proteins by amino acid analysis after purification by two-dimensional electrophoresis.

Mouse brain proteins were separated by two-dimensional electrophoresis (2-DE). The proteins of a section of the 2-DE pattern were blotted onto hydroph...
2MB Sizes 0 Downloads 0 Views