Use of nonlinear regression to analyze enzyme kinetic data: application to situations of substrate contamination and background subtraction.

ANALYTICALBIOCHEMISTRY

184,274-278

(1990)

Use of Nonlinear Regression to Analyze Enzyme Kinetic Data: Application to Situations of Substrate Contamination and Background Subtraction Robin

J. Leatherbarrowl

Department of Chemistry, Imperial Collegeof Science,Technology&Medicine, South Kensington, London SW7 2A Y, United Kingdom

Received

June

27,1989

In a recent publication, A. Lundin, P. Amer, and J. Hellmer [And. Biochem. 177, 125-131 (1989)J describe a method whereby kinetic substrate assays can be performed when the assay mixture includes significant contaminating levels of substrate. Their method requires various rearrangements of the data, and involves three separate linear regression calculations. We show how the same data may be analyzed directly, and far more simply, by nonlinear regression. Unlike the linear regression method, nonlinear regression allows direct calculation of the actual values for K,,,, and the concentration of contaminating substrate as well as estimates of their standard errors); the forF=’ mer method gives only apparent values. The nonlinear regression technique is also statistically a more valid means of analysis, as the rearrangements required to give linearized equations will considerably distort the error distribution and render simple unweighted linear regression inappropriate. The ease of incorporating extra parameters into standard equations when nonlinear regression is used is further illustrated by fitting enzyme reaction data which describe a first-order process when a significant nonspecific background is present. For this equation no simple rearranged linear plot is possible, but nonlinear regression is easily applied to determine the kinetic parameters. o 1990 Academic Press, Inc.

analysis, but the method traditionally employed is rearrangement of the data to give a linear plot, which is then fitted by linear regression to give rate or equilibrium constants. Although there has been extensive discussion about which is the ‘best’ linear plot, particularly for enzyme kinetic work (l-3), it must be stressed that all such rearranged plots perturb the error distribution of the data, as is apparent if error bars are included in the rearranged plots (Fig. 1). Simple (unweighted) linear regression is therefore not applicable for analysis of such data unless no errors are present. This fact was well understood by the originators of one of the plots most often criticized, the Lineweaver-Burk double-reciprocal plot to analyze enzyme kinetic data (4,5). These authors correctly applied l/y4 weighting to their transformed data when performing regression analysis; unfortunately, facilities to incorporate weighting are rarely available on the programs used to perform linear regression and so correct weighting is, in practice, usually not employed. An approach which is more generally applicable than the derivation and application of complex weighting functions for linear regression of rearranged data is nonlinear regression on the original data. Nonlinear regression minimizes the sums-of-squares differences between experimental and calculated data for any equation of the form Y

Analysis of kinetic or binding data is a common feature of biochemical work. Often the data involve a nonlinear dependence of the observed parameter upon that varied, for example, the rate of an enzyme reaction on substrate concentration or the extent of reaction with time (if the full reaction profile is followed). There is a wealth of literature concerning methods for such data 1 To whom

274

correspondence

should

be addressed.

=fkPl,Pz,

ea.)

VI

i.e., y is a unique function of x and one or more unknown parameters, pl, p2, etc. The best-fit values for these parameters are determined by the calculations. The method is completely general, in that any equation can be used. For example, enzyme kinetic data would involve the equation

u = ~mtxwwm

+ PI)

PI

where the parameters to be determined are K, and V,,, 0(X33-2697/90 $3.00 Copyright 0 1990 by Academic Press, Inc. All rights of reproduction in any form reserved.

NONLINEAR

REGRESSION

ANALYSIS

OF

ENZYME

a 6

A = A,(1

R

2

4

6

[Substrate]

1.8

I

I

I

,

1

I

I

I

Vmax * S/(Km + S)

1.2

a

0.8

t e

0.6

141

where the symbols Vmax, S, and Km have been defined to represent the parameters V,,,, S, and K,,, in Eq. [2]. Similarly, the right-hand side of Eq. [3] could be entered as

1.0

R

[31

1.6 1.4

)

- eeKt)

where the extent of reaction, A, is observed with time, t. The parameters describing the curve are the rate constant, k, and the maximal extent of reaction, A,. The application of nonlinear regression to determine kinetic parameters is well established (6-11). The only disadvantage of nonlinear regression is that the calculations require a computer. However, a standard IBM PC is adequate to perform these calculations, and several programs have been published to allow calculations using the more common equations (12). In addition, commercial programs are available which allow the user simply to type in a new equation, to which the data are then fitted (13-15). All that is required is to translate the equation into a form slightly more akin to a computer language. For example, the right-hand side of Eq. [2] could become

4

b

275

DATA

as the rate of reaction, u, is monitored versus the concentration of substrate, [S]. A first-order rate process is described by the equation

8

a t e

KINETIC

0.4

Ainf*(l

0.2 I

0.0

I

I

1

0

I

I

I

I

151

where Ainf, k, and t are defined to represent A,, k, and tin Eq. [3]. It is therefore not only preferable in terms of greater statistical validity to use nonlinear regression, it is far simpler practically to enter new equations in their original nonlinear form into one of these programs for analysis than it is to manipulate them to a linear form.

1

4

2

- exp(-k*t))

1/[Substrate]

c8

APPLICATIONS Substrate Contamination in Enzyme Kinetic Assays

R

This article was prompted by the publication by Lundin et al. (16) of a new linear plot to determine kinetic parameters for enzyme assays conducted in the presence

6

a t 8

4

0

2

4

6

Rate/[Substrate] FIG. 1. The ror structure

effect of transformations of enzyme kinetic

data.

to give linear plots on the er(a) Raw enzvme kinetic

data. It is assumed that all data points in this experiment are equally accurate, as indicated by the error bars. (b) Data and error bars from (a) rearranged to give the double-reciprocal Lineweaver-Burk plot. If linear regression is to be employed to analyze these data, it is necessary to apply appropriate weighting to compensate for the distorted error distribution (4,5). (c) Data and error bars from (a) arranged to give the Eadie-Hofstee plot. Note that error is present on both axes in this plot, making it inappropriate for standard regression analysis where all error is assumed to be in the dependent observation.

276

ROBIN

J. LEATHERBARROW

[6] was analyzed by the GraFit program (15) which uses the Marquart method (17) to perform nonlinear fitting. To enter this equation into this program, the appropriate ‘computerized’ form of the equation is

R

-a t e

Vmax* @add + Scon)/(Km

[Substrate]

KM FIG. 2.

The rate strate, Sd, in the tion, S, . The data tion. The line drawn Eq. [6], calculated of S,,, and K,,, .

of reaction versus the concentration of added subpresence of a contaminating substrate concentraare simulated to represent an experimental situathrough the points is the best fit to the data using by nonlinear regression. Bars represent the values

of small amounts of contaminating substrate. In these circumstances, the normal enzyme kinetic equation becomes

Typical data are shown in Fig. 2. The problem which was addressed was how to analyze data which record the rate, u, versus the concentration of added substrate, [S&i], to allow further rate measurements to be read from the curve to give estimates of [S&j]. The solution proposed by these workers illustrates the difficulties of extracting best-fit values from an inherently nonlinear equation if only linear regression is employed. Their recipe is as follows: (i) Perform linear regression on the initial portion of the data to back-extrapolate to give a value for the background rate in the absence of added substrate, ubl. (ii) Determine “apparent” V,,, and K, values from linear regression on a plot of u - &,l versus (u - &) [S,dd]. (iii) Plot a standard curve of (u - %)/Vmax(app) versus + [S,,]). The rate given by an unknown [saddedl/(~mbpp) sample can be read from this plot (or calculated after fitting these data by linear regression) to allow the concentration of substrate to be determined. The rearrangements made to the data will distort the error distribution in a manner similar to that shown in Fig. lc, making the application of unweighted linear regression at each step questionable. However, such procedures are unnecessarily complicated compared to direct analysis by nonlinear regression. In addition, nonlinear regression will give directly the V,,, , Km, and [S,,,] values, which the above recipe does not provide. Equation

+ Sadd + Scan).

Fitting to this equation directly provides the values of Vmax9Km, and [S,,,] (Vmax, Km, and Scan as defined above), together with estimates of their standard errors. Figure 2 shows data fitted to such an equation. Unknown rate values may be converted into substrate concentrations by direct calculation using the fitted parameters. Alternatively, and more simply, these values can be extracted by the option in this program which allows x data values to be determined from a given set of y data values using fitted equations. Background Correction in Progress Curves The equation for a process which follows first-order kinetics was given in [3]. For the enzyme tyrosyl tRNA synthetase the reaction between tyrosine and ATP proceedsto give enzyme-bound tyrosyl adenylate (18): Tyr+ATP+EPE.TyrAMP+PPi. In the presence of pyrophosphatase the equilibrium is driven over to the right to produce 1 mol of tyrosyl adenylate per mole of dimeric enzyme, an example of halfof-the-sites reactivity. For native enzyme the rate constant for formation of tyrosyl adenylate is 38 s-l, but for many mutants produced by site-directed mutagenesis this rate constant is reduced, often by many orders of magnitude (19). In these circumstances the enzyme can be assayed by observing the formation of enzyme-bound [14C]TyrAMP from ATP + [14C]Tyr; the enzyme-bound radioactivity can be separated from unreacted [i4C]Tyr by rapid filtration through nitrocellulose filters (18). The amount of [14C]TyrAMP produced is given in Eq. [3]. However, in common with many radioactive filter binding assays there is a small but significant background value, due to nonspecific binding of the [14C]Tyr to the filters. It can be difficult to construct controls which measure this background accurately, and so it is simpler to treat the background value as an experimental unknown, to be determined from the data analysis. Equation [ 31therefore becomes A obs

= Am

(1

-

e-?

+ Aback

[71

where &bs is the observed amount of radioactivity bound to the filters, A, is the amount bound at infinite reaction time minus the background value, Abeck. It must be stressed that before employment of an equation such as [7], it is necessary to be sure that the background is gen-

NONLINEAR

8000

I

,

I

,

REGRESSION

I

,

Limit -----------------

I

,

ANALYSIS

ENZYME

KINETIC

277

DATA

ever, a statistic generated from the nonlinear regression, the x2 value, allows assessment of the significance of the extra variable. x2 is defined as

I

n

”

OF

0.

6000

wm 4000

background 01

" 0

" 10

' 20

1 30

"

' 40

_ I 50

where gi is the standard deviation of the individual data points. The value of x2 obtained from fitting the same data to two equations which differ in the inclusion of a single parameter, for example, equation pairs [2], [6] or [3], [7], can be used to determine the significance of the extra parameter using an F test (20,21). The F statistic is calculated as

Time/min FIG. 3.

Radioactivity bound to filters as a function of the time of reaction for formation of enzyme-bound [‘%]tyrosyl adenylate from ATP and [“‘C]tyrosine by Thr + Gly40 tyrosyl tRNA synthetase (23) ([MgATP] = 2 mM, [tyrosine] = 20 pM, pH 7.8 (Tris-HCl, 50 mM), 10 mM MgC&, 25°C). The line drawn through the points is the best fit to the data using Eq. [7], calculated by nonlinear regression. The limiting and background values from the analysis are indicated on the graph.

uinely such, rather than the consequence of a more complex mechanism. Figure 3 shows a typical set of raw data from such an assay, where the recorded radioactivity bound to the filters is plotted as a function of the time of reaction. The data are analyzed in this form; for presentation purposes they can be rearranged to show moles of Tyr bound per mole of enzyme versus time. This equation is a particularly good example to show the benefits of the nonlinear regression analysis; it is difficult to rearrange the data to a linear form unless assumptions about the values of A, are made [for example, to allow plots of ln(A, - A) versus t]. The nonlinear fitting makes no such assumptions, and involves no rearrangements of the data or error structure. STATISTICAL ADDITIONAL

SIGNIFICANCE VARIABLE

OF THE

Even when used without regard to the statistical aspects of the fitting, nonlinear regression offers considerable practical advantages over analysis involving extensive rearrangements followed by linear regression. However, the statistics generated during the fitting procedures allow further insight into the significance of the data. In both of the examples described above there is the addition of a background parameter to a more standard equation. It is therefore pertinent to question the significance of adding this extra variable, as it is always possible to fit a data set better by adding extra parameters to an equation. When using the analysis suggested by Lundin et al. (16), or when extracting values from simple graphical analysis of the data, there is no indication of the significance of the values calculated. How-

F = {xb-I,

- xh,}/{x?n,l(N

- n - U}

where n is the number of variables in the equation and N is the number of data points used. This tests the probability of the additional parameter being zero; tables of probability at various F values may be found in standard statistics texts (20,22), or may be provided by the program (15). The data shown in Fig. 3 when analyzed in this manner reveal that the probability of the background value being zero is 0.6%, i.e., insignificant. DISCUSSION

Nonlinear fitting allows data to be analyzed without the need for further manipulation, and so provides results which are more likely to be statistically valid than those found after linear regression to rearranged data. The equations analyzed may be any which are of the form y = f(x), which covers essentially any equation describing an experimental process. The availability of programs which allow the equation to be simply entered at the keyboard without recourse to reprogramming means that is easy to extend standard equations to account for slightly differing experimental conditions. The examples described in this paper involve inclusion of a background substrate concentration in an enzyme assay, and a background radioactivity in an enzyme progress curve. However, it would be equally simple to include a background drift (incorporate a “mx + c” term into the equation), a delay term before data were collected, or other such factors accounting for particular experimental conditions. Entering such equations into one of these programs requires no knowledge of programming and only basic mathematical skills. The only disadvantage of nonlinear fitting is that a computer is essential. However, basic laboratory computers are capable of performing the calculations in seconds (the data in Fig. 3 are fitted in 4 s on a basic IBM AT computer). REFERENCES 1. Walter, C. J. (1974) J. Biol. Chem. 249,699-703. 2. Mannervik, B. (1975) Anal. Biochm. 63,12-16.

278

ROBIN

3. Atkins,

G. L., and Nimmo,

4. Lineweaver,

H., Burk,

I. A. (1975) D., and

Deming,

J. LEATHERBARROW

Biochem. J. 149,775-777. W. E. (1934) J. Amer.

Chem. Sot. 56.225-230. 5. Lineweaver, 666.

H., andBurk,

D. (1934)

J. Amer. Chem. Sot. 56,658-

Biochem. J. 80,324-332. Adu. Enzymol. Z&l-32. 8. Duggleby, R. G. (1981) Anal. Biochem. 110,9-18. 9. Duggleby, R. G. (1984) Comput. Biol. Med. 14,447. 10. Canela, E. I. (1984) Znt. J. Med. Comput. 15,121. 6. Wilkinson,

7. Cleland,

11. Green,

G. N. (1961)

W. W. (1967)

S., Field,

J. K., Green,

C. D., and Beynon,

R. J. (1982)

Phil. Trans. R. Sot. London A 3 17,305-320.

Nucl.

Acids Res. 10,1411-1421. 12. Atkins, G. L. (1985) Comp. Appl. Biosci. 1,79-82. 13. Leatherbarrow, Road, Cambridge,

R. J. UK.

(1987)

Enzfitter,

Elsevier-Biosoft,

14. Beynon, R. J. (1988) Curvefit, I. R. L. Press Ltd., Oxford. 15. Leatherbarrow, R. J. (1989) GraFit, Erithacus Software Ltd., Staines, UK. 16. Lundin, A., Arner, P., and Hellmir, J. (1989) Anal. Biochem. 177, 125-131. 17. Marquart, D. W. (1963) J. Sot. Znd. Appl. Math. 11,431-441. 18. Fersht, A. R., and Jakes, R. (1975) Biochemistry 14,3350-3356. 19. Fersht, A. R., Leatherbarrow, R. J., and Wells, T. N. C. (1986)

Hills

20. Bevington, P. R. (1969) Data Reduction and Error the Physical Sciences, McGraw-Hill, New York. 21. Ellis, K. J., andDuggleby, R. G. (1978) Biochem. J.

Analysis

for

171,513.

22. Dawes, 0. L., and Goldsmith, P. L. (1977) Statistical Methods Research and Production, Longman, London. 23. Leatherbarrow, R. J., and Fersht, A. R. (1987) Biochemistry 8524-8528.

in

26,

Kinetic modelling: an integrated approach to analyze enzyme activity assays.

w4CSeq: software and web application to analyze 4C-seq data.

Tolerance approach to possibilistic nonlinear regression with interval data.

Application of Bayesian logistic regression to mining biomedical data.

Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.

Error structure of enzyme kinetic experiments. Implications for weighting in regression analysis of experimental data.

A New Paradigm to Analyze Data Completeness of Patient Data.

Statistical analysis of enzyme kinetic data.

On the use of apparent kinetic parameters for immobilized enzyme with uncompetitive substrate inhibition [corrected].

Use of Fixed Effects Models to Analyze Self-Controlled Case Series Data in Vaccine Safety Studies.

Robust ridge regression estimators for nonlinear models with applications to high throughput screening assay data.

The minimal requirements to use calcium imaging to analyze ICRAC.

Kinetic studies in isolated organs: tools to design analgesic peptides and to analyze their receptor effects.

Thoughtful Methods to Increase Evidence Levels and Analyze Nonparametric Data.

Using biological networks to integrate, visualize and analyze genomics data.

Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.

Beliefs and anticipated situations influencing intentions to use drugs.

Use of gel retardation to analyze protein-nucleic acid interactions.

Application of a Systems Pharmacology-Based Placebo Population Model to Analyze Long-Term Data of Postmenopausal Osteoporosis.

How to analyze tumor stage data in clinical research.

A New Bliss Independence Model to Analyze Drug Combination Data.

Rigorous methodology is needed to analyze and interpret observational data on the use and effectiveness of smoking cessation AIDS.

Using Classification and Regression Trees (CART) and random forests to analyze attrition: Results from two simulations.

BMDExpress Data Viewer - a visualization tool to analyze BMDExpress datasets.