MAGNETIC RESONANCE IN MEDICINE

28,214-236 ( 1992)

An Investigation of Tumor 'H Nuclear Magnetic Resonance Spectra by the Application of Chemometric Techniques S. L. HOWELLS,* R. J. MAXWELL, A.

c. PEET, AND J. R. GRIFFITHS

CRC Biomedical Magnetic Resonance Research Group, Division of Biochemistry, St. George's Hospital Medical School, Cranmer Terrace, London SWI 7 ORE, United Kingdom Received March 11, 1992; revised April 15, 1992; accepted June 13, 1992

'H nuclear magnetic resonance (NMR) spectra of tumors and normal tissue include signals from all hydrogen-containing metabolites and can therefore be considered multicomponent multivariate mixtures. We have obtained 'H spectra from perchloric acid extracts of three normal tissues (liver, kidney, and spleen) and five rat tumors (GH3 prolactinoma, Moms hepatomas 7777 and 96 18a, LBDS, fibrosarcoma, and Walker 256 carcinosarcoma). We have applied several different chemometric methods to analyze the data. First, we used principal component analysis, cluster analysis, and an optimized artificial neural network to develop a classification rule from a training set of samples of known origin or class. The classification rule was then assessed using a set of unknown samples. We were able to successfully determine the class of each unknown sample. Second, we used the chemometric techniques of factor analysis followed by target testing to investigate the underlying biochemical differences that are detected between the classes of samples. 0 1992 Academic Press, Inc.

INTRODUCTION

Over the past 10 years nuclear magnetic resonance (NMR) has been widely used to study cancer. Changes in tumor metabolism can be followed noninvasively after various forms of therapy, and anticancer drugs and their metabolites can be detected in situ. However, the diagnostic use of NMR has been less successful. In certain cases it is possible to distinguish spectra of particular tumors from those of normal tissue, but there is no generally applicable feature that can be used in all cases. In principle, 'H NMR of a tumor gives information about almost every metabolite in the tissue, but this results in extremely complicated spectra with considerable problems of peak overlap and assignment. The total information content contained in each spectrum cannot be analyzed easily by examination. Therefore, in order to characterize and classify samples into groups or classes the conventional approach is either to simplify the spectra by editing or to identify and quantify chosen peaks. These methods involve assumptions as to the importance of the metabolites that are chosen, and discard a large amount of the information in the original spectrum. It is possible to analyze large numbers of very complex 'H NMR spectra, so as, for instance, to differentiate between different classes of tumor or distinguish normal from 1992 SMRM Young Investigator's Award Finalist.

* To whom correspondence should be addressed. 0740-3194192 $5.00 Copynght 0 1992 by Academic Press, Inc. All nghts of reproductionIn any form reserved.

214

CHEMOMETRIC TECHNIQUES FOR ‘H NMR SPECTRA

215

malignant tissue. Chemometric methods of analysis in which each NMR spectrum is considered a multicomponent multivariate mixture have been developed to solve such complex chemical problems. Chemometrics is defined as the “chemical discipline that uses mathematical and statistical methods for handling, analyzing, interpreting and predicting chemical data” ( 1). Two of the most important areas of chemometrics are factor analysis and pattern recognition. Factor analysis, developed in the 1930’s for use in behavioral sciences, was applied to chemical problems in 1970. This type of analysis has many advantages for solving problems in analytical chemistry. Large quantities of complex data can be simplified to a small number of factors which can then be interpreted and hence the data can be classified. The most widely used method for calculating these factors is principal component analysis ( PCA). Early work in factor analysis and PCA was mainly focused on NMR data, in order to develop a procedure for predicting the shifts of simple solutes in a variety of solvents ( 2 ) ,mass spectrometry, and chromatography ( 3 ) .Recently these methods have been applied to a wide variety of problems including: PCA of 13C NMR spectra for the identification and quantification of structures in petroleum distilates ( 4 ) ,the analysis of near infrared spectra ( 5 ) , the use of PCA as a digital filter for noise reduction in GC/MS data sets (6), and the analysis and identification of petroleum-related target compounds from highly contaminated extracts of fire debris ( 7). Pattern recognition by unsupervised learning involves methods that make no a priori assumptions about class membership of the samples but instead attempts to uncover intrinsic patterns or clusters in the data. The goal is to find a characteristic property of a collection of samples via indirect measurements made on the samples. Cluster analysis is an unsupervised learning technique based on the idea of finding clusters of points in the data. Samples are grouped together on the basis of their nearness or similarity. The output from cluster analysis is usually in the form of a dendrogram which is a two-dimensional representation corresponding to the fusions of clusters at each level or stage of the analysis. The use of a dendrogram allows for the visualization of the clustering and relationship between the samples. The data used for cluster analysis can be either the normalized experimental data (the raw data) or the scores calculated from PCA. The advantage of using the results from PCA is that the quantity of data is reduced and the experimental or residual error can be partially removed from the original data matrix. Cluster analysis has been extensively used in a variety of different ways: for the detection of cross peaks in 2D NMR (8, 9 ) , molecular modeling and drug design ( l o ) ,the analysis of infrared spectra ( I I ), the analysis and optimization of gas sensor arrays (IZ),and in the investigation of metabolic pathways ( 1 3 ) .We have previously used both PCA and cluster analysis to analyze ‘H NMR spectra of tumors and normal tissue and to characterize the samples into groups ( 14). In almost all cases we could separate the different tumor types from each other and from the normal tissues. Supervised learning includes many different methods in which the overall aim is classification of unknown samples. A set of samples, known as the training set, which are of known origin or class, is used to develop a classification rule. The purpose of the rule is to classify a set of unknowns into one of these defined classes. Initially the classification rule is evaluated by predicting the class membership of samples contained

216

HOWELLS ET AL

in a test set. The actual origins of the test set samples are known so the predicted classifications can be compared to the true classifications. Therefore, if the predicted class equals the true class then the rule is well developed and can be used further to predict the identity of samples whose true origin is unknown. An early reported use of nonparametric methods of pattern recognition was in the classification of archaeological artifacts based upon trace element data ( 1 5 ) . These methods have now been utilized further to include the use of the K-nearest neighbor technique, among others, for predicting the biological activity of an anticancer drug ( 16) and the analysis of 'H NMR spectra of urine for the classification of toxicological data ( 17). Recently, Goux ( 1 8 ) has applied PCA, K-nearest neighbor, and SIMCA class modeling to the classification of peracetylated mono- and oligo-saccarides. Perkins et al. ( 1 9 ) classified infrared spectra of alcohols and nonalcohols by calculating PC scores for each sample within a training set and then obtaining a discriminant rule that classifies an unknown sample to one class or the other. The classification rule was derived by assuming that the samples are normally distributed about the class mean. A discriminant score was calculated from the distance of the unknown to the class mean and the unknown assigned to the class with the lowest discriminant score. We recently reported the use of two different methods to classify a set of unknown samples using a database containing seven different classes of sample ( 2 0 ) . The first method involved cluster analysis to find the nearest neighbor for each unknown, and the second method compared the distance of an unknown sample score with the mean scores for each class of sample. Neural network computing techniques have also been applied in pattern recognition for the identification and classification of unknown samples. These programs are based on models of the ways in which neurones are thought to interact within the nervous system. Within the field of analytical chemistry, neural networks have been applied to a variety of situations (21, 22). In particular, Long et al. ( 2 3 )demonstrated the use of a network as a method of classifying chromatographic data from various types of jet fuels. The network was trained to associate the patterns within the chromatograms to the class of fuel. After the learning stage a test set of samples was then analyzed to validate the method. The first neural network application for the recognition of NMR spectra was described by Thomsen and Meyer (24). 'H NMR spectra from six different sugar alditols were used to train a three-layered feed-forward neural network and recognize the class of alditol represented by a specific spectrum. In this paper we present results using many of the aforementioned techniques to extend our previous characterization and classification studies. The aim is first to develop a model from which the class of a set of unknown samples can be identified and then to investigate the underlying metabolic and biochemical differences that are being detected using these methods. THEORY

Factor Analysis Factor analysis involves several major steps and only those relevant to this work are discussed in detail. The terminology and explanations are based on those described by Malinowski ( 1 ) .The main steps in factor analysis are shown in Fig. 1.

217

CHEMOMETRIC TECHNIQUES FOR 'H NMR SPECTRA

, cD , Training data

DATA MATRIX COVARIANCE

A b s t r Reproduction

a

y

fpcA

Decomposition

(Etgenvectors)

Data Scores

ABSTRACT FACTORS

Test Scores

REAL FACTORS

FIG. I. Main steps in factor analysis. FIG.2. Main steps in classification analysis.

Any sample can be described by a number of variables or data points. Therefore, for a large number of samples a data matrix [ D] can be constructed containing r rows [samples] and c columns [variables]. The first step in factor analysis is the decomposition of [ D] into two matrices, a row matrix [ R] and a column matrix [ C] : [Dl

=

[RI[Cl.

This is a purely mathematical procedure and [ R ] and [ C] are called abstract matrices. However, in order to decompose [D], use is made of a correlation or covariance matrix [ Z] , calculated by premultiplying [ D] by its transpose. Calculation of the abstract matrices involves eigenanalysis which yields a set of eigenvalues and associated eigenvectors. Each eigenvalue is a measure of the importance of the vector. A large eigenvalue is indicative of a major factor. Principal component analysis is one widely used method that can be used to decompose [Z]and calculate the eigenvectors and eigenvalues. The aim of PCA is to represent the data in the minimum number of principal components (PC) which can then be viewed graphically. Each successive PC (eigenvector) is calculated to account for as much of the variance in the data as possible. The first PC passes through the greatest concentration of data points, accounts for the major fraction of the variance, and hence is associated with the largest most important eigenvalue. The second PC is orthogonal to the first and describes as much of the data set as possible that is not already accounted for. This process is repeated until all the variance in the data is described. Therefore a complete set of PCs account for all of the original data including any experimental error. Associated with each PC is a set of coefficients called the loadings which can be considered a projection of each variable onto the principal component. The loadings

218

HOWELLS ET AL.

represent the importance of a variable on an eigenvector. The magnitude of the loading is related to the importance of the variable, regardless of the sign of the loading. The projection of a sample onto a PC is called a score. The sample scores can be calculated from the eigenvectors: [Scores] = [Data] [ Eigenvector~]~ The reason for this manipulation is so the scores matrix will have dimensions of Samples by Real Factors. A detailed description of principal component analysis can be found in the literature ( 2 5 ) . The aim of factor analysis is to reproduce [ D] within experimental error using the minimum number of eigenvectors. Therefore, starting with the first eigenvector (associated with the largest eigenvalue ) the following matrix multiplication is performed,

where [ DIest is the recalculated or estimated data matrix that is compared with the original data matrix [ D] . This process is repeated using the next largest eigenvector until [ D] is reproduced within experimental error. As each successive eigenvector is added to the calculation, the estimate of [ D] becomes more accurate. When too many factors are introduced the experimental error will be reproduced and this should obviously be avoided. The minimum number of eigenvectors required will equal the dimensionality of factor space, i.e., the number of factors involved. The theory of error in factor analysis is described fully by Malinowski (26). First the resulting eigenvectors can be divided into two groups: a primary and a secondary set. The primary eigenvectors contain the factors along with a mixture of error, and it is from these eigenvectors that the original data can be regenerated. Malinowski defined three types of error associated with factor analysis, making it possible to calculate the number of primary vectors without a priori knowledge of the error. The difference between the raw data matrix and the pure data matrix is defined as the real error (RE), which is itself found to consist of two error terms: the imbedded (IE) and the extracted (XE) error. IE is defined as the error imbedded or mixed into the factors; it cannot be removed and can be considered the difference between the pure and the factor-analyzed (reproduced) data. XE is defined as the error that is removed or extracted when the secondary vectors are eliminated from further analysis and can be considered the difference between the raw and the factor-analyzed data. The final aim of factor analysis is to transform the principal factors (primary eigenvectors) into recognizable parameters. When applied to analytical chemistry problems the ultimate objective is to determine real chemical meaning. One method used to carry out the transformation step is known as target testing. Using a test vector, known as the “target,” it is possible to test for this vector within the entire multicomponent data set and to determine if the target is a real factor. The assessment is made by the mathematical procedure

where xpred is the predicted vector, and t is the associated transformation vector.

CHEMOMETRIC TECHNIQUES FOR ‘H NMR SPECTRA

219

The transformation vector t is the result of a least-squares procedure involving the primary eigenvectors and the individual target vector being tested. The least-squares procedure minimizes the differences between the test vector xtarge, and the predicted vector producing the best possible transformation vector:

This transformation vector can then be used to produce a predicted vector that most closely matches the target. It is then possible to determine whether the predicted vector Xpred is equal to the test vector within experimental error, i.e., if Xpred = xtarget. If the tested and predicted vectors are sufficiently similar then the target can be considered a real factor.

Neural Networks A neural network is separated into several layers, and the set of neurones from one layer has inputs from preceding layers. The outputs of the neurones lie between 0 (off) and 1 (on) depending on the magnitude of the input. Any number of layers of neurones may be used. The final layer has one neurone for each class, and the output is “on” if the sample lies in that class and “off” if it does not. Internal layers of neurones are called hidden layers and may have any number of neurones called nodes. The characteristics of the neural network are determined by the nature of the threshold function which turns the input of a neurone into an output, and a series of coefficients which are used to weight the inputs from different neurones. The procedure involves repeating a sequence of two passes through the network. Initially, in the forward step, a pattern of inputs is supplied to the first layer of a network with arbitrary weights and propagated through the network until the final outputs of the last layer are calculated. The generated output is compared with the desired output (known output or class) and an error is calculated. Second, in the backward step, the weights associated with the inputs to the last layer are altered to minimize the error. This minimization of error is propagated layer by layer backward through the network. The forward and backward propagation is continued until the weights obtained result in an acceptable approximation to the desired output. Therefore, the technique involves changing the values of the weights and parameters in the threshold function in order to give the known outputs. This is called the learning stage and is camed out using a method called the generalized delta rule with back propagation of error. Once the network is constructed, a sample of unknown class may be used as the input, an output calculated, and the class of the unknown assigned or identified. This process is called forward propagation. MATERIALS AND METHODS

Animal and Tumors The following tumors and rat strains were used: the Moms hepatoma 7777 (poorly differentiated, rapidly growing) and Moms hepatoma 96 18a (well differentiated, slowly growing), both of which were grown in male and female Buffalo rats; the GH3 prolactinoma, grown in female Wistar-Furth rats; the Walker 256 carcinosarcoma, grown in female Wistar rats; a mammary adenocarcinoma, induced in female Ludwig-Olac

220

HOWELLS ET AL.

rats by three iv injections of n-methyl-n-nitrosourea; and the LBDSl fibrosarcoma, grown in male BD9 rats. Normal tissues, livers, kidneys, and spleens, were obtained from each class of tumor-bearing animal. Thus a complete set of normal tissue associated with the different strains of tumor-bearing rats was acquired. Livers, kidneys, and spleens obtained from identical strains of non-tumor-bearing rats formed the set of "control'' normal tissues. The range of tumor weights at the time of excision was 1-15 g. Anesthesia Prior to freeze-clamping of samples the animals were anesthetized with either pentobarbitone (60 mg/kg ip) or a halothane (2-3%), nitrous oxide ( 1 liter/min), oxygen ( 2 liter/min) mixture. Sample Extraction The anesthetized animal was killed by cervical dislocation and the time of death was taken as time zero. Samples were excised in the following order and within the following time ranges: liver, 20-60 s; spleen, 30-75 s; kidney, 50-120 s; and tumor, 80-250 s. Samples were immediately freeze-clamped in liquid nitrogen. Chemical extracts were prepared using perchloric acid. The preweighed ground sample was homogenized with 6% perchloric acid. The total volume of perchloric acid (in milliliters) was equal to five times the weight of tissue (in grams). The homogenate was left to stand in ice for 10 min and then centrifuged. The supernatant was neutralized to pH 7 (by universal indicator solution) using 20%~KOH and left to stand in ice for several hours. The perchlorate precipitate was removed by centrifugation and the supernatant freeze dried.

NMR Spectroscopy Seventy milligrams of freeze dried material was redissolved in 500 p1 D20 and 10 pl(2 mM) sodium 3'( trimethylsily1)-1-propanesulfonate (TPS) was added as chemical shift and quantitation reference. The sample was then centrifuged and the pH of the supernatant adjusted to pH 6.80 (k0.05) using DCl or NaOD. Each sample was prepared immediately before the NMR spectrum was recorded to reduce the possibility of further production of perchlorate precipitate. The following metabolites (purchased from Sigma) were also analyzed: lactate, choline, glycine, taurine, and valine. Each metabolite ( 10 mM) was prepared in 1 ml D20, and the pH adjusted to pH 6.8 (a0.05). A 10 m M solution of the TPS standard was also prepared. The NMR spectrum for each metabolite was obtained and analyzed in a manner identical to that of any tissue extract. IH NMR spectra were obtained at 25°C on a Bruker AM400 spectrometer with a 5-mm 'H probe. The quality of shimming was determined from the water linewidth before presaturation. All spectra were recorded with a shim of less than 4 Hz water/ HOD linewidth. Acquisition involved selective presaturation of the residual water signal, a 90" flip angle, a pulse repetition time of 10 s, spectral width of 8 kHz, and 16 K data points.

CHEMOMETRIC TECHNIQUES FOR 'H NMR SPECTRA

22 1

Data Processing (Spectrum Analysis) A 'H NMR spectrum was obtained by Fourier transformation with exponential weighting to give a total linewidth of 4 Hz. The spectrum was then digitized by dividing it into equally spaced ppm intervals and recording the maximum peak height intensity in each interval. A list of peak heights (variables) which describe or artificially recreate the 'H spectrum was then obtained. In order to compare one digitized spectrum directly with any other the data must be normalized. This is achieved by dividing each variable in the data set by the peak height of the standard TPS peak at 0 ppm. Therefore, the peak height of the standard in every digitized spectrum was equal to 1 .OOO. For each sample, the two sets of data obtained during the digitization procedure consisted of the following: (a) 180 variables, 4.5 to 0 ppm at 0.025 ppm intervals; ( b ) 501 variables, 4.5 to -0.5 ppm at 0.01 ppm intervals. The 180-variable data set was chosen for use in the classification analysis, mainly because of the constraints set by the Clustan software package which allows a maximum of 200 variables to be analyzed and also because an increase in the number of variables causes an exponential increase in the data processing time for the other classification methods. However, because the spectra were all standardized to 4 Hz linewidth, the smallest ppm interval that could be chosen for digitization was 0.0 1 ppm. The test vectors used in this paper are the digitized spectra of the individual metabolites present in many different concentrations within the tissue or tumor. Therefore, upon digitization of the tissue or tumor it is important not only that the signals from the metabolites be distinguished from each other but also that signal averaging be minimized, i.e., uniqueness of peaks is important.

Software Packages SAS (SAS Institute Inc., Cary, NC). Version 6.03 under SunOS running on a Sun workstation was used for PCA and cluster analysis using the PRINCOMP and CLUSTER procedures, respectively. Clustan (Clustan Limited, Scotland). Version 3.2 running on a Sun workstation was used for the classification of unknown samples using the CLASSIFY procedure. TFA (27). This program, which is not commercially available, was used for factor analysis and target testing.

Classijication-Supervised Learning In this section only data digitized at 0.025 ppm intervals were considered for analysis. The digitized data were arranged in the form of a data matrix containing n cases and 180 variables. Two data sets were prepared: The first was a training set [ D] containing 70 samples considered to be of known origin or class and including samples from all eight classes to be classified. Each sample in the training set was assigned a label ( 18) determined by the class to which it belongs. The second was a test set [TI containing 14 samples representative of all classes found in the training set. The 14 samples were labeled A-M and given no class assignment. The aim was to predict the class of each test set sample. The class and number of samples contained in both [ D] and [TI are shown in Tables 1 and 2.

222

HOWELLS ET AL. TABLE 1 Class and Number of Samples in Training Set [D] and Test Set [TI

Classification label

Number of samples in [Dl

Sample class

Number of samples in [TI

10 4 I

LBDS, fibrosarcoma GH3 prolactinoma Walker 256 carcinosarcoma Moms hepatoma 96 18a Moms hepatoma 7777 Spleen Kidney Liver

9 10 10 10 10

Exploratory Examination of the Data The first step in the classification procedure was an exploratory examination of the data. Initially the training set was analyzed using PCA to calculate a correlation matrix (correlation between variables) and to carry out eigenanalysis, to summarize and reduce the complexity of the data, and to demonstrate relationships between the samples. The PCs generated from this procedure were then analyzed further using cluster analysis and a dendrogram was produced. In this case PCA transformed the original data set into a new reduced data set containing 15 PCs, each with 180 variables. A score for each PC was then calculated and a new data matrix containing 70 cases (samples) and 15 variables (the PC scores)

TABLE 2 Actual Identity of Samples in Test Set [TI Test set samples

Actual class of sample

A B C D

Liver GH3 prolactinoma Moms hepatoma 96 18a Kidney Liver Spleen Moms hepatoma 1117 Kidney Moms hepatoma 9618a Spleen Walker 256 carcinosarcoma Moms hepatoma 7777 LBDS, fibrosarcoma LBDS, fibrosarcoma

E F G H I J K L M

N

CHEMOMETRIC TECHNIQUES FOR ‘H NMR SPECTRA

223

was obtained. Following the techniques of multivariate data analysis, each sample is now characterized by a vector of 15 components which may be represented by a single point in 15-dimensional space. Points representing samples which are similar are obviously close together in this space. Conversely, points representing samples which have greatly differing ‘H spectra lie far apart. Such properties can be investigated by cluster analysis. Cluster analysis was camed out using the sample scores obtained for the first 15 PCs accounting for 95% of the original data, and the clustering algorithm used was Ward’s method ( 2 8 ) .The results generated from the cluster analysis were in the form of a dendrogram. The training and test sets [ D] and [TI were then combined and PCA and cluster analysis were carried out to test the validity of the initial category assumptions, to ensure the training set spanned the data space sufficiently, and to examine the test set for outliers which are undesirable when evaluating the classification rule.

Classification Analysis After establishing that the training set is representative of all types of samples to be classified, four different methods of classification were implemented. All methods utilize the PC scores calculated previously during PCA. The main steps in the classification analysis are summarized in Fig. 2. ( 1 ) Nearest neighbor location in cluster analysis. ’The first method used was the “CLASSIFY” procedure contained in Clustan. This procedure was implemented to perform cluster analysis on the training set [ D ] using the first 15 PCs. The test set [TI was then input into the procedure, the samples (labeled A-M ) were compared with the existing hierarchical classification, and a nearest cluster and nearest neighbor were found for each sample. (2) Distancefrom class mean. Using the training set sample scores obtained from the first 15 PCs it was possible to calculate a mean score position for each class. Therefore, each class was represented as a single point in 15-dimensional space. Using the 15 PCs generated from the training set it was also possible to calculate a score for each unknown sample in the test set. This method then compares the unknown scores with the mean score for each class and then assigns the unknown score to the closest of these means. Therefore, for each unknown sample a distance from the mean measurement was obtained and the sample assigned to the class with the smallest distance value. (3) Probability of being in a given class. This method approximates the probability distribution as a product of a series of gaussians, one in each of the 15 PCs. The means and standard deviations of the gaussians are taken to be those of the training set samples. Therefore, this method assumes a gaussian distribution of the data in 15dimensional space and defines a volume at the 1 standard deviation contour around each class and hence takes into account the distribution of the samples within each given class in the training set. By comparing the unknown sample scores with the mean score for each class of sample it was therefore possible to calculate the relative probabilities of its being in each particular class. An unknown is assigned to the class with the highest probability value.

224

HOWELLS ET AL.

(4) Neural network. The program used ( 2 9 ) considered the training set sample scores for the first 15 PCs to be the inputs into a set of neurones. There were eight possible outputs: five tumor types and three normal tissue. The learning stage was implemented using the generalized delta rule with back propagation of error. Once the learning stage was completed (i.e., the network constructed and a set of weights that correctly defined the training set obtained) the sample scores of the test set were used as the input and a set of outputs was generated. The generalised delta rule was used again but this time with feed-fonvard propagation. In order to implement the learning stage, class representative output values had to be determined. A value of 0.9 was used to indicate class membership and a value of 0.1 to indicate nonmembership. The optimized conditions used in the network are given in Table 3.

Factor Analysis and Target Testing The aim of this section is to investigate the basic underlying factors contained within the 'H NMR spectra of tumors and to attempt to achieve real chemical understanding of the data. Samples digitized at 0.0 1 ppm intervals were used for this analysis to maximize the detail available. The digitized spectra from 72 tumor extracts were arranged in a data matrix of dimensions 72 samples by 50 1 variables. The number of each class of tumor analyzed is shown in Table 4. The data were analyzed using PCA to calculate a covariance matrix and to carry out eigenanalysis. By examination of the errors determined during this process it was possible to determine the number of factors required for further analysis. The individual metabolites were digitized, used as the test vectors, and analyzed using target transform. RESULTS AND DISCUSSION

The first step in the analysis is to validate the data contained in the training set, to check for any samples that may be outliers. This is achieved using PCA and cluster

TABLE 3 Optimized Parameters for Construction of the Neural Network ~~

Parameters Inputs Features per input outputs Hidden layers Nodes per layer 1 Nodes per layer 2 Nodes per layer 3 Maximum error Individual error Maximum number of iterations

Optimized conditions 70 15

8 3 30 30 30 0.01 0.00 1 1000

CHEMOMETRIC TECHNIQUES FOR ‘H NMR SPECTRA

225

TABLE 4 Tumor Samples Used for Factor Analysis ~

~

~

Sample class

Number of samples

LBDS, fibrosarcoma GH3 prolactinoma Walker 256 carcinosarcoma Moms hepatoma 96 18a Moms hepatoma 7777 Mammary adenocarcinoma

17 11 8 11 12 13

analysis for an exploratory examination of the training set. The results from PCA are shown in Table 5. The magnitude of the eigenvalue is indicative of the importance of the associated eigenvector. From the results shown in Table 5, it can be seen that after the first 15 PCs the magnitude of and the difference between each eigenvalue, and the contribution made by each of the associated eigenvectors, becomes very small. By this time 95% of the variance contained within the original data is described.

TABLE 5 PCA Results for the First 20 PCs Calculated from the Training Set [D] PC number

Eigenvalue

70Cumulative variance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

78.2557 24.5928 18.3411 14.2267 6.7366 6.3164 5.2789 3.9585 3.4781 3.1806 2.1496 1.5860 1.2768 1.1435 0.97 17 0.9348 0.7133 0.6626 0.6 139 0.5841

43.48 57.14 67.33 75.23 78.97 82.48 85.42 87.62 89.55 91.31 92.5 1 93.39 94.10 94.73 95.27 95.79 96.19 96.56 96.90 97.22

226

HOWELLS ET AL.

As the overall aim is to generate a classification rule that is able to categorize a large number of different samples it is necessary to include as much original information as possible and to eliminate as much experimental error as possible. For this reason, only the first 15 PCs were used for further analysis. To determine the validity of this simplification it was necessary to transform these 15 PCs into a more visual representation by the use of cluster analysis. The final results from cluster analysis can be represented in the form of a dendrogram which is shown in Fig. 3. The samples are shown on the x axis and the dissimilarity coefficient is on the y axis. The larger the dissimilarity coefficient the less significant the groupings or clusters. From the dendrogram it can be seen that there is a partial separation of samples into groups or clusters. The main points to note are the LBDS1 fibrosarcomas (with the exception of F14), the kidneys, and the livers (with the exception of L20) which all form separate isolated clusters; the GH3 prolactinomas and Walker 256 carcinosarcomas which are very closely linked (in particular P8 and W4 which form nearest neighbors), indicating that classification between these two classes of tumors may prove slightly difficult; the two types of Moms hepatomas which each form several small subclusters although it is encouraging to note that there are no subclusters containing the two types, i.e., no misassignment between the two classes; and finally the spleens which form two small subclusters. The remaining spleens are

0 07

006

i

0 05 c

c

.-al 0

5

004

8 ._ b z .-

003

E ._

In u) .U

a a2

r

r

0 01

0

-

e

s

a

r

~

A

e

t

e

1

s

p

s

p

i

i

F

S

p

E

E

/1

~ , ~, . ~: *~: :~ , a~ a ~a l ~: 3~ ~ ~f f ~B I~ E ~~ ~~ ~ ~~ ~~ ~ ~~ ~~

sample

FIG.3. Dendrogram produced from cluster analysis of the training set [ D] samples. The sample labels are as follows: F, LBDS, fibrosarcoma; P, GH3 polactinoma; W, Walker 256 carcinosarcoma; H, M o m s hepatoma 7777; G, Moms hepatoma 9618a; S , spleen; K, kidney; L, liver.

~

~

~

~

-

CHEMOMETRIC TECHNIQUES FOR ‘H NMR SPECTRA

227

widely distributed throughout the dendrogram and clearly this class of sample may be difficult to classify correctly. We have shown previously that it is possible to characterize each tumor type and tissue type into a separate cluster ( 1 4 ) ,but this requires slightly more rigorous optimization than is applied here. However, the aim of this present study was not to obtain a perfect picture but to have sufficient separation between the different sample classes so as to obtain a classification rule and identify an unknown. Therefore, this representation using 15 PCs seemed adequate. Cluster analysis repeated on a data set formed from a combination of training and test sets [ D] and [TI (to ensure that the training set spans sufficient range for each class and that the test set contains no outliers or spurious samples) resulted in a dendrogram with no obvious outliers. There was also complete separation between the GH3 prolactinomas and Walker 256 carcinosarcomas, indicating that it should be possible to classify between these two tumor types. The addition of extra spleens reduced the distribution of spleen samples throughout the dendrogram. After this preliminary examination it was decided that the samples chosen were reasonable and the classification analysis could proceed.

Classification Results The aim is to develop a model that can be used to classify sampIes whose true origin is unknown. To achieve this, several different methods of classification must be assessed first to determine which methods have the best prediction power and second to ascertain how reliable these methods actually are. Nearest neighbor location in cluster analysis. For each sample in [TI this procedure searches the dendrogram produced from cluster analysis of the training set (as shown in Fig. 3 ) and finds the nearest neighbor from among all the cases. The simplest method of searching the tree is to compare the test sample with two clusters at each branch within the dendrogram, find which cluster is most similar to the unknown, and then proceed down the tree until a possible nearest neighbor is found at the base. Therefore, an incorrect early assignment of a nearest neighbor leads to error propagation. The results from this method (Table 6) show the correct assignment of 10 out of 14 test samples. Several of these misassignmentscan be explained: Unknown sample C, actually a 96 18a hepatoma, is classified as a 7777 hepatoma. Problems differentiating between the two types of Moms hepatomas can be explained by examination of the original dendrogram. Each class of hepatoma forms several subclusters which are all linked together at a higher dissimilarity value so a bad branching decision at an early stage in the progression down the tree could result in misassignment. Samples F and J (both spleens) are wrongly classified as a kidney and a Walker 256 carcinosarcoma, respectively. This is not surprising as the spleens are widely distributed throughout the original dendrogram. The assignment of the true nearest neighbor is often problematic because of incorrect branching decisions made during the stepwise progression down the tree. However, false nearest neighbors can sometimes be recognized by their weak similarity with the unknown. Examination of the criterion value may indicate when a false nearest neighbor has been found as it gives an indication of how well the unknown case fits into

228

HOWELLS ET AL. TABLE 6 Results from Nearest Neighbor Location in Cluster Analysis for 14 Test Set Samples

Test sample (unknown) A

B C D E F G H 1 J K L M N

Nearest case in [D]

Criterion value

Classification label

L20 P6 H18 K16 H8 K8 H11 K14 G13 w4 w7 H9 F5 F12

0.0266 1 0.00104 0.0 1 137 0.00520 0.00807 0.00466 0.0082 1 0.00824 0.00535 0.0 1 160 0.00160 0.00623 0.00576 0.00279

8 2 5 7 5 5 7 4 3 3 5 1 1

Class assignment Liver GH3 prolactinoma Moms hepatoma 7777" Kidney Moms hepatoma 7777" Kidney" Moms hepatoma 1771 Kidney Moms hepatoma 96 18a Walker 256 carcinosarcoma" Walker 256 carcinosarcoma Moms hepatoma 1177 LBDS, fibrosarcoma LBDS, fibrosaroma

* lndicates an incorrect class assignment.

the existing classification. The smaller the criterion value the better the goodness of fit and the more similar the unknown is to its neighbor. In this example the criterion value is a poor indicator. Sample A has the highest criterion value of all the unknowns yet is classified correctly. This effect may be attributed to the unknown joining the correct cluster yet at a high dissimilarity value. Cluster analysis is invaluable in obtaining an overall impression of whether or not samples can be characterized together by 'H NMR. However, it is not ideal as a tool for predicting the identity of an unknown sample as clustering pattern tends to change significantly when an additional sample is added. Mean distance and probability methods. Both these methods assume that a cluster is made up of one class of sample and that there is no overlap of the sample classes. The major difference between these two methods is that the mean distance measurement is based on the distance from a centroid within each class, whereas the probability method takes into account the distribution of samples within each class. Thus one would expect the probability method to be more reliable. The results from these two methods are shown in Table 7. The mean distance method correctly classified 12 of the 14 unknowns. Assignment of an unknown sample score to the closest class mean score may result in misassignments if the members of two close-lying classes are of greatly differing spread. The unknown will be assigned to the class with the closest mean even if this class is so tightly clustered that the distance of the unknown from this mean makes its membership to the class unlikely. Therefore, using this method there may be a tendency for underassignment to classes that are widely spread. From the exploratory examination of the training set, it would be expected that this method would have the greatest difficulty in classification of the spleens contained

CHEMOMETRIC TECHNIQUES FOR 'H NMR SPECTRA

229

TABLE 7 Classification of Unknown Test Set Samples Using the Mean Distance and Probability Methods

Test sample A

B C D E F G H I J K L M N

Mean distance-class assignment Kidney GH3 prolactinoma Moms hepatoma 96 18a Kidney Spleen" Spleen Moms hepatoma 7777 Kidney Moms hepatoma 96 18a Spleen Walker 256 carcinosarcoma Moms hepatoma 7777 LBDS, fibrosarcoma LBDS, fibrosarcoma

Probability method-class assignment Kidney" GH3 prolactinoma Moms hepatoma 96 18a Kidney Liver Spleen Moms hepatoma 7771 Kidney Moms hepatoma 96 18a Spleen Walker 256 carcinosarcoma Moms hepatoma 1777 LBDS, fibrosarcoma LBDS, fibrosarcoma

Indicates an incorrect class assignment.

within the test set. However, this is not the case. The incorrectly assigned samples A and E are actually both livers. In a previous study ( 1 4 ) it has been found that livers form diffuse clusters and as a class are more widely spread than other types of tissues. This may account for the two incorrectly classified unknowns. The probability method correctly assigned 13 of the 14 test set samples. Sample A was incorrectly identified as a kidney. Perhaps the volume defined for the livers was not large enough to contain this sample and the unknown lies on the extremity of the samples contained in the training set. Clearly it is preferable to represent each class by the volume it occupies rather than just its mean position. In reality, this volume will be infinitely large but will have some probability distribution associated with it. This estimated probability is likely to be a poor representation of the real one. However, a more realistic model would require significantly more data than is available. These two methods easily differentiate between normal and malignant tissue but clearly have problems when dealing with samples from diffusely spread classes. However, as expected the probability method is more reliable. Neural networks. Construction of the neural network involved repeating the learning stage with varying numbers of nodes and hidden layers and assessing its ability to classify the test set samples. For each of the unknown samples to be assigned to one class with an output value of approximately 0.9 required that the network be optimized to include three hidden layers with 30 nodes per layer. In effect the extra nodes and hidden layers allow the program to model more and more complex shapes in the 15dimensional space and so define the clusters of classes of samples more precisely. The

230

HOWELLS ET AL.

optimized learning phase required 1 18 iterations for the correct assignment of all 14 test samples and the results are given in Table 8. Each unknown was assigned to a single class by an output >0.89 and for the other seven possible classes no output was >0.2. It is interesting to note that this method has no difficulty in classifying the spleen samples contained in the test set: both spleen samples F and J have output values >0.92. Therefore, this approach is a more robust method for defining patterns for all samples within the training set and is probably the most suitable and reliable method for classification. The normal tissues contained in the training and test sets were obtained from tumorbearing rats and the concern was that the state of health of the animal may result in extra variability between samples. For this reason the normal tissues in [Dl and [TI were replaced by livers, kidneys, and spleens from non-tumor-bearing rats and the whole analysis repeated. The new training set was analyzed to produce PC scores. The test set was analyzed using all four methods. Once again, the mean distance, probability, and neural network approaches gave the correct classification of 12, 13, and 14 samples, respectively. The nearest neighbor location procedure, however, only assigned all 14 test samples to a nearest neighbor of the correct class when the unknowns were input into the analysis in one particular order. This result appears to be a feature of one particular routine in this commercial package and, therefore, this method cannot be considered a reliable method to be used on its own. In conclusion, the neural network approach proved to be the most successful and probably the most reliable method. However, when carrying out classification analysis it is important that more than one method be used.

Factor Analysis and Target Testing The successful classification of unknown spectra into the tumor types or normal tissue from which they originate implies that there must be hidden patterns in the TABLE 8 Classification Results of Unknown Test Set Samples Using an Optimized Neural Network

Test sample

Output value

A

0.89 0.89 0.90 0.9 I 0.89 0.92 0.90 0.90 0.90 0.93 0.9 1 0.90 0.93 0.9 1

B C D E F G H I 3 K L M N

Classification label

8 2 4 7

8 6 5 I 4 6 3 5 1 1

Class assignment Liver GH3 prolactinoma Moms hepatoma 96 18a Kidney Liver Spleen Moms hepatoma 1117 Kidney Moms hepatoma 96 18a Spleen Walker 256 carcinosarcoma Moms hepatoma I111 LBDS, fibrosarcoma LBDS, fibrosarcoma

CHEMOMETRIC TECHNIQUES FOR 'H NMR SPECTRA

23 1

mixtures of metabolites giving rise to the overall spectra. We therefore attempted to determine the underlying biochemical properties that are responsible for the differences between the samples. In such a study, it is important to retain the integrity of the original spectrum, which is achieved by decomposing the data matrix via a covariance matrix. This is a different approach than that used in our previous paper ( 1 4 )in which the analysis utilized a correlation matrix in order to obtain the PCs. Previously, we were using the chemometric techniques of PCA and cluster analysis to characterize samples into groups determined by their overall pattern. Therefore, it was important to give each variable equal weighting, hence the use of correlation. However, in this case, the first step in the factor analysis is decomposition of the covariance matrix to generate the important eigenvectors (or PCs). The initial results from PCA are shown in Table 9. The use of a covariance matrix places a statistical bias on the analysis. Each variable is weighted in proportion to its magnitude; therefore, a larger variable is given more statistical importance. Thus, by examination of the eigenvector loadings it may be possible to assign some biochemical significance to the vectors. The loading plots for the first four PCs are shown in Fig. 4. The first PC accounts for 82% of the variance within the data and by definition passes through the greatest concentration of data points. This PC effectively describes an average tumor spectrum. Individual metabolites can be easily assigned, and this loading accounts for the information contained in each tumor spectrum. The second PC is by definition orthogonal to the first and describes the major source of variation between the samples. In this case the source of variation, indicated by a large negative loading value, can be attributed to choline-containing compounds. After the first two PCs the information or variance contained in further PCs can be thought of as describing uniqueness within the samples. A large positive loading for lactate and a large negative loading for choline-containing compounds in PC3 indicate that these metabolite are important factors in the data. Likewise for PC4, glycine, choline-containing compounds, and possibly taurine are of importance in this vector.

TABLE 9 Results from Factor Analysis of 72 Tumor Samples % Cumulative

PC number 1

2 3 4 5 6 7 8 9 10

Eigenvalue

variance

RE (X 1O-*)

IE (X10-2)

4845.8 450.26 310.31 80.90 48.17 29.64 22.16 19.90 14.52 10.19

82.35 90.00 95.27 96.65 91.41 97.97 98.36 98.69 98.94 99.1 1

17.09 12.95 8.97 7.6 1 6.67 6.01 5.45 4.90 4.44 4.10

2.0 I 2.16 1.83 1.79 1.76 1.74 1.70 1.63 1.57 1.53

232

HOWELLS ET AL.

25

?

Variable Number

0

m

IW

3w

400

Vanable Number

0

IW

m

3

w

Variable Number

4w

,L. ,

0

IW

m

3w

4cm

Variable Number

FIG.4. PC loadings from factor analysis of 72 tumor samples. ( A ) Variable loadings on PCI. The major peaks are assigned as follows: ( 1) TPS standard, ( 2 ) valine, (3) lactate, (4)choline-containing compounds, ( 5 ) taurine, ( 6 ) glycine. ( B ) Variable loadings on PC2. (C) Variable loadings on PC3. ( D ) Variable loadings on PC4.

Regardless of the complexity of the data it is possible to test potential factors individually using target testing. Using the process of target transformation it is possible to extract basic factors from within a multicomponent mixture. Therefore, target testing can help to evaluate the nature of the factors and can aid in the development of real biochemical meaning. If a metabolite is a basic factor then it should be possible to extract the individual spectrum of the metabolite of interest from the data matrix of tumor spectra. From the error calculations obtained during the PCA analysis it is possible to evaluate the number of eigenvectors to be used in any further analysis. Figure 5 is a graph of the real error RE (the difference between the raw experimental data and the pure data) and the embedded error IE (the difference between the pure data and the factoranalyzed data) against the number of PCs. The point at which RE and IE converge indicates the number of important factors in the data, and in this case is found to be 55. This may seem rather a large number of factors but when the nature of the samples is considered it does not seem unreasonable. The samples contain a large number of metabolites; the addition of a standard will also be described within a factor; each class of tumor is obtained from different strains of rats, and there will also be a contribution from background possibly due to the intrinsic differences between the tissue types.

233

CHEMOMETRIC TECHNIQUES FOR 'H NMR SPECTRA

0.05

0

0

20

40

60

80

Number of Factors FIG. 5 A. plot of error against the number of PCs. (A) Real error RE; ( B ) imbedded error IE.

The results from target testing of individual metabolites are shown in Fig. 6. The following were used as test vectors: the standard TPS, lactate, choline, glycine, taurine, and valine. Figure 6A shows the average spectra of all 72 tumor samples. The metabolites tested have been assigned. At this level of digitization (0.01 ppm), resolution is not quite good enough to fully reproduce the original NMR spectrum. This may lead to problems when testing for metabolites at a low concentration within the spectrum. Figure 6B shows the predicted spectrum of the standard extracted from the data matrix and it can be seen that all four standard peaks are easily distinguished and there is only a very small contribution from baseline noise. Figure 6C shows the predicted spectrum of lactate which is also extracted with a good fit to the spectrum of the individual metabolite and also of course includes the standard. Both the large doublet and the small unresolved quartet are easily distinguished, along with the spectrum of the standard. Similar results are obtained for choline (Fig. 6D) where all three peaks are successfully isolated from the tumor spectra. It appears that the presence of a large dominating peak aids in the distinction of the smaller signals and that the baseline effects are significantly reduced if this is the case. The predicted vectors of glycine and taurine (Figs. 6e and 6f, respectively) also produce positive results, implying that all of these metabolites can be considered basic factors within the data matrix. The effect of using a covariance matrix can be clearly seen by comparing the predicted vectors of choline and glycine. Metabolites with large dominating peaks are easily extracted and the predicted vectors are a very good fit to the original individual metabolite spectrum. Also, if the test contains a large peak then the fitting of smaller peaks is simplified, with a smaller contribution from baseline noise. The signal due to glycine is smaller in magnitude in a tumor than is choline and the contribution of noise or baseline is increased significantly. Therefore, dynamic range will be a problem in this type of analysis. This is illustrated further by examination of the predicted vector of valine, Fig. 6G. The contribution from valine in a tumor spectrum is very small and the predicted vector is dominated by baseline and noise. It is more difficult to distinguish the signal from the noise.

234

HOWELLS ET AL.

Variable Number I

1.2, 1.

0.6 . 0.6 0.4 0.2 ~

~

Variable Number

Variable Number

7

1

D . -

0 -2

. A -

,

Variable Number

-05 .-

Variable Number 1.2 1

I

0.6 0.4

0.2 0

-

A

-

.

~

A

n

F

~

Variable Number 12,

1

1

0.6 0.8 0.4 0.2 0 -0.2

Variable Number

FIG,6. Results from target testing of individual metabolites. ( A ) Average spectrum of 72 tumor samples; ( B ) predicted spectrum of TPS standard; (C) predicted spectrum of lactate; ( D ) predicted spectrum of choline; ( E ) predicted spectrum of glycine; (F) predicted spectrum of taurine; and ( G ) predicted spectrum of valine.

CHEMOMETRIC TECHNIQUES FOR 'H NMR SPECTRA

235

CONCLUSIONS

By using several different chemometric applications along with an optimized neural network to the analysis of 'H NMR spectra, it was possible to classify a set of unknown samples. The most reliable method was the neural network; however, for classification purposes more than one method should always be used. Increasing the size of the database is also desirable to improve the versatility and reliability of the methods. From the target test results it can be concluded that it is possible to apply this method to the analysis of biological NMR data and certain individual metabolites can be considered basic factors influencing the data. The potential of this method is wide ranging. The vectors chosen as targets have so far been known to exist in tumors and were teally chosen for this preliminary investigation to develop the method. It is now possible to extend the study. For example, it is not necessary to have a pure spectrum of the compound of interest; the test can be artificially created, resulting in the possible assignments of unknown signals in a spectrum. Overall, this method may result in new assignments and in a better understanding of underlying factors of biological significance that are not obvious from initial examination. All methods of analysis used in this paper are available in many commercial software packages and can therefore be easily repeated. The choice of software is purely personal and the CPU time required is entirely dependent on the software chosen. However, the overall CPU time requirements in this study were as follows: The initial PCA was in the order of 6-7 s, and cluster analysis approximately 10 s. In the classification section, the learning stage of the neural network was the rate-determining step, with an optimized learning time of 30 min. For the other classification methods, once the training set PC scores were calculated, classification of the unknowns was almost instantaneous. The rate-determining step for the TFA program was decomposition of the covariance matrix, which for the 72 tumor samples required an overnight run. This was a consequence of using a 386 personal computer for the analysis. This package was chosen as it is dedicated to the analysis of analytical chemical data and hence the graphical representation of results is superior to other packages. However, identical results can be obtained using SAS with a CPU time requirement of the order of minutes. There are several potential applications for the classification methods developed. The method as it stands may be usable for identifying tumor types from biopsies and in principal it would be cheaper and easier to run a 1D NMR spectrum of a biopsy extract and use a neural network program than perform conventional histology. This technique could also be applied to the analysis of many different types of samples, blood plasma or urine samples, for example. We are currently evaluating these methods for application to the analysis of time domain data. It may also be possible to classify spectra obtained in vivo and thus make completely noninvasive diagnoses. In vivo spectra have much lower resolution and are prone to artifacts but in simple cases that often arise in medicine (normal brain versus several classes of tumor), sufficient information may be present for classification. ACKNOWLEDGMENTS This work was supported by the Cancer Research Campaign. We are grateful to the MRC Biomedical NMR Centre for the provision of NMR spectroscopy facilities. We also thank Tim Brockwell (Thames Polytechnic) for his help and advice and for the use of his target factor analysis program.

236

HOWELLS ET AL. REFERENCES

1. E. R. MALINOWSKI,“Factor Analysis in Chemistry,” 2nd ed., Wiley, New York, 1991. 2. P. H. WEINER,E. R. MALINOWSKI, AND A. R. LEVINSTONE, J. Phys. Chem. 74(26), 4537 ( 1970). 3. E. R. MALINOWSKI, Anal. Chim. Acta 134, 129 (1982). 4. T. BREKKE,T. BARTH,0. M. KVALHEIM, AND E. SLETTEN,And. Chem. 62( 1 ), 49 (1990). 5. 1. A. COWE, J. w . MCNICOL,AND D. c. CUTHBERTSON, Anal. Proc 27( 3), 61 ( 1990). 6. T. A. LEE, L. M. HEADLEY,AND J. K. HARDY,Anal. Chem. 63(4), 357 (1991). 7. R. 0 . KETOA N D P. L. WINEMAN, Anal. Chem. 63( IS), 1964 (1991 ). 8. 2. MADI,B. U. MEIER,AND R. R. ERNST, J. Magn. Reson 72, 584 ( 1987). Y. K. P. NEIDIG,R. SAFFRICH,M. LORENZ,AND H. R. KALBITZER,J . Magn Reson. 89, 543 ( 1990). 10. D. J. LIVINGSTONE, Anal. Proc. 28(8), 247 (1991). If. M. J. ADAMS,Anal. Proc. 28(5), 147 (1991). I 2 A. D. WALMSLEY, S. J. HASWELL,AND E. METCALFE,Anal. Chzm. Acta 242, 31 ( 1991). 13. A. M. MASSART-LEAN AND D. L. MASSART,Biochem. J. 196,611 (1981 ). 14. S. L. HOWELLS,R. J. MAXWELL,AND J. R. GRIFFITHS, NMR Biomed. 5, 59 ( 1992). IS. B. R. KOWALSKI, T. F. SCHATZKI,A N D F. H. STROSS,Anal. Chem. 44( 13), 2176 ( 1972). 16. B. R. KOWALSKI AND C. F. BENDER,J. Amer. Chem. Soc. 96(3), 916 (1974). 17. K. P. R. GARTLAND, S. M. SANINS,J. K. NICHOLSON,B. C. SWEATMAN, C. R. BEDDELL,AND J. C . LINEON,NMR Biomed. 3(4), 166 (1990). 18. W. J. Goux, J. Magn. Reson. 85,457 (1989). I Y . J. H. PERKINS,E. J. HASENOEHRL, AND P. R. GRIFFITHS, Anal. Chem. 63( 17), 1738 ( 1991). 20. S. L. HOWELLS,R. J. MAXWELL,A. C. PEET,AND J. R. GRIFFTTHS, “Proceedings, SMRM 10th Ann. Meeting,” p. 600, 1991. 21. B. J. WYTHOFF,S. P. LEVINE,ANDS. A. TOMELLINI, Anal. Chem. 62(4), 2702 ( 1990). 22. J. R. LONG, V. G . GREGORIOU,AND P. J. GEMPERLINE, Anal. Chem. 62( 17), 1791 (1990). 23. J. R. LONG,H. T. MAYFIELD,M. V. HENLEY,ANDP.R. KROMANN, Anal. Chern. 63( 13), 1256 ( 1991). 24. J. U. THOMSEN AND B. MEYER,J. Mu@. Reson. 84,212 ( 1989). 25. S. WOLD, K. ESBENSEN, AND P. GELADI,Chemometrics Intell. Lab. Syst. 2, 37 (1987). 26. E. R. MALINOWSKI, Anal. Chem. 49(4), 606 (1977). 27. T. BRWKWELL,Kent (commercially unavailable). 28. J. H. WARD,J Amer. Statist. Assoc. 58, 236 (1963). 29. Y-H. PAO,“Adaptive Pattern Recognition and Neural Networks,” Addison-Wesley, New York, 1989.

An investigation of tumor 1H nuclear magnetic resonance spectra by the application of chemometric techniques.

1H nuclear magnetic resonance (NMR) spectra of tumors and normal tissue include signals from all hydrogen-containing metabolites and can therefore be ...
1MB Sizes 0 Downloads 0 Views