NMR IN BIOMEDICINE, VOL. 5, 59-64 (1992)

Classification of Tumour 'H NMR Spectra by Pattern Recognition S. L. Howells, R. J. Maxwell and J. R. Griffiths* CRC Biomedical Magnetic Resonance Research Group, St George's Hospital Medical School, Division of Biochemistry, Cranmer Terrace, London SW17 ORE, UK

'H spectra of tumours or normal tissues, which include signals from all hydrogen-containingmetabolites, are too complex for the human eye to interpret. We have studied 58 'H spectra from perchloric acid extracts of three normal tissues (liver, kidney and spleen) and five rat tumours (GH3 pituitary, fibrosarcoma, Morris Hepatomas 7777 and 9618a and Walker carcinosarcoma). Instead of editing them or quantifying individual metabolites, we have used statistical pattern recognition techniques to classify them into groups. This automatic, objective method differentiated spectra from normal and malignant rat tissue biopsies, and from different types of cancer. It seems likely that this technique can be applied to human tissues and thus used for cancer diagnosis.

INTR0DUCTI 0N

The profusion of hydrogen-containing chemical compounds in biological tissues gives enormously complex 'H NMR spectra. However, only a fraction of this power can be utilized in practice, because of the difficulty of interpreting the spectra. Even a molecule as simple as a-D-glucose gives a spectrum consisting of six multiplets, so the overall spectrum obtained from a biological tissue is too complex to be analysed by eye. Consider, for example, the spectra in Fig. 1, obtained from extracts of normal and malignant tissues. It is clear that they resemble one another, and yet they differ in detail. Would these differences enable us to categorize spectra as arising from cancerous or normal tissue, or from one cancer rather than another? The conventional solution to this problem is to identify the substances giving rise t o the peaks one by one (peak assignment) and then to see whether the concentration of any of them correlates with malignancy. It is also possible to simplify spectra by editing them, exploiting the coupling or relaxation characteristics of certain peaks. Both these methods inevitably lose an enormous amount of the information in the original spectrum. They also involve assumptions as to the importance of the metabolites that are chosen for editing or assignment. Another weakness of the conventional approach is its concentration on one or a few metabolites. Perhaps the spectral properties that would enable us to differentiate normal from malignant tissues involve a pattern of many metabolites. Given the complexity of individual 'H NMR spectra it is improbable that the human eye could recognize such patterns, and to assign every peak and develop suitable editing methods for them would be immensely laborious. We have adopted a different approach. Instead of editing the spectra or assigning the peaks we have used computer-based pattern recognition methods to classify the spectra into groups. Our aim was to differentiate Author to whom correspondence should be addressed. Abbreviations used: TPS, sodium 3'-(trimethylsilyl)-lpropanesulphonate; PC, principal component.

0952-3480/92/020059-06 $05.00

01992 by John Wiley & Sons, Ltd.

the tumours from the normal tissues and, ideally, to differentiate one tumour type from another. This is analogous to other problems in analytical chemistry in which mixtures characterized by many measurements have been classified into categories by the use of pattern recognition methods.' The general objective of pattern recognition is to predict an obscure property of a sample (its origin or the class to which it belongs) on the basis of a set of indirect measurements. A number of examples of pattern recognition methods including principal component analysis and cluster analysis have appeared in the NMR Gartland et aLs discussed the use of pattern recognition in the analysis of 'H NMR spectra from the urine of rats treated with a variety of toxins. However, their method involves manually scoring the size of certain previously assigned peaks whereas in our technique the spectrum is analysed automatically without subjective interpretation, editing or peak assignment.

EXPERIMENTAL Animals and tumours. All the studies were performed on

biopsies of tumours implanted subcutaneously into the flanks of rats. The following tumours and rat strains were used: the Morris hepatoma 7777 (poorly differentiated, rapidly growing) and Morris hepatoma 9618A (well differentiated, slowly growing), both of which were grown in male or female Buffalo rats; the GH3 prolactinoma, grown in female Wistar-Furth rats (see Stubbs et aL6 for further details of the hepatomas and prolactinomas used); the Walker 256 carcinosarcoma, grown in female Wistar rats;' and the LBDS, fibrosarcoma, grown in male BD9 rats, as described by Tozer and Morris.' Normal tissues were obtained from the tumour-bearing animals. The range of tumour weights at time of excision was 2-10 g. There was no evidence of metastatic spread of any of the tumours into any of the normal tissues. Sample extraction and NMR spectroscopy. Rat tissues

(liver, spleen, kidney) and tumours (fibrosarcomas, Morris hepatomas 7777 and 9618A, Walker sarcomas Receiued 8 May 1991 Accepted (revised) 8 August 1991

S. L. HOWELLS, R. J . MAXWELL A N D J . R . GRIFFITHS

I

(4

(c) 4.0

3.5

2.5

2. e

1.5

1.e

0.5

0.8

PP.

Figure 1. 400 MHz 'H NMR spectra of perchloric acid tumour extracts: (a) LBDSl fibrosarcoma; (b) Walker carcinosarcoma; (c) GH3 pituitary tumour.

and pituitary tumours) were excised within 2 min of being killed (by cervical dislocation) from animals that had been previously anaesthetized with pentobarbitone. Samples were immediately freeze-clamped in liquid nitrogen. Chemical extracts were prepared using 6% perchloric acid, neutralized (using KOH), the precipitate removed by centrifugation and then freezedried. 'H NMR spectra were obtained at 25°C on a Bruker AM-400 spectrometer after dissolution in 0.5 mL D,O, addition of sodium 3'-(trimet hylsilyl)- 1-propanesulphonate (TPS, to a concentration of 2 mM) as chemical shift and quantitation reference, and readjustment to pH6.8. The quality of shimming was determined for each sample from the water linewidth (before presaturation). Acquisition involved selective presaturation of the residual water signal (7 s), a 90" flip angle, a pulse repetition time of 10 s, a spectral width of 8 kHz and 16 k data points. Spectrum analysis. Processing of the water-suppressed

data involved exponential weighting to give a total water linewidth of 4 Hz. To standardize the spectra, peak heights were obtained at 0.025ppm intervals in the range of 4.5-0 ppm and normalized with respect to the TPS peak at 0 ppm. The data matrix thus consisted at 180 variables and 58 cases. Pattern recognition. This analysis was performed using SAS version 5.19 on MVS. The clustering algorithm used was Ward's method.' It was repeated for a range of principal components (PCs) from 3 to 18.

RESULTS

'H spectra Visual inspection of the spectra suggest some differences between the various samples (Fig. l), and it is possible to assign several resonances. The most easily identifiable resonances are as follows: a doublet at 1.3ppm due to -CH3 of lactate, a doublet at 1.45 pprn due to -CH3 of alanine, peaks at 3.02 and 3.92 ppm that are due to the -CH3 and -CH, groups, respectively, of creatine plus phosphocreatine, and finally the peak at 3.2 ppm which is due to choline-containing compounds. It is not possible from these assignments alone to distinguish between samples as the differences are too subtle and complex. It is therefore desirable to reduce the complexity of the data set and then to look for patterns within the data that will categorize similar samples. Because of the complexity of the spectra it is necessary to use multivariate (pattern recognition) data analysis techniques. Pattern recognition The data matrix, consisting of digitized spectra from the following samples, kidney (7), liver (19), spleen (7), pituitary tumour ( 5 ) , fibrosarcoma (8), hepatoma 7777 (3). hepatoma 9618A (9,Walker sarcoma (4), was analysed using pattern recognition techniques, as summarized in Fig. 2.

CLASSIFICATION OF TUMOUR 'H NMR SPECTRA

61

0.1 -

0-

Data Matrix

I1

i

-O'l -0.2

0.2.1 0.1

0 -0.1

O,:k)

P

-0.1

Figure 2. Scheme for processing 'H NMR data.

Principal component analysis Initially the complexity of the data was reduced using principal component analysis, yielding a set of eigenvectors and eigenvalues. A large eigenvalue indicates an important vector, or PC. The first three eigenvector loadings are shown in Fig. 3 (a)-(c) together with the mean of the 58 extract spectra (Fig. 3(d)). These loadings consist of a complex combination of spectral regions. The larger the loading value (positive or negative) the greater the correlation between the variable and the PC. It is difficult to make a biological interpretation of the eigenvectors at this stage since each PC evidently includes information from a large proportion of the data points. One way of representing the data is to calculate a score for each sample with a given PC (the projection of the data from that sample onto the eigenvector axis). It is then possible to display these scores graphically in two or three dimensions. Figure 4 is a scatter diagram showing the partial characterization of the sample classes according to the scores from PCs 3 and 4.It can be seen that the kidneys are separated but there is Table 1. Results from principal component analysis PC no.

Eigenvalue

Cumulative variance (%)

1 2 3 4 5 6 7 8 9 10 11 12 13

68.6828 41 A875 12.5530 10.9156 8.7001 6.1345 4.3933 3.7751 3.2033 2.6079 2.1113 1.9768 1.6154

38.2 61.2 68.2 74.2 79.1 82.5 84.9 87.0 88.8 90.3 91.4 92.5 93.4

-0.q

d

Figure 3. Eigenvector loadings for principal components 1 (a), 2 (b) and 3 (c). The magnitudes of the loadings show the importance of each spectral region (digitized in 0.025 ppm steps) in a given pc. The mean of the 58 digitized sample spectra is shown for comparison (d).

considerable overlap between the other samples. A larger number (n) of PCs is required to fully describe the data but the scores from all of these can no longer be represented in scatter diagrams. It is found that the important information content of a data set can usually be described in the first few PCs, the remainder arising from residual error. The number of physically important PCs was estimated using the indicator function described by Mahowski." In the present case principal component analysis yielded 13 important PCs (Table l), successfully reducing the dimensions of the data set from 180 to 13. The final step is to transform these PCs into a more visual representation. This can be achieved using cluster analysis, a technique which attempts to determine similarities within the data, and organize them into groups of cluster. l 1 Comparison of dendrograms obtained from cluster analysis using a range of PCs showed that the best separation of tissue types was given when 13 PCs were used, in accordance with the number of PCs estimated from Malinowski's indicator function.

S. L. HOWELLS. R. J. M A X W E L L A N D J. R . GRIFFITHS

62

RINCIPAL COMPONENT 3 H

L L L H

LL F F

L

G L

L G

F F L

Kr

L L%

E-

FH

K

L

L

K

S

x

K

S

z w z 0 a 0

V 4 4

K

K

a H

K

V

E0: 14

Figure 4. Scatter diagram showing partial characterization of the sample classes according to the scores from PCs 3 and 4. The samples are kidney, K ( n = 7 ) , liver, L (n= 19). spleen, S ( n = 7).pituitary tumour, P ( n = 5). fibrosarcoma, F ( n= 8 ) . Morris hepatoma 7777, H ( n= 3). Morris hepatoma 9618A. G ( n = 5) and Walker carcinosarcoma, W ( n= 4). 0.08

0.08

d

1 , *.,

,

1

1

!

0.02

0

Figure 5. Dendrogram showing cluster analysis of 'H spectra of extracts of rat tissues and tumours. Sample labels are as given in Fig. 4. Analysis performed using SAS version 5.19 on MVS. The clustering algorithm used was Ward's method.'

Cluster analysis

All clustering methods attempt to determine similarities or characteristics of the data set by organizing each point within the data into groups or clusters. Hierarchical clustering is a widely used method, in which each observation begins as a cluster by itself. The distance between each cluster is then measured and the two closest clusters are merged to form a new cluster, replacing the two old clusters. Merging of the two closest clusters is repeated until only one cluster is left. Various algorithms are commonly used to determine

the clusters in the data; they differ in t h e way they compute the distance between clusters. Cluster analysis was used on this reduced (13dimensional) data set and generated clusters of samples which had the most similar spectra. This can be shown in the dendrogram (Fig. 5 ) in which each tumour type and tissue type form a separate cluster. The only exceptions are one of the hepatomas which was incorporated in the liver cluster and two spleens which form a cluster away from the main spleen group. The spectrum from one of these outlying spleens was markedly different from the other spleen spectra in that it was

CLASSIFICATION OF TUMOUR ‘HNMR SPECTRA

dominated by peaks at 1.2 and 3.7 ppm. The spectrum of the other outlying spleen has a relatively large peak at 3.7 ppm (compared to the other spleen spectra). One feature of cluster analysis is that an outlying sample can be paired up with a sample with the most similarities and the resultant cluster (having average properties of its components) will be dragged away from the rest of the samples of the same class. The ‘normal’ tissues are from rats of several different strains bearing a variety of tumours, so it is not surprising that variations occur. However, the subclusters do not show a clear pattern based on the strain or tumour carried by particular rats. Several of the other clusters can be divided into subclusters (fibrosarcomas, kidneys and livers) whereas the pituitary tumours and Walker sarcomas each form very tight separate clusters which then form a relatively compact cluster. The formation of subclusters within a tumour type was not related to tumour size. The livers form a diffuse cluster made up of several subclusters. Hepatoma 9618A forms a tight cluster but hepatoma 7777 shows more variability, one of these samples being incorporated into the liver cluster. This difference may be due to the characteristics of the two tumour types. Hepatoma 9618A is relatively slow growing, with a high degree of differentiation and a low level of necrosis. Hepatoma 7777 is fast growing, poorly differentiated and has more necrosis.” On the one hand, the more rapidly growing tumour (H7777) may be expected to show more variability although the better differentiated tumour (H9618A) may be expected to be most ‘liver-like’.

PH The pH of the extracts was standardized at 6.80, but we were concerned that the patterns observed might have been dependent, to some extent, on this arbitrary choice. We therefore investigated the sensitivity of our cluster analysis strategy to pH variations. The freezedried material from one kidney extraction was divided into seven aliquots of 40 mg each, dissolved in 0.5 mL D 2 0 and the pH adjusted to values in the range of 6.03-7.52. Cluster analysis of the spectra from these samples showed that there was little variation within the p H range of 6.75-7.26. Samples with pH values outside this range were less similar. Subsequent cluster analysis was performed on a data set including this variable-pH kidney extract and all other kidney extracts (at pH 6.80+ 0.05). This showed that the dissimilarity coefficient between the variable-pH samples in the range 6.75-7.26 was small compared to the inter-kidney dissimilarity coefficient. These data suggest that the p H of the extracts was not a major factor in determining the clustering of the spectra. Lactate

Recent ‘H NMR experiments in viuo have shown that lactate is one the most important peaks differentiating tumours from normal tissues. If lactate were the main component responsible for the characterization of tumours from normal tissues then the use of multivariate analysis techniques would be unnecessary. In order

63

to see whether the lactate signals in the spectra were critical to the clustering pattern we eliminated them from the analysis (by removal of four peak height values corresponding to the peaks at 4.11 and 1.3ppm). There was no change in the clustering pattern of kidneys, spleens, pituitary tumours and Walker sarcomas. The main change was that some of the hepatomas (9618A) formed a subcluster within the large liver cluster. This suggests that lactate is not a major factor in the general pattern that distinguishes normal t i s u e from tumours, although it appears to be important in distinguishing this well-differentiated hepatoma from its tissue of origin. Time of freeze clamping All samples were freeze-clamped within 2 min of death but we wished to determine the sensitivity of the technique to the exact time of freeze clamping. The spectra from 20 kidney extracts, frozen at various times in the range of 15-120 s, were analysed in three different ways. Cluster analysis showed that although there was a slight tendency for samples to cluster together if they had been freeze-clamped at a similar time, this effect was considerably less than the differences between these kidney samples and a set of liver samples (freeze-clamped at 20-45 s). Also, the data matrix was transposed (i.e., to give a matrix consisting of 180 cases and 20 variables) and the correlation matrix (Fig. 2) calculated. This showed a high correlation (>0.93) between all of these 20 kidney spectra implying that there was very little difference between them. Finally, the time of freeze clamping was added to the original data matrix and the resultant correlation matrix showed that the correlation between time and each of the peak height variables was relatively low. The highest correlation coefficients (0.66 and 0.71) were found for the lactate doublet at 1.3 ppm. This suggests that the main change with time was in lactate but that the spectra were not significantly changed as a whole. It therefore seems likely that, within this time range, the effect of variations in time of freeze clamping will be relatively small, compared to intrinsic differences between tissue types.

DISCUSSION Tumours could be distinguished from normal tissues by cluster analysis of ‘H NMR data. In almost all cases the cluster analysis could separate the different tumour types and tissue types from each other. This suggests that the approach is likely to be useful in characterizing tumours. We are presently expanding the data base to include other tumour and tissue types. It will be important to relate tumour spectra to the spectra obtained from their tissue of origin. The next stage in this analysis is to determine which spectral features are characteristic of particular tumour types and to perform assignment studies so that these can be related to specific metabolites. We are presently applying our method to ‘H NMR spectra of human tumour and normal tissue biopsies. In the longer term several other developments are poss-

64

S. L. HOWELLS, R. J. MAXWELL AND J. R. GRIFFITHS

ible. The technique could be adapted for use in a predictive mode, so that an unknown spectrum could be assigned to one of the groups. Preliminary results with the data set used in the present study are encouraging, but larger numbers of samples will be needed in each group to make the method adequately robust. The most obvious practical application of this method, assuming it can be used successfully with human tissues, would be in the analysis of biopsies of suspected tumours, some of which are hard to distinguish from benign growths by conventional, labourintensive, histological methods. Delikatny er al. l3 have argued that 2D 'H NMR followed by visual examination of the spectra could be a more cost-effective way of distinguishing between invasive and non-invasive carcinoma of the cervix than histopathology. Pattern analysis could be the basis of an automatic method for analysis of 1D 'H spectra of tumour extracts that might be still more cost-effective. A major application of pattern analysis could arise in the intepretation of 'H spectra of tumours (or other pathological tissues) obtained from patients in uiuo. This non-invasive method has been extensively developed in recent years14 and could, in principle, permit tumour diagnosis without biopsy. The spectra have much poorer resolution than is obtained in the spectra of the extracts that we have studied so far, and (unless editing techniques are used) they tend to be dominated by signals from tissue lipids. However, the latter signals

may, in fact, contain important information. An additional problem with spectra obtained in uiuo will be the use of localization methods, which may introduce artifacts. In the present study we have adhered to a precise protocol for sample extraction and NMR spectroscopy. The method is tolerant of a limited variability in the pH at which spectra are obtained and in the time that tissues are freeze-clamped after death. However, the variability that may be introduced in uiuo by localized water-suppressed 'H techniques may present more difficult problems. In principle, any loss of discriminatory power resulting from poor spectral resolution could be compensated for by introducing information from other MRS methods (e.g., 3'P MRS) or data obtained by classical techniques, clinical observations, etc. The present results also suggest that there are many unsuspected metabolic differences between normal tissues and turnours, between one tumour type and another, and indeed between one normal tissue and another. A deeper understanding of these metabolic idiosyncrasies of tumours may permit the development of novel treatment strategies.

Acknowledgements This work was supported by the Cancer Research Campaign. We are also grateful to the MRC Biomedical N M R Centre for provision of N M R spectroscopy facilities.

REFERENCES 1. Jurs, P. C. Pattern recognition used to investigate multivariate data in analytical chemistry. Science, 232, 1219-1224, (1986). 2. Gaux, W. J. NMR pattern recognition of peracetylated mono- and oligosaccharide structures. Classification of residues using principal component analysis, K-nearest neighbour analysis and SlMCA class modelling. J. Magn. Reson. 85,457-469 (1989). 3. Brekke, T., Barth, T., Kvalhelm, 0. M. and Sletten, E. Multivariate analysis of Carbon-13 nuclear magnetic resonance spectra. Identification and quantification of average structures in petroleum distillates. Anal. Chem. 62, 49-56 (1990). 4. Madi, Z., Meier, 6. U. and Ernst, R. R. Detection of cross peaks in two-dimensional NMR by cluster analysis. J. Magn. Reson. 72, 584-590 (1987). 5. Gartland, K. P. R., Sanins, S.M. and Nicholson, J. K. Pattern recognition analysis of high resolution 'H NMR spectra of urine. A nonlinear mapping approach to the classification of toxicological data. NMR Biomed. 3, 166-172 (1990). 6. Stubbs, M., Rodrigues, L. M. and Griffiths, J. R. Potential artefacts from overlaying tissues in 31P NMR spectra of subcutaneously implanted rat tumours. NMR Biomed. 1, 165-170 (1989). 7. McSheehy, P. M. J., Prior, M. J. W. and Griffiths, J. R.

Prediction of 5-fluorouracil cytotoxicity towards the Walker carcinosarcorna using peak integrals of fluoronucleotides measured by MRS in vivo. Br. J. Cancer 60,303-309 (1989). 8. Tozer, G. M. and Morris, C. C. Blood flow and blood volume in a transplanted rat fibrosarcoma: comparison with various normal tissues. Radiother. Oncol. 17, 153-165 (1990). 9. Ward, J. H. Hierarchical grouping to optimise an objective function. J. Am. Stat. Assoc. 58, 236-244 (1963). 10. Malinowski, E. R. Determination of the number of factors and the experimental error in a data matrix. Anal. Chem. 49, 612-617 (1977). 11. Everitt, 6. Cluster Analysis, 2nd Edn. Heinemann Educational Books Ltd, London (19811. 12. Morris, H. P. and Wagner 6. P. Induction and transplantation of rat hepatomas with different growth rates, in Methods in Cancer Research, ed. by H. Busch, Vol. 4, pp. 125-132. Academic Press, New York (1968). 13. Delikatny, E. J., Russell, P., Dyne, M., Holmes, K. T., Atkinson, K., van Haaften-Day, C., Hunter, C. and Mountford, C. E. 'H MRS distinguishes between invasive carcinoma of the cervix and non-invasive carcinoma in situ. froc. 9th Annual Meeting, SOC. Magn. Reson. in Med. 2, 837 (1 990). 14. Ross, 6. D. ed. Proton spectroscopy in clinical medicine. NMR Biomed. 4,47-116 (1991).

Classification of tumour 1H NMR spectra by pattern recognition.

1H spectra of tumours or normal tissues, which include signals from all hydrogen-containing metabolites, are too complex for the human eye to interpre...
536KB Sizes 0 Downloads 0 Views