Cell Oncol. DOI 10.1007/s13402-014-0172-x

ORIGINAL PAPER

Automated classification of oral premalignant lesions using image cytometry and Random Forests-based algorithms Jonathan Baik & Qian Ye & Lewei Zhang & Catherine Poh & Miriam Rosin & Calum MacAulay & Martial Guillaud

Accepted: 24 April 2014 # International Society for Cellular Oncology 2014

Abstract Purpose A major challenge for the early diagnosis of oral cancer is the ability to differentiate oral premalignant lesions (OPL) at high risk of progressing into invasive squamous cell carcinoma (SCC) from those at low risk. Our group has previously used high-resolution image analysis algorithms to quantify the nuclear phenotypic changes occurring in OPLs. This approach, however, requires a manual selection of nuclei images. Here, we investigated a new, semi-automated algorithm to identify OPLs at high risk of progressing into invasive SCC from those at low risk using Random Forests, a treebased ensemble classifier. Methods We trained a sequence of classifiers using morphometric data calculated on nuclei from 29 normal, 5 carcinoma in situ (CIS) and 28 SCC specimens. After automated discrimination of nuclei from other objects (i.e., debris, clusters, etc.), a nuclei classifier was trained to discriminate abnormal nuclei (8,841) from normal nuclei (5,762). We extracted J. Baik : Q. Ye : C. Poh : C. MacAulay : M. Guillaud (*) Department of Integrative Oncology, British Columbia Cancer Research Centre, 675 West 10th Avenue, Vancouver, BC, Canada e-mail: [email protected] L. Zhang : C. Poh Faculty of Dentistry, University of British Columbia, 2199 Wesbrook Mall, Vancouver, BC, Canada L. Zhang Vancouver Hospital and Health Sciences Centre, British Columbia, 895 West 10th Avenue, Vancouver, BC, Canada M. Rosin Department of Biomedical Physiology and Kinesiology, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada M. Rosin Department of Cancer Control Research, British Columbia Cancer Research Centre, 675 West 10th Avenue, Vancouver, BC, Canada

voting scores from this trained classifier and created an automated nuclear phenotypic score (aNPS) to identify OPLs at high risk of progression. Results The new algorithm showed a correct classification rate of 80 % (80.6 % sensitivity, 79.3 % specificity) at the cellular level for the test set, and a correct classification rate of 75 % (77.8 % sensitivity, 71.4 % specificity) at the tissue level with a negative predictive value of 76 % and a positive predictive value of 74 % for predicting progression among 71 OPLs, performed on par with the manual method in our previous study. Conclusions We conclude that the newly developed aNPS algorithm serves as a crucial asset in the implementation of high-resolution image analysis in routine clinical pathology practice to identify lesions that require molecular evaluation or more frequent follow-up. Keywords Oral premalignant lesions . Random forests . Automated classification . Image cytometry . Quantitative pathology . Malignant progression

1 Introduction Currently, histology remains the most reliable way for predicting cancer risk when oral premalignant (pre-invasive) lesions (OPLs) show high-grade changes (i.e., severe dysplasia or carcinoma in situ, CIS). It is, however, a poor predictor of cancer risk of OPLs with no or low-grade (mild/moderate) dysplasia (termed LGOPL). Previously, we found that there are subtle histological differences between progressing LGOPLs and non-progressing LGOPLs and, in a retrospective study, we have shown that nuclear phenotypic score (NPS) as measured by a computer-driven microscope imaging system could serve as an adjunct tool to assist pathologists in judging the progression risk of LGOPLs [1]. To assess the full

J. Baik et al.

clinical potential of this new technology, we have correlated NPS with pathology grade, genetic profile and disease outcome, the ultimate metric for judging the validity of this new technology. We found that NPS monotonically increased with the severity of the pathology diagnosis, and that there was a significant increase in NPS between the low-histology risk group (hyperplasia or mild/moderate dysplasia) and the highhistology risk group (severe dysplasia or CIS). The NPS values were significantly higher for samples with LOH at most of the chromosome regions examined (e.g. 3p, 4q, 9p and 11q), and were strongly associated with the presence of a high-risk LOH pattern (e.g. LOH at 3p and/or 9q plus LOH at any of the arms 4q, 8p, 11q, 13q or 17p). Progressing cases showed significantly higher NPS compared to nonprogressing cases, and with a cut-off value of 4.5, above which one could predict an increased cancer risk for an OPL, there was a 10-fold increase in the relative risk of progression to cancer for oral lesions with a high NPS (>4.5) compared to those with a low NPS (0.1). On average, the non-progressing cases were monitored over twice the duration of the progressing cases (median [range]:71 [2.1–133.7] months vs 20 [6.3–115.5] months) to ensure that progression did not occur.

Automated classification of oral pre-malignant lesions

2.2 Sample preparation All samples were formalin fixed and paraffin embedded. The histological diagnoses of the samples were confirmed by two oral pathologists (C.P. and L.Z.). Serial sections, 4-μm in thickness, were prepared from each sample and placed on two glass slides, of which one was stained with H&E and the other with Feulgen-Thionin [10]. The area on the H&E slide selected for conventional histopathology diagnosis was the same area that was selected on the Feulgen-Thionin stained slide to be examined by the QTP imaging system. 2.3 Quantitative tissue phenotype (QTP) analysis The imaging system used for QTP was a modified version of the Cyto-Savant automated quantitative system (Cancer Imaging, BC Cancer Agency; [10]). The illumination wavelength was 600±5 nm, corresponding to the absorption peak of the Thionin stain. The effective pixel sampling space within the plane of the sample was 0.34 μm2, and the effective pixel sampling area was 0.116 μm2. The software that was specifically designed for interactive semi-automatic cellular and architectural analysis was implemented to ensure the stability of the imaging system for each analysis [11]. The imaging system characteristics were in conformity with the recommendations of the European Society of Analytical Cellular Pathology [12]. Using a reference image of the H&E region of interest (ROI), an experienced technician delineated the same ROI on the Feulgen image. For normal epithelium and OPLs, the full width of the epithelium had to be present. The nuclei were obtained from all layers of the epithelium. The regions used in each case of SCC were selected using the following three criteria: (i) the region of the tumor selected needed to show tumor differentiation that was representative of the case, (ii) Fig. 1 Flow diagram illustrating the consecutive steps of our methodology

the region had to exhibit a minimum amount of inflammation and epithelial tissue and (iii) the tumor region needed to be reasonably large, i.e., not a single line or layer of tumor cells. The semi-automated algorithm used in this study consisted of 5 consecutive steps (Fig. 1): (1) manual delineation of the ROI defined on the image displayed on the monitor (in-focus image), (2) automatic thresholding followed by segmentation using our proprietary algorithm [13]. This step was repeated five times: one in-focus image, two images collected at 1 and 2 microns above the in-focus image, and two images collected at 1 and 2 microns below the in-focus image (an example of the segmentation algorithm is shown in Fig. 2), (3) automated classification of objects and identification of in-focus and intact objects (nuclei), (4) calculation of the new nucleus phenotype score (NPS), and (5) calculation of the automated tissue nuclear phenotype score (aNPS). For each object, 110 features were calculated for measuring the size, the shape, the nuclear DNA amount and the chromatin distribution [14]. Only a subset of these features was retained and used in the final analyses. An exhaustive list and description of these features is available from the authors and can be found in Doudkine et al. [14]. 2.4 Statistical classifications The automated QTP analysis employs two classifications: (1) object classification, classifying in-focus and intact objects (nuclei) from all the objects obtained from the five images and (2) nuclei classification, classifying nuclei selected in the previous step originating from normal tissue or from abnormal (cancerous) tissue (CIS or SCC). Here, we used Random Forests, a tree-based classifier. This is an ensemble classifier that makes classification decisions based on a majority vote from a set of classification or regression trees that are generated from random subsets of data, using randomly selected

J. Baik et al. Fig. 2 Image Analysis of Oral Pre-malignant Lesions. A) H&E stained diagnostic area, B) Delineation of the Region Of Interest on the Feulgen-stained Section. C) Nuclear segmentation

predictors for each subset. The Random Forests classifier outperforms many other classifiers (e.g. linear discriminate analysis) and is robust against over-fitting [15]. 2.4.1 Object classification Object classification based on Random Forests was used to automate the collection of in-focus and intact nuclei images. To train the object selection classifiers, two technicians labelled each individual object originating from the training samples into three groups: (1) in-focus, intact nuclei (good objects, i.e., nuclei), (2) clustered or overlapping objects (merged into a group “cluster” objects), and (3) “out of focus” or “fragmented” objects and other debris (merged into a group “junk” objects). The initial groups were “good”, “cluster”, “out of focus”, “bad mask” and “junk”. Since we were only interested in the groups “nuclei” and “cluster”, we decided for this first implementation to merge “out of focus”, “bad mask” and “junk” into one single group, called “junk”. In total, the training sample group encompassed 8,072 nuclei, 1,699 cluster objects and 129,180 junk objects. These objects were randomly split into training (70 %), validation (20 %), and test (10 %) sets (Fig. 3). A series of five binary Random Forests classifiers was trained using these training samples and arranged in a hierarchical classification scheme Fig. 3 Allocation of OPLs in the study. Splitting of the training samples into training, validation and test sets is at the object level, rather than specimen level

(Fig. 4). This hierarchical scheme was designed to satisfy two objectives: (1) to retain the maximum number of cluster and nuclei objects and (2) to recover as many good objects from this composite group as possible. For each classifier, the number of features used was tuned using 10-fold cross-validation, and the subsample sizes were initially set to be equal to the smallest class to address class imbalance issues [16]. The subsample sizes were iteratively tuned to optimize the recovery of desirable objects in the training and validation sets. The number of random trees generated was set at 500, and sub-sampling was done without replacement (Table 1). Once acceptable levels of correctly selected and classified objects were present in the validation set, the selection and classification procedures were tested on the test set and no further modifications were made to the objects selection and classification procedure. 2.4.2 Nuclei classification Objects classified as nuclei were used to train a new Random Forests classifier, a nuclei classifier, to classify nuclei into two distinct groups: (i) a normal group containing nuclei collected from normal specimens and (ii) an abnormal (cancerous) group containing nuclei collected from abnormal (cancerous) specimens. It is worthy to note that, whether all nuclei in a

Automated classification of oral pre-malignant lesions

Fig. 4 Objects classification algorithm. Each split corresponds to a Random Forest

normal tissue can be considered as being normal, all nuclei originating from an abnormal tissue are certainly not all malignant/abnormal. The proportion of abnormal (cancerous) and non-cancerous nuclei (i.e., normal, dysplastic or any intermediate phenotype) within each abnormal (cancerous) tissue specimen varies greatly. Since it is difficult to assess the phenotype of each individual nucleus within each tissue specimen (sectioning effect), a nucleus classified as abnormal (cancerous) signifies that this nucleus is more likely

to have been collected from an abnormal (cancerous) tissue than from a normal tissue. Nuclei collected from normal specimens were mostly epithelial in nature since, by definition, they were located inside the region of interest (ROI). On the contrary, abnormal (cancerous) specimens were composed of a mixture of epithelial cells, stroma cells, fibroblasts, lymphocytes etc. Only epithelial nuclei from abnormal (cancerous) areas were selected and classified as “nuclei”.

Table 1 Parameters of the objects classifiers #

Initial group

Classification groups

# of trees

# of features per tree

Sampling size

Sampling

1

All objects

Cluster, non-clusters

500

30

Without replacement

2

Clusters (From 1)

Clusters, non-Clusters

500

30

3

Non-clusters (From 1)

Good, not good

500

30

4

Non-clusters (From 2)

Good, not good

500

30

5

Good (From 3)

Good, not good

500

30

Cluster: 1,000, Non-cluster: 1,200 Cluster: 1,000 Non-cluster: 750 Good: 3,000, Not good: 3,300 Good: 600, Not good: 600 Good: 3,000, Not good: 1,400

Without replacement Without replacement Without replacement Without replacement

J. Baik et al. Table 2 Parameters of the nuclei classifier Initial group

Classification groups

# of trees

# of features per tree

Sampling size

Sampling

Nuclei identified by object classification

Normal, cancer

500

27

Normal: 4,000, Cancer: 3,000

Without replacement

The parameters of the Random Forests classifier were tuned similarly to the nuclei classifier (Table 2). The results of the nuclei classifier were validated by testing the classifier on the nuclei in the validation and test sets. 2.4.3 Nucleus phenotype score (NPS) calculation A NPS was calculated by extracting the voting score from the decision trees within the Random Forests-based nuclei classifier. The voting score refers to the proportion of votes each nucleus received from each of the randomly generated decision trees in the Random Forests classifier. A score of 0 can be interpreted as a unanimous vote (100 % of decision trees in the Random Forests classifier) to classify a nucleus as “normal-like”, and a score of 1 as a unanimous vote to classify a nucleus as “abnormal-like”. NPS represents a continuous metric between 0 and 1, in which intermediate values indicate an “intermediate” phenotype between a normal and an abnormal (cancerous) phenotype. 2.4.4 Automated tissue nuclear phenotype scores (aNPS) The aNPS assigned to each specimen was calculated by averaging the NPS of all nuclei found within the specimen. Table 3 Accuracy of the Random Forests algorithm to classify objects into “Good”, “Cluster” or “Junk” groups in training (A), validation (B) and test (C) sets

2.5 Statistical analyses All analyses were performed using the R statistical software (R version 2.15.1) package. The Random Forest implementation algorithm was provided by the “randomForest” package in R (randomForest version 4.6-2) [17, 18]. In addition, the CARET package in R (caret version 4.90) was used to perform parameter tuning for the Random Forest classifiers [19]. For comparison of continuous data between groups, a nonparametric Wilcox rank-sum test was used. Comparisons of frequencies between groups were performed using the Fisher’s exact test. The kernel density estimation with Gaussian kernel was used in the density distribution plot. The KaplanMeier estimator was used to estimate the time to progression curves. Differences in survival curves between groups were examined using the log-rank test. Differences in relative Predicted

A: Training set Actual

The number in bold is the number of objects being correctly classified by the algorithm

The aNPS may range from 0 to 1, where a value of 0 corresponds to a specimen in which all nuclei were classified as “normal-like”, whereas a value of 1 corresponds to a specimen in which all nuclei were classified as “abnormal-like”. An intermediate value indicates an “intermediate” phenotype between normal and abnormal (cancerous) phenotype.

Correct classification rate (%)

Good

Cluster

Junk

Good Cluster

4,690 50

96 920

857 250

85.4 76.8

Bad mask Out of focus Junk

1,445 1,977 1,944

332 142 1,994

3,092 11,896 67,580

61.6 85.6 94.2

1,410 12 398 602 606

27 243 103 32 588

203 69 904 3,283 19,310

86.0 75.0 64.3 83.8 94.2

674 5 221 250 321

10 119 55 30 271

105 31 442 1,659 9,703

85.4 76.8 61.6 85.6 94.2

B: Validation set Actual Good Cluster Bad mask Out of focus Junk C: Test set Actual Good Cluster Bad mask Out of focus Junk

Automated classification of oral pre-malignant lesions Fig. 5 Test of the accuracy of objects classification on the test set. Among the “Good” group, 88 % of objects are correctly classified, 2 % are classified into “Cluster”, and 10 % are classified into “Junk”. Among the “Cluster” group, 78 % of objects are correctly classified, 1 % are classified into “Good”, and 21 % are classified into “Junk”. Among the “Junk” group, 90 % of objects are correctly classified, 7 % are classified into “Good”, and 3 % are classified into “Cluster”

hazards between groups were examined using the Cox proportional hazard model. All P values were two-sided and 0.05 was considered as the significance level.

3 Results 3.1 Object classification performance The performance of the objects classification to classify all objects into three groups (i.e., nuclei, clusters and junk) is presented in Table 3 and Fig. 5 for the training, validation and test sets. Overall, a total of 83.9 % of the 8,072 good objects, 75.5 % of the 1,699 cluster objects, and 91.2 % of the 129,180 junk objects were correctly classified. On average, for each specimen in the test sample group, about 100 nuclei were identified by this classifier. Objects not classified as nuclei were removed from subsequent analyses. The numbers of objects classified as nuclei from this first step were respectively 5,762 for the normal group and 8,841 for the abnormal (cancerous) group. Table 4 Accuracy of classification of nuclei as originating from normal or cancer/CIS in training (A), validation (B) and test (C) sets

Based on the mean decrease in the Gini impurity [20] in the Random Forests classifier, the five features that were the most discriminating to separate normal samples from abnormal (cancerous) samples were: (a) Fractal_area1, measurement of the area of a three dimensional surface, created by the nuclear optical density function, (b) DNA_Index, measurement of the integrated optical density of the nucleus, (c) Max_radius, maximum value of the length of the measurement of the smoothness of image intensity—large for nuclei with slight and spatially smooth grey level variations, (e) Run_length1, mean of the measurements for the nonuniformity of the run lengths across the four principal directions [14]. Table 4 shows the performance of the nuclei classification to classify the nuclei into a normal-like group and an abnormal-like group for the training, validation and test sets. A total of 80.2 % of the 8,841 cancer-like nuclei and 78.1 % of the 5,762 normal-like nuclei were correctly classified. Figure 6 shows the density distributions of the nucleus phenotype scores (NPS) for nuclei originating

Predicted

A: Training set Actual B: Validation set Actual

The number in bold is the number of objects being correctly classified by the algorithm

3.2 Nucleus phenotype score (NPS) performance

C: Test set Actual

Correct Classification Rate (%)

Cancer/CIS

Normal

Cancer/CIS Normal

4,907 854

1,185 3,160

80.5 78.7

Cancer/CIS Normal

1,464 290

383 889

79.3 75.4

Cancer/CIS Normal

727 118

175 451

80.6 79.3

J. Baik et al. Fig. 6 NPS distribution for nuclei from specimens diagnosed as normal and specimens diagnosed as SCC/CIS, in the training set

from normal tissues and SCC/CIS tissues in the training set, in which the NPS for the SCC/CIS tissues are skewed to the left, whereas the scores for the normal tissues are skewed to the right. Figure 7 shows the density distributions of the NPS for all OPLs in this study, grouped by different diagnostic grades (i.e., normal, hyperplasia, D1, D2, CIS, carcinoma). A bimodal distribution was observed in the intermediate levels of dysplasia. As the diagnosis increased towards a higher risk, the distribution of the NPS became more skewed to the left, reflecting that higher risk tissues are more likely to have a larger proportion of abnormal-like nuclei. The NPS for each individual nucleus extracted from the Random Forest was amalgamated across the nuclei selected to generate the aNPS for each specimen in the 71 test samples. 3.3 Automated tissue nuclear phenotype score (aNPS) performance To assess the validity of the aNPS to measure the progressing potential of OPLs with hyperplasia or mild/moderate dysplasia, we evaluated the correlation of aNPS with outcome for the Fig. 7 NPS distribution of nuclei for all 133 OPLs in the study, grouped by diagnosis

OPLs in the test samples. The aNPS values were compared between progressing lesions and non-progressing lesions. As shown in Fig. 8a, progressing lesions are more likely to have higher aNPS values than nonprogressing lesions [aNPS median (range)]: 0.50 (0.44– 0.58) for non-progressing cases versus 0.64 (0.56–0.74) for progressing cases (P

Automated classification of oral premalignant lesions using image cytometry and Random Forests-based algorithms.

A major challenge for the early diagnosis of oral cancer is the ability to differentiate oral premalignant lesions (OPL) at high risk of progressing i...
1MB Sizes 0 Downloads 3 Views