Segmentation of Trabecular Jaw Bone on Cone Beam CT Datasets Olivia Nackaerts, MSc, PhD;* Maarten Depypere, MSc, PhD;† Guozhi Zhang, MSc, PhD;‡ Bart Vandenberghe, DDS, MSc, PhD;§ Frederik Maes, MSc, PhD;¶ Reinhilde Jacobs, DDS, MSc, PhD;** SEDENTEXCT Consortium

ABSTRACT Background: The term bone quality is often used in a dentomaxillofacial context, for example in implant planning, as bone density and bone structure have been linked to primary implant success. Purpose: This research aimed to investigate the performance of adaptive thresholding of trabecular bone in cone beam CT (CBCT) images. The segmentation quality was assessed for different imaging devices and upper and lower jaws. Materials and Methods: Four jaws were scanned with eight CBCT scanners and one micro-CT device. Images of the jaws were spatially aligned with the micro-CT images. Two volumes of interest for each jaw were manually delineated. Trabecular bone in the volumes of interest in the micro-CT images was segmented so that the micro-CT images could serve as high-resolution ground truth images. The volumes of interest in the CBCT images were segmented using both global and adaptive thresholding. Results: Segmentation was significantly better for the lower jaw than for the upper jaw. Differences in performance between the scanners were significant for both jaws. Adaptive thresholding performed significantly better in segmenting the bone structure out of CBCT images. Conclusions: When assessing jaw bone structure, the observer should always choose adaptive thresholding. It remains a challenge to identify the optimal threshold selection for the structural assessment of jaw bone. KEY WORDS: bone density, CBCT imaging, micro-CT

INTRODUCTION

application is bone quality assessment prior to implant treatment, as bone density and bone structure have been linked to primary implant success.1,2 Nevertheless, the definition of jaw bone quality remains a matter of debate.3 Bone quality can be assessed through X-raybased radiographic imaging, either by subjective evaluation with or without reference images or by objective quantification of jaw bone. Objective parameters of bone quality typically relate to bone density as deduced from the X-ray attenuation (gray values or Hounsfield units) or to an image-based description of the bone structure. The structure of trabecular jaw bone is indeed of relevance, as it can provide information on bone metabolism and biomechanical properties. Some wellknown parameters for describing the trabecular jaw bone structure or morphology are bone volume percentage (bone volume/total volume, BV/TV), trabecular thickness, trabecular number, and trabecular separation. The reliable evaluation of these and other morphological parameters from radiographic images strongly

The term bone quality is often used in a dentomaxillofacial context in the clinical evaluation of disease processes affecting the jawbone. The most frequent *Postdoctoral fellow, OMFS-IMPATH Research Group, Department of Imaging & Pathology, KU Leuven, Leuven, Belgium; †postdoctoral fellow, Medical Image Computing, Center for Processing Speech and Images, Department of Electrical Engineering, KU Leuven, Leuven, Belgium; ‡postdoctoral fellow, Leuven University Centre for Medical Physics in Radiology, University Hospitals Leuven, Leuven, Belgium; § postdoctoral fellow, Department of Oral Health Sciences, KU Leuven, Leuven, Belgium; ¶senior professor, Medical Image Computing, Center for Processing Speech and Images, Department of Electrical Engineering & iMinds, KU Leuven, Leuven, Belgium; **senior professor, OMFS-IMPATH Research Group, Department of Imaging & Pathology, KU Leuven, Leuven, Belgium Corresponding Author: Prof. Reinhilde Jacobs, OMFS-IMPATH Research Group, Department of Imaging & Pathology, KU Leuven, Kapucijnenvoer 7, Leuven 3000, Belgium; e-mail: reinhilde.jacobs @med.kuleuven.be © 2014 Wiley Periodicals, Inc. DOI 10.1111/cid.12217

1

2

Clinical Implant Dentistry and Related Research, Volume *, Number *, 2014

depends on the segmentation of the trabecular bone in the images. In turn, segmentation performance depends on a range of factors, including hardware, patient positioning and imaging stability, reconstruction software, postprocessing algorithms, and others. For implant placement planning, clinicians often opt for cone beam CT (CBCT) imaging.4,5 In previous research, it has been shown that the intensity values of CBCT images are not as uniform or reproducible as those of multislice CT images,6,7 making mere densitometric analysis inappropriate for assessment of bone quality with CBCT. When using segmentation-derived morphometric measures, it is important that the postprocessing of CBCT images be adapted to this inconsistency in intensity values. Adaptive thresholding algorithms that allow the segmentation threshold to vary in every pixel of the image can be expected to cope reasonably well with the nonuniformity of intensity values in CBCT images.8,9 A thorough evaluation of global and adaptive thresholding of CBCT images is therefore of interest. Both thresholding methods are simple and fast, but they require manual selection of appropriate threshold parameters. Comparing segmentation methods that rely on user input can be delicate, as it cannot be guaranteed that the user will select optimal thresholds in practice. Moreover, different CBCT devices or acquisition parameters may require different threshold selections for optimal performance. In fact, the entire imaging chain, including scanned sample, CBCT device, and segmentation method, should be evaluated, irrespective of the user input. Receiver operating characteristic (ROC) curves measure performance on classification tasks, provided that the ground truth is known. While such curves are

often used in the medical imaging context to measure the performance of a radiologist in detecting and classifying image patterns of disease, they can also serve as a useful evaluation of image segmentation performance through analysis of pixel classification accuracy. For segmentation algorithms that require user input, every possible parameter value generates a different point on the ROC curve. When an algorithm is evaluated by its ROC characteristics, all possible user-selected inputs are considered.10 ROC curves can also be used to compare the final segmentation accuracy of images obtained by various imaging devices or the accuracy obtained for different bone samples independently of the userselected parameters. This research aims to investigate the performance of adaptive thresholding of trabecular bone in images of human jawbones acquired by CBCT. To account for the varying user-selected thresholds in the segmentation algorithms, ROC curves are used for evaluating the performance of global and adaptive segmentation methods. The ROC approach additionally enables assessment of segmentation qualities for different imaging devices and for upper and lower jaws. The impact of different approaches for threshold selection on morphometric indices is also investigated. MATERIALS AND METHODS This study received ethical approval from the medical ethics committee of the University Hospitals Leuven (number B32220083749). Table 1 shows an overview of the steps that were followed in order to assess the performance of adaptive thresholding of trabecular bone in images of human jawbones acquired by CBCT. In the paragraphs below, each step will be further clarified.

TABLE 1 Consecutive Steps in Assessing Segmentation Quality Scanning

Image processing

Scan registration 4 jaws 8 CBCT scanners Scan normalisation Volume of interest (VOI) 1 micro-CT selection scanner Segmentation of trabecular bone, micro-CT (ground truth) Segmentation of trabecular bone, CBCT

Image analysis: bone morphometry

Image analysis: segmentation quality

Statistical analysis

One-way ANOVA: Percentage bone volume ROC curves Global thresholding versus scanner comparison Trabecular thickness Paired t-test: thresholding adaptive thresholding Trabecular number method comparison, Manual thresholding versus Trabecular separation upper versus lower jaw thresholding based on sample properties versus thresholding based on optimal overlap

Segmentation of Jaw Bone on CBCT Datasets

Imaging Four formalin-fixed human jaws, two upper and two lower, including soft tissues, were scanned with eight different CBCT devices at standard clinical settings (Table 2) and with a micro-CT device (Skyscan 1173, Skyscan, Kontich, Belgium; Table 2). The instructions for patient positioning were followed for all CBCT devices when the samples were placed. In the micro-CT device, samples were firmly attached to the rotating pin with polystyrene blocks and tape and were wrapped in a humid paper towel in order to avoid dehydration and subsequent movement of the jaws during the long scanning time. The CBCT devices tested were the following: Galileos (Sirona Dental Systems, New York, NY, USA), i-CAT (Imaging Sciences International, Hatfield, PA, USA), Illuma (3M, St. Paul, MN, USA), Newtom (QR, Verona, Italy), Picasso Trio (E-WOO, Yongin, Korea), Promax 3D (Planmeca, Helsinki, Finland), 3D Scanora (Soredex, Tuusula, Finland), and Skyview (MyRay, Imola, Italy). For reporting the results, we have used codes, as the aim was to focus on the postprocessing phase rather than the absolute differences between the scanners. Image Processing After reconstruction of the images using vendorspecific reconstruction software and parameter settings, further processing of the images was needed to segment the bony structures of interest, that is, the jawbone. This segmentation step can be performed using commonly used image analysis tools such as CT-Analyser (Skyscan).

To facilitate comparison of the image datasets, all CBCT images were first spatially aligned to the micro-CT scans by image registration using maximization of mutual information,11 which is fully automated and well suited for multimodal registration. Following registration, the CBCT images of all scanners for each sample were resampled by trilinear interpolation to an identical voxel resolution of 0.2 × 0.2 × 0.2 mm, which is assumed to be sufficient for comparative quality evaluation. The intensity ranges of the images were subsequently globally normalised by linear rescaling, such that the average intensity within two selected regions in soft tissue and cortical bone was identical in all datasets. Next, in CT-Analyser, two volumes of interest (VOI) for each jaw were manually delineated on the micro-CT images. These volumes were selected to contain trabecular bone only and in a continuous way, thus avoiding data distortion due to cortical bone areas in the VOI. Due to the processing (i.e., registration and resampling of all images), these regions were identical for all eight CBCT scans. Trabecular bone in the volumes of interest in the micro-CT images was segmented using adaptive thresholding. The threshold was visually determined by an examiner. Prior to this manual process, the examiner segmented four different but comparable VOIs in micro-CT images of human jaw bone samples in consensus with two researchers who regularly use bone segmentation in their work. The research setup thus provided a thorough training session for the examiner. The resulting segmentations served as high-resolution ground truth images. The volumes of interest in the CBCT images were segmented using both global and

TABLE 2 CBCT Devices and Imaging Parameters Device

Skyscan 1173 (micro-CT) Promax 3D Galileos Comfort Scanora 3D Newtom VGi Picasso Trio SkyView Illuma Elite i-CAT Next Generation

3

Code

Field of view (cm × cm)

Voltage (kV)

Current (mA)

Scan time (s)

Voxel size (mm)

— A B C D E F G I

— 8×8 15 × 15 10 × 7.5 12 × 8 12 × 7 17 × 17 14 × 21 16 × 13

— 84 85 85 110 85 90 120 120

— 14 7 7 4 4 6.5 3.8 5

— 18 4 15 18 15 15 30 14.7

0.04 0.16 0.3 0.13 0.3 0.2 0.16 0.2 0.2

4

Clinical Implant Dentistry and Related Research, Volume *, Number *, 2014

adaptive thresholding. In global thresholding, one single intensity threshold value was selected to classify all voxels as either background or bone. This method is typically applied in CBCT data, as bone is bright in these image sets. In adaptive thresholding, the average intensity within a sphere around each voxel was calculated and used as local threshold for that voxel. For the current analysis, the radius of the sphere was kept constant at 3 voxels (i.e., 0.6 mm) based on an estimate of the trabecular thickness derived from micro-CT morphological analysis. This approach implicitly assumes that each sphere contains bone as well as soft tissue and defines the threshold as the average of those two tissues. Adaptive thresholding enables the selection of a range of intensity values outside of which voxels are excluded from local thresholding. This range provides a correction for regions that contain no bone or only bone. Through selection of a lower threshold, a minimum average intensity is defined, under which none of the voxels within the sphere are considered bone; through selection of an upper threshold, a maximum average intensity is defined, above which none of the pixels are considered soft tissue. For the current analysis, no upper threshold was selected, as no cortical bone was involved in the analysis and therefore we did not expect to find spheres with only bone. As a result, both segmentation methods, global and adaptive, had a single parameter that needed to be set manually. Image Analysis: Bone Morphometry Standard morphometric parameters were calculated with CT-Analyser, including BV/TV, trabecular thickness, trabecular number, and trabecular separation. For precise definitions of these quantities and their interpretation, please refer to Parfitt and colleagues.12 Image Analysis: Segmentation Quality ROC Curves. To incorporate all the possible choices of the threshold values, a ROC curve can be generated for each combination of VOI and segmentation method. Given the ground truth, every voxel within the VOI in the segmented image can be categorised in one of four possible classes. If the voxel is bone and it is segmented as bone, it is classified as true positive; if it is segmented as background it is classified as false negative. If the voxel is background in the ground truth and segmented as background, it is classified as true negative; if it is segmented as bone it is classified as false positive. The false

positive rate (FPR) is then defined as the ratio of false positives to the amount of background voxels in the ground truth, while the true positive rate (TPR) is given by the ratio of true positives to the amount of bone voxels in the ground truth. A point on the ROC curve plots the TPR versus the FPR in the VOI of a segmented image. Figure 1 illustrates how global segmentations of the same VOI in the same image with five different thresholds can result in a ROC curve. The segmentation at the lower left has a global threshold value higher than the highest intensity in the image, resulting in no bone in the segmentation. As no voxel is assigned to bone, there are no positives, and both TPR and FPR are 0. Upon lowering of the threshold, some voxels are segmented as bone, some correctly and some incorrectly, leading to a specific combination of TPR and FPR. The computation of a ROC point for each threshold value results in a ROC curve. To obtain an average ROC curve, such as the average of ROC curves obtained for different images acquired with the same CBCT device, these curves are averaged by threshold averaging, that is, averaging the points on the curve based on the thresholds that created the points.10 In general, a ROC curve that lies more northwesterly in a ROC graph provides higher accuracy, with a high TPR and a low FPR. This feature can be characterised by the area under the curve (AUC), which will be larger for curves lying more northwesterly. Note that for the adaptive thresholding scheme, not all voxels can be segmented as bone, resulting in a curve that does not reach the coordinate (1,1). Furthermore, the most extreme threshold values are very unlikely to be chosen by a user and are not of interest for our evaluation. Therefore, we computed the partial area under the curve (pAUC) as the AUC between an FPR of 0.1 and an FPR of 0.4. Manual Thresholding versus Thresholding Based on Sample Properties or Optimal Overlap In clinical practice, a threshold is typically selected by visual inspection by an examiner. Alternatively, some information about the sample might be available, such as the percentage bone volume (BV/TV), and a threshold might then be chosen in a way that the segmented volume has equivalent BV/TV values. As stated before, the pAUC is based on the TPR and FPR, which can be related to the overlap, defined as the

Segmentation of Jaw Bone on CBCT Datasets

5

Figure 1 ROC curve construction. Each coordinate on the curve represents a threshold for segmentation. CBCT E: see Table 2.

proportion of correctly segmented pixels. In evaluating and comparing segmentation methods using the pAUC, we implicitly assumed that higher pAUC values resulted in more accurate morphometric indices, meaning a more accurate description of the bone microstructure. To validate this assumption, the threshold value on the ROC curve that resulted in the highest overlap of CBCT bone segmentation with the micro-CT ground truth was determined for both global and adaptive thresholding. To proceed with the previously mentioned assessments, we compared the morphometric indices obtained with the different thresholds: the thresholds determined by an examiner (first), the thresholds determined by optimal congruence with bone volume fraction (second), and the thresholds determined by maximal overlap (third). Results are reported as percentage error from the ground truth value. Statistical Analysis Statistical tests were performed on the pAUCs obtained on each VOI using Matlab (MathWorks, Natick, MA, USA). Different CBCT scanners were compared by oneway ANOVA, while global versus adaptive thresholding and upper versus lower jaw performance were compared with paired t-tests. Differences were considered significant at p < .05.

RESULTS Table 3 summarizes the statistical test results. In the following paragraphs, these results will be discussed in more in detail with regard to significance values. Figure 2 shows the average ROC curves for global thresholding for the different CBCT scanners. In the segmentation results, AUCs are significantly larger for the lower jaw than for the upper jaw (p = .04). Differences in performance between the scanners were significant for upper jaw segmentation (p = .005) and for lower jaw segmentation (p < .001). Figure 3 shows the ROC curves for adaptive thresholding. There is a significant difference between global and adaptive thresholding, where adaptive thresholding performs better in segmenting the bone structure out of CBCT images (p < .001). Figure 4 and Figure 5 show the difference between global (full curves) and adaptive (dotted curves) thresholding for four CBCT scanners as an example. All other scanners showed a similar increase in AUC. The difference was significant between the upper and lower jaws. Based on the paired t-test, we identified a significant difference between global and adaptive thresholding for the procedures’ sensitivity to threshold changes (p = .03), where segmentation based on adaptive thresholding was less sensitive to alternating thresholds.

6

Clinical Implant Dentistry and Related Research, Volume *, Number *, 2014

TABLE 3 Summary of the Statistical Results Investigated difference

AUC, global thresholding, upper versus lower jaw AUC, global thresholding, difference between scanners for upper jaw AUC, global thresholding, difference between scanners for lower jaw AUC, global versus adaptive thresholding Sensitivity to threshold changes for global versus adaptive thresholding

Test used

Result

Paired t-test ANOVA (single-factor)

AUC lower > AUC upper Significant differences for the upper jaw Significant differences for the lower jaw AUC adaptive > AUC global Sensitivity global > adaptive

ANOVA (single-factor) Paired t-test Paired t-test

We can conclude from Table 4 that for all methods of threshold selection, adaptive segmentation generally produces smaller errors in morphometric indices than global thresholding, which is translated into a smaller mean absolute error over the morphometric indices. Manual threshold selection results in smaller errors for trabecular thickness than other threshold selection methods, but other indices are estimated less accurately. A threshold based on the ground truth bone volume percentage (BV/TV as derived from segmented microCT images) improves the results for this individual morphometric index. When the threshold is based on maximal overlap (aiming for maximal overlap with the segmented micro-CT images), the result is a compro-

p Value

.04 .005

Segmentation of Trabecular Jaw Bone on Cone Beam CT Datasets.

The term bone quality is often used in a dentomaxillofacial context, for example in implant planning, as bone density and bone structure have been lin...
581KB Sizes 0 Downloads 3 Views