CME JOURNAL OF MAGNETIC RESONANCE IMAGING 00:00–00 (2014)

Original Research

Automatic Model-Based Semantic Registration of Multimodal MRI Knee Data Ning Xue, MSc,1,2* Michael Doellinger, PhD,1 Jurgen Fripp, PhD,3 Charles P. Ho, MD, PhD,4 Rachel K. Surowiec, MSc,4 and Raphael Schwarz, PhD2 THE KNEE JOINT is a common site of various chronic musculoskeletal diseases, such as osteoarthritis (OA), rheumatoid arthritis, and osteochondritis dissecans. Among them OA is the most common form of arthritis and one of the major causes of disability (1), affecting over 25 million people in the United States, and is increasing in frequency and severity in all aging populations (2). MRI is commonly used to diagnose knee pathologies and evaluate the curative effect of treatment. In addition to the qualitative assessment of cartilage anatomy and other structures of the joint, quantitative MR imaging offers the possibility to assess the biochemical constitution of the cartilage tissue noninvasively. This is a promising approach in the early detection and diagnosis of cartilage damage before the onset of macroscopic changes associated with diseases such as OA (3). To evaluate the knee joint using quantitative MRI, the morphological and biochemical information must be extracted (4). However, the morphological and biochemical information are commonly acquired separately under different MR acquisition techniques and parameters. Thus, it is necessary to bring the morphological MR datasets (target) and the corresponding quantitative biochemical MR datasets (reference) into a common reference frame to evaluate cartilage health in a single session (5). This is usually performed through various methods of registration, which refers to establishing spatial correspondences between different sets of multimodal MR image data. Recently, many MR image registration techniques have been proposed. Among them voxel-based rigid and elastic registration algorithms are most commonly used (6). Both algorithms are implemented as a maximization or minimization procedure of a cost function, such as mutual information (MI), in a deformation field. The rigid registration is simpler to implement and demonstrates lower computational costs, but is not capable of dealing with relative motions between knee bones, such as flex and twist (6,7). In comparison, the voxel-based elastic registration method is more complex to implement but provides a high degree of flexibility in co-registering MR images with different grades of knee movements (8–13). To optimize performance of voxel-based elastic registration methods, some application-specific parameters

Purpose: To propose a robust and automated modelbased semantic registration for the multimodal alignment of the knee bone and cartilage from three-dimensional (3D) MR image data. Materials and Methods: The movement of the knee joint can be semantically interpreted as a combination of movements of each bone. A semantic registration of the knee joint was implemented by separately reconstructing the rigid movements of the three bones. The proposed method was validated by registering 3D morphological MR datasets of 25 subjects into the corresponding T2 map datasets, and was compared with rigid and elastic methods using two criteria: the spatial overlap of the manually segmented cartilage and the distance between the same landmarks in the reference and target datasets. Results: The mean Dice Similarity Coefficient (DSC) of the overlapped cartilage segmentation was increased to 0.68 6 0.1 (mean 6 SD) and the landmark distance was reduced to 1.3 6 0.3 mm after the proposed registration method. Both metrics were statistically superior to using rigid (DSC: 0.59 6 0.12; landmark distance: 2.1 6 0.4 mm) and elastic (DSC: 0.64 6 0.11; landmark distance: 1.5 6 0.5 mm) registrations. Conclusion: The proposed method is an efficient and robust approach for the automated registration between morphological knee datasets and T2 MRI relaxation maps. Key Words: model-based semantic registration; landmark detection; knee joint diseases J. Magn. Reson. Imaging 2014;00:000–000. C 2014 Wiley Periodicals, Inc. V

1 Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Germany. 2 Imaging & Therapy Division, Healthcare Sector, Siemens AG, Erlangen, Germany. 3 CSIRO Digital Productivity and Services Flagship, CSIRO Computational Informatics, The Australian e-Health Research Centre, Brisbane, Australia. 4 Steadman Philippon Research Institute, Vail, Colorado, USA. Contract grant sponsor: Healthcare Sector, Siemens AG. *Address reprint requests to: N.X., Allee am R€ othelheimpark 2, 91052, Erlangen, Germany. E-mail: [email protected] Received October 20, 2013; Accepted February 11, 2014. DOI 10.1002/jmri.24609 View this article online at wileyonlinelibrary.com. C 2014 Wiley Periodicals, Inc. V

1

2

Xue et al.

Figure 1. Twenty-three landmarks defined on the surfaces of three bones in the knee joint (one landmark posterior central patella is not shown because of the limit of views). The underlined landmarks are used to indicate the location of the bonecartilage interface and the other landmarks are used to describe knee kinematics.

require tuning to define the range of the deformation field (9,10). In some cases, several manual preprocessing steps are also required (11–13). The iterative derivation of the transformations in the registration procedure is based on the similarity measure between the both datasets. Thus, the transform matrices derived from the image regions of different bones are likely different, especially when there are relative motions between bones. In this case, using the entire knee joint as input would cause the registration procedure to remain affixed in a local minimum position and reduce the accuracy of the registration. Thus, to avoid the influences of the other bones while iteratively computing the optimized transform matrix, it would be more appropriate to use each bone as the input for the voxel-based registration and to rigidly co-register the three bones independently. Additionally, the anatomic information regarding the knee joint is rarely used in the voxel-based elastic registration. The MR dataset is only registered as a set of variables with random intensities with each image voxel transformed merely according to statistical similarity criteria, regardless of which structures or tissues the voxel belongs. This can lead to nonanatomical deformations of bone and cartilage following the elastic registration and in most cases these deformations do not actually occur between the both datasets. The deformation and the degeneration of the bone and cartilage are important symptoms of many joint diseases, such as OA, spur, hyperosteogeny, hyperostosis, Paget’s disease. Therefore, the nonanatomical deformations of structures and soft tissues caused by the image processing techniques are highly undesired for diagnosing knee joint diseases in a single session (14–17). The purpose of this work is to present a new registration scheme which is capable of dealing with MR images with different health conditions of the knee joint, knee postures, patient positioning and various

contrasts, and also causes minimal nonanatomical deformation of bone and cartilage. The proposed semantic registration method makes use of anatomic information regarding the knee joint and aligns the knee MR image data by simulating the actual movements of the knee joint. MATERIALS AND METHODS Materials This study was approved by the Institutional Review Board and all subjects provided informed consent before inclusion. To describe the mean shape of each knee bone, a Statistical Shape Model (SSM) was built by the CSIRO Computational Informatics (18) (see Appendix A). To align each SSM to the knee bone in the target datasets using a landmark-based registration, we propose eight landmarks on the femur SSM, ten landmarks on the tibia SSM and six landmarks on the patella SSM (Fig. 1) (19). The training and testing data used in this work are listed in Table 1. Training data for the automated landmark detection were obtained from the Osteoarthritis Initiative (OAI) database, which is a natural history database for osteoarthritis (OA) and available for public access at http://oai.epi-ucsf.org/datarelease/. The participants of the OAI database are 45–79 years of age with approximately equal numbers of men and women for all datasets in the database. In addition, The OAI database provides the Kellgren-Laurence (KL) grade of each subject. Twelve subjects with different KL grades (Grades 0–4) were selected as training data. The training data were acquired at 3.0 Tesla (T) (Magnetom Trio, Siemens Healthcare, Erlangen, Germany) and consisted of a fat-suppressed sagittal three-dimensional (3D) double-echo steady-state (DESS) dataset and a T2 map dataset for each subject. For training the landmark detection algorithm, the 24 predefined landmarks were manually placed by experienced technologists in all training datasets.

Erlangen (4 subjects)

SPRI (6 subjects)

The Kellgren-Lawrence (KL) grading system is a radiological classification of knee OA. It is based on X-rays and consists of Grade 0 to 4. Grade 0 demonstrates normal knee joint with no signs of OA and Grade 4 denotes multiple large osteophytes, severe joint space narrowing, marked sclerosis and definite deformation of the bone ends. b The Whole-Organ Magnetic Resonance Imaging Score (WORMS) is a semiquantitative scoring method for multifeature, whole-organ evaluation of the knee in osteoarthritis based on magnetic resonance imaging findings.

a

384384 256256 256256 512512 256256 0.3 0.6 0.6 0.3 0.6 3.5 0.7 2 0.6 3.6 180 120 180 25 180 10,20,. . .70 45 14,28,. . .96 7 10,20,. . .70 T2 map SPACE T2 map DESS T2 map

2700 1200 2570 19.4 1811

120 150 150 150 150

384384 0.36 0.7 25 4.7 OAI (15 subjects) Testing

Health: KL¼0 (5 subjects) At risk: KL¼0–1 (5 subjects) Pathological: KL¼2–4 (5 subjects) Health: WORMSb ¼0 (3 subjects) Pathological: KL¼1–2 (3 subjects) Asymptomatic

DESS

16.3

140

384384 0.3 3.5 180 10,20,. . .70 T2 map

2700

120

384384 0.36 0.7 25 140 4.7 16.3 DESS

Health: KL ¼0 (4 subjects) At risk: KL¼0–1 (4 subjects) Pathological: KL¼2–4 (4 subjects) OAI (12 subjects) Training

a

Health condition

Contrast

TE (ms) Source

Table 1 Training and Testing Data in This Work

3

TR (ms)

FOV (mm)

Flip angle

Slice thickness (mm)

Resolution (mm2)

Matrix size

Automatic Semantic Registration

The proposed registration method was validated using 25 subjects (Table 1). Fifteen subjects with various KL grades (0–4) and spanning all stages of degeneration/ OA severity were obtained from the OAI database and consisted of DESS and T2 map datasets. Six subjects (3 asymptomatic, 3 patients scheduled for surgery) were acquired by the Steadman Philippon Research Institute (SPRI). The health condition of the subjects was evaluated through subjective self-reported questionnaires, a physical examination performed by the orthopedic surgeon, and by morphological MRI scoring. Each subject consisted of a 3D “Sampling Perfection with Application optimized Contrast using different flip angle Evolutions” (SPACE) dataset, a T2 map dataset and binary masks of the patellar, femoral and tibial cartilage. The binary masks were obtained by manually segmenting on a slice-by-slice basis all slices of the SPACE and T2 map datasets by expert raters at SPRI. To evaluate the proposed system under different knee postures, the remaining four test subjects were acquired at 1.5T (Magnetom Avanto, Siemens Healthcare, Erlangen, Germany) in Erlangen using a 15channel knee coil. Each subject consisted of a DESS dataset and a T2 map dataset. When measuring the datasets, volunteers were asked to flex and twist the  knee joint approximately 5 from the neutral alignment between different measurements. To validate the automated landmark detection and the proposed registration method, the predefined landmarks were manually placed in all test datasets by experienced technologists with 1 year of training. Registration Methodology The proposed semantic registration method is based on the fact that a knee joint is comprised of three bones (the femur, tibia, and patella). Any movements of the knee joint, including (a) the rigid movement of the whole joint caused by different patient positioning and (b) the relative motions between knee bones caused by the different knee postures, can be semantically simplified into a combination of three rigid movements, each describing the movement of one knee bone. The principle of the registration method is to localize each knee bone in the target datasets with automated landmark detection and co-register the entire knee joint by aligning each bone of the knee joint, respectively. An overview of the registration scheme is outlined in Figure 2. The morphological DESS/SPACE datasets and the biochemical T2 map datasets were regarded as the target datasets and the reference datasets, respectively. To localize the three bones in the target datasets, the 24 predefined landmarks were first localized with an automated landmark detection method (Fig. 2a). Second, a 3D rigid landmark-based registration was implemented to align the SSM into the target datasets and the positions of each bone were consequently estimated (Fig. 2b). Next, each bone was respectively aligned to that in the reference datasets using a multimodal rigid registration (Fig. 2c). Given the three local transforms of each bone, we wanted to generate a dense transformation to map the entire knee joint.

4

Xue et al.

Figure 2. Flow chart for the modelbased semantic registration. The rounded rectangles denote the inputs for the registration workflow. The rectangles indicate the processes implemented in the workflow. The parallelograms are the intermediate data generated in the workflow. Each gray block denotes one essential step of the registration workflow.

This was achieved by a weighted combination of three local transforms, using distance maps derived from the SSM (Fig. 2d). The combined transformation field coregistered bones and cartilage in the target datasets to those in the reference datasets. All image processing steps were performed with the Insight Segmentation and Registration Toolkit (ITK) (http://www.itk.org/) on an Intel(R) Core(TM) i5 2.4 GHz CPU. The additional image analysis and the assessment of results were performed using MeVisLab (http://www.mevislab.de/).

generated by computing the gradient magnitude and orientation at each image sample point in its surrounding region (see Appendix C). Interest points with minimal SIFT distance to training landmarks were chosen as landmarks (Fig. 3c). Finally, to achieve high localization accuracy and improve detection robustness, a multi-classifier boosting system (22) was implemented to combine results derived from the different training datasets (Fig. 3d) (see Appendix D). The automatically detected landmarks were used to align the SSM into the target datasets and localize the three joint bones in the target datasets.

Step 1: Automated Landmark Detection To register the three knee bones independently, each bone should be first localized in the target MR datasets (DESS or SPACE). This is done through an automated landmark detection technique. The proposed scheme of landmark detection is outlined in Figure 3. The method is a learning-based technique using a set of manually placed landmarks (Fig. 3a) to describe the feature of predefined landmarks. Because the landmarks were predefined on the knee bone surface, it would be time-consuming to compute the feature descriptor of all image points in target datasets. To reduce the computational cost, a simplified Difference of Gaussian (DoG) detector (20) was implemented to choose a minority of interest points as inputs for the feature description (Fig. 3b) (see Appendix B). The interest point refers to an image point with high image gradient, i.e., the image point located on the interface among different tissues. All detected interest points were considered as candidates for landmarks. Next, to evaluate which interest point is the best candidate, both training landmarks and interest points in target datasets were characterized using a “coarse to fine” Scale Invariant Feature Transform (SIFT) descriptor (21). This SIFT descriptor was

Step 2: Landmark-based Registration Based on the detected landmarks in the target dataset, the knee bones can be localized in the target datasets separately by aligning the SSM to the target datasets using a landmark-based registration algorithm (23–25). The algorithm iteratively revises a 3D rigid transform to minimize the Euclidean distance between the landmarks on the SSM and the same landmarks in the target datasets. After the landmark-based registration, each SSM was aligned to the corresponding bone in the target datasets and three surface models were generated to describe the outlines of the three bones. Step 3: Rigid Registration of Each Bone Region of Interest Using the surface models, the image data of each bone and its surrounding soft tissues (obtained using a 10 mm Bounding Box around the bone) were extracted from the target datasets and defined as the region of interest (ROI) of this bone. Each ROI was rigidly registered to the reference dataset separately to avoid the influences of image information of the other bones in the registration (10,13,26). A 3D linear interpolation

Automatic Semantic Registration

5

Figure 3. Pipeline of the proposed automated landmark detection. a: In the training datasets, predefined landmarks (blue points) are manually placed. b: As landmark candidates, interest points (red points) are detected in target datasets using DoG detector. c: Training landmarks and interest points are characterized with the same SIFT descriptor. The SIFT distance between training landmarks and interest points are applied as classifiers for selecting landmark candidates. d: The candidates (yellow points) chosen by classifiers trained from different training datasets are combined with a boosting system to be the final detected landmark (green point).

was performed to make sure that the both datasets are in the same scale space before the registration. Because the both datasets were acquired using different MR sequences and with different contrasts, the MI was used as the similarity measure function for the registration (see Appendix E). Two datasets are considered registered when the mutual information between them is maximized. Our registration uses a regular step gradient descent algorithm to optimize the transformation parameters iteratively (see Appendix F). Step 4: Combining Registration Results Using Distance Maps Because each bone in the target datasets is respectively aligned to that in the reference datasets, there is a possibility for some reduplicated voxels in the contact area between femur ROI and tibia ROI and between femur ROI and patella ROI. This causes a sharp edge in the contact area (left in Fig. 4). In addition, there are also many voxels located outside the ROIs of all three bones (black region, left in Fig. 4). To solve these problems, a weighted combination based on distance maps was performed after the registration to assign every voxel in the target dataset to the correct position. The distance map was computed from the surface model by means of ( 0 if a 2 Mj Dj ðaÞ ¼ [1] Mj Mj max ðGs Þ=ðGs ðaÞ þ dÞ if a 2 = Mj where a is a voxel in the dataset A, M is a binary mask of the joint bone and j denotes the three different bones. d is an infinitesimal constant to make the denominator not be zero. M is derived from the surface model of each bone. In the binary mask all the voxels inside the bone are set to 1 and the remaining

voxels are set to 0. GsM is a Gaussian-filtered mask. After the Gaussian filtering the mask is no longer binary and the voxels in the neighborhood of the bone surface have a value between 0 and 1. All the voxels inside the bone in the Gaussian filtered mask are set to 0 again. In the generated distance map, only the voxels close-to and outside the joint bone have a positive value (Fig. 5). The closer the voxels are located to the bone surface, the smaller value they have. The combination of the registration results of the three different joint bones was implemented by means of 3 X T ðaÞ ¼ wj ðaÞTj j¼1

1 Dj ðaÞ wj ðaÞ ¼ X 1 3 i¼1 D ðaÞ i

[2]

where T ðaÞ is the final 3D rigid transformation vector for the voxel a and Tj is the 3D rigid transformation vector obtained from the registration of the bone j. wj ðaÞ is the weight of each transformation vector, which is derived from the corresponding distance map. In this way, the final transformation field for the input image is continuous and there are no more reduplicated voxels in the registered image (right in Fig. 4). Performance Evaluation Methodology Landmark Detection Validation The automated landmark detection method was validated by detecting 24 predefined landmarks in 25 test subjects (DESS/SPACE MR datasets). The performance of landmark detection was evaluated according

6

Xue et al.

Figure 4. The combination of the ROIs of the registered bones. Left: Before the combination: the circles indicate the reduplicated voxels and the consequent sharp edge in the contact area between different bones. In the black periphery are the voxels outside the ROI of all bones. Right: After the combination: the reduplicated voxels are assigned to the right position and the contact area becomes smooth.

to two statistical criteria: the mean distance between manually placed landmarks and automatically detected landmarks and the standard deviation of the landmark distance. To validate the inter-observer variability, the predefined landmarks were manually placed by two experienced technologists. The robustness of landmark detection was evaluated by comparing detection accuracies in test subjects with the different features (e.g., DESS versus SPACE, healthy versus pathological, rigid movements versus relative motions between bones, 3T versus 1.5T, technologist 1 versus technologist 2). Registration Validation The morphological datasets (SPACE/DESS) of the 25 test subjects were co-registered to the corresponding biochemical datasets (T2 map) using the proposed registration workflow. For the comparison of the

different registration methods, a rigid registration and an elastic registration introduced in (8,9) were implemented in this work and tested on the same datasets. The elastic registration was performed by maximization of similarity criteria in a variational framework and using the corresponding gradients to drive a flow of diffeomorphisms allowing large deformation, which was proven to be an accurate and robust approach for the elastic registration. All registration methods used the mutual information between the reference and target datasets as the statistical similarity criteria. The registration results were evaluated by different methods for different test subjects:  Dice Similarity Coefficient (DSC). For the six subjects with the manually segmented cartilage in both SPACE and T2 map datasets, the registration accuracy was quantitatively evaluated using the DSC (18) (27) of the overlapping cartilage volume in the SPACE and T2 map datasets:

Figure 5. Left: SSM is aligned to the target image and transferred to the binary mask. Middle: The Gaussian filtered mask. Right: The distance map of the femur is generated using Eq. [1]. The voxel value in the distance map indicates the distance of this voxel from the surface of the bone.

Automatic Semantic Registration

7

Figure 6. Box plot of automated landmark detection results. The 25 test subjects are divided into different subgroups according to various characteristics (e.g., different contrasts, resolution, healthy condition). Each box represents the mean value and the standard deviation of the landmark detection errors in millimeters for one subgroup, respectively. The 95% percentile and 5% percentile cut off the top and bottom 5% of detection errors for each subgroup.

DSC ¼

2TP 2TP þ FP þ FN

[3]

where TP is true positive, FP is false positive and FN is false negative count for the overlapping cartilage voxels.  The mean of the distance between the same landmarks in the both datasets after the registration. The criterion was evaluated on the remaining 19 test subjects (DESS datasets and T2 map datasets). To compare the registration performance under different patient positioning and poses, the 19 subjects were divided into two groups: 15 normal subjects without relative motions between knee bones and four subjects with flex and twist.  The spatial distance among the landmarks in the 19 target datasets was used to evaluate the nonanatomical deformation of the bone after the registration. As the landmarks are located on the bone surface, the nonanatomical deformation of the bone will change the spatial distance among the landmarks in the registered target datasets compared with that in the unregistered target datasets. In addition, to evaluate the deformation of the cartilage, the change of cartilage volume after the registration was computed in the 6 subjects with manually segmented cartilage. Paired t-tests were used to perform the data analysis of all registration results. Differences of DSC and the landmark distance between different registration methods with a P < 0.05 were considered statistically significant.

RESULTS Landmark Detection The performance of landmark detection in different groups of test subjects is shown in Figure 6. The distance between the manually placed and automatically detected landmarks in 25 test datasets was found with a mean value of 1.6 mm and a standard deviation of 1.0 mm. The mean sensitivity for all test datasets was 97%. There were no observed significant differences (P < 0.05) in detection accuracy among the test groups with various features such as image

contrast, image resolution, health conditions, patient positioning, and using different sets of training landmarks placed by different technologists. Registration Performance The DSC of the overlapping cartilage voxels before and after the different registration methods are presented in Figure 7a. The DSC increases to 0.68 6 0.1 (mean 6 SD) after the proposed registration method. In comparison, the rigid registration improves the DSC to 0.59 6 0.12 and the elastic registration increases the DSC to 0.64 6 0.11. The changes of the landmarks distance after different registration methods are shown in Figure 7b. The landmark distance in the 15 normal test subjects (i.e., without relative motions between knee bones) decreases from 3.9 6 0.9 mm to 1.3 6 0.3 mm following the proposed registration method. For these normal test subjects, the mean accuracy of the semantic registration is improved by 11% (P ¼ 0.023) compared with the rigid registration and increased by 3% (P ¼ 0.041) compared with the elastic registration. For the four test subjects with flex and twist, the registration accuracy of the proposed method is improved by 80% (P < 0.001) compared with the rigid registration and increased by 70% (P < 0.001) compared with the elastic registration. To display the accuracy of different registration methods more intuitively, an example of registering a locally obtained subject with flex and twist is presented in Figure 8. The changes of the spatial distance among landmarks in target datasets and the cartilage volume after the registration are shown in Figure 7c. The spatial distance among landmarks after the elastic registration is changed on average by 2.5% compared with that before the registration. The mean change of cartilage volume after the elastic registration is 14.5%. In comparison, the maximum change of spatial landmark distance and cartilage volume after the semantic registration is 0.7%. The mean computational cost of the proposed semantic registration is 4.79 6 0.32 min per subject. In comparison, the rigid and elastic registration methods take 3.12 6 0.41 min and 4.09 6 0.63 min per subject, respectively.

8

Xue et al.

Figure 7. Results of three registration methods. a: DSC of the overlapping cartilage volume of 6 subjects before and after the registration. The accuracy of registration is evaluated by the increase of the DSC after different registration methods. b: Distances between the same landmarks of 19 subjects before and after the registration. The accuracy of different registration methods is evaluated by the decrease of the landmark distances. c: The deformation of the bone and cartilage after different registration methods. After the elastic registration there are visible changes of the spatial distance among landmarks and the cartilage volume. In comparison, the change after the semantic registration is insignificant.

DISCUSSION The aim of our semantic registration is to simulate and reconstruct the movement of the joint as it occurs in reality. Our results show that this approach reduces the complexity of aligning relative motions and demonstrates an improved accuracy, especially for subjects with relative motions between knee bones. In addition, unlike other methods in introduces minimal nonanatomical deformation of the objective structures and tissues following the registration.

It should be acknowledged, that evaluating registration error and comparing methods is challenging as it depends on application-specific information regarding (a) the size of the regions of interest to be registered; (b) the imaging parameters of the testing data; (c) the implementation and optimization details; (d) the type of the performance evaluation methodology; (e) the power of the analysis (7). Thus, a direct and fair comparison of the results of the proposed method with other studies is difficult. In this study we sought to reduce this problem by implementing and reporting

Figure 8. The checkerboard between the reference T2 map dataset and the registered DESS dataset. The neighboring blocks are from different datasets. For an optimized registration, the structures and tissues between neighboring blocks should be aligned with each other, namely, there are no visible edges between the neighboring blocks. The red circles indicate the shape edges which mean the registration does not work well. The green circles denote the well-registered edges. Upper left: before the registration. Upper right: after rigid registration. Lower left: after elastic registration. Lower right: after modelbased semantic registration.

Automatic Semantic Registration

the results of one rigid voxel-based registration method reported in (6) and one elastic voxel-based registration reported in (8,9) using our data within the same evaluation framework. Although, significant effort was allocated in optimizing each approach, implementation and optimization biases could still exist between the three methods. The DSC of the overlapping cartilage volume between both datasets after the semantic registration was improved by 15% in comparison to that after the rigid registration and by 5% in comparison to that after the elastic registration. This reveals that the semantic method possesses registration accuracy superior to the rigid registration and at least comparable performance compared with the elastic registration that was used in the present study. The DSC following registration appears low. This was a result of different definitions of manual cartilage segmentation found on the SPACE and T2 sequences and resolving this requires further research. The mean distance between the same landmarks in both datasets was reduced by all three registration methods. The results in normal subjects obtained similar performance. However, when the knee was flexed and twisted between scans, the advantages of the semantic registration was demonstrated compared with both the rigid and elastic registration. This is because the flex and twist are relative motions between knee bones. Global rigid transformations cannot handle this problem. Elastic registration is capable of registering relative motions between bones; however, parameters require tuning to define the capture range of the transformation. In some cases of flex and twist, the ranges of the movements in proximal and distal femur are not the same. In our experience, it is difficult to find generalizable and accurate parameters to define the capture range for both proximal and distal femur. Unlike the elastic registration, the semantic registration treats the relative motion between bones as a combination of several rigid movements. This reduces the complexity of aligning relative motions between knee bones. Due to this advantage, the proposed semantic registration (including the automated landmark detection) can still be used in kinematic analyses of the knee joint (28–30). In the elastic registration, the transformation vector is calculated for each voxel. This causes the deformation of the joint bone and cartilage after the registration, especially for subjects with relative motions between bones. This is confirmed by the results regarding the deformation of bone and cartilage after the registration. After elastic registration, the distance between the same landmarks was changed on average by 2.5%, in extreme cases 8.7%. The cartilage volume was strongly influenced by the elastic registration, due to its long thin morphology next to the bones. To coregister the relative motions between bones, the cartilage is likely stretched and widened by the elastic registration. In comparison, the semantic registration rarely changes the form of bones and cartilage because they are treated as rigid structures within the registration. The small change of the landmark distance and the cartilage volume is due to the interpolation

9

involved while computing the new position of the voxel after the semantic registration. The automated landmark detection method used in the registration scheme obtained promising accuracy and robustness at localizing anatomical landmarks in MR images with various contrasts, patient positioning and health conditions of the knee joint. The accuracy of the detected landmarks was 1–3 mm higher than that other reported findings for automated knee MR landmarking (31–34). The high landmark detection accuracy will improve the performance of subsequent image processing techniques like segmentation or kinematics. For instance, Morton et al (35) found that a standard deviation of 2 mm in landmark placements resulted in a vari ability of up to 6.5 in tibiofemoral kinematics. In this study, the detection of some landmarks failed in two cases with errors over 5 mm. However, there were no statistically significant differences (P > 0.05) of the final registration results compared with other cases. This was because the landmarks were only used in the subsequent registration to localize the ROI of each bone using a 10-mm Bounding Box. The ROI of each bone was used as inputs for the registration. Several limitations were evident in this study: First, only three MR sequences (DESS, SPACE, and T2 map) were used in the validation of the method. In addition, all data used in this work were acquired in sagittal orientation. However, the image processing treated the data as a 3D datasets: (a) in the landmark detection a 3D feature descriptor was used to describe landmarks; (b) in the rigid registration a 3D rigid transform was optimized based on the mutual information between both 3D datasets. Hence, theoretically the method is not limited to sagittal MRI data. Further validation is needed to demonstrate that the proposed method can be applied to other MR sequences. Second, for comparing the proposed method with other registration techniques, only one rigid registration and one elastic registration methods were implemented and other elastic registration techniques may offer better performance. Another limitation is that the efficiency of the proposed method still requires improvement for some applications requiring higher computational speeds. One possible approach to achieve this is using a parallelized implementation, as all the results we reported used only a single CPU for the computation. In conclusion, we present the results of a 3D automated semantic registration for aligning the knee bone and cartilage in multimodal MR image data. The method makes use of automated landmark detection to align a SSM of the knee joint into the target datasets. The three bones in the target datasets are separately aligned into the reference datasets with a multimodal rigid registration. The three obtained rigid-transformation fields are combined to generate the final transformation field for the entire knee joint. Experimental results reveal that the proposed method offers promising accuracy and robustness in comparison with those obtained by the rigid registration and the elastic registration, especially when a flex and/or twist of the knee joint occurs between scans. Unlike the elastic registration, the semantic registration introduces minimal clinically undesirable

10

Xue et al.

nonanatomical deformations of the bone or cartilage while achieving high registration performance.

APPENDIX A : STATISTICAL SHAPE MODEL The Statistical Shape Model (SSM) was built from a set of 118 training labeled shapes. Each labeled shape consisted of 23046 points sampled on its surface. These shapes are aligned by means of Generalized Procrustes Alignment of Gower (18,36). Principal component analysis allows each shape to be written as a combination of mean shape and variation matrix. The SSM is used to describe the new shape in the target datasets. The SSM is divided into three parts to describe three joint bones.

As a result a 16-dimensional vector Vcoarse is generated. 5: R is refined to a 2 3 2 array of n 3 n subregions. For each subregion an orientation histogram with 8 bins is computed: Vfine ¼ ðv1 ; v2 ;    v8 Þ;

[8]

where   n X n X 2p  ðb  1Þ f ðx; yÞ  tan uðx; yÞ  vb ¼ 8 x¼0 y¼0

[9] 2p  b 2p  ðb  1Þ < uðx; yÞ < : if 8 8 As a result a 32-dimensional vector Vfine is obtained for the 2 3 2 array of subregions. 6: return Vcoarse and Vfine as feature descriptors of P.

APPENDIX B : INTEREST POINT DETECTION A simplified Difference of Gaussian (DoG) (21) detector was implemented for the interest point detection. DoG is a feature enhancement algorithm to detect edges and other details present in digital images. It involves the subtraction of one blurred version of an original image from another less blurred version of the original image. A simplified DoG detector Dsimplified is produced by means of: Dsimplified ¼ D1 \ D2

[4]

D2 ¼ D ðs2 ; s3 Þ ¼ where D1 ¼ D ðs1 ; s2 Þ ¼ Is1  Is2 , Is2  Is3 , Is1 is a Gaussian blurred image. The pixels with non-zero value in Dsimplified are regarded as interest points.

4: An orientation histogram H is computed from G.  H has 16 bins covering 360 of orientation. The histogram H is created by adding the gradient magnitude of each pixel in the adjacent bins close to the orientation of the gradient vector of this pixel: [6]

where

if

2p  a 2p  ða  1Þ < uðx; yÞ < : 16 16

S ðpi Þ

[10]

S ðpi Þ  S ðpi Þ ı¼1

Algorithm 2D Feature Descriptor 1: input An image point P in the image I. 2: A region of interest R with 2n 3 2n pixels is set in the neighborhood of P. 3: A gradient map G with gradient magnitude f ðx; yÞ and orientation uðx; yÞ is computed by means of: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi f ðx; yÞ ¼ ðI ðx þ 1; yÞ  I ðx; yÞÞ2 þ ðI ðx; y þ 1Þ  I ðx; yÞÞ2   ðI ðx; y þ 1Þ  I ðx; yÞÞ uðx; yÞ ¼ arctan ðI ðx þ 1; yÞ  I ðx; yÞÞ [5]

  2n X 2n X 2p  ða  1Þ f ðx; yÞ  tan uðx; yÞ  va ¼ 16 x¼0 y¼0

The principle of a boosting system is that a strong classifier in the probably approximately correct (PAC) sense can be generated by combining multiple weak classifiers (22). In this work, a comparator where the SIFT feature descriptors of a candidate landmark and a training landmark are compared was registered as a weak classifier. For each weak classifier, a majority voting was performed to select interest points which have minimum SIFT distance to this classifier. A weighted majority voting was implemented to combine the results of all weak classifiers. The weighting factor of each detected interest point which is derived from its SIFT distance to training landmarks: wi / XN

APPENDIX C : FEATURE DESCRIPTOR

Vcoarse ¼ ðv1 ; v2 ;    v16 Þ;

APPENDIX D : BOOSTING SYSTEM

[7]

where S ðpi Þ computes the SIFT distance of pi to the predefined landmark. The sum of weighting factors is normalized to 1. APPENDIX E : MUTUAL INFORMATION The mutual information is defined in terms of entropy (37), which measures how much information one random variable (image intensity in one dataset) gives about another random variable (image intensity in another dataset). As long as there is an overlapping region in the field of view (FOV) of both datasets, the MI can be estimated. Thus, the FOV of the both datasets does not need to be the same at the time of registration. One way to simplify the computation of the mutual information is to normalize the statistical distribution of the two input datasets. A normalization filter was implemented to rescale the intensities of the input datasets to produce output datasets with zero mean and unit variance. The MI similarity measure requires several parameters to be selected, including the standard deviation of the Gaussian kernel for the probability density estimate and the number of samples used to compute the densities. Experience has shown that a kernel standard deviation of 0.4 works well for datasets with zero

Automatic Semantic Registration

11

mean and unit variance. Increasing the number of samples improves the smoothness of the similarity measures from one iteration to another. The trade-off is that a larger number of samples results in longer computation costs. To reach a compromise between computational cost and performance, the number of spatial samples is usually set as 1% of the total number of voxels in the dataset. APPENDIX F : REGULAR DESCENT ALGORITHM

STEP

GRADIENT

The Gradient-based optimizers develop a model function derived from the local gradient of the similarity measures (38). At each iteration, the transformation parameters are adjusted along the direction of MI derivatives. Each time the direction of the derivative changes, the optimizer assumes that a local extremum has been passed and reduces the step size by a relaxation factor. The lower value of the relaxation factor (e.g., 0.5) has been proven to be inadequate for noisy similarity measures because it tends to induce unstable movements and results in many directional changes on the optimizers. In this case, the optimizer will quickly shrink the step size despite the fact that it is still far from the location of the extremum of the similarity measure. Thus, the relaxation rate was set to a higher value (0.8) to prevent the premature shrinkage of the step size. The iteration process is terminated when the step size is below a predefined threshold (

Automatic model-based semantic registration of multimodal MRI knee data.

To propose a robust and automated model-based semantic registration for the multimodal alignment of the knee bone and cartilage from three-dimensional...
652KB Sizes 1 Downloads 3 Views