Quantitative Analysis Tools and Digital Phantoms for Deformable Image Registration Quality Assurance.

Original Article

Quantitative Analysis Tools and Digital Phantoms for Deformable Image Registration Quality Assurance

Technology in Cancer Research & Treatment 1–12 ª The Author(s) 2014 Reprints and permission: sagepub.com/journalsPermissions.nav DOI: 10.1177/1533034614553891 tct.sagepub.com

Haksoo Kim, PhD1, Samuel B. Park, PhD2, James I. Monroe, PhD1,3, Bryan J. Traughber, MD1,4, Yiran Zheng, PhD1,4, Simon S. Lo, MD1,4, Min Yao, MD1,4, David Mansur, MD1,4, Rodney Ellis, MD1,4, Mitchell Machtay, MD1,4, and Jason W. Sohn, PhD1,4

Abstract This article proposes quantitative analysis tools and digital phantoms to quantify intrinsic errors of deformable image registration (DIR) systems and establish quality assurance (QA) procedures for clinical use of DIR systems utilizing local and global error analysis methods with clinically realistic digital image phantoms. Landmark-based image registration verifications are suitable only for images with significant feature points. To address this shortfall, we adapted a deformation vector field (DVF) comparison approach with new analysis techniques to quantify the results. Digital image phantoms are derived from data sets of actual patient images (a reference image set, R, a test image set, T). Image sets from the same patient taken at different times are registered with deformable methods producing a reference DVFref. Applying DVFref to the original reference image deforms T into a new image R0 . The data set, R0 , T, and DVFref, is from a realistic truth set and therefore can be used to analyze any DIR system and expose intrinsic errors by comparing DVFref and DVFtest. For quantitative error analysis, calculating and delineating differences between DVFs, 2 methods were used, (1) a local error analysis tool that displays deformation error magnitudes with color mapping on each image slice and (2) a global error analysis tool that calculates a deformation error histogram, which describes a cumulative probability function of errors for each anatomical structure. Three digital image phantoms were generated from three patients with a head and neck, a lung and a liver cancer. The DIR QA was evaluated using the case with head and neck. Keywords deformable image registration, quality assurance Abbreviations CT, computed tomography; CTV, clinical target volume; DEH, deformation error histogram; DICOM, digital imaging and communications in medicine; DIR, deformable image registration; DVF, deformation vector field; FEM, finite element model; GTV, gross tumor volume; kVCT, kilovoltage computed tomography; MI, mutual information; MSE, mean square error; PCA, principal component analysis; PET, positron emission tomography; QA, quality assurance; SGD, stochastic gradient descent; SOI, structure of interest

Introduction Several commercial software packages provide deformable image registration (DIR) tools to enhance target delineation in the era of intensity-modulated radiation therapy, imageguided radiation therapy, and image-guided adaptive radiation therapy.1-3 Moreover, researchers continue to develop new DIR algorithms—diffeomorphic demons,4-9 diffeomorphic morphons,10 optical flow,11,12 finite element model (FEM),13-15 small deformation inverse consistent linear elastic,16 thin plate spline, free form deformation, 17-21 and Markov random field22—reflecting the growing interest in deformable contours for adaptive radiation therapy and composite dose visualization of multiple treatment plans.

1

Department of Radiation Oncology, School of Medicine, Case Western Reserve University, Cleveland, OH, USA 2 National Cancer Center, Goyang-si Gyeonggi-do, Republic of Korea 3 St Anthony’s Medical Center, St Louis, MO, USA 4 University Hospitals of Cleveland, Cleveland, OH, USA

Received: February 19, 2014; Revised: June 11, 2014; Accepted: June 16, 2014. Corresponding Author: Jason W. Sohn, Department of Radiation Oncology, School of Medicine, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, USA. Email: [email protected]

Downloaded from tct.sagepub.com at UNIV NEBRASKA LIBRARIES on April 11, 2015

2

Technology in Cancer Research & Treatment

Reference Image (R)

Metric (Similarity)

Acceptable?

YES

Final DVF

NO Optimizer Initial DVF

Test Image (T)

Transformed Test Image (TR) Image Generator (Interpolator)

DVF Transform Parameters

Registration Process

Figure 1. Diagram of deformable image registration process.

In order to introduce these DIR tools into clinical practice, established quality assurance (QA) and acceptance test procedures are essential. Although there have been many research efforts15,23-36 to devise a quantitative evaluation method for DIR, robust QA and acceptance test procedures are still lacking. In this article, we implemented a localized error analysis tool proposed by Wang et al,35 which displays a color-keyed map of deformation registration errors on each image slice. In addition, a global error analysis tool is presented that calculates deformation errors per anatomical structure. This new tool is called a deformation error histogram (DEH). These tools are useful for quantifying intrinsic errors of DIR systems, but a truth set consisting of a reference image set, a test image set, and a deformation vector field (DVF) is needed as a benchmark. Previous efforts have not translated into realistic clinical use. In this regard, a novel truth set reconstruction method is proposed. A truth set created from the proposed method is called a ‘‘digital image phantom’’ and consists of a reference image set, a test image set, and a DVF. A ‘‘phantom’’ refers to a known object (or set of files—thus ‘‘digital’’) which is used to benchmark a system. In this application, the digital phantom is created from actual patients for QA of multiple anatomical sites so that a generated truth set may be used in clinics and made available to the community for analyzing DIR errors.

Overview of DIR Deformable image registration is a process to find the bestestimated DVF, which forms the voxel correspondence between 2 different image sets. In other words, DIR finds a matrix that represents how individual voxels of 1 image are ‘‘deformed’’ (moved, etc), so they optimally line up with corresponding voxels from another image. Figure 1 shows a schematic of the DIR process. For a given spatial transformation (a DVF), the interpolator applies the transformation to a test image and compares the transformed test image with the reference image. The metric evaluates the degree of similarity between the reference image and the transformed test image. The optimizer can now estimate the best candidate for the DVF. A

newly estimated DVF is used by the interpolator for the next iteration. Iteration continues until the metric of similarity satisfies the given criteria or threshold. The performance and accuracy of the DIR depend on the configuration settings of each component, similarity measure, interpolator, and optimizer. There are multiple DIR algorithms available such as B-spline, demon, and FEM registration. Although the different methods utilize the same general iterative process described in Figure 1, they can arrive at different results, given the distinct similarity measures such as mutual information (MI) 37 or statistical approaches.38-40 Additionally, the algorithms’ decision on when an image is successfully deformed to another image is central to how the algorithm arrives at the final iteration. Furthermore, different methods for image interpolation after the transformation (linear interpolation, cubic spline interpolation, and sinc interpolation) result in different end points. The selection of the interpolation method can affect the calculation time and accuracy of the registration.37,41 Regular step gradient descent, stochastic gradient descent (SGD), and limited-memory Broyden-Fletcher-GoldfarbShanno (L-BFGS) are some of the available choices for optimizers. Regular step gradient descent is a generally well-known optimizer.42 The L-BFGS can find a good registration for most DIRs but consumes a lot of calculation time.42 Stochastic gradient descent can find a comparable result in reduced calculation time.43 Components outside the core DIR process can also affect the characteristics and/or effectiveness of a registration. Image modality, allowed duration of the optimization process (iteration time), tissue type focus (soft tissue vs. bone, for example), and registration’s purpose (image registration vs. transformed contour) can affect the results. Moreover, most DIR implementations utilize randomly sampled pixels to reduce calculation time. An accurate evaluation of systematic and random errors in a DIR system is essential before utilizing a DIR package for clinical applications.

Previous Approaches to DIR QA Image subtraction approach. The traditional method of DIR system evaluation involves a paired image set composed of a


Kim et al

3

B’

B A

Initial Plan Image

A’

Follow-up Image

Figure 2. An illustration demonstrating the shortfall of landmarkbased approaches. The points A and A0 have significant image features, so we can visually confirm correspondence. However, the points B and B0 have no significant image features and lack methods to evaluate the robustness of the deformation.

reference image, R, and that same reference image deformed using an artificial DVF (DVFartifiical) to produce the test image, Tartificial.35 To test the DIR, Tartificial is registered to R. The DVF created (DVFtest) should match DVFartificial. There are 2 analysis methods that can be used to quantify the accuracy of the registration. The image subtraction method compares the intensity difference between the reference image and the registered test image after DIR. A shortfall of using image difference analysis is measuring registration errors in regions where the neighboring pixels have the same intensity values. It is possible that 2 voxels may have the same value but should not be aligned to each other. A difference image would not show differences in the region, although the voxels are not registered correctly by location. Artificial DVF comparison. Artificial DVF comparison is the second approach,34,44 which compares the given DVFartificial and the new DVFtest. Although a DVF comparison can overcome the shortfall of image subtraction analysis, both traditional approaches utilize an artificial DVF. However, comparing artificial DVFs may be unrealistic if it is assumed to apply to images of the full range of human anatomy. The unique characteristics of deformations at different anatomical sites may be unsuitable for a single DIR setting. For example, head and neck, lung, and liver anatomies may deform in different ways, demanding unique solutions and QA. Although some DIR software packages do provide anatomic-specific parameters, optimal settings are unknown since quantifiable QA of the results are missing. 24,25,28,29,45

Landmark-based approach. Previous efforts to evaluate image registration errors utilize patient landmarks that appear within the image and quantifying their deformation and registration. Researchers typically employ visual checks that compare manually designated landmarks such as point of interests or organ contours between 2 image sets. Brock and Consortium24 and Castillo et al.25 developed a software tool to generate large landmark point sets automatically. They produced a large number (>1000) of matching landmark point sets for lung image sets. Similarly, Brock and Consortium24 utilized

manually chosen landmark point sets to compare DIR results among multiple institutions. In these landmark-based approaches, 2 different image sets are utilized from the same patient. An expert physician finds matching landmarks between 2 image sets; after DIR, the landmarks on the test image set should be matched with the landmarks corresponding to the reference image set. By measuring the distance between the reference landmark point and the registered landmark point on the test image set, the error magnitude of the image registration system can be measured as maximum and an average distance to agreement. These approaches quantify the correlation between the computerized image registration and human visual judgment. Vaman et al.33 devised a way to reduce the number of landmarks since entering a large number of landmark points is clinically tedious and impractical. To quantify the errors in a DIR for 4-dimensional computed tomography (CT) image sets, they applied principal component analysis (PCA) to landmark patient motion due to respiration. The PCA can estimate the fundamental eigenmodes of human respiration and difference between the eigenmodes from the landmarks and the eigenmodes from the deformation vectors were measured. They showed their efficacy by comparing eigenmodes from randomly selected subsets of landmarks. They also found that validation through a selected small number of landmarks can lead to unrepresentative results. Murphy et al.30 recently utilized a similar scheme using a small number of landmarks to estimate the uncertainty in daily dose mapping due to DIR error. Although their efforts made significant progress, these landmark-based approaches have significant limitations. Visual verification of landmarks can only be performed where significant image features exist (ie, at limited sites). As well, intrinsic uncertainties of DIR in regions with no image features cannot be measured with these techniques. Figure 2 illustrates the shortfall of landmark-based approaches. The left graphic is an initial planning CT image, and the right graphic is a follow-up CT image. The clinical target volume (CTV) contour is shown in red and 2 corresponding points are marked on both images. The points A and A0 have significant image features, so we can confirm their correspondence visually. However, the points B and B0 have no significant image features near the point, and therefore, their correlation may be incorrect. Without a method to evaluate the deformation magnitude of error in regions of similar image features, quantitative analysis of deformation accuracy remains limited. Unbalanced energy approach. Zhong et al.36 utilized unbalanced energy analysis that compares the DVF between DIRs using FEM and B-spline registrations and demon registrations. The group found deformation vectors calculated by the FEM and the B-spline methods showed a 2-mm average difference near organ edges. This result is in accordance with previous landmark-based approaches. However, in regions with no significant gradient features, deformation vectors from various DIR methods demonstrated much larger differences up to 10 mm.36 Although this method provides the DVF comparison between DIRs, 2 numerical phantoms were used, in which one was fabricated using


4


Image R

Image T

Deformable Image Registration

Deformation Vector Field (DVFref) Generating Image

Image R

(A) Generation of the truth digital phantom, the image set R , T, and DVM Ref

Image R

Image T

Deformable Image Registration

Deformation Vector Field (DVFref)

Deformation Vector Field (DVFtest)

No error in DIR system, (DVFtest) = (DVFref) With error in DIR system, (DVFtest) ≠ (DVFref)

Difference

(B) Test of Deformable Image Registration Figure 3. A, Process to generate a truth data with R0 , T, and DVFref. B, Process to evaluate deformation errors using the truth data. The deformation vectors field (DVFtest) should be equal to the DVFref when there is no error in the deformable image registration (DIR) system. Intrinsic errors are measured by calculating vector differences between DVFref and DVFtest

artificial bladder, prostate, rectum, and femoral head structures and another one was created from a patient with lung cancer using a known DVF. This indicates that the utilized numerical phantoms do not reflect a realistic environment in a clinic.

Other approaches. Physical deformable phantoms were proposed to validate the accuracy of DIR.46-48 These efforts have a strong advantage in exact matching of voxels after performing DIR. However, it is very difficult to mimic all anatomical structures in a clinical environment and not realistic. Varadhan et al 34 proposed a framework for DIR validation using ImSimQA (Oncology Systems Limited, Shrewsbury, Shropshire, United Kingdom) and 3DSlicer (Open Source Software Package, http://www.slicer.org) tools. Two image sets as a validation data of DIR were created with a deformation using ImSimQA. After performing DIR between the created 2 image sets, 2 deformation fields, anatomical correspondence, and image quality were analyzed using 3DSlicer. Although the validation scheme of this research is reasonable, this technique also used artificial image sets, not clinical.

New proposed approach. As an alternative solution to DIR QA described in the previous sections, we propose a new DIR QA procedure for practical clinical use. This work was partially introduced in our previous research.49 This approach was also proved as an indicator that can show DIR accuracy by using patients with liver and lung.50 Pukala et al used the similar concept of the digital image phantom for kVCT volumetric image sets of head and neck.51 Nie et al utilized the same histogram to quantify deformation errors of DIR systems, but they did not consider the error histogram for each anatomical structure.44 Digital image phantoms generated from deidentified clinical cases (which consists of a reference image set (R0 ), a test image set (T), and a reference DVF for various anatomical sites) are made available to clinics via the Web as downloadable content. Further, a set of QA tools composed of local and global error analysis systems analyze clinically registered images. Local error analyses display deformation errors via color maps on each image slice and the global error analysis tool can review error per voxel and/or structure of interest (SOI) using a DEH. Tools and digital image phantoms are utilized locally to evaluate individual clinical systems.


Kim et al

5

5

1

Metric (Similarity)

Acceptable?

Visual Check – Deformable Image Fusion (R and T)

YES

NO 6 4

Reference Image (R ) Test Image (T)

7

3

DVFref

Digital Phantom

Optimizer

Transformed Test Image (TR )

2

Image Generator (Interpolator)

8

DVFtest

Transform Parameters

Export

Amount of Error = Difference between DVFref and DVFtest

Deformable Image Registration Tool in Clinic

Figure 4. Quality assurance (QA) evaluation of a DIR system. The user can select among various anatomical image sets to simulate the clinical situation. After running the deformable image registration using selected image sets, a deformation vector field (DVF) is exported and compared to the truth DVF.

Materials and Methods Theoretical Process of Generating a Truth Image Set Our proposed QA process requires 2 image sets and a true DVFref. We start with 2 image sets (image set R and T) for the same patient taken at different times; the initial image set R and the later image set T are set as a reference image set and a test image set, respectively. Figure 3 shows the processes to generate a digital image phantom set. At completion of the DIR, the algorithm finds a DVF, which is a map of deformation vectors from pixels or grids in the test image to those in the reference image, assigning the DVF as a DVFref and convolving the test image T with DVFref to generate the image R0 . Therefore, the DVFref between the image set R0 and the image set T becomes the ‘‘true’’ deformation. In summary, we applied a realistic DVF to an image set to generate a deformed image set for creating QA image data instead of using an artificial DVF. When performing DIR between the image sets of R0 and T, the DIR should generate a DVF identical with the truth deformation DVFref if the DIR system has no intrinsic errors as shown in Figure 3B. The deformation errors are composed of random errors and systematic errors. The random errors are from noise in the image sets and sampling processes in the DIR systems. The systematic errors are from limitations of optimizers or the characteristics of the similarity metrics. By comparing DVFref and DVFtest using the local and global error analysis tools, we can characterize the error in DIR system.

Theory for Quantitative Error Analysis Tools Local error analysis tool: Visualization of errors on each image slice with color mapping. In order to compare DVFref and DVFtest, the vector difference was taken at each voxel with the same coor! dinate. There is a vector RV ðx;y;zÞ ¼ Rx ; Ry ; Rz at a voxel

coordinate (x, y, z) in the DVF ref and a vector ! TV ðx;y;zÞ ¼ Tx ; Ty ; Tz at voxel coordinate (x, y, z) in the DVF test . The vector difference between the 2 vectors is ! ! RV ðx;y;zÞ TV ðx;y;zÞ . This resulting vector difference is assigned to the same coordinate. In this article, a colored map is used to visualize the magnitude of vector differences from the deformation errors. Equation 1 calculates the difference between the vectors, and Equation 2 calculates the magnitude of the vector difference found in Equation 1. ! ! ! RV ðx;y;zÞ TV ðx;y;zÞ ¼ VD ðRx;Ry;RzÞ ¼ ðRx Tx ; Ry Ty ; Rz Tz Þ: ð1Þ ! ! RV ðx;y;zÞ TV ðx;y;zÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðRx Tx Þ2 þ ð Ry Ty Þ2 þ ðRz Tz Þ2 : ð2Þ

Global error analysis tool: DEH. For a quantitative analysis of global deformation errors, a DEH was developed from vector differences between the test DVFtest and the reference DVFref. The DEH produces a quantification of the deformation errors per anatomical structure that can be graphically displayed. The DEH graph indicates a cumulative distribution of errors per organ or SOI. The DEH concept utilizes an approach similar to the dose– volume histogram.52 A frequency analysis was applied to vector differences d1, d2, . . . , dN measured at voxels with the same coordinate between the DVFref and the DVFtest. When a reference value Dr is a target histogram bin, the cumulative frequency MDr of Dr is the frequency in which the measured distances are greater than or equal to Dr as shown in Equation3.


6


Figure 5. A, The rigid registration and the intensity difference between R0 and R image sets to show realistic DVFref. The R0 transformed from the image set T does not have erroneous distortion compared to the original R. Sagittal image highlights an endotracheal tube only present in image T. B, In the intensity difference, the gray scale means the difference value. The intensity difference along edge lines of bone, skin, and anatomical structures is higher than others.

MDr ¼ number fk j Dk Dr g ¼

N X

I ðDk Dr Þ;

ð3Þ

Pc ðDr Þ ¼

MDr 100ð%Þ: N

ð4Þ

k

where N is the number of vectors, and I is the characteristic function that indicates the value 1 if Dk is greater than or equal to Dr or the value 0 otherwise. The representation of a histogram using the cumulative frequency is difficult to understand as the distribution of vector differences is not normalized. For simplification, the relative probability of the cumulative frequency is used as shown in Equation 4.The relative probability ranges from 0% to 100% and can be estimated for each anatomical site. By using Equation 4,a DEH histogram analysis generates a meaningful distribution for each SOI that is measured.

Case Study: Head and Neck DIR The proposed QA approach was applied to our in-house DIR system using a case with head and neck. To provide the image sets R0 and T for the test registration, we exported the image sets in DICOM format. The in-house DIR system utilizes a B-spline image registration algorithm. In Figure 4, the testing process is summarized in a schematic diagram. After 2 image sets (a reference and a test) are imported, the particular DIR software performs the registration process as explained earlier. As a result of the registration, a DVFtest is


Kim et al

7

Figure 6. The visualization of the magnitude of deformation errors for the head-and-neck case using the in-house deformable image registration (DIR) system. A, Gray and green images are reference images of the reference image set R0 and a transformed test image, respectively. B, Local error analysis performed by DVF comparison showing significant errors at various locations although the visual check was satisfactory. The magnitude of vector differences between the DVFref and the DVFtest was calculated with a 2-mm grid resolution and displayed as a color map on the image set R0 .

produced. This DVFtest is compared to the DVFref, which is the truth DVF between the R0 and T image sets. The vector difference between the test DVFtest and the ref DVFref will be utilized to quantify deformation errors in the DIR. In cases where the test DVF has grid resolution different from the DVFref, it should be interpolated to match to the DVFref grid resolution.

Generating a head and neck QA image data set. This section describes the generation of a head and neck digital QA phantom using the in-house DIR software.53 A patient with head and neck cancer with 2 radiation therapy planning CT data sets in a single month is used to create the truth set. After the initial CT simulation, the patient had a tracheotomy and required a new CT scan and immobilization mask. In addition to the 2 data sets, manual


8


contours were also delineated by board-certified radiation oncologists. The first treatment planning image set was selected as reference image set R. The second treatment planning image set was selected as test image set T. The reference image set was 133 image slices of 512 512 pixels. The pixel resolution was 1.17 1.17 mm and the slice thickness was 3 mm. The test image set was 131 image slices of 512 512 pixels and the slice thickness and pixel resolution are identical to the reference image set. In order to generate the initial DVFref (see Figure 4), a DIR was performed using our in-house DIR software. B-spline DIR using MI metric was applied instead of using the mean square error (MSE) metric. The MSE would produce a large error due to the endotracheal tube only being present in the test image set T. L-BFGS was utilized to optimize the number of random sampling points comparing pixels between the 2 image sets. The sinc interpolator was utilized to generate the final R0 image set from the test image set T using the DVFref. The sinc interpolator54 is slow but provides the least image distortion compared to other methods. Test of deformable registration systems using the head and neck digital phantom. We applied our test procedure to evaluate DIR system errors from the head and neck image registrations using our in-house DIR software. The configuration of our in-house DIR system is as follows: B-spline transform, linear interpolator, SGD optimizer, and MI similarity metric. A thousand pixels were randomly sampled per iteration for similarity calculation, and the maximum iteration was set to 700. We utilized a multi-resolution approach55 where the 512 512 133 and 512 512 131 image resolutions were initially downsampled to 128 128 34 and 128 128 33. The second pass resolutions were raised to 256 256 67 and 256 256 66, respectively. The B-spline grid size was set to 21 mm. The number of histogram bins for the MI calculation was 50.

Results Creation of a Head and Neck Digital Image Phantom The DIR between an original reference image and a test image was performed randomly utilizing 50% of the pixels from the reference image. The DIR’s execution using L-BFGS optimizer took 3 hours using an AMD Opteron 6136 (2.4 GHz) processor (Advanced Micro Devices, Semiconductor Company, Sunnyvale, CA, United States). Visual checks such as the ‘‘checkerboard visualization’’ were performed to verify the DIR was accurate and acceptable. Once it was approved, a DVFref was generated from the DIR result. After that, we created a new reference image set R0 transformed from the test image T using the DVFref. Figure 5 illustrates that DVFref generates a clinically realistic deformation by comparing R0 to R without erroneous image distortion. Figure 5A visualizes the alignment using the checkerboard test. Figure 5B shows the intensity difference following image subtraction. R0 is

Figure 7. Deformation error histogram (DEH) for the head and neck example. The cumulative histogram of deformation errors visually shows the confidence range of errors.

similar to R, showing small differences near the highgradient edges of bone, skin, and anatomical structures. The DIR between the image set R0 and the image set T produced DVFtest. This registration was assessed by comparing DVFref and DVFtest. By repeating the steps mentioned earlier, we created 3 truth digital image phantoms, head and neck, liver, and lung. Since the DIR QA of all the test cases follows the same process, we only present the experimental results of the case with head and neck.

Deformable Image Registration System Test Results Local deformation error analyses. Figure 6 shows the results of DIR using the in-house system as applied to the digital phantom. Figure 6A illustrates visual registration checks showing reasonable matching between the reference image R0 (gray image) and the transformed test image (green image). Figure 6B illustrates the 2~3 mm deformation errors around the skull surface, jaw, and posterior neck using the in-house DIR software. Vector differences are calculated over a 2-mm grid resolution. The largest deformation error was found at shoulders. Global deformation error analyses. In Figure 7, to analyze the global deformation errors, we generated DEH for the primary CTV, brain stem, shoulders, and normal tissues. It is important to note that this histogram is generated from the registered DVFs and not the image differences. The DEH demonstrates the confidence range of deformation errors per the selected SOI. The DEH for the primary CTV shows that 95% of deformation errors were less than 0.72 mm. Those for the rest of SOIs (shoulders, spinal cord, and brain stem) were less than 3.32, 1.25, and 1.87 mm, respectively. Deformation errors are also analyzed using conventional statistical methods, taking the average and standard deviation


Kim et al

9

Table 1. After Performing DIR Using the In-House DIR system, the Confidence Ranges of Deformation Errors Were Calculated Using Traditional Statistical and DEH Analyses for the Case With Head and Neck.a Traditional Analysis

Deformation Error Histogram Analysis

Structure of Interest

Average

Standard Deviation (s)

Confidence Range (95%)

Confidence Range (68%)

CTV (Primary) CTV (Lymph node) Left parotid Mandible Esophagus Oral structure Partial brain Posterior neck Brain stem Spinal cord Normal tissue Shoulders

0.38 0.85 0.93 1.17 0.71 1.27 1.23 1.10 1.38 0.79 0.72 1.60

.18 .41 .45 .46 .28 .89 .46 .64 .31 .27 .37 .86

0.72 1.60 1.74 2.05 1.12 2.92 1.97 2.36 1.87 1.25 1.50 3.32

0.45 1.03 1.16 1.33 0.89 1.70 1.49 1.24 1.56 0.90 0.84 1.83

Abbreviations: CTV, clinical target volume; DEH, deformation error histogram; DIR, deformable image registration. a All ranges are in millimeter (mm) scale.

of the errors. We summarized the statistical as well as the DEH analyses for the selected SOIs in Table 1. The average error for the partial brain in Table 1 is 1.23 mm and the standard deviation is 0.46 mm. Therefore, the 2s range is up to 2.15 mm. However, the measured 95% confidence range from DEH is 1.97 mm. The analysis using the average and the standard deviation may not accurately convey the magnitude of deformation errors as shown in Table 1. In addition, the DEH graph shows the confidence range of the error in DVF for each organ.

Discussions These digital image phantoms and quantitative tools can be used to measure local and global magnitudes of errors during commissioning of a DIR system for clinical applications. An analogy can be found in the use of gamma analysis evaluating performances of clinical dose delivery systems in place of simple dose differences. In the same manner, the use of digital phantoms and DEH for evaluating DIR systems can be used to QA systems for clinical procedures. Furthermore, if a specific DIR system allows a user to select a set of parameters, then our process can be used to identify the most optimized parameter set for a specific anatomical site which would produce minimal errors. Composite doses constructed from DIRs are routinely used in making medical treatment decisions. For accessing the radiation toxicity in particular organs at risk, we need to generate an accurate composite dose that requires DIR. Furthermore, the success and failures of these registrations can be delineated by locality. Park et al.56 investigated a fuzzy composite dose representation to deal with the uncertainty in DIR. It can generate composite dose plans displaying localitybased uncertainties. By utilizing our proposed test procedure along with the fuzzy composite dose representation, we will be able to collect the data for modeling the deformation vector errors for a specific SOI, which reflects the DIR uncertainty in the composite dose at specific anatomical locations.

To utilize our proposed QA procedure, the user should retrieve the deformation vector data from their image registration systems. Most in-house systems are able to export their deformation vector data since the users have the program source codes. However, many commercial systems utilize their proprietary data formats to store the deformation vectors, although DICOM RT format is recommended for DVF. For example, only the newer version of MIM Maestro (MIMsoftware, Clevaland, Ohio) supports the DICOM format to store the deformation vectors. Otherwise, the commercial system users can retrieve the deformation vector data if the vendor provides adequate technical support. In addition, most DIR systems have adjustable parameters to optimize the DIR algorithm, which may affect quality and performance. The optimization process for finding a set of parameters from a specific anatomical site may be required. In this research, we presented multiple digital image phantoms using only CT image sets. Further works are needed for DIR between different diagnostic modalities such as magnetic resonance and CT or ultrasound and CT. Many clinics are utilizing positron emission tomography (PET)/CT or diagnostic PET or Single-photon emission computed tomography images to delineate gross tumor volume (GTV) and/or CTV. Based on the preliminary results of the QA test proposed in this research, adding a margin for DIR uncertainty may be necessary when a GTV and/or CTV is targeted on deformed data sets.

Conclusions In this research, we implemented multiple digital image phantoms, based on real patients, and a local and a global error analysis tool for QA of DIR systems. We successfully built a DVF comparison software tool and downloadable digital image phantoms for the DIR QA procedures. The digital image phantom consists of a reference image set, a test image set, and a truth DVF created through the DIR


10


between 2 image sets of real patients (for clinical relevance). The local error analysis tool displays the magnitude of deformation errors on each 2D image slice and the global error analysis tool generates a deformation confidence range per anatomical site in a histogram. The DEH proved to be a useful analysis tool and should be included for future QA commissioning of DIR systems. Three digital image phantoms (head and neck, lung, and liver) consisting of a reference image set, a test image set, and a deformation vector map field are made available for public access through the Web link at http://rophys.case.edu/dip/. The DEH analysis software is also available. By utilizing the proposed QA procedure, an in-house plan review system was proved to have acceptable error range of deformation vectors.

10.

11.

12.

13.

Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

14.

Funding The author(s) received no financial support for the research, authorship, and/or publication of this article.

15.

References 1. Kessler ML. Image registration and data fusion in radiation therapy. Br J Radiol. 2006;79(spec no 1):S99-S108. doi:10.1259/Bjr/ 70617164 2. Yan D. Developing quality assurance processes for image-guided adaptive radiation therapy. Int J Radiat Oncol Biol Phys. 2008; 71(1 suppl):S28-S32. doi:10.1016/j.ijrobp.2007.08.082 3. Sarrut D. Deformable registration for image-guided radiation therapy. Z Med Phys. 2006;16(4):285-297. 4. Bricault I, Ferretti G, Cinquin P. Registration of real and CT-derived virtual bronchoscopic images to assist transbronchial biopsy. IEEE Trans Med Imaging. 1998;17(5):703-714. doi:10. 1109/42.736022 5. Guimond A, Roche A, Ayache N, Meunier J. Three-dimensional multimodal brain warping using the demons algorithm and adaptive intensity corrections. IEEE Trans Med Imaging. 2001; 20(1):58-69. doi:10.1109/42.906425 6. Pennec X, Cachier P, Ayache N. Understanding the ‘‘Demon’s algorithm’’: 3D non-rigid registration by gradient descent. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI); 1999 September 19-22;1679:597-605; Cambridge, UK. doi:10.1007/10704282_64 7. Thirion JP. Image matching as a diffusion process: an analogy with Maxwell’s demons. Med Image Anal. 1998;2(3):243-260. doi:10.1016/S1361-8415(98)80022-4 8. Vercauteren T, Pennec X, Perchant A, Ayache N. Diffeomorphic demons: Efficient non-parametric image registration. Neuroimage. 2009;45(1 suppl):S61-S72. doi:10.1016/j.neuroimage. 2008.10.040 9. Vereauteren T, Pennec X, Perchant A, Ayache N. Nonparametric diffeomorphic image registration with the demons

16.

17.

18.

19.

20.

21.

22.

algorithm. Med Image Comput Comput Assist Interv. 2007; 10(pt 2):319-326. Wrangsjo A, Pettersson J, Knutsson H. Non-rigid registration using morphons. Image Analysis Lecture Note in Computer Science. 2005;3540:501-510. doi:10.1007/11499145_51 Zhang G, Huang TC, Guerrero T, et al. Use of three-dimensional (3D) optical flow method in mapping 3D anatomic structure and tumor contours across four-dimensional computed tomography data. J Appl Clin Med Phys. 2008;9(1):2738. doi:10.1120/ jacmp.v9i1.2738 Zhang GG, Huang TC, Forster KM, et al. Dose mapping: validation in 4D dosimetry with measurements and application in radiotherapy follow-up evaluation. Comput Methods Programs Biomed. 2008;90(1):25-37. doi:10.1016/j.cmpb.2007. 11.015 Ferrant M, Nabavi A, Macq B, Jolesz FA, Kikinis R, Warfield SK. Registration of 3-D intraoperative MR images of the brain using a finite-element biomechanical model. IEEE Trans Med Imaging. 2001;20(12):1384-1397. doi:10.1109/42.974933 Xuan J, Wang Y, Freedman MT, Adali T, Shields P. Nonrigid medical image registration by finite-element deformable sheetcurve models. Int J Biomed Imaging. 2006;2006:73430. doi:10. 1155/IJBI/2006/73430 Zhong HL, Peters T, Siebers JV. FEM-based evaluation of deformable image registration for radiation therapy. Phys Med Biol. 2007;52(16):4721-4738. doi:10.1088/0031-9155/ 52/16/001 Christensen GE, Song JH, Lu W, El Naqa I, Low DA. Tracking lung tissue motion and expansion/compression with inverse consistent image registration and spirometry. Med Phys. 2007;34(6): 2155-2163. doi:10.1118/1.2731029 Abolhassani N, Samani A. Non-rigid registration using free-form deformation for prostate images. In: Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS); 2005 June 26-28;51-54; Detroit, United States. doi:10.1109/Nafips. 2005.1548506 Jacobson TJ, Murphy MJ. Optimized knot placement for B-splines in deformable image registration. Med Phys. 2011; 38(8):4579-4582. doi:10.1118/1.3609416 Kybic J, Unser M. Fast parametric elastic image registration. IEEE Trans Image Process. 2003;12(11):1427-1442. doi:10. 1109/Tip.2003.813139 Loeckx D, Maes F, Vandermeulen D, Suetens P. Nonrigid image registration using free-form deformations with a local rigidity constraint. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI); 2004 September 26-29;3216:639-646; Saint-Mal, France. doi:10.1007/978-3540-30135-6_78 Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans Med Imaging. 1999;18(8): 712-721. doi:10.1109/42.796284 Glocker B, Sotiras A, Komodakis N, Paragios N. Deformable medical image registration: setting the state of the art with discrete methods. Annu Rev Biomed Eng. 2011;13:219-244. doi:10. 1146/annurev-bioeng-071910-124649


Kim et al

11

23. Boldea V, Sharp GC, Jiang SB, Sarrut D. 4D-CT lung motion estimation with deformable registration: quantification of motion nonlinearity and hysteresis. Med Phys. 2008;35(3):1008-1018. doi:10.1118/1.2839103 24. Brock KK, Consortium DRA. Results of a multi-institution deformable registration accuracy study (Midras). Int J Radiat Oncol. 2010;76(2):583-596. doi:10.1016/j.ijrobp.2009.06.031 25. Castillo R, Castillo E, Guerra R, et al. A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets. Phys Med Biol. 2009;54(7):1849-1870. doi: 10.1088/0031-9155/54/7/001 26. Dalah EZ, Nisbet A, Reise S, Bradley D. Evaluating commercial image registration packages for radiotherapy treatment planning. Appl Radiat Isotopes. 2008;66(12):1948-1953. doi:10.1016/j. apradiso.2008.06.003 27. Gu XJ, Pan H, Liang Y, et al. Implementation and evaluation of various demons deformable image registration algorithms on a GPU. Phys Med Biol. 2010;55(1):207-219. doi:10.1088/00319155/55/1/012 28. Latifi K, Zhang G, Stawicki M, van Elmpt W, Dekker A, Forster K. Validation of three deformable image registration algorithms for the thorax. J Appl Clin Med Phys. 2013;14(1):3834. doi:10. 1120/jacmp.v14i1.3834 29. Loi G, Dominietto M, Manfredda I, et al. Acceptance test of a commercially available software for automatic image registration of computed tomography (CT), magnetic resonance imaging (MRI) and (99 m)Tc-methoxyisobutylisonitrile (MIBI) single-photon emission computed tomography (SPECT) brain images. J Digit Imaging. 2008;21(3):329-337. doi:10.1007/s10278-007-9042-7 30. Murphy MJ, Salguero FJ, Siebers JV, Staub D, Vaman C. A method to estimate the effect of deformable image registration uncertainties on daily dose mapping. Med Phys. 2012;39(2): 573-580. doi:10.1118/1.3673772 31. Shen JK, Matuszewski BJ, Shark LK, Skalski A, Zielinski T, Moore CJ. Deformable Image Registration - A Critical Evaluation: Demons, B-Spline FFD and Spring Mass System. In: Fifth International Conference BioMedical in Medical and Biomedical Informatics; 2008 July 9-11;77-82; London, UK. doi:10.1109/ MediVis.2008.11 32. Skerl D, Likar B, Pernus F. A protocol for evaluation of similarity measures for non-rigid registration. Med Image Anal. 2008;12(1): 42-54. doi:10.1016/j.media.2007.06.001 33. Vaman C, Staub D, Williamson J, Murphy MJ. A method to map errors in the deformable registration of 4DCT images. Med Phys. 2010;37(11):5765-5776. doi:10.1118/1.3488983 34. Varadhan R, Karangelis G, Krishnan K, Hui S. A framework for deformable image registration validation in radiotherapy clinical applications. J Appl Clin Med Phys. 2013;14(1):4066. doi:10. 1120/jacmp.v14i1.4066 35. Wang H, Dong L, O’Daniel J, et al. Validation of an accelerated ‘‘demons’’ algorithm for deformable image registration in radiation therapy. Phys Med Biol. 2005;50(12):2887-905. doi:10.1088/ 0031-9155/50/12/011 36. Zhong HL, Kim J, Chetty IJ. Analysis of deformable image registration accuracy using computational modeling. Med Phys. 2010;37(3):970-979. doi:10.1118/1.3302141

37. Pluim JPW, Maintz JBA, Viergever MA. Mutual-informationbased registration of medical images: a survey. IEEE Trans Med Imaging. 2003;22(8):986-1004. doi:10.1109/Tmi.2003. 815867 38. Pluim JPW, Maintz JBA, Viergever MA. f-information measures in medical image registration. IEEE Trans Med Imaging. 2004; 23(12):1508-1516. doi:10.1109/Tmi.2004.836872 39. Chiang MC, Dutton RA, Hayashi KM, et al. Fluid registration of medical images using Jensen-Renyi Divergence reveals 3D profile of brain atrophy in HIV/AIDS. In: 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro; 2006 April 6-9;193-196; Arlington, VA, USA. doi:10.1109/ISBI.2006. 1624885 40. Friston KJ, Ashburner J, Frith CD, Poline JB, Heather JD, Frackowiak RSJ. Spatial registration and normalization of images. Hum Brain Mapp. 1995;3(3):165-189. doi:10.1002/hbm. 460030303 41. Thevenaz P, Unser M. Optimization of mutual information for multiresolution image registration. IEEE Trans Image Process. 2000;9(12):2083-2099. doi:10.1109/83.887976 42. Ibań˜ez L, Consortium IS. The ITK Software Guide: Updated for ITK Version 2.4. 2nd edn. New York, NY: Kitware; 2005. 43. Klein S, Staring M, Pluim JPW. Evaluation of optimization methods for nonrigid medical image registration using mutual information and B-splines. IEEE Trans Image Process. 2007;16(12): 2879-2890. doi:10.1109/Tip.2007.909412 44. Nie K, Chuang C, Kirby N, Braunstein S, Pouliot J. Site-specific deformable imaging registration algorithm selection using patient-based simulated deformations. Med Phys. 2013;40(4): 041911. doi:10.1118/1.4793723 45. Kadoya N, Fujita Y, Katsuta Y, et al. Evaluation of various deformable image registration algorithms for thoracic images. J Radiat Res. 2014;55(1):175-182. doi:10.1093/jrr/rrt093 46. Kashani R, Hub M, Kessler ML, Balter JM. Technical note: a physical phantom for assessment of accuracy of deformable alignment algorithms. Med Phys. 2007;34(7):2785-2788. doi:10.1118/ 1.2739812 47. Kirby N, Chuang C, Pouliot J. A two-dimensional deformable phantom for quantitatively verifying deformation algorithms. Med Phys. 2011;38(8):4583-4586. doi:10.1118/1.3597881 48. Serban M, Heath E, Stroian G, Collins DL, Seuntjens J. A deformable phantom for 4D radiotherapy verification: design and image registration evaluation. Med Phys. 2008;35(3):1094-1102. doi:10. 1118/1.2836417 49. Park SB, Kim H, Yao M, Ellis R, Machtay M, Sohn JW. Building deformation error histogram and quality assurance of deformable image registration. Med Phys. 2012;39(6):3672-3672. doi:10. 1118/1.4734922 50. Kim H, Monroe J, Yao M, et al. Use of deformation error histogram as an accuracy indicator for deformable image registration. Med Phys. 2013;40(6):169-169. doi:10.1118/1.4814296 51. Pukala J, Meeks SL, Staton RJ, Bova FJ, Manõn RR, Langen KM. A virtual phantom library for the quantification of deformable image registration uncertainties in patients with cancers of the head and neck. Med Phys. 2013;40(11):111703. doi:10.1118/1. 4823467


12


52. Drzymala RE, Mohan R, Brewster L, et al. Dose-volume histograms. Int J Radiat Oncol. 1991;21(1):71-78. doi:10.1016/03603016(91)90168-4 53. Park SB, Monroe JI, Brindle J, Sohn JW. Developing a Universal Treatment Plan Review System. Med Phys. 2009;36(6): 2663-2664. doi:10.1118/1.3182103 54. Meijering EHW, Niessen WJ, Pluim JPW, Viergever MA. Quantitative comparison of sinc- approximating kernels for medical image interpolation. In: Medical Image Computing

and Computer-Assisted Intervention (MICCAI); 1999 September 19-22;1679:210-217; Cambridge, UK. doi:10.1007/ 10704282_23 55. Pluim JPW, Maintz JBA, Viergever MA. Mutual information matching in multiresolution contexts. Image Vision Comput. 2001;19(1):45-52. doi:10.1016/S0262-8856(00)00054-8 56. Park SB, Monroe JI, Yao M, Machtay M, Sohn JW. Composite radiation dose representation using Fuzzy Set theory. Inform Sci. 2012;187:204-215. doi:10.1016/j.ins.2011.10.025


Using patient-specific phantoms to evaluate deformable image registration algorithms for adaptive radiation therapy.

Interactive multigrid refinement for deformable image registration.

Automated landmark-guided deformable image registration.

Digital pathology and image analysis augment biospecimen annotation and biobank quality assurance harmonization.

Automatic deformable MR-ultrasound registration for image-guided neurosurgery.

Deformable image registration for tissues with large displacements.

Deformable image registration between pathological images and MR image via an optical macro image.

MIND Demons for MR-to-CT Deformable Image Registration In Image-Guided Spine Surgery.

Validation of an improved 'diffeomorphic demons' algorithm for deformable image registration in image-guided radiation therapy.

Image analysis of endodontic radiographs: digital subtraction and quantitative densitometry.

Known-Component 3D-2D Registration for Image Guidance and Quality Assurance in Spine Surgery Pedicle Screw Placement.

Evaluation of deformable image registration methods for dose monitoring in head and neck radiotherapy.

A GPU based high-resolution multilevel biomechanical head and neck model for validating deformable image registration.

Deformable image registration of CT and truncated cone-beam CT for adaptive radiation therapy.

MIND Demons: Symmetric Diffeomorphic Deformable Registration of MR and CT for Image-Guided Spine Surgery.

Compounding local invariant features and global deformable geometry for medical image registration.

A multiple-image-based method to evaluate the performance of deformable image registration in the pelvis.

2D Model-to-Image Registration for Quantitative Dietary Assessment.

A deformable head and neck phantom with in-vivo dosimetry for adaptive radiotherapy quality assurance.

Deformable medical image registration of pleural cavity for photodynamic therapy by using finite-element based method.

Deformable image registration for adaptive radiotherapy with guaranteed local rigidity constraints.

Mid-Space-Independent Symmetric Data Term for Pairwise Deformable Image Registration.

Optimizing Options for Re-irradiation With Deformable Image Registration of Prior Plans.

Deformable image registration for defining the postimplant seroma in permanent breast seed implant brachytherapy.