Real-time computer-generated integral imaging and 3D image calibration for augmented reality surgical navigation.

G Model

ARTICLE IN PRESS

CMIG-1322; No. of Pages 13

Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Computerized Medical Imaging and Graphics journal homepage: www.elsevier.com/locate/compmedimag

Real-time computer-generated integral imaging and 3D image calibration for augmented reality surgical navigation Junchen Wang a,∗ , Hideyuki Suenaga b , Hongen Liao c,∗∗ , Kazuto Hoshi b , Liangjing Yang a , Etsuko Kobayashi a , Ichiro Sakuma a a

Graduate School of Engineering, The University of Tokyo, Tokyo, Japan Department of Oral-Maxillofacial Surgery, Dentistry and Orthodontics, The University of Tokyo Hospital, Tokyo, Japan c Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China b

a r t i c l e

i n f o

Article history: Received 22 April 2014 Received in revised form 3 September 2014 Accepted 3 November 2014 Keywords: Integral imaging 3D image calibration Augmented reality Surgical navigation Stereo tracking

a b s t r a c t Autostereoscopic 3D image overlay for augmented reality (AR) based surgical navigation has been studied and reported many times. For the purpose of surgical overlay, the 3D image is expected to have the same geometric shape as the original organ, and can be transformed to a specified location for image overlay. However, how to generate a 3D image with high geometric fidelity and quantitative evaluation of 3D image’s geometric accuracy have not been addressed. This paper proposes a graphics processing unit (GPU) based computer-generated integral imaging pipeline for real-time autostereoscopic 3D display, and an automatic closed-loop 3D image calibration paradigm for displaying undistorted 3D images. Based on the proposed methods, a novel AR device for 3D image surgical overlay is presented, which mainly consists of a 3D display, an AR window, a stereo camera for 3D measurement, and a workstation for information processing. The evaluation on the 3D image rendering performance with 2560 × 1600 elemental image resolution shows the rendering speeds of 50–60 frames per second (fps) for surface models, and 5–8 fps for large medical volumes. The evaluation of the undistorted 3D image after the calibration yields submillimeter geometric accuracy. A phantom experiment simulating oral and maxillofacial surgery was also performed to evaluate the proposed AR overlay device in terms of the image registration accuracy, 3D image overlay accuracy, and the visual effects of the overlay. The experimental results show satisfactory image registration and image overlay accuracy, and confirm the system usability. © 2014 Elsevier Ltd. All rights reserved.

1. Introduction 1.1. Background Augmented reality (AR) based surgical navigation allows virtual organs overlaid on real organs to provide surgeons with an immersive visualized surgical environment. Compared with virtual reality based surgical navigation where a flat 2D monitor is used to display a virtual surgical scene, AR navigation where a virtual surgical scene is further registered to reality could provide enhanced realism and more intuitive information for surgical guidance [1]. There are three types of AR visualization technologies in the current surgical navigation systems: video-based display [2–6], see-through display [7,8], and projection based AR [9–11]. Video-based display superimposes virtual organs on a (stereo) video stream captured

∗ Corresponding author. Tel.: +81 0358417480. ∗∗ Corresponding author. E-mail addresses: [email protected] (J. Wang), [email protected] (H. Liao).

by endoscopic cameras or head-mounted displays (HMD). Because camera videos cannot reproduce all the information obtained by the human visual system, see-through display and projection based AR were proposed for overlay on user’s direct view using translucent mirrors or projectors. The virtual organs used in most direct view overlay systems are 2D-projected computer graphics (CG) models of organs derived from preoperative medical images. Compared with visual perception of a 3D object, 2D projection lacks two important visual clues that give a viewer the perception of depth: stereo parallax and motion parallax [12]. Depth perception in image-guided surgery enhances the safety of the surgical operation. Therefore, for those applications which superimpose virtual organs directly on a real surgical site, a 3D image is preferred so that consistent and correct parallax is maintained when observed from different locations. Depth perception can be obtained from the parallax encoded in the 3D image to give surgeons a distance sensing about surgical targets. “3D image” in this paper refers to the image encoding the parallax information which can be extracted by some optic device or human eyes to give a viewer the depth perception: the third dimension.

http://dx.doi.org/10.1016/j.compmedimag.2014.11.003 0895-6111/© 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Wang J, et al. Real-time computer-generated integral imaging and 3D image calibration for augmented reality surgical navigation. Comput Med Imaging Graph (2014), http://dx.doi.org/10.1016/j.compmedimag.2014.11.003

G Model CMIG-1322; No. of Pages 13

ARTICLE IN PRESS J. Wang et al. / Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

2

Among 3D image display technologies are stereoscopy [13], integral imaging (or integral photography) [14], and holography [15]. Stereoscopy creates depth perception using two view images, like the left and right images seen by the human visual system. The disparities between the two images encode the parallax information which can be extracted using such as a parallax barrier or polarized glasses for display. However, stereoscopy has very limited data bandwidth (only two images). The parallax information is only provided at two predefined view points. That is why we cannot see more even if we shift our eyes in a 3D film cinema. On the opposite side, holography directly records the wavefront of the light from a 3D object. Since all optical data irradiated from the object can be captured and reproduced with minimal loss, the parallax information can be encoded in a complete way. However, the data bandwidth is too huge to be handled in real time using current available devices with satisfactory resolution and viewing angle. In addition, holography requires very complicated and expensive devices to capture and reproduce 3D images, which limits its application. Integral imaging exists between the stereoscopy and the holography. It has medium data bandwidth and provides stereo parallax and full motion parallax within its viewing angle. The devices for extracting the encoded parallax information are very simple: It requires only a high resolution liquid crystal display (LCD) and a lens array in front of the LCD to display 3D images. The resulting 3D image presents stereo parallax and continuous motion parallax which can be directly observed by viewers without wearing special glasses. These exciting advantages make integral imaging under active research. Although integral imaging was first invented by Lippmann more than 100 years ago [16], it began to draw great attentions in the last decade since high resolution digital cameras and LCDs became available. A thorough review on integral imaging can be found in [17] and [18]. Current study of integral imaging focuses on 3D image pickup methods [19], image reconstruction methods [20,21], and viewing quality enhancement [22,23].

1.2. Related work and limitations Our group first introduced integral imaging into surgical navigation for AR visualization [24–26], where an autostereoscopic 3D image is displayed by a lens array monitor (3D display) using computer-generated integral imaging (CGII), and is overlaid on the surgical site by a translucent mirror (AR window). The spatial registration between the 3D image and the surgical site is performed manually using a point-based registration method involving the use of a commercial optical tracking device. In our recent publication [27], an AR solution for dental surgery using 3D image overlay was proposed, where a stereo camera was employed to replace the commercial optical tracker for automatic real-time image registration and instrument tracking. For surgical overlay, the overlaid 3D image is expected to have the same geometric shape as the original organ. However, the resulting 3D image suffers from distortion owing to the inconsistent parameters between the digital recording and the optical reconstruction. In our previous work, the 3D image deformation issue was not well addressed and there was no quantitative evaluation on the geometric accuracy of 3D images. To reduce the distortion caused by the mismatch of the lens array and the LCD monitor, it took a lot of time to manually adjust the device itself. However, there was no guarantee that the resulting distortion could be controlled under certain level due to the lack of real-time feedback of current distortion level. The manual adjustment was performed in a trial-and-error fashion, which was very time consuming.

1.3. Improvement and contribution This paper proposes an automatic closed-loop 3D image calibration algorithm using a stereo camera to measure the distortion of 3D images as the real-time feedback. The final distortion error can be controlled under a small tolerance and the whole calibration procedure can be done within several seconds. To the best of our knowledge, there is no relevant work on compensating 3D image distortion. The second novelty of this paper is the new design of the graphics pipeline for computer-generated integral imaging. We have obtained the best rendering performance compared with our previous work. We also have made great improvement on the AR device for surgical overlay. Unlike the prototype in our previous work, the new design is compact, integrated and flexible. We performed comprehensive phantom experiments to evaluate the new AR device in terms of the accuracy and the visual effects of the 3D image overlay. The rest of the paper is organized as follows. Section 2 describes the principle of integral imaging and introduces the new 3D image rendering pipeline together with its implementation. Section 3 presents the 3D image calibration algorithm. Section 4 describes the new AR overlay device based on the proposed methods. Section 5 presents evaluation experiments and the results. Finally, Section 6 concludes the paper. 2. CGII rendering The basic principle of integral imaging is illustrated in Fig. 1. In the pickup procedure as shown in Fig. 1(a), the 3D object is captured by a lens array as individual elemental images recorded on the imaging plane (e.g., charge coupled device, CCD), which is located behind the lens array. For display, an LCD is used to display the previously recorded elemental images to reproduce the original routes of rays irradiated from the 3D object, causing a 3D image to be formed (Fig. 1(b)). The pickup procedure can be simulated using computer graphics (CG) techniques by tracing the desired rays if the 3D object is represented by 3D data (i.e., surface models or volumes), and we need only an LCD and a lens array to display 3D images. The problem of generating elemental images has become a CG rendering problem. Assume the LCD used for 3D display has a pixel resolution of nx × ny with pixel pitches of px × py mm, we want to synthesize an image (referred to as the composite elemental image, CEI) of nx × ny pixels, where each pixel value represents the color of the ray connecting that pixel and its assigned elemental lens. The process of synthesizing such a CEI is called CGII rendering. 2.1. Pixel assignment Pixel assignment is to associate each pixel of the LCD screen with its corresponding lens on the lens array. The assignment is determined by the shape and pitches of the lens array. A hexagonal lens array is preferred because it can provide a denser lens arrangement (small pitches) than that of a rectangular one, as shown in Fig. 2(a). Fig. 2(b) shows the pixel assignment when using a hexagonal lens array. Assume the distance between the screen plane (u − v) and the lens array plane (s − t) is g, we project the lens array plane to the screen plane orthogonally along the normal direction of the screen, and a pixel is assigned to the lens whose center is nearest to that pixel. This can be expressed mathematically as PA(u, v) = (s, t) ⇔ (u, v) ∈ Vor(s, t)

(1)

where PA(u, v) is a R2 → R2 mapping function that maps the pixel at (u, v) to its corresponding lens center; Vor(s, t) represents the Voronoi region associated with the lens centered at (s, t). Because a pixel can be addressed by its zero-based integer indices (i, j), a




3

Fig. 1. Basic principle of integral imaging.

2D lookup table P AT[i, j] with 2-channels (s − t channel) is built to store the pixel assignment results as follows u = (i + 0.5 − 0.5nx )px

(2)

v = (j + 0.5 − 0.5ny )py

(3)

P AT [i, j] = fd(u, v|L)

(4)

where fd(u, v|L) is a function that returns the point in the point set L which is nearest to the point (u, v); L is a point set consisting of all the lens centers. Eqs. (2) and (3) are nothing more than the discrete–continuous conversion of pixel coordinates. The lookup table can be implemented as e.g., a 2D texture object in OpenGL, and the mathematical representation of the ray passing the pixel can be extracted by texture fetch from PAT.

to WCS’s origin. The distance is called referential distance which is determined by

f = max

nx px g ny py g , 2lx 2ly

(5)

where lx and ly are lens pitches defined in Fig. 2(a). The camera looks at the screen center (camera’s viewing direction is opposite to its z axis) with its x and y axes parallel to those of WCS. Fig. 3 illustrates the model-view configuration. With this configuration, we can freely place the 3D object with respect to the 3D display by changing the model matrix. The model matrix is usually obtained by image registration. 2.3. Ray sampling

2.2. Spatial transformation To display a 3D image at a specified location with respect to the physical device, the spatial transformation between different coordinate systems should be addressed during the CGII rendering process. The spatial transformation is described by the model matrix and view matrix. The model matrix is the transformation matrix from the world coordinate system (WCS) to the model coordinate system (MCS), where the 3D object is defined. WCS is defined on the lens array plane (s − t), whose x and y axes are parallel to the width and height directions of the screen, respectively, with the origin located at the lens array center. The view matrix is the transformation matrix from WCS to the camera coordinate system (CCS), which is set to be located on WCS’s z axis at a distance of f

In the rendering process, every pixel (i, j) of the screen is processed in parallel using a GPU to sample a ray stored in PAT[i, j]. With the gap distance g, we can uniquely determine the ray in WCS and calculate its color using some shading model. 2.3.1. Direct raycasting If the 3D object is a medical volume, i.e., a scalar field s( x) with x being a 3D position vector, direct raycasting is applied to calculate the ray color. As shown in Fig. 4(a), two intersection points of the currently processed ray and the volume bounding box can be easily calculated in MCS, denoted by t 1 and t 2 . Assume that the ray is parameterized by the distance t to t 1 , we can calculate the color of

Fig. 2. (a) Lens array shape. (b) Pixel assignment.


G Model

ARTICLE IN PRESS


J. Wang et al. / Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

4

Fig. 3. Coordinate systems for CGII rendering.

the ray after it interacts with the volume according to the volume rendering integral [28].

d

I=

c˜ (s(t)) exp 0

t

(s(t ))dt

−

dt

(6)

0

where I is the final composite color; d is the distance between t1 and t2 ; s(t) is the scalar value of the volume at the position t of the ray; c˜ (s) and (s) are the color density and extinction density, respectively. With the substitution c˜ (s) = c(s)(s), we can write the numerical integration of Eq. (6) as I=

N k=1

is shaded according to some lighting model. The Phong shading model is used to calculate Ck given the primary color ck , the gradient of s( x), and lighting parameters, with the light source set to be located at the camera center.

˛k Ck

k−1

(1 − ˛k )

(7)

k =1

where ˛k = (s(kt)) and Ck = c(s(kt)) are the opacity and color of the kth segment of the ray; t is the integration step. (s) and c(s) are given in the form of opacity and color transfer functions with respect to the scalar s. ˛k Ck is also called opacity-weighted color, which represents the total light contributed by the kth segment of the ray. Eq. (7) can also be written in a front-to-back iteration manner Ik = Ik−1 + ˛k Ck Ak−1 , Ak = (1 − ˛k )Ak−1

(8)

with I0 = 0, A0 = 1 and k = 1, 2, · · · , N. Note that Ck may not only depend on ck ≡ c(s(kt)), but may also depend on other parameters, for example, the gradient of s( x), the light source, ambient and diffuse lighting parameters, etc. We call ck the primary color which is from the color transfer function, and Ck the shading color which

2.3.2. View point moving If the 3D object is a surface model extracted from a medical volume, directly sampling the desired rays is difficult. Instead, we sample the blue dot rays shown in Fig. 4(b) by moving the view point on the x − y plane of the camera (referred to as view point plane or x − y plane). The number and spacing of the view points are given by nvx =

ly lx , nvy = px py

(9)

pvx =

fly flx , pvy = g g

(10)

where nvx (nvy ) and pvx (pvy ) are the number and spacing of the view points in the x (y) direction. Performing off-axis projection rendering towards the screen (u − v) will produce a series of viewpoint images, denoted by Ix,y (i, j), where (x, y) are the coordinates of the view point on the x-y plane, and (i, j) are the pixel coordinates on the u − v plane. This process is equivalent to sampling the blue dot rays in Fig. 4(b). The off-axis projection rendering obtains samples of the rays emitted from the object surface facing the camera; therefore, the pseudoscopic-orthoscopic conversion is automatically performed. Finally, as demonstrated in Fig. 4(c), the pixel color

Fig. 4. (a) Direct raycasting. (b) View point moving. (c) Bilinear interpolation on the view point plane.




5

Fig. 5. CGII rendering pipeline for (a) volumetric data and (b) surface models.

of the CEI is calculated using bilinear interpolation on the x − y plane as follows CEI(i, j) = I11 (x2 − x)(y2 − y) + I21 (x − x1 )(y2 − y) + I12 (x2 − x)(y − y1 ) + I22 (x − x1 )(y − y1 ) Imn =

I xm ,yn (i, j) (x2 − x1 )(y2 − y1 )

(11)

(12)

where (x, y) are the coordinates of the intersection point of the x − y plane and the ray which connects the pixel (i, j) and its nearest lens center; (xm , yn ), {m, n} ∈ {1, 2} × {1, 2} are four neighbor view points of (x, y).

surface renderings at the predefined nvx × nvy locations of the view point are performed. The resulting nvx × nvy viewpoint images are stored as a 2D texture array object. At last, the final pass is triggered by drawing a proxy rectangle so that every pixel of the screen will be processed by the fragment processor, where the bilinear interpolation is performed to synthesize the CEI. The pipeline dynamically loads corresponding shaders according to the type of input data, changing modes between the direct raycasting and the view point moving flexibly. 3. 3D image calibration 3.1. Necessity of 3D image calibration

2.3.3. Graphics pipeline The CGII rendering process described above is implemented by taking advantages of the programmable OpenGL graphics pipeline and high parallel computing performance of a GPU. Modern OpenGL has a programmable graphics pipeline which could be executed in parallel on a many-core GPU. We customize the pipeline so as to achieve integral imaging functionality by providing the vertex program and fragment program (also called vertex shader and fragment shader), which are executed in parallel by GPU’s vertex and fragment processors, respectively. For volumetric data, it is a one pass rendering process. The volumetric data together with its gradient data is loaded into OpenGL as a 3D texture with 4 channels, and then a proxy rectangle is drawn so that every pixel of the screen will be processed by the fragment processor. In the fragment shader, the ray to be sampled is extracted by texture fetch from the pre-calculated PAT, and the following procedure is to perform the iterative color compositing described by Eq. (8). Fig. 5(a) illustrates the graphics pipeline of the direct raycasting. For a surface model, it is a multi-pass rendering process. As illustrated in Fig. 5(b), the surface model is loaded into OpenGL as a vertex buffer object (VBO), and then nvx × nvy times regular

We hope that the 3D image has high geometric fidelity (i.e., the same physical dimensions) as the original one for the purpose of surgical overlay. Theoretically, the 3D image has the same physical dimensions as the original 3D object if the parameters (px , py , lx , ly , g) are exactly the same in the rendering and display phases. Current commercial LCDs have accurate pixel pitches, and the lens array pitches can be guaranteed by precise manufacturing. However, the distance from the lens array to the LCD (i.e., g) cannot be accurately measured on the physical device, and a small change of g will lead to a large distortion of the 3D image in the depth dimension. In addition, the assembly errors between the lens array and the LCD in x (horizontal) and y (vertical) directions will lead to horizontal and vertical errors of the 3D images. Fig. 6(a) explains the image distortion by changing and shifting the lens array. Therefore, 3D image calibration must be performed before surgical overlay. 3.2. Calibration algorithm We can cancel the misalignment of the lens array by adding a small offset vector (s, t) on all the entries of the lookup table




6

Fig. 6. (a) 3D image distortion caused by parameter mismatch. (b) Closed-loop 3D image calibration paradigm. (c) Virtual calibration rig. (d) Flowchart of the 3D image calibration algorithm.

PAT to slightly rectify the pixel assignment results. Let I(g, s, t) represent the CEI rendered with the parameter g and the pixel assignment rectification of (s, t), where s and t satisfy −

lx lx ≤ s ≤ 2 2

(13)

−

ly ly ≤ t ≤ 2 2

(14)

Eq. (13) and (14) are from the fact that the distortion of the 3D image caused by lens array shift is periodic regarding the relative shift. Next, we will find optimal (g, s, t) so that the 3D image has correct physical dimensions. Fig. 6(b) illustrates our proposed calibration paradigm. The 3D image of a calibration rig (surface model) is displayed and captured by a stereo camera. The stereo camera consists of two complementary metal oxide semiconductor (CMOS) cameras (1280 × 1024 pixels, 60 fps, USB3.0 interface), and is calibrated using a 7 × 7 dot array pattern on a 100 × 100 mm plate. The left and right video streams are then undistorted and rectified so that only the horizontal parallax exists between left and right cameras, which will simplify the stereo matching process to linear search. The geometric information of the 3D image is reconstructed by the camera, and is fed to the CGII rendering pipeline to

update the 3D image. This closed-loop “visual servoing” process is finished until some conditions are satisfied. The calibration rig consists of five spatial spheres (A, B, C, D, E) with known geometries, as shown in Fig. 6(c). 3D Coordinates of spherical centers can be measured by the stereo camera. The measured geometries are compared with the theoretical ones to update (g, s, t) by minimizing the difference. For a 3D image of the calibration rig, s (t) affects the relative position of E with respect to A (C) and B (D); g affects the distance between E and the ABCD plane. A three-step strategy is applied to find the optimal (g, s, t) by independently optimizing one parameter at each step while fixing the other two. In step 1, I(g0 , 0, 0) is generated with an initial estimate of g0 . Fixing g = g0 and t = 0, we are looking at the target function Lx (s) = |AE| − |BE| which is a monotonic function near its root. The root of Lx (s), denoted by s, is the optimal s. s can be found easily using linear search with a fixed small step until |Lx (s)| ≤ (e.g., = 0.5 mm). In step 2, similarly, with fixed s and g0 , we find t satisfying |Ly (t)| ≤ , where |Ly (t)| = |DE| − |CE|. In step 3, the optimal g will be found with fixed (s, t). Let l(g) denote the distance from E to the ABCD plane measured on the 3D image of I(g, s, t), and L be the true distance (20 mm). The goal is to find g that minimizes |l(g) − L|. Starting at g0 , the linear search is quite simple: if l(g) − L < 0 (l(g) − L > 0) then increase (decrease) g by




7

Fig. 7. (a) System configuration. (b) Physical setup. (c) Display-camera registration.

a small step until l(g) − L ≥ 0 (l(g) − L ≤ 0). This linear search quickly converges within a dozen of iterations because g0 can be guaranteed to be close to g. Fig. 6(d) demonstrates the workflow of the three-step strategy. The optimal parameters (g, s, t) are used to generate I(g, s, t) which has the highest geometric fidelity as the original 3D object. The stereo measurement on a 3D image is automatically performed by image processing; therefore, the whole calibration procedure is carried out in a fully automatic way within several seconds. 4. AR overlay device

surgical overlay. The aim is to obtain the transformation matrix T C D from the reflected 3D display frame TD to the stereo camera frame TC , as shown in Fig. 7(c). For this purpose, the 3D image of the calibration rig (Fig. 6(c)) is displayed, and the stereo images of the 3D image are captured by the stereo camera through the AR window. Then, the 3D coordinates of the spherical centers in the camera frame are reconstructed and matched to the known ones in TD using a point-based registration method. In this way, the 3D display is associated with the stereo camera, and a virtual organ can be overlaid on the correct place using the matrix T I (i.e. the model matrix D described in Section 2.2) after the image registration is completed: T I = T C T IC

(15)

So far, we can display an undistorted 3D image of either a surface model or a medical volume at a specified location with respect to the 3D display. For the purpose of surgical overlay, we design a novel AR device for 3D image surgical overlay based on the proposed methods.

where T IC is the image registration matrix representing the transformation from the camera frame TC to the preoperative image frame TI .

4.1. System configuration

5. Experiments and results

Fig. 7(a) illustrates the system configuration. A stereo camera is integrated with the 3D display for 3D measurement. A translucent mirror is used to superimpose the 3D image on the surgical site for intraoperative visualization. Surgeons will see through the translucent mirror (AR window) to observe the AR scene. A workstation with a consumer-level GPU is for real-time CGII rendering and information processing. The physical setup of the system is shown in Fig. 7(b). The AR window together with the 3D display can be dragged and positioned freely during surgery.

5.1. CGII rendering performance

4.2. Display-camera registration After the 3D image calibration, it is assumed that the undistorted 3D image of a 3D object can be displayed at a specified location with respect to the 3D display. Next, we have to associate the 3D display coordinate system with that of the stereo camera for

D

D

The specifications of the 3D display for rendering performance evaluation is shown in Table 1. A CT volume of a head with 512 × 512 × 388 voxels and a MR volume of a heart with 512 × 512 × 186 voxels were used to test the direct raycasting method. A surface model of a lower jaw with 205,138 triangles, which was segmented and reconstructed from a CT volume, was used to test the view point moving method. For each dataset, the CEIs with two different resolutions were synthesized using the proposed rendering method. The rendering speeds are summarized in Table 2. Fig. 8(a) and (b) shows the common volume rendering result, and the corresponding CEI of the head volume. Fig. 8(c) and (d) shows the 3D image of the head observed through the lens array at two different vertical locations. Fig. 8(e) and (f) shows the 3D image of the heart observed at two different

Table 1 Specifications of the 3D display. LCD x-pitch (px ) y-pitch (py ) Resolution Screen size Color mode

Lens array (Hexagon) 0.0847 mm 0.0847 mm 2560 × 1600 10 RGB

x-pitch (lx ) y-pitch (ly ) Lens number Dimensions Lens diameter

Computer 1.001 mm 0.876 mm 210 × 160 217 × 150 mm 1.05 mm

CPU GPU Memory Shading language Graphics API

Intel Core i7 (2.93 GHz) GeForce GTX780 8 GB Cg by NVIDIA OpenGL4.4




8

Fig. 8. (a) Common volume rendering of the head volume. (b) CEI of the head volume. (c)–(h) 3D images observed from different locations (captured by a digital camera).

horizontal locations. Fig. 8(g) and (h) shows the resulting 3D jaw image observed from two different vertical locations. 5.2. 3D image calibration evaluation 3D image calibration was performed to remove the image distortion by automatically adjusting rendering parameters to compensate the mismatch between the LCD and the lens array. The current values of Lx (s), Ly (t) and l(g) were calculated in real time. The step lengths in the linear search of s, t, and g were set to be 0.02 mm, 0.02 mm, and 0.1 mm, respectively. The tolerance was set to be 0.5 mm. Because the search space for each parameter is tightly bounded and the target functions are monotonic, it was very fast (within several seconds) to converge on the optimal parameters. Fig. 9(a) shows the screenshot of the real-time stereo measurement on the 3D image of the calibration rig. The spheres were automatically segmented according to the color information in the left and right images, and the spherical contour was approximated by an ellipse whose center was calculated as the 2D projection of the spherical center. Fig. 9(b) shows the reconstructed calibration rig in the stereo camera frame. Fig. 10 shows the measured curves of the target function Lx (s), Ly (t) and l(g) around the optimal values. Note that for Lx (s) and Ly (t), the x-axis was shift towards the negative direction of the found optical value for better visualization (centered at zero). 5.3. Geometric accuracy evaluation After the 3D image calibration, we obtained the optimal parameters which enable us to display an undistorted 3D image at a specified location with respect to the 3D display. Using the optimal parameters, the calibration rig was displayed again and the displaycamera registration was performed as described in Section 4.2 to get the transformation matrix from the stereo camera frame to the Table 2 Rendering performance. Dataset

Description

Head volume Heart volume Jaw model

5122 × 388 voxels 5122 × 186 voxels 205, 318 triangles

CEI resolution (nx × ny ) 2560 × 1600

1024 × 768

5.5 fps 8.2 fps 58.5 fps

15.6 fps 29.3 fps 56.7 fps

3D display frame. Then, a sphere (3D image) with 4 mm in diameter was displayed at different depths with respect to the 3D display. At each position, the 3D coordinates of the sphere was measured by the stereo camera and further transformed to the 3D display frame using the previously calculated registration matrix. An error was calculated by comparing with its canonical coordinates to evaluate the target registration error (TRE) [29], which reflects the geometric accuracy of the 3D image. Fig. 11(a) shows the space within which the TREs were evaluated. The target positions distributed every 5 mm from z = −50 mm to z = 50 mm. The fiducial registration error (FRE) of the display-camera registration was 0.10 mm, and the TREs at different depths are plotted in Fig. 11(b). The maximum TRE was 0.72 mm at the plane z = −50 mm. 5.4. AR device evaluation To evaluate the accuracy and the usability of the proposed AR overlay device, a phantom application, simulating oral and maxillofacial surgery, was performed. Oral and maxillofacial surgery often involves drilling holes and cutting bones on the craniomaxillofacial region. These operations should be performed without damaging critical structures, for example, the tooth roots or nerve channels. However these critical structures are hidden by other tissue and cannot be observed directly by surgeons. It is helpful to overlay 3D images of these critical structures on the patient intraoperatively to enhance visualization. 5.4.1. Experimental setup Fig. 12(a) shows the experimental scene. A lower and a upper jaw models were fabricated using a 3D printer from real patient data, as shown in Fig. 12(b). They were assembled with a head phantom (with rubber skin) to constitute a patient phantom, as shown in Fig. 12(c). Fig. 12(d) and (e) shows the left and right images captured through the AR window by the stereo camera. The stereo camera has a baseline of 100 mm and the resolution of 1280 × 1024 pixel with 60 fps frame rate. It was calibrated using the HALCON software with the final reprojection error of 0.05 pixel. The 3D display consists of a 200 ppi LCD with 1024 × 768 resolution and a hexagonal lens array with the lens pitches of 1.001 × 0.876 mm. The 3D image calibration was performed in advance to obtain the optimal rendering parameters. Then the display-camera registration was performed, yielding a fiducial registration error of 0.15 mm.




Fig. 9. (a) Screenshot of the real-time stereo measurement on the 3D image of the calibration rig. (b) Reconstructed calibration rig in the stereo camera frame.

Fig. 10. Measured curves of the target functions.

Fig. 11. (a) Target position distribution (b) TREs with respect to the depths.


9

G Model CMIG-1322; No. of Pages 13 10


Fig. 12. Experimental setup. (a) Experimental scene. (b) Jaw models by 3D printing. (c) Patient phantom. (d) and (e) Left and right views of the stereo camera.

5.4.2. Image registration In order to achieve correct 3D image overlay, the image registration must be performed to obtain T IC in Eq. (15). We have proposed an automatic marker-free image registration method using teeth contour tracking for oral and maxillofacial surgery in our previous work [27]. The teeth contour of either the upper or lower jaw (depending on the surgical site) is reconstructed in real time by the stereo camera, and is registered to the corresponding contour extracted from the surface model (reconstructed from e.g., CT image) using the iterative closest point (ICP) algorithm to obtain T IC . Because the registration is performed in real time, even if the patient moved, the registration matrix can get updated automatically to maintain correct image overlay. Fig. 13(a)–(d) shows the image registration results using the front teeth contour and the

molar contour, respectively. The surface models of the teeth were overlaid on the stereo camera views using the calculated registration matrix T IC . In order to evaluate the image registration error, several fiducial points were made as target points on the surface of the jaw model, as shown in Fig. 13(e) and (f). The coordinates of the fiducial points in the image space (ground truth) were already known (actually we made the fiducial points on the surface model of the jaw and then printed it out using a 3D printer). After the image registration was completed, the coordinates of the fiducial points were reconstructed using the stereo camera and were transformed into the image space by the resulting registration matrix. Target registration errors (TRE) could be measured by comparing the transformed coordinates with the ground truth. We evaluated TREs of 9 fiducial

Fig. 13. Image registration results. (a) and (b) Image registration using the front teeth contour. (c) and (d) Image registration using the molar contour. (e) and (f) Fiducial points in the front teeth and molar areas. (g) and (h) 2D projections of the ground truth using the inverse of the registration matrix.


G Model

ARTICLE IN PRESS


J. Wang et al. / Computerized Medical Imaging and Graphics xxx (2014) xxx–xxx

11

Table 3 Target registration errors (mm). Front teeth

Index TRE

1 0.53

2 0.55

3 0.71

4 0.20

5 1.41

6 0.67

7 0.41

8 1.39

9 0.48

RMS 0.81

Molar

Index TRE

1 0.19

2 0.82

3 1.02

4 0.30

5 0.42

6 0.90

7 0.42

8 1.12

9 0.85

RMS 0.74

points on the front teeth area (Fig. 13(e), using front teeth contour tracking) and the molar area (Fig. 13(f), using the molar contour tracking), respectively. The results are summarized in Table 3. The root mean square (RMS) errors were 0.81 mm and 0.74 mm in the front teeth area and the molar area, respectively. Fig. 13(g) and (h) shows the 2D projections of the fiducial points by transforming the ground truth into the camera space using the inverse of the registration matrix. 5.4.3. 3D image overlay Once the image registration was completed, the obtained T IC together with the display-camera transformation T C was used to D superimpose 3D images of interest on the patient for visualization. Fig. 14(a) and (b) shows the overlay of the tooth roots, nerve channels and a wisdom tooth, which were observed through the AR window. Surgeons could perform operations safely by avoiding damage to the tooth roots and nerve channels with the help of the in situ 3D image overlay. The image overlay accuracy was assessed by overlaying preoperative surgical plan intraoperatively. In the real surgery, surgeons often perform preoperative surgical simulation on the 3D model

from patient’s image data. One problem is the accurate delivery of preoperative plan to the surgery. In our experiment, 4 entry points on the left molar area for fixing a titanium plate which is usually used in orthognathic surgery were preplanned on the model of the lower jaw. Then, the preplanned entry points were overlaid intraoperatively as shown in Fig. 14(c). Under the visual guidance of the 3D images, we used a marker pen to mark the entry points on the surface of the jaw model, and compared the marked positions with the ground truth. Fig. 14(d) shows the deviations, where Point 4 agrees well with the ground truth, and the errors of Point 1–3 are approximately 1.01 mm, 0.91 mm, 0.95 mm, respectively. While user perception is subjective, we asked a surgeon from the department of oral and maxillofacial surgery, the university of Tokyo hospital to participate in the experiment using the AR overlay device. The overlay results shown in Fig. 14 were consistent with his professional expertise. We also got positive feedback from the surgeon that it is useful to overlay the hidden critical structures and/or preoperative surgical simulation results on the patient intraoperatively in the form of 3D images without glasses. The current system is easy to set up, and it takes little time to make it ready to use.

Fig. 14. 3D image overlay evaluation. (a) and (b) Critical structure overlay in the front teeth and molar areas. (c) Intraoperative delivery of preplanned surgical plan. (d) Overlay error evaluation.




12

6. Discussion and conclusion

References

For the sake of 3D image surgical overlay, there are three important problems which should be addressed. First is how to display a 3D image given patient’s medical data; second is how to generate a 3D image which has high geometric fidelity as the original one; third is how to overlay a 3D image on a specified location with respect to a tracking device. To answer the three questions, a CGII rendering method as well as the corresponding OpenGL pipeline implementation was presented to generate 3D images for both surface models and medical volumes; a closed-loop automatic 3D image calibration paradigm was proposed to find optimal parameters for generating undistorted 3D images; a virtual spatial calibration rig was used to associate the 3D display with the tracking device for spatial transformation. Experimental results show that the proposed rendering pipeline has real-time 3D imaging performance, and the 3D image after the calibration has good geometric accuracy of less than 1 mm within a depth range of 100 mm. In addition, a phantom application simulating oral and maxillofacial surgery was performed to evaluate the proposed new AR device in terms of the system accuracy and the visual effects of the 3D image overlay. We obtained satisfactory image registration and image overlay accuracy, and positive feedback from surgeons. The direct raycasting method for 3D image rendering could provide more rich visual effects compared with the view point moving method without requiring preoperative image segmentation, while suffering from low rendering speed. The view point moving method achieved fully real-time rendering performance, and its rendering speed mainly depends on the number of model’s geometric primitives. Compared with projector-based or stereoscopy-based AR techniques, the integral imaging based 3D image overlay can deal with the image overlay in a small surgical site (versus projector-based) without bulky viewing devices (versus stereoscopy-based). The 3D image calibration is important because it can remove the distortion of 3D images caused by the mismatched rendering parameters. Serious distortion will cause large image overlay error, hampering the clinical delivery of the 3D image overlay device. Our proposed 3D image calibration method is straightforward and effective without using any physical calibration device. The undistorted 3D image have good geometric accuracy. The AR overlay device is low cost, compact, flexible and easy to set up. In this study, it was used for oral and maxillofacial surgery, because we have developed the corresponding image registration method. The system can also be used in other kind of surgery, as long as the image registration problem could be well addressed. The stereo camera can serve as a 3D tracking device such as the NDI Polaris optical tracking system. Therefore, the common fiducial marker based registration method is also available in our system. Moreover, it is also possible to use the stereo camera to perform surface reconstruction or other kind of 3D measurement which could be used for image registration. We believe that the proposed methods will promote the application of autostereoscopic 3D image overlay in AR surgical navigation.

[1] Nicolau S, Soler L, Mutter D, Marescaux J. Augmented reality in surgical laparoscopic oncology. Surg Oncol 2011;20(3):189–201, http://dx.doi.org/10.1016/j.suronc.2011.07.002. [2] Shekhar R, Dandekar O, Bhat V, Philip M, Lei P, Godinez C, et al. Live augmented reality: a new visualization method for laparoscopic surgery using continuous volumetric computed tomography. Surg Endosc 2010;24(8):1976–85, http://dx.doi.org/10.1007/s00464-010-0890-8. [3] Su L-M, Vagvolgyi BP, Agarwal R, Reiley CE, Taylor RH, Hager GD. Augmented reality during robot-assisted laparoscopic partial nephrectomy: Toward realtime 3D-CT to stereoscopic video registration. Urology 2009;73(4):896–900, http://dx.doi.org/10.1016/j.urology.2008.11.040. [4] Ferrari V, Megali G, Troia E, Pietrabissa A, Mosca F. A 3-d mixed-reality system for stereoscopic visualization of medical dataset. IEEE Trans Biomed Eng 2009;56(11):2627–33, http://dx.doi.org/10.1109/TBME.2009.2028013. [5] Figl M, Ede C, Hummel J, Wanschitz F, Ewers R, Bergmann H, et al. A fully automated calibration method for an optical see-through head-mounted operating microscope with variable zoom and focus. IEEE Trans Med Imaging 2005;24(11):1492–9, http://dx.doi.org/10.1109/TMI.2005.856746. [6] Teber D, Guven S, Simpfendrfer T, Baumhauer M, Gven EO, Yencilek F, et al. Augmented reality: a new tool to improve surgical accuracy during laparoscopic partial nephrectomy? preliminary in vitro and in vivo results. Eur Urol 2009;56(2):332–8, http://dx.doi.org/10.1016/j.eururo.2009.05.017. [7] Masamune K, Sato I, Liao H, Dohi T. Non-metal slice image overlay display system used inside the open type MRI. In: Dohi T, Sakuma I, Liao H, editors. Medical imaging and augmented reality. Lecture notes in computer science, vol. 5128. Berlin/Heidelberg: Springer; 2008. p. 385–92, http://dx.doi.org/10.1007/978-3-540-79982-5 42. [8] Fichtinger G, Deguet A, Masamune K, Balogh E, Fischer G, Mathieu H, et al. Image overlay guidance for needle insertion in CT scanner. IEEE Trans Biomed Eng 2005;52(8):1415–24, http://dx.doi.org/10.1109/TBME.2005.851493. [9] Wen R, Chui C-K, Ong S-H, Lim K-B, Chang S-Y. Projection-based visual guidance for robot-aided RF needle insertion. Int J Comput Assist Radiol Surg 2013;8(6):1015–25, http://dx.doi.org/10.1007/s11548-013-0897-4. [10] Osorio A, Galan J-A, Nauroy J, Donars P. Real time planning, guidance and validation of surgical acts using 3D segmentations, augmented reality projections and surgical tools video tracking. Proc SPIE 2010;7625, http://dx.doi.org/10.1117/12.844039. [11] Krempien R, Hoppe H, Kahrs L, Daeuber S, Schorr O, Eggers G, et al. Projectorbased augmented reality for intuitive intraoperative guidance in image-guided 3D interstitial brachytherapy. Int J Radiat Oncol Biol Phys 2008;70(3):944–52, http://dx.doi.org/10.1016/j.ijrobp.2007.10.048. [12] Dodgson NA. Autostereoscopic 3D displays. Computer 2005;8:31–6. [13] Lambooij M, Fortuin M, Heynderickx I, IJsselsteijn W. Visual discomfort and visual fatigue of stereoscopic displays: a review. J Imaging Sci 2009;53(3), http://dx.doi.org/10.2352/J.ImagingSci.Technol.2009.53.3.030201. [14] Martinez-Cuenca R, Saavedra G, Martinez-Corral M, Javidi B. Progress in 3-D multiperspective display by integral imaging. Proc IEEE 2009;97(6):1067–77, http://dx.doi.org/10.1109/JPROC.2009.2016816. [15] Collier R. Optical holography. Elsevier; 2013. [16] Lippmann G. Epreuves reversibles donnant la sensation du relief. J Phys Theor Appl 1908;7(1):821–5. [17] Park J-H, Hong K, Lee B. Recent progress in three-dimensional information processing based on integral imaging. Appl Opt 2009;48(34):H77–94, http://dx.doi.org/10.1364/AO.48.000H77. [18] Stern A, Javidi B. Three-dimensional image sensing, visualization, and processing using integral imaging. Proc IEEE 2006;94(3):591–607, http://dx.doi.org/10.1109/JPROC.2006.870696. [19] Okano F, Hoshino H, Arai J, Yuyama I. Real-time pickup method for a three-dimensional image based on integral photography. Appl Opt 1997;36(7):1598–603, http://dx.doi.org/10.1364/AO.36.001598. [20] Arimoto H, Javidi B. Integral three-dimensional imaging with digital reconstruction. Opt Lett 2001;26(3):157–9, http://dx.doi.org/10.1364/OL.26.000157. [21] Hong S-H, Jang J-S, Javidi B. Three-dimensional volumetric object reconstruction using computational integral imaging. Opt Exp 2004;12(3):483–91, http://dx.doi.org/10.1364/OPEX.12.000483. [22] Liao H, Iwahara M, Katayama Y, Hata N, Dohi T. Three-dimensional display with a long viewing distance by use of integral photography. Opt Lett 2005;30(6):613–5, http://dx.doi.org/10.1364/OL.30.000613. [23] Kim Y, Choi H, Kim J, Cho S-W, Kim Y, Park G, et al. Depth-enhanced integral imaging display system with electrically variable image planes using polymer-dispersed liquid-crystal layers. Appl Opt 2007;46(18):3766–73, http://dx.doi.org/10.1364/AO.46.003766. [24] Liao H, Hata N, Nakajima S, Iwahara M, Sakuma I, Dohi T. Surgical navigation by autostereoscopic image overlay of integral videography. IEEE Trans Inform Technol Biomed 2004;8(2):114–21, http://dx.doi.org/10.1109/TITB.2004.826734. [25] Liao H, Inomata T, Sakuma I, Dohi T. 3-d augmented reality for mri-guided surgery using integral videography autostereoscopic image overlay. IEEE Trans Biomed Eng 2010;57(6):1476–86, http://dx.doi.org/10.1109/TBME.2010.2040278.

Conflict of interest None.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/ j.compmedimag.2014.11.003.




[26] Liao H, Ishihara H, Tran HH, Masamune K, Sakuma I, Dohi T. Precision-guided surgical navigation system using laser guidance and 3d autostereoscopic image overlay. Comput Med Imaging Gr 2010;34(1):46–54, http://dx.doi.org/10.1016/j.compmedimag.2009.07.003. [27] Wang J, Suenaga H, Hoshi K, Yang L, Kobayashi E, Sakuma I, et al. Augmented reality navigation with automatic marker-free image registration using 3D image overlay for dental surgery. IEEE Trans Biomed Eng 2014;61(4):1295–304, http://dx.doi.org/10.1109/TBME.2014.2301191.

13

[28] Engel K, Kraus M, Ertl T. High-quality pre-integrated volume rendering using hardware-accelerated pixel shading. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on graphics hardware, HWWS’01. New York, NY, USA: ACM; 2001. p. 9–16, http://dx.doi.org/10.1145/383507.383515. [29] Fitzpatrick J, West J. The distribution of target registration error in rigidbody point-based registration. IEEE Trans Med Imaging 2001;20(9):917–27, http://dx.doi.org/10.1109/42.952729.


Inattentional blindness increased with augmented reality surgical navigation.

Augmented Reality Imaging System: 3D Viewing of a Breast Cancer.

Augmented Reality and Wearable Technology in Image-guided Navigation and Preoperative Planning.

Augmented reality for endoscopic sinus surgery with surgical navigation: a cadaver study.

Augmented reality-based navigation system for wrist arthroscopy: feasibility.

Augmented, Reality-Enhanced Navigation for Extracranial-Intracranial Bypass.

Navigation surgery using an augmented reality for pancreatectomy.

Augmented reality-based navigation system for wrist arthroscopy: feasibility.

Augmented reality navigation with automatic marker-free image registration using 3-D image overlay for dental surgery.

FlyAR: augmented reality supported micro aerial vehicle navigation.

Video-guided calibration of an augmented reality mobile C-arm.

Machine learning-based augmented reality for improved surgical scene understanding.

Vision-based endoscope tracking for 3D ultrasound image-guided surgical navigation.

Augmented reality visualization of deformable tubular structures for surgical simulation.

A Novel Augmented Reality Navigation System for Endoscopic Sinus and Skull Base Surgery: A Feasibility Study.

An augmented reality navigation system for pediatric oncologic surgery based on preoperative CT and MRI images.

On-demand calibration and evaluation for electromagnetically tracked laparoscope in augmented reality visualization.

Comparison of 3D C-arm fluoroscopy and 3D image-guided navigation for minimally invasive pelvic surgery.

Development of a surgical navigation system based on augmented reality using an optical see-through head-mounted display.

3D image fusion.

Augmented reality based real-time subcutaneous vein imaging system.

Application of single-image camera calibration for ultrasound augmented laparoscopic visualization.

Navigation tools for viewing augmented CAD models.

D3D augmented reality imaging system: proof of concept in mammography.