sensors Article

Empirical Study on Designing of Gaze Tracking Camera Based on the Information of User’s Head Movement Weiyuan Pan, Dongwook Jung, Hyo Sik Yoon, Dong Eun Lee, Rizwan Ali Naqvi, Kwan Woo Lee and Kang Ryoung Park * Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 100-715, Korea; [email protected] (W.P.); [email protected] (D.J.); [email protected] (H.S.Y.); [email protected] (D.E.L.); [email protected] (R.A.N.); [email protected] (K.W.L.) * Correspondence: [email protected]; Tel.: +82-10-3111-7022; Fax: +82-2-2277-8735 Academic Editor: Vittorio M. N. Passaro Received: 17 June 2016; Accepted: 25 August 2016; Published: 31 August 2016

Abstract: Gaze tracking is the technology that identifies a region in space that a user is looking at. Most previous non-wearable gaze tracking systems use a near-infrared (NIR) light camera with an NIR illuminator. Based on the kind of camera lens used, the viewing angle and depth-of-field (DOF) of a gaze tracking camera can be different, which affects the performance of the gaze tracking system. Nevertheless, to our best knowledge, most previous researches implemented gaze tracking cameras without ground truth information for determining the optimal viewing angle and DOF of the camera lens. Eye-tracker manufacturers might also use ground truth information, but they do not provide this in public. Therefore, researchers and developers of gaze tracking systems cannot refer to such information for implementing gaze tracking system. We address this problem providing an empirical study in which we design an optimal gaze tracking camera based on experimental measurements of the amount and velocity of user’s head movements. Based on our results and analyses, researchers and developers might be able to more easily implement an optimal gaze tracking system. Experimental results show that our gaze tracking system shows high performance in terms of accuracy, user convenience and interest. Keywords: gaze tracking; optimal viewing angle and DOF of camera lens; empirical study; accuracy; user convenience and interest

1. Introduction Gaze tracking has recently become a very active research field that has applications in many different fields, including human computer interfaces (HCI), virtual reality (VR), driver monitoring, eye disease diagnosis, and intelligent machine interfaces. In detail, various applications using gaze tracking system are shown such as user–computer dialogs in desktop environment [1], head-mounted gaze tracking for the study of visual behavior in unconstrained real world [2–5], and for the study of oculomotor characteristics and abnormalities in addition to the input devices for HCI [6], eye-typing system for the user with severe motor impairments [7], and HCI in desktop environment [8–20]. Gaze tracking systems can find the position a user is looking at by using image processing and computer vision technologies. Various kinds of gaze tracking devices, wearable and non-wearable and with single or multiple cameras [1–15] have been researched. Most non-wearable gaze tracking systems are designed for computer users in a desktop environment using a near-infrared (NIR) camera and an NIR illuminator.

Sensors 2016, 16, 1396; doi:10.3390/s16091396

www.mdpi.com/journal/sensors

Sensors 2016, 16, 1396

2 of 22

2. Related Works When a gaze tracking system is working properly in a desktop environment, focused images of the user’s eyes should be captured by the gaze tracking camera. Therefore, it is critical to determine the appropriate viewing angle and depth-of-field (DOF) of the gaze tracking camera. Here, DOF indicates the range of z-distances between the user’s eyes and the camera where the camera can acquire a focused eye image. The viewing angle of a gaze tracking camera should not be very small or very large, as this reduces the system’s performance. If the gaze tracking system adopts a camera lens with a small viewing angle, the eye size in the image may be large, which guarantees accurate detection of the pupil and corneal specular reflection (SR). However, the user’s head movement becomes limited, which can degrade the convenience of the system for the user. Conversely, a camera lens with a large viewing angle allows for large head movements by the user, enhancing the convenience of using the system. However, in this case, the size of the eye region in the captured image decreases, which may degrade the detection accuracy of the pupil and corneal SR, and consequently, the gaze detection accuracy. In addition, when using a lens with a small DOF, the eye region in the captured image is easily blurred by movement of the user’s head in the z-direction (approaching or receding from the camera). Blurred eye images can reduce pupil and corneal SR detection accuracies, and thus, the overall gaze detection accuracy. In order to increase the DOF, a lens with a large f -number should be used. In general, f -number is inversely proportional to the iris size of the lens, or the lens diameter [21]. Therefore, a lens with a large f -number has a small diameter, which decreases the brightness of captured images, and the eye region in the image can become too dark to be useful for gaze detection. Therefore, determining the optimal viewing angle and DOF of lenses designed for gaze tracking cameras is a difficult procedure and a prerequisite for realizing high-performance gaze tracking systems. Nevertheless, most research has focused on algorithm design for improving the accuracy of gaze tracking systems in situations with limited head movement; therefore, previous work implemented gaze tracking cameras without ground truth information for the optimal viewing angle and DOF of the camera lens [1–20]. Previous research [22] has proposed a long-range gaze tracking system that allows for large head movements. However, this work requires gaze tracking cameras with complicated panning and tilting mechanisms in addition to two wide-view cameras. In addition, this gaze tracking camera is also designed without ground truth information for optimal viewing angle and DOF of the camera lens. There have been commercial gaze trackers [23,24]. However, they did not present ground truth information (the experimental measurements of the amount and velocity of user’s head movements) in public, either, although this information is necessary for designing an optimal gaze tracking camera. Therefore, we address this problem providing an empirical study to design an optimal gaze tracking camera based on experimental measurements of the amount and velocity of user’s head movements. Compared to previous works, our research is novel in three ways. 1.

2.

3.

To our best knowledge, most previous researches implemented gaze tracking cameras without a ground truth information for determining the optimal viewing angle and DOF of the camera lens [1–20,22]. Eye-tracker manufacturers might also use ground truth information, but they do not provide this in public [23,24]. Therefore, we address this problem providing the empirical research for designing an optimal gaze tracking camera based on experimental measurement of the amount and velocity of user’s head movements. Accurate measurements of the amount and velocity of user’s head movements are made using a web-camera and an ultrasonic sensor while each user plays a game, surfs the web, types, and watches a movie in a desktop environment. Based on the amount and velocity of user head movements, we determine the optimal viewing angle and DOF of the camera lens. We design a gaze tracking system with such a lens and the accuracy and user convenience of our gaze tracking method is compared with those of previous methods.

2016, 16, 1396 1396 Sensors 2016,

33 of of 22 22

Recently, various depth-cameras have been commercialized, and Kinect (versions 1 and 2) Recently, depth-cameras have2.0”) beenhave commercialized, anddue Kinect (versions 1 and of 2) sensors (namedvarious as “Kinect 1.0” and “Kinect been widely used to their superiorities sensors (named as “Kinect 1.0” and “Kinect 2.0”) have been widely used due to their superiorities of performance. However, the minimum z-distance that can be measured by Kinect 1.0 is about 70 cm performance. the1.0 minimum canfrom be measured by Kinect 1.0 is about 70 cm between objectHowever, and Kinect [25]. The z-distance z-distance that ranges 40 to 100 cm when people are usually between object and Kinect 1.0 [25]. The z-distance ranges from 40 to 100 cm when people are usually doing various tasks in desktop computer environments. Therefore, accurate z-distance in the range doing40 various tasks in desktop computer environments. accurate z-distance in theKinect range from to 70 cm cannot be measured by Kinect 1.0 in ourTherefore, experiments. In addition, because from 40 to 70the cm depth cannotmap be measured by Kinect 1.0 inthe our experiments. In addition, because Kinect 1.0 1.0 provides of z-distance including areas of head, neck and body, the face region provides the depth map of z-distance including the areas of head, neck and body, the face region which which is used for z-distance measurement should be separated from these areas through further is used for z-distance shouldimage be separated from areas through processing processing using the measurement visible light camera of Kinect 1.0.these In addition, Kinectfurther 1.0 measures the using the visible camera image Kinect 1.0. In addition, 1.0 measures the z-distance based z-distance based light on trigonometry byofusing the patterned lightKinect produced by laser projector which has on trigonometry by using patternedinlight by laser the feature of the feature of straight [25].the Therefore, the produced case that the depthprojector map of which face is has obtained at near straight [25]. Therefore, incm), the case map map of face is obtained z-distance (fromby40the to z-distance (from 40 to 70 the that holesthe indepth the depth occur due to at thenear occlusion of light 70 cm), the holes in the depth map occur due to the occlusion of light by the protruded area of face such protruded area of face such as nose. Thus, the further processing of hole filling should be performed as nose. further processing of hole filling should[26–28]. be performed with the depth map of face for with theThus, depththe map of face for z-distance measurement To overcome these problems, Kinect z-distance measurement [26–28].and To it overcome thesethe problems, Kinect 2.0 is recently commercialized 2.0 is recently commercialized can measure z-distance by using the time-of-flight (TOF) and it can measure the z-distance by using the time-of-flight (TOF) camera. However, Kinect 2.0 shows camera. However, Kinect 2.0 shows the low accuracy (with the large variation) of z-distance the low accuracy the z-distance large variation) measurement at the of measurement at (with the near rangeofofz-distance 50–100 cm, and authors alsonear toldz-distance that less range reliable 50–100 cm,isand authorsby also told 2.0 thatatless z-distancerange is measured by Kinect 2.0 at this near z-distance measured Kinect thisreliable near z-distance [29]. Therefore, it cannot be used z-distance range [29]. Therefore, it cannot be used for z-distance measurement in our experiment. for z-distance measurement in our experiment. devices can canbe beused usedbehind behindthe themonitor monitor screen. Considering z-distance range of Kinect devices screen. Considering the the z-distance range of 40– 40–100 cm in our experiments, Kinect 1.0 should be placed at the position of 30 cm behind the monitor 100 cm in our experiments, Kinect 1.0 should be placed at the position of 30 cm behind the monitor In the the case case of of Kinect Kinect 2.0, 2.0, itit should should be be placed placed at at the the position position of of at least 60 cm behind the screen. In In this this case, case, Kinect Kinect device device should should be be placed placed behind behind at at the the upper, left or right position of monitor. In monitor because monitor should not hide the depth camera of Kinect Kinect device. device. Therefore, additional complicated calibration calibration (among (among monitor, monitor, our web-camera, web-camera, and and the the depth depth camera camera of of Kinect Kinect device) device) complicated to to thethe disparities of coordinates of monitor, web-camera, and depth should be beperformed performeddue due disparities of coordinates of monitor, web-camera, andcamera. depth We can consider the method of using the RGB camera of Kinect device. However, the measurement camera. We can consider the method of using the RGB camera of Kinect device. However, the resolution of head movement by the RGB camera becomes smallbecomes due to the z-distance measurement resolution of head movement by the RGB too camera toofarsmall due tobetween the far camera and user, and the distorted image of user’s face can be captured. This is because the Kinect z-distance between camera and user, and the distorted image of user’s face can be captured. This is device isthe placed behind at is theplaced upper,behind left or at right of monitor. because Kinect device the position upper, left or right position of monitor. However, in our system, the position of ultrasonic sensor is close to web-camera, and they are placed in the same plane to monitor screen as shown in Figure 1. Therefore, additional complicated In the the case of SoftKinetic SoftKinetic device device [30], its speed for calibration is not necessary in our experiments. In becomes slower (lower than 15 frames/s) according to the increase z-distance, measuring depth depthdata data becomes slower (lower than 15 frames/s) according to the ofincrease of which cannot measure the natural head movement of users (our system can measure it at the speed z-distance, which cannot measure the natural head movement of users (our system can measure it of at 30 frames/s). the speed of 30 frames/s).

(a)

(b)

Figure 1. Experimental Experimentalenvironment environmentofof our method: conceptual diagram; andexample (b) example of Figure 1. our method: (a) (a) conceptual diagram; and (b) of setup setup of web-camera and ultrasonic sensor on the 19-inch monitor. of web-camera and ultrasonic sensor on the 19-inch monitor.

Sensors 2016, 16, 1396

4 of 22

In addition, the prices of these systems are expensive, i.e., more than $300 in the case of Intel RealSense F200 [31], more than $150 in the case of SoftKinetic device, more than $270 in the case of Kinect 2.0, and about $150–$200 in the case of Kinect 1.0. However, our system (including web-camera and ultrasonic sensor) costs less than $50, and researchers can easily use our system for obtaining the ground-truth information for implementing optimal gaze tracking camera. Polhemus position tracking sensor (Patriot sensor [32]) can be considered for measuring the z-distance of user’s face. However, the sensor should be attached on user’s face during the experiments (performing four tasks (playing a game, surfing the web, typing, and watching a movie), respectively, for about 15 min), which gives much inconvenience to the participants of experiments, and prevents the natural head movement of each user. Table 1 compares the previous and proposed methods. Table 1. Comparison of previous and proposed methods for designing a gaze tracking camera. Category

Method

Advantages

Disadvantage

Without a ground truth information for determining the viewing angle and DOF of the camera lens [1–20,22]

Wearable gaze tracking system [2–5], non-wearable gaze tracking system with a single camera [1,6–10,16,18–20], and multiple cameras [11–15,17,22]

Implementation time is short because the additional procedures of measuring the amount and velocity of user’s head movements are not necessary

Without determining the optimal viewing angle and DOF of the camera lens through empirical study, gaze tracking accuracy can be reduced or user’s head movement can be limited

With a ground truth information for determining the viewing angle and DOF of the camera lens (Proposed method)

The accurate amount and velocity of user’s head movements are measured with a web-camera and ultrasonic sensor for designing a non-wearable gaze tracking system with a single camera

Gaze tracking accuracy can be enhanced without limiting user’s head movements by determining the optimal viewing angle and DOF of the camera lens

Additional procedures for measuring the amount and velocity of user’s head movements are required

The remaining content of this paper is organized as follows: In Section 3, the proposed system and method for measuring the amount and velocity of a user’s head movement are described. In Section 4, experimental results are presented and the performance of our gaze tracking method is evaluated. Finally, the conclusions of this paper are presented in Section 5. 3. Proposed System and Method for Determining the Optimal Viewing Angle and DOF of a Camera Lens 3.1. Experimental Environment and Overall Procedures of the Proposed Method This section gives an overview of our process for detecting a user’s head movement information. The experimental environment of our method is shown in Figure 1. The web-camera and the ultrasonic sensor were placed in the same plane as shown in the right image of Figure 1b. As shown in Figures 1 and 2, each participant in our experiments performs four assigned tasks that are common for desktop computer users: playing a game, surfing the web, typing, and watching a movie. While each participant performs each task for about 15 min (for a total of about 60 min of participation per user), images of the participant and the z-distance (the distance between the user and the gaze tracking system) are acquired by web-camera and ultrasonic sensor, respectively. User’s head position in an image is located using adaptive boosting (AdaBoost) method [33,34]. The x- and y-axis coordinates of the user’s head in 3D space are measured based on a pinhole camera model [35] and the z-distance measured by the ultrasonic sensor [36]. With this information, the amount of user’s head movements in 3D space can be measured while the user is performing the four tasks. To measure the velocity of user’s head movements, the acquisition time of each image frame is recorded. The velocity of the user’s head movement in 3D space can be calculated with the user’s head position and the acquisition time of each image frame.

Sensors 2016, 16, 1396

5 of 22

of each image frame Sensors 2016, 16, 1396

is recorded. The velocity of the user’s head movement in 3D space can 5 ofbe 22 calculated with the user’s head position and the acquisition time of each image frame.

Figure method. Figure 2. 2. Overall Overall procedure procedure of of the the proposed proposed method.

With measurements of the amount and velocity of user’s head movements in 3D space, the With measurements of the amount and velocity of user’s head movements in 3D space, minimum viewing angle of the camera lens needed for our gaze tracking system can be calculated the minimum viewing angle of the camera lens needed for our gaze tracking system can be calculated based on the maximum amount of user’s head movements in the x- and y-axes. In addition, the based on the maximum amount of user’s head movements in the x- and y-axes. In addition, the optimal optimal DOF of the camera lens for our gaze tracking camera can be determined based on the DOF of the camera lens for our gaze tracking camera can be determined based on the measured measured z-distances. With this information, we design a lens for our gaze tracking camera. In z-distances. With this information, we design a lens for our gaze tracking camera. In addition, addition, we evaluate the accuracy, user convenience and interest of our gaze tracking system we evaluate the accuracy, user convenience and interest of our gaze tracking system compared to those compared to those using other camera lenses and a commercial gaze tracking system. using other camera lenses and a commercial gaze tracking system. 3.2. Head by by Ultrasonic Ultrasonic Sensor Sensor 3.2. Acquisition Acquisition of of z-Distance z-Distance of of the the User’s User’s Head The z-distance, the distance between tracking system system and and the the user’s user’s face, measured The z-distance, the distance between the the gaze gaze tracking face, is is measured using SRF04 ultrasonic ultrasonic sensor installed under under the the web-camera web-camera as as indicated indicated in in Figure Figure 1. 1. The The using an an SRF04 sensor [36] [36] installed sensor is composed of two parts—the receiver and transmitter—and a control board. The receiver sensor is composed of two parts—the receiver and transmitter—and a control board. The receiver and and transmitter are connected to the board controlvia board via an I2Cand interface and board the control board to is transmitter are connected to the control an I2C interface the control is connected connected to a desktop computerserial via abus universal bus Therefore, (USB) 2.0 the interface. Therefore, the a desktop computer via a universal (USB) 2.0serial interface. z-distance is measured z-distance is measured by the receiver and is seamlessly transmitted to the desktop computer by the by the receiver and is seamlessly transmitted to the desktop computer by the control board at 40 kHz, control board 40 kHz, a frequency the z-distance beofmeasured even in the case of a frequency at at which the z-distance canatbewhich measured even in thecan case rapid head movements. rapidThe head movements. measurable range of the ultrasonic sensor is from 3 cm to 3 m. The cone angle (θ in Figure 3) The measurable of the ultrasonic sensor from 3 cmintoFigure 3 m. The cone angle (θofina Figure ◦ . Asis of the ultrasonic beamrange is from about 80◦ to 140 illustrated 3, the z-distance user’s 3) of the ultrasonicusing beama is from aboutpulse 80° to illustrated in Figure 3, theisz-distance a head is measured transmitted of 140°. soundAswhose frequency (40 KHz) outside ofof the user’s head is measured using a transmitted pulse of sound whose frequency (40 KHz) is outside of audible range (about 20 Hz–20 KHz [37]) of humans. The pulse of ultrasonic sound is generated by the audible range (about 20 Hz–20 KHz [37]) of humans. The pulse of ultrasonic sound is generated the transmitter and reflected back from the surface of user’s face in the path of the ultrasonic wave to by the transmitter and reflected back from the surface of user’s face in the path of the ultrasonic the transducer. The time between transmission and receipt of the wave can be measured and used to wave to the transducer. The time between transmission and receipt of the wave can be measured calculate the z-distance. and used to calculate the z-distance. Before the experiments for measuring z-distance of user’s face, we measured the accuracy of Before the experiments for measuring z-distance of user’s face, we measured the accuracy of ultrasonic sensor with calibration board and actual faces of five people, respectively. When using ultrasonic sensor with calibration board and actual faces of five people, respectively. When using the calibration board (perpendicular to the ultrasonic sensor) according to the z-distance from 40 to the calibration board (perpendicular to the ultrasonic sensor) according to the z-distance from 40 to 100 cm (from the ultrasonic sensor), the error between ground-truth z-distance and measured one 100 cm (from the ultrasonic sensor), the error between ground-truth z-distance and measured one by the sensor was less than 1 cm. Here, the ground-truth z-distance was measured by laser-distance by the sensor was less than 1 cm. Here, the ground-truth z-distance was measured by laser-distance measurement device [38]. In addition, with the actual faces of five people (who did not participate measurement device [38]. In addition, with the actual faces of five people (who did not participate in the experiments of Section 4) according to the z-distance from 40 to 100 cm, the error between in the experiments of Section 4) according to the z-distance from 40 to 100 cm, the error between ground-truth and measured z-distances was about 2 cm. It means that the z-distance measured by the ground-truth and measured z-distances was about 2 cm. It means that the z-distance measured by

Sensors 2016, 16, 1396 Sensors 2016, 16, 1396

6 of 22

6 of 22

the ultrasonic sensor is smaller by 2 cm than the ground-truth z-distance in most cases. Therefore, ultrasonic sensor is smaller by 2 cm than the ground-truth z-distance in most cases. Therefore, we used we used the compensated z-distance (zcomp) based on the Equation (1) in our experiments of Section the compensated z-distance (zcomp ) based on the Equation (1) in our experiments of Section 4. 4. Sensors 2016, 16, 1396

zcomp ==z ++22

6 of 22

(1)(1)

the ultrasonic sensor is smaller by 2 cm than the ground-truth z-distance in most cases. Therefore, we used the compensated z-distance (zcomp) based on the Equation (1) in our experiments of Section 4. =

+2

(1)

Figure 3. The measurement of z-distance with an ultrasonic sensor. Figure 3. The measurement of z-distance with an ultrasonic sensor.

If the extended arm includes the beam range of 80° to 140°, incorrect z-distance can be ◦ to 140◦ , incorrect z-distance can be measured. If the extended includes the beamwe range of 80 measured. In orderarm to solve this problem, set the beam range so as to include only the user’s head Figure 3. The measurement of z-distance with an ultrasonic sensor. In (not orderincluding to solve this problem, the beam range so adjusting as to include the user’s head (not including other parts we of set body) by manually theonly direction of ultrasonic sensor according to the sitting heights of user. During this adjustment, we told the participant to keep their other parts of body) by manually adjusting the direction of ultrasonic sensor according to the sitting If the extended arm includes the beam range of 80° to 140°, incorrect z-distance can be head still, even though it takes little time (less than 10 s). From that, the other parts of body (even heights of user.InDuring adjustment, wewe told participant their head still, even though measured. order tothis solve this problem, set the the beam range sotoaskeep to include only the user’s head with the extended arms forward) were not included inadjusting the beam range. This step ofthe adjustment was it takes little time (less than 10 of s). body) From that, the other parts ofthe body (even with extended arms (not including other parts by manually direction of ultrasonic sensor performed just one time (per each participant) before the experiments, and each participant can forward) were not included in the beam range. This step of adjustment was performed just one according to the sitting heights of user. During this adjustment, we told the participant to keep their time freely move or though herbefore head after this conclusion, during the experiments, weofdid not tell head still, his even it takes littlestep. timeBy (less thaneach 10 s). From that, the other body (per each participant) the experiments, and participant can freelyparts move his or(even herthe head participants to keep their head still, and they can freely move his or her head. the extended arms forward) were included in the range. of adjustment was their after with this step. By conclusion, during thenot experiments, webeam did not tellThis the step participants to keep Nevertheless, in the case that the user raises hand to his face, incorrect z-distance from just can one time (per each beforeclose the experiments, and each participant canthe head performed still, and they freely move hisparticipant) or her head. freely his or her head after this step. By incorrect conclusion, during the experiments, didposition not tell the hand canmove be measured, which measurement of user’s we head and Nevertheless, in the case that causes the userthe raises hand close to his face, incorrect z-distance from the participants keep their head still, and they can freely movethe hisdata or herinhead. movement. Totosolve this problem, we manually deleted this case by observing the hand can be measured, which causes the incorrect measurement of user’s head position and movement. Nevertheless, in in theour caseexperiment. that the user raises hand close to his face, incorrect z-distance from the captured video of user To solve this we manually deleted data inmeasurement this case by observing the captured video of hand canproblem, be measured, which causes thethe incorrect user’sofhead and In addition, we compensated the measured z-distance by the of height nose position (by using the user in our experiment. movement. To solve this problem, we manually deleted the data in this case by observing the manually measured height of nose as the offset value for compensation) for more accurate In addition, weofcompensated the measured z-distance by the height of nose (by using the manually captured video user in our experiment. measurement of z-distance of user’s face. measuredInheight of nose as the offsetthe value for compensation) for height more accurate measurement addition, we compensated measured z-distance by the of nose (by using the of manually measured height of nose as the offset value for compensation) for more accurate z-distance of user’s face. 3.3. Acquisition of User Face Images and Measurement of the Amount of Head Movement measurement of z-distance of user’s face.

An unconstrained environmentof the is required measuring the natural head 3.3. Acquisition of User Faceexperimental Images and Measurement Amount offor Head Movement movements of users of user’sof the faces wereofcaptured at 30 frames/s using a 3.3. Acquisition of User[33]. Face Thus, Images images and Measurement Amount Head Movement An unconstrained experimental environment is required forofmeasuring natural head web-camera (Logitech C600 [39]) installed at the top-center position the monitorthe as indicated in An unconstrained experimental environment is required for measuring the natural head movements of users [33]. Thus, images of user’s faces were captured at 30 frames/s using a web-camera Figure 1. An example of a captured image is shown in Figure 4. The size of the image is 1600 × 1200 movements of users [33]. Thus, images of user’s faces were captured at 30 frames/s using a (Logitech C600 [39]) installed at with the top-center position of the monitor as indicated in Figure 1. pixels and the image is captured RGB colors three channels. web-camera (Logitech C600 [39]) installed at the in top-center position of the monitor as indicated in An example of a captured image is shown in Figure 4. The size of the image is 1600 × 1200 pixels and Figure 1. An example of a captured image is shown in Figure 4. The size of the image is 1600 × 1200 the image is captured with RGB colors in three channels. pixels and the image is captured with RGB colors in three channels.

Figure 4. Example of a captured image. Figure 4. Example of a captured image.

Figure 4. Example of a captured image.

Sensors 2016, 16, 1396

7 of 22

Sensors 2016, 16, 1396 Sensors 2016, 16, 1396

7 of 22 7 of 22

It to to directly measure the user’s headhead position in 3D in space; as a firstas step, we It isisnot notsimple simple directly measure the user’s position 3D therefore space; therefore a first detect the user’s face position using AdaBoost, implemented using the OpenCV library (version 2.4.2, It is not simple to directly measure the user’s head position in 3D space; therefore as a first step, we detect the user’s face position using AdaBoost, implemented using the OpenCV library Intel Corporation, Clara, USA using [40]), in the USA acquired detected using the step, we 2.4.2, detectIntel theSanta user’s face CA, position AdaBoost, implemented usingare the OpenCV library (version Corporation, Santa Clara, CA, [40]),images. in the Faces acquired images. Faces are provided Haar classifier cascade with a scale factor of 1.2. The face detection results by AdaBoost (version 2.4.2, Intel Corporation, Santa Clara, CA, USA [40]), in the acquired images. Faces are detected using the provided Haar classifier cascade with a scale factor of 1.2. The face detection method were manually checked again, and incorrectly detected results were measuring detected using the provided Haar classifier cascade a scale factor ofexcluded 1.2. The for face detection results by AdaBoost method were manually checkedwith again, and incorrectly detected results were accurate head movement in our experiments. From that, we can enhance the reliability of measured results by AdaBoost method were manually checked again, and incorrectly detected results were excluded for measuring accurate head movement in our experiments. From that, we can enhance the head movement in our experiments. Then, position 3D space based on a pinhole excluded for measuring accurate head movement in ourhead experiments. From that,head we can enhance reliability of measured head movement in we ourcalculate experiments. Then, wein calculate position in the 3D camera model and the z-distance measured by the ultrasonic sensor.weby reliability of on measured head movement our experiments. Then, calculate head position space based a pinhole camera model in and the z-distance measured the ultrasonic sensor. in 3D example region shown in Figure 5. The position of the face region spaceAn based on a of pinhole cameraface model andisthe by the ultrasonic sensor. An example ofthe thedetected detected face region is z-distance shown in measured Figure 5. center The center position of the face is used as the user’s head position in image. With the head position information in continuous image An example of the detected face region is shown in Figure 5. The center position of the face region is used as the user’s head position in image. With the head position information in frames, we can measure the amount of head movement in images. region is used as the user’s head position in image. With the head position information in continuous image frames, we can measure the amount of head movement in images. continuous image frames, we can measure the amount of head movement in images.

Figure 5. Example of the detected face region (z-distance: 61 cm, center coordinate of face region: Figure 5. Example of the detected face region (z-distance: 61 cm, center coordinate of face region: Figure 5. Example of the detected face region (z-distance: 61 cm, center coordinate of face region: (859, 627)). 627)). (859, (859, 627)).

The amount of user’s head movements in 3D space can then be calculated based on a pinhole The amount user’s head movements movements inthe 3Dultrasonic space can cansensor. then calculated on aa pinhole Themodel amount ofthe user’s head in 3D space then be beThe calculated based onmodel pinhole camera andof z-distance measured by pinhole based camera is as camera model and the z-distance measured by the ultrasonic sensor. The pinhole camera model is as camera model and the z-distance measured by the ultrasonic sensor. The pinhole camera model is as follows. As illustrated in Figure 6, the amount of head movement in image sensor (l) is measured by follows. As illustrated in Figure 6, the amount of head movement in image sensor (l) is measured by follows. As illustrated in Figure 6, the amount of head movement in image sensor (l) is measured by the movement of face box detected by AdaBoost method in input images. the the movement movement of of face face box box detected detected by by AdaBoost AdaBoost method method in in input input images. images.

Figure 6. Measuring the amount of head movement in 3D space based on pinhole camera model and Figure 6. head movement in in 3D3D space based on pinhole camera model and z-distance. Figure 6. Measuring Measuringthe theamount amountofof head movement space based on pinhole camera model z-distance. and z-distance.

With the focal length of the camera lens f, which is obtained by camera calibration, we can With focaloflength of the camera f, which is obtained byon camera calibration, we The can obtain the the amount head movement in 3Dlens space (L of Figure 6) based Equations (2) and (3). With the focal length of the camera lens f, which is obtained by camera calibration, we can obtain obtain the amount of head in using 3D space of Figuresensor 6) based on Equations (2) and3.2. (3). The z-distance of the user’s facemovement is measured the (L ultrasonic as explained in Section the amount of head movement in 3D space (L of Figure 6) based on Equations (2) and (3). The z-distance z-distance of the user’s face is measured using the ultrasonic sensor as explained in Section 3.2. of the user’s face is measured using the ultrasonic sensor as explained in Section 3.2. = (2) = (2) L l = (2) z f× = × (3) = (3)

Sensors 2016, 16, 1396

8 of 22

L=

l×z f

(3)

The focal length of the camera lens f is measured as 1362 pixels and the amount of head movement in the image sensor (l) is measured from the center point of the face region in pixels. In addition, the z-distances measured by the ultrasonic sensor are in centimeters. Thus, the amount of user’s head movements in 3D space is also measured in centimeters based on Equation (3). From this and the maximum amount of user’s head movements in 3D space measured while performing four tasks (playing a game, surfing the web, typing, and watching a movie), the range of camera viewing angles (A) can be calculated using Equation (4). A = 2tan−1

| L| z

(4)

3.4. Measurement of Head Movement Velocity After measuring the amount of head movement in images and 3D space, as explained in previous section, we can also obtain the velocity of user’s head movement in images and 3D space, respectively. As depicted in Equation (5), user’s head movement velocity in images (vimage ) is calculated based on the amount of head movement in images (l) and the time (t) of each image acquisition. vimage = v3Dspace =

l t

(5)

L t

(6)

In addition, user’s head movement velocity in 3D space (v3Dspace ) can be calculated using the amount of head movement in 3D space (L) and the time (t) of each image acquisition, as presented in Equation (6). ( x −µ) 2 K − f ( x ) = √ e 2σ2 (7) σ 2π In order to remove outliers from the measured velocity data, we use a Gaussian function as presented in Equation (7), which is used for graph fitting of the measured data. In Equation (7), µ and σ are the mean and standard deviation of the distribution, respectively. K is the amplitude factor. 4. Experimental Results 4.1. Experimental Environment and Description of Database Ten participants performed four tasks (playing a game, surfing the web, typing, and watching a movie [41]) for about 15 min in a desktop environment. During experiments, successive images were captured at 30 frames/s using a web-camera (Logitech C600 [39]); the total number of images captured is 110,129. Detailed descriptions of the collected images are shown in Table 2. Experiments were carried out with a desktop computer including an Intel Core2 Quad Q6600 CPU of 2.4 GHz and 4 GB RAM with a 19-inch monitor. The program was implemented in C++ with OpenCV library (version 2.4.2). Table 2. The numbers of images during experiments. Tasks

Numbers of Images

Playing game Surfing the web Typing Watching a movie Total

28,579 28,742 26,529 26,279 110,129

Sensors 2016, 16, 1396 Sensors 2016, 16, 1396

9 of 22 9 of 22

Analysisofofthe thez-Distance z-DistanceofofUser’s User’sHead Head 4.2.4.2. Analysis Withthe theimages images collected, collected, the movement in the direction is analyzed. As With the amount amountofofhead head movement in zthe z direction is analyzed. explained in Section 3.2, we acquired z-distance data from an ultrasonic sensor for each image As explained in Section 3.2, we acquired z-distance data from an ultrasonic sensor for each image acquisition; a total of 110,129 z-distance data points were obtained. Most of the z-distances range acquisition; a total of 110,129 z-distance data points were obtained. Most of the z-distances range from from 40 to 80 cm, as shown in Figure 7 with a range interval of 10 cm. The probability of z-distance 40 to 80 cm, as shown in Figure 7 with a range interval of 10 cm. The probability of z-distance of Figure 7 of Figure 7 is that the frequency of z-distance is represented as probability. For example, assuming is that the frequency of z-distance is represented as probability. For example, assuming that all the that all the measured frequencies (z-distance) are 2 (40–49 cm), 5 (50–59 cm), and 3 (60–69 cm), measured frequencies (z-distance) are 2 (40–49 cm), 5 (50–59 cm), and 3 (60–69 cm), respectively. Then, respectively. Then, the probabilities (z-distance) are 20% (=2/10) (40–49 cm), 50% (=5/10) (50–59 cm), theand probabilities 20% (=2/10) (40–49 cm), 50% (=5/10) to (50–59 cm), and (=3/10) 30% (=3/10)(z-distance) (60–69 cm), are respectively. These probabilities according z-distances are30% shown in (60–69 cm), respectively. These probabilities according to z-distances are shown in Figure 7. Figure 7.

(a)

(b)

(c)

(d)

(e) Figure 7. z-distance distributions of subjects measured by ultrasonic sensor when subjects were: (a) Figure 7. z-distance distributions of subjects measured by ultrasonic sensor when subjects were: playing a game; (b) surfing the web; (c) typing; and (d) watching a movie. (e) The average (a) playing a game; (b) surfing the web; (c) typing; and (d) watching a movie. (e) The average distribution of all the z-distances from (a) to (d). distribution of all the z-distances from (a) to (d).

As shown in Figure 7a, when subjects were playing a game, the most frequent z-distances were in the range of 60 cm to 69 cm (56.48%), and almost 100% of z-distances were in the range of 40 cm

Sensors 2016, 16, 1396 Sensors 2016, 16, 1396

10 of 22

10 of 22

As shown in Figure 7a, when subjects were playing a game, the most frequent z-distances were to 79 cm. As presented in Figure 7b, when subjects were surfing the web, the two ranges of 50 cm to in the range of 60 cm to 69 cm (56.48%), and almost 100% of z-distances were in the range of 40 cm 59 cm and 60 cm to 69 cm contained a total of 69.93% of z-distance distribution. In addition, 99.83% of to 79 cm. As presented in Figure 7b, when subjects were surfing the web, the two ranges of 50 cm to z-distances were in the range of 40 cm to 79 cm. 59 cm and 60 cm to 69 cm contained a total of 69.93% of z-distance distribution. In addition, 99.83% subjects were typing, in 79 Figure of When z-distances were in the rangeas ofshown 40 cm to cm. 7c, the most frequent z-distances were in the range of 60 cmWhen to 69 subjects cm (42.27%), and 99.72% of z-distances were range of 40 cm to 79 cm.were As presented were typing, as shown in Figure 7c, in thethe most frequent z-distances in the in range Figureof7d, when subjects were watching a movie, the most frequent z-distances were 70 cm to 60 cm to 69 cm (42.27%), and 99.72% of z-distances were in the range of 40 cm to 79 cm.79 Ascm (48.5%). 94.78% of z-distances range from 40 cm to 79 cm. presented in Figure 7d, when subjects were watching a movie, the most frequent z-distances were did the initial of the participants’ 70 We cm to 79not cm set (48.5%). 94.78%z-distance of z-distances range from 40 cmhead to 79 by cm.force. Instead, we requested them toWe sitdid in front of the monitor where wantInstead, to be. The reason why not set initial naturally z-distanceatofthe thez-distance participants’ headthey by force. we requested thethem measured of Figure 7 areatdifferent for each taskthey is due task. to sit inz-distances front of monitor naturally the z-distance where wanttotothe be.characteristics The reason whyofthe of Figure arethrough differentFigure for each is due to thewhen characteristics of task. For Formeasured example,z-distances by comparing Figure7 7a 7d,task the z-distances subjects were watching example, by the comparing Figure 7a through 7d, the z-distances when to subjects watching a a a movie were largest of all cases. This isFigure because subjects tended watchwere the movie from moviedistance. were theFigure largest7eofshows all cases. This is because the subjects tended to watch the movie7afrom a greater the average distribution of all the z-distances from Figure through greater distance. Figure 7e shows the average distribution of all the z-distances from Figure 7a Figure 7d. The most frequent z-distances range from 60 cm to 69 cm (39.64%) and the range from 40 cm Figure 7d. The most frequent z-distances range from 60 cm to 69 cm (39.64%) and the range to through 79 cm included 98.58% of z-distances. from 40 cm to if 79the cm DOF included 98.58% of z-distances. Therefore, of the camera for gaze tracking could support the range from 40 cm to Therefore, if the DOF of the camera tracking However, could support the range fromthe 40DOF cm toof79the 79 cm, about 98.58% of user’s z-distances for cangaze be covered. in order to make cm, about 98.58% of user’s z-distances can be covered. However, in order to make the DOF of the camera lens cover this wide range of 40 cm (40 cm to 79 cm), the lens diameter will be greatly reduced, camera lens cover this wide range of 40 cm (40 cm to 79 cm), the lens diameter will be greatly which causes the image to be darker [21], and will make pupil detection more difficult. Therefore, reduced, which causes the image to be darker [21], and will make pupil detection more difficult. in our research, the lens DOF for gaze tracking camera was designed to work in the range from 50 cm Therefore, in our research, the lens DOF for gaze tracking camera was designed to work in the to 80 cm, based on the results shown in Figure 7e, about 88.3% of user’s z-distances can be covered by range from 50 cm to 80 cm, based on the results shown in Figure 7e, about 88.3% of user’s ourz-distances gaze tracking camera. can be covered by our gaze tracking camera.

4.3. Analysis of the Amount of User’s Head Movement on the x- and y-Axes 4.3. Analysis of the Amount of User’s Head Movement on the x- and y-Axes The distributions of subjects’ head movements on the x- (horizontal direction) and y- (vertical The distributions of subjects’ head movements on the x- (horizontal direction) and y- (vertical direction) axes areare presented in Figure 8a through FigureFigure 8d. User’s head movements were acquired direction) axes presented in Figure 8a through 8d. User’s head movements were while users were performing four tasks, as explained in Section 3.3. The origin of Figure acquired while users were performing four tasks, as explained in Section 3.3. The origin8ofmeans Figurethe 8 xand y-position 3Dy-position space) where thespace) camera axis (passing through the lens through center and (sensor) means the x-(of and (of 3D where the camera axis (passing theimage lens center center of web-camera of Figure 1b) is passing through. and image (sensor) center of web-camera of Figure 1b) is passing through.

(a)

(b)

(c)

(d) Figure 8. Cont.

Sensors 2016, 16, 1396

11 of 22

Sensors 2016, 16, 1396

11 of 22

(e) Figure 8. Subjects’ head movements on the x- and y-axes in 3D space when: (a) playing a game; (b) Figure 8. Subjects’ head movements on the x- and y-axes in 3D space when: (a) playing a game; surfing the web; (c) typing; and (d) watching a movie. (e) Total distribution result that contains data (b) surfing the web; (c) typing; and (d) watching a movie. (e) Total distribution result that contains from (a) to (d). data from (a) to (d).

When users were playing a game, the maximum amount of head movement on the x- and When users were playing game, the maximum amount of head the xand y-axes y-axes was 11.65 cm and 11.84acm, respectively, as shown in Figure 8a.movement When usersonwere surfing the was 11.65 and 11.84 cm, respectively, as shown Figure When users were thecm, web, web, thecm maximum amount of head movement onin the x- and8a. y-axes was 19.66 cm surfing and 21.32 as shown in Figure 8b. When typing, the maximum amount of head therespectively, maximum amount of head movement on theusers x- andwere y-axes was 19.66 cm and 21.32 cm, respectively, on the 8b. x- and y-axes was 17.8typing, cm andthe 16.28 cm, respectively, shown in Figureon 8c.the as movement shown in Figure When users were maximum amount ofashead movement When userswas were watching movie, maximum as amount movement onusers the xandwatching y-axes x- and y-axes 17.8 cm anda16.28 cm,the respectively, shownofinhead Figure 8c. When were was 14.1 cmmaximum and 17.1 cm, respectively, asmovement shown in Figure 8d.x- and y-axes was 14.1 cm and 17.1 cm, a movie, the amount of head on the The result of total distribution respectively, as shown in Figure 8d. from Figure 8a to Figure 8d is presented in Figure 8e; the maximum amount of head movement the x-8aand y-axes 8d wasis22.19 cm and The result of total distribution fromon Figure to Figure presented in 21.32 Figurecm, 8e;respectively. the maximum As shown in Figure 6, the angle of camera view can be calculated using the amount amount of head movement on the x- and y-axes was 22.19 cm and 21.32 cm, respectively. of As head shown movement (L) and z-distance (z) with Equation (4), presented in Section 3.3. For example, the angle in Figure 6, the angle of camera view can be calculated using the amount of head movement (L) and of camera view (A) is calculated as 90° in the case that L and z are same as 60 cm based on the z-distance (z) with Equation (4), presented in Section 3.3. For example, the angle of camera view (A) is Equation (4). ◦ calculated as 90 in the case that L and z are same as 60 cm based on the Equation (4). Based on that equation, the calculated maximum angle of camera view required for four cases Based on that equation, the calculated maximum angle of camera view required for four cases were described in Table 3, and the maximum angle of camera view for all the cases was determined were described in Table 3, and the maximum angle of camera view for all the cases was determined as as 20.72° (when subjects were web surfing). Therefore, the lens of our gaze tracking camera was ◦ 20.72 (whensosubjects werethe web surfing). Therefore, of ourall gaze camera designed designed as to cover viewing angle of 20.72°.the In lens addition, the tracking measured resultswas of Figures ◦ . In addition, all the measured results of Figures 7 and 8 are so 7asand to cover the viewing angle of 20.72 8 are summarized in Table 3. summarized in Table 3. Table 3. The summarized results of Figures 7 and 8 (in °). Table 3. The summarized results of Figures 7 and 8 (in ◦ ).

Tasks Tasks

Game Web surfing Game Typing Web surfing Movie Typing watching Movie watching

Maximum Angle of View on Angle x-Axisof Maximum View13.73 on x-Axis 20.72 13.73 19.42 20.72 19.42 16.54 16.54

Maximum Angle of View on y-Axis Maximum Angle of View on y-Axis 17.69 18.84 17.69 19.09 18.84 19.09 20.39 20.39

We then measure the angle of view of three other cameras in addition to the web-camera (Logitech [39]) the withangle a variety of camera We used in three high-speed (Point Grey We thenC600 measure of view of threelenses. other cameras addition to the cameras web-camera (Logitech [42]): the with Gazelle, Grasshopper3, andlenses. Flea3 models. C600 [39]) a variety of camera We used three high-speed cameras (Point Grey [42]): Lenses of focal lengths 9 mm through 50 mm were applied to the four cameras. The angles of the Gazelle, Grasshopper3, and Flea3 models. camera view for each camera and lens are shown in Table 4. In general, with lenses with a smaller Lenses of focal lengths 9 mm through 50 mm were applied to the four cameras. The angles of focal length, the consequent angle of camera view increases, as shown in Table 4. camera view for each camera and lens are shown in Table 4. In general, with lenses with a smaller Based on the above results of Figure 8 and Table 3, the angle of camera view should be at least focal length, the consequent angle of camera view increases, as shown in Table 4. 20.72°. However, if the angle of the camera view is much larger than 20.72°, the camera captures larger area in the image, which means the size of object becomes smaller in the captured image. Consequently, the size of user’s eye is also smaller and the image resolution of user’s eyes in the captured image decreases. With the smaller eye image, the image sizes of pupil and corneal SR also

Sensors 2016, 16, 1396

12 of 22

Based on the above results of Figure 8 and Table 3, the angle of camera view should be at least ◦ . However, if the angle of the camera view is much larger than 20.72◦ , the camera captures larger 20.72 Sensors 2016, 16, 1396 12 of 22 area in the image, which means the size of object becomes smaller in the captured image. Consequently, the size of user’s and eye is also smallerofand the image user’s SR eyesisininevitably the captured image become smaller, the accuracy detecting theresolution pupil andofcorneal decreased. decreases. With the smaller eye image, the image sizes of pupil and corneal SR also become smaller, and Because the gaze position is calculated based on the relative position of pupil center and the center the accuracySR of detecting the pupil and corneal SR is inevitably decreased.accuracy Because the gaze decreased. position is of corneal (see details in Section 4.5), the final gaze detection is also calculated based on the relative position of pupil center and the center of corneal SR (see details in Considering these, we prefer an angle of view similar to 20.72° for our gaze tracking camera. Based Section the final accuracy is also decreased. Considering these, and we prefer an angle on this 4.5), criterion, wegaze can detection confirm that the 25-mm lens applied to the Gazelle Grasshopper3 ◦ for our gaze tracking camera. Based on this criterion, we can confirm that of view similar to 20.72 cameras has a sufficient angle of view for our gaze tracking system. In the case of the Flea3 camera the lens applied to the Gazelle andisGrasshopper3 cameras a sufficient angle of view for and25-mm the C600 web-camera, the 9-mm lens sufficient to cover user’shas head movement. our gaze tracking system. In the case of the Flea3 camera and the C600 web-camera, the 9-mm lens is sufficient to cover user’s Tablehead 4. Themovement. angles of camera view for each camera and lens (in °).

Focal Length Table 4. The angles of camera view for each camera and lens (in ◦ ). Gazelle Grasshopper3 Flea3 C600 Web-Camera of Lens of Lens Gazelle 10.96 Grasshopper3 Flea33.89 C600 Web-Camera 4.51 50 mm Focal Length 10.63 35 mm 16.5 16.83 10.96 6.52 50 mm 10.63 3.896.01 4.51 35 mm 16.5 6.018.58 6.52 25 mm 23.83 24.16 16.83 9.27 25 mm 23.83 8.589.44 9.27 22 mm 25.25 25.59 24.16 10.57 22 mm 25.25 25.59 9.44 10.57 17 mm 32.5 32.85 12.24 13.56 17 mm 32.5 32.85 12.24 13.56 9 mm 61.12 61.47 22.91 25.21 9 mm 61.12 61.47 22.91 25.21 4.4. Analysis of User’s Head Movement Velocity 4.4. Analysis of User’s Head Movement Velocity Subjects’ head movement velocity was calculated using Equation (5). Velocities in successive Subjects’ head movement velocity was calculated using Equation (5). Velocities in successive captured images and in 3D space are shown in Figures 9 and 10, respectively. The probabilities of captured images and in 3D space are shown in Figures 9 and 10, respectively. The probabilities of velocities in the images are shown in Figure 9a through Figure 9d for subjects playing a game, velocities in the images are shown in Figure 9a through Figure 9d for subjects playing a game, surfing surfing the web, typing, and watching a movie, respectively. The average distribution contained in the web, typing, and watching a movie, respectively. The average distribution contained in Figure 9a Figure 9a through Figure 9d is also shown in Figure 9e. Figure 10 shows the amount (magnitude) of through Figure 9d is also shown in Figure 9e. Figure 10 shows the amount (magnitude) of velocity of velocity of user’s head movement in 3D space (not 3D plot), whereas Figure 9 shows that in 2D user’s head movement in 3D space (not 3D plot), whereas Figure 9 shows that in 2D image. Therefore, image. Therefore, the unit of x-axis of Figure 10 is cm/s whereas that of Figure 9 is pixel/s. the unit of x-axis of Figure 10 is cm/s whereas that of Figure 9 is pixel/s.

(a)

(b) Figure 9. Cont.

Sensors 2016, 16, 1396 Sensors 2016, 16, 1396 Sensors 2016, 16, 1396

13 of 22 13 of 22 13 of 22

(c) (c)

(d) (d)

(e) (e) Figure 9. Subjects’ Subjects’ head movement movement velocities in in successive captured captured images while: while: (a) playing playing a Figure Figure 9. 9. Subjects’ head head movement velocities velocities in successive successive captured images images while: (a) (a) playing aa game; (b) surfing surfing the web; web; (c) typing; typing; and (d) (d) watching aa movie; movie; (e) The The average velocity velocity distribution game; game; (b) (b) surfing the the web; (c) (c) typing; and and (d) watching watching a movie; (e) (e) The average average velocity distribution distribution from (a) to (d). from (a) to (d). from (a) to (d).

(a) (a)

(b) (b) Figure 10. Cont.

Sensors 2016, 16, 1396 Sensors 2016, 16, 1396

14 of 22 14 of 22

(c)

(d)

(e) Figure 10. 10. Subjects’ Subjects’ head head movement movement velocities velocities in in 3D 3D space space while: while: (a) (a) playing playing aa game; game; (b) (b) surfing surfing the the Figure web; (c) (c) typing; typing; and and (d) (d) watching watching aa movie; movie; (e) (e) The The average average velocity velocity distribution distributionfrom from(a) (a)to to(d). (d). web;

Based on the pinhole camera model in Section 3.3, the velocity of subjects’ head movement in Based on the pinhole camera model in Section 3.3, the velocity of subjects’ head movement in 3D 3D space can be calculated. The probabilities of velocities in 3D space are shown in Figure 10a space can be calculated. The probabilities of velocities in 3D space are shown in Figure 10a through through Figure 10d as subjects performed four tasks. The average distribution contained in Figure Figure 10d as subjects performed four tasks. The average distribution contained in Figure 10a through 10a through Figure 10d is also shown in Figure 10e. Figure 10d is also shown in Figure 10e. As shown in Figure 8, we obtain the x- and y-movement (as the unit of cm) of user’s head in 3D As shown in Figure 8, we obtain the x- and y-movement (as the unit of cm) of user’s head in space based on the x- and y-movement in image (measured by AdaBoost method) and z-distance 3D space based on the x- and y-movement in image (measured by AdaBoost method) and z-distance (measured by ultrasonic sensor). Because we know the time interval between two successive images (measured by ultrasonic sensor). Because we know the time interval between two successive images in in our experiments of Figure 8 as about 0.033 (=1/30) s (from the image capturing speed of 30 our experiments of Figure 8 as about 0.033 (=1/30) s (from the image capturing speed of 30 frames/s), frames/s), we can obtain the amount (magnitude) of velocity (as the unit of cm/s) of user’s head we can obtain the amount (magnitude) of velocity (as the unit of cm/s) of user’s head movement in movement in 3D space. For example, if the x- and y-movement of user’s head in 3D space is 0.33 3D space. For example, if the x- and y-movement of user’s head in 3D space is 0.33 cm, the amount cm, the amount (magnitude) of velocity of user’s head movement in 3D space becomes 10 (=0.33 (magnitude) of velocity of user’s head movement in 3D space becomes 10 (=0.33 cm/0.033 s) (cm/s). cm/0.033 s) (cm/s). From this, Figure 10a–e were obtained. From this, Figure 10a–e were obtained. The probability of velocity is that the frequency of velocity is presented as probability. For The probability of velocity is that the frequency of velocity is presented as probability. For example, example, assuming that all the measured frequencies of velocity are 2 (at the velocity of 10 pixels/s), assuming that all the measured frequencies of velocity are 2 (at the velocity of 10 pixels/s), 5 (at the 5 (at the velocity of 20 pixels/s), and 3 (at the velocity of 30 pixels/s), respectively. Then, the velocity of 20 pixels/s), and 3 (at the velocity of 30 pixels/s), respectively. Then, the probabilities of probabilities of velocity are 20% (=2/10) (at the velocity of 10 pixels/s), 50% (=5/10) (at the velocity of velocity are 20% (=2/10) (at the velocity of 10 pixels/s), 50% (=5/10) (at the velocity of 20 pixels/s), 20 pixels/s), and 30% (=3/10) (at the velocity of 30 pixels/s), respectively. These probabilities and 30% (=3/10) (at the velocity of 30 pixels/s), respectively. These probabilities according to velocities according to velocities are shown in Figures 9 and 10. are shown in Figures 9 and 10. The reason why expressing the frequency of velocity measures as a probability is as follows. In The reason why expressing the frequency of velocity measures as a probability is as follows. the case of expressing it as a frequency, the total sum of all the frequencies are the total number of In the case of expressing it as a frequency, the total sum of all the frequencies are the total number of images. For example, if expressing it as a frequency, the total sum of all the frequencies of Figure 9a (or Figure 10a) is same to the total number of images (28,579) in the case of playing game of Table 2.

Sensors 2016, 16, 1396

15 of 22

images. For example, if expressing it as a frequency, the total sum of all the frequencies of Figure 9a (or Figure 10a) is same to the total number of images (28,579) in the case of playing game of Table 2. As shown in Table 2, there exists a little difference among the total numbers of images in each task (playing game, surfing the web, typing, and watching a movie), and the total sum of all the frequencies can be a little different. Therefore, we express it as a probability in order to make the total sum the same (as 1) in all of the Figure 9a–d (or Figure 10a–d). From that, more accurate analysis of head velocity can be possible without the bias caused by the difference of the total number of images. For statistical analysis and outlier removal, we fit a Gaussian function, as in Equation (7), to each velocity distribution. Tables 5 and 6 show the head movement velocities in successive images and 3D space as analyzed with Gaussian fitting. As explained in Section 3.4, based on the 3σ principle, values less than µ + σ, µ + 2σ, and µ + 3σ can be regarded as having confidence levels of 68.27%, 95.45%, and 99.73%, respectively [43]. Table 5. The velocities of head movement in successive images through the analysis based on Gaussian fitting (in pixels/s). Tasks

µ

µ + σ (68.27%)

µ + 2σ (95.45%)

µ + 3σ (99.73%)

Game Web surfing Typing Movie watching Average

11 11 21 19 11

49.39 47.57 60.13 43.91 49.66

87.78 84.15 99.25 68.82 88.32

126.18 120.72 138.38 93.73 126.98

In this research, we measured only the amount (magnitude) of velocity (not the direction of velocity) of head movement (based on the amount of horizontal and vertical head movement of Figure 8) as shown in Figures 9 and 10. That is because only the amount of velocity of head movement is necessary for determining the size of eye searching region in successive images. As shown in Figure 8, the direction of head movement is random (without any directional trend). Therefore, only the amount of velocity of head movement is used for the determination of eye searching region. Based on the results in Figures 9 and 10 and Tables 5 and 6, we determined the size of the region to be searched for eye detection in successive images using the average velocity value at the position µ + 3σ, which we can expect to contain the eye area in almost 99.73% of cases. As explained in Section 3.4, the parameters of Gaussian fitting of the Equation (7) were adaptively determined according to the different figures of Figures 9 and 10. For example, in the case of Figure 10a, µ, σ, and K are 0.55, 1.62, and 0.19, respectively. For example, in the case of Figure 10c, µ, σ, and K are 0.95, 1.58, and 0.15, respectively. Table 6. Head movement velocities in 3D space based on Gaussian fitting analysis (in cm/s). Tasks

µ

µ + σ (68.27%)

µ + 2σ (95.45%)

µ + 3σ (99.73%)

Game Web surfing Typing Movie watching Average

0.55 0.45 0.95 0.55 0.55

2.17 1.94 2.53 1.97 2.14

3.78 3.44 4.11 3.39 3.74

5.40 4.94 5.69 4.81 5.33

The z-directional velocity of head movement was not measured in our research because it has less effect on the size of eye searching region than the velocity of horizontal and vertical head movement. This can be proved as follows. Figure 11a,b, respectively, presents the two images (before and after head movement in the vertical direction) captured by our gaze tracking camera. In addition, Figure 11c,d, respectively, presents the two images (before and after head movement in the z-direction (approaching to the camera)) captured

Sensors 2016, 16, 1396

16 of 22

by our gaze camera. The movement velocity between Figure 11a,b is the same as 16 that Sensorstracking 2016, 16, 1396 of 22between Figure 11c,d. As shown in these Figure 11, the amount of eye movement in image between Figure 11a,b Figure 11a,b is much 11c,d. larger than between 11c,d.ofThis is duecamera to the model is muchimage largerbetween than that between Figure This that is due to theFigure principle pinhole L principle of pinhole camera model ( = , based on the Equation (2)). That is, even with the same (l = z f , based on the Equation (2)). That is, even with the same amount of head movement, the amount of head movement, the amount of movement in 3D space (L) has the larger effect on the amount of movement in 3D space (L) has the larger effect on the measured movement in image sensor measured movement in image sensor (l) than that in z-direction (z) base on this equation. Therefore, (l) than that in searching z-direction (z) base onbe this equation. the eye searching should be much the eye region should much larger inTherefore, the case of horizontal or verticalregion head movement larger incompared the casetoofthat horizontal or vertical head movement compared to that of z-direction movement. of z-direction movement.

(a)

(b)

(c)

(d)

Figure 11. Images captured by our gaze tracking camera: (a) before head movement in the vertical

Figure 11. Images captured by our gaze tracking camera: (a) before head movement in the vertical direction; (b) after movement in the vertical direction; (c) before head movement in the z-direction; direction; movement in the vertical direction;to(c)thebefore head movement in the z-direction; and(b) (d) after after movement in the z-direction (approaching camera). and (d) after movement in the z-direction (approaching to the camera). 4.5. Performance Evaluation of Our Gaze Tracking System with Analyses

4.5. Performance Evaluation Ourdetection Gaze Tracking withbased Analyses We designed our of gaze camera System and system on the results in Sections 4.2–4.4. Since our gaze tracking systems utilized two cameras, the Gazelle and Grasshopper3, with 25- and

We designed our gaze detection camera and system based on the results in Sections 4.2–4.4. Since 35-mm lenses, we compared gaze tracking accuracies for the different hardware. our gaze tracking systems user’s utilized two cameras, the Gazelle andonGrasshopper3, with 25- and In our research, gaze position was calculated based the detected pupil center and 35-mm lenses, we compared gaze tracking accuracies for the[44]. different hardware. corneal SR center using a geometric transform Pupil center is located based on image morphological processing, ellipse fitting. Corneal is detected basedcenter on In binarization, our research, user’s gaze positionand was calculated based SR on center the detected pupil and image binarization, component labeling, and calculation of geometric center of SR area. User’s head corneal SR center using a geometric transform [44]. Pupil center is located based on image binarization, movement is compensated based on the relative positional information between the pupil center and morphological processing, and ellipse fitting. Corneal SR center is detected based on image corneal SR center. Then, based on the information of user’s calibration (each user gazes at nine binarization, component labeling, and calculation of geometric center of SR area. User’s head calibration points on monitor at initial stage), four matrices of geometric transform are obtained, and movement compensated based relative positional information between pupil center user’sis final gaze position on on the the monitor is calculated using these matrices. Morethe detailed and corneal SR center. Then, based on the ofprevious user’s calibration (each our userresearch gazes at nine explanations of gaze tracking method areinformation included in our paper [44]. Because is not focused on the gaze tracking method we only refer to [44] for the more detail calibration points on monitor at initial stage), fouritself, matrices of geometric transform are obtained, and explanations of gaze tracking system and method. user’s final gaze position on the monitor is calculated using these matrices. More detailed explanations The lens DOF for our gaze tracking camera was designed to work in the range from 50 cm to 80 of gaze tracking method are included in our previous paper [44]. Because our research is not focused cm as explained in Section 4.2, based on the results shown in Figure 7e. Therefore, the distances on the gaze tracking method onlywere refer to 50 [44] between the user’s eyes anditself, the eyewe tracker from cm for to 80the cm.more Those detail betweenexplanations the user’s eyesof gaze trackingand system and method. the monitor were also 50 cm to 80 cm because the eye tracker was set below the monitor as shown Figure As gaze showntracking in Figure camera 12, we did not designed use a chin-rest. The lens in DOF for12. our was to work in the range from 50 cm to 80 cm as explained in Section 4.2, based on the results shown in Figure 7e. Therefore, the distances between the user’s eyes and the eye tracker were from 50 cm to 80 cm. Those between the user’s eyes and the monitor were also 50 cm to 80 cm because the eye tracker was set below the monitor as shown in Figure 12. As shown in Figure 12, we did not use a chin-rest.

Sensors 2016, 16, 1396

17 of 22

Sensors 2016, 16, 1396

17 of 22

Sensors 2016, 16, 1396

17 of 22

Figure 12. Experimental setup of our gaze tracking system.

Figure 12. Experimental setup of our gaze tracking system. Figure 12. Experimental setup of our when gaze tracking system. looked at 20 reference Experiments were performed with images captured 10 participants

positions on a were 19-inch monitor with a resolution of 1280 ×when 960 pixels; each participant didat five Experiments performed with images captured 10 participants looked 20trials. reference Experiments were performed with images captured when 10 participants lookedgaze at 20tracking reference They had no experience with the system of eye-tracking and gaze input. The average positions on a 19-inch monitor with a resolution of 1280 × 960 pixels; each participant did five trials. positions onof a 19-inch monitor(male: with a5,resolution 1280 ×age: 960 pixels; each participant did five1.71) trials. the 10 subjects female: 5, of average deviation of gaze age: They accuracy had no experience with the system of eye-tracking and 26.4, gazestandard input. The average tracking They had no experience the in system eye-tracking andgaze gaze input. The average gaze tracking over five each arewith shown Table of 7. 5, Figure 13 shows tracking across different accuracy of thetrials 10 subjects (male: 5, female: average age: 26.4, standardaccuracies deviation of age: 1.71) over accuracy ofand thelenses. 10 subjects (male: 5, female: 5, average age: standard deviation age: 1.71) cameras As shown in Table 7 and Figure 13, we can26.4, confirm that gaze trackingof accuracies five trials each areeach shown in Tablein7.Table Figure 13 shows gaze tracking accuracies across across different cameras over are shown 7. Figure 13 shows gaze tracking different byfive our trials system (designed based on the results of Sections 4.2–4.4) are high,accuracies and the average error of and lenses. As shown in Table 7 and Figure 13, we can confirm that gaze tracking accuracies by our cameras and lenses. As shown in Table anderror Figure we can confirm tracking accuracies gaze tracking is about less than 0.9°.7The is 13, calculated based onthat the gaze disparity between the system (designed based on the results of Sections 4.2–4.4) are high, and the average error of gaze by reference our system (designed based on the gaze results of Sections 4.2–4.4) are high, and the average error of position and the calculated position. ◦ tracking is aboutisless than 0.9than . The error calculated based on the disparity between the reference gaze tracking about less 0.9°. Theis error is calculated based on the disparity between the position and the calculated gaze position. reference position and the calculated gaze position.

(a)

(a)

(c)

(b)

(b)

(d)

Figure 13. Comparisons of gaze tracking accuracies using: (a) gazelle camera with 35 mm lens; (b) gazelle camera with 25 mm lens; (c) Grasshopper3 camera with 35 mm lens; and (d) Grasshopper3 camera with 25 mm(c) lens. (d)

Figure 13. Comparisons of gaze tracking accuracies using: (a) gazelle camera with 35 mm lens; (b) Figure 13. Comparisons of gaze tracking accuracies using: (a) gazelle camera with 35 mm lens; gazelle camera with 25 mm lens; (c) Grasshopper3 camera with 35 mm lens; and (d) Grasshopper3 (b) gazelle camera with 25 mm lens; (c) Grasshopper3 camera with 35 mm lens; and (d) Grasshopper3 camera with 25 mm lens. camera with 25 mm lens.

Sensors 2016, 16, 1396

18 of 22

As shown in Table 4, the 25-mm lens is required to cover the user’s head movements. However, gaze tracking accuracies with this lens were slightly decreased compared to the accuracies when using the 35-mm lens, as shown in Table 7. In detail, in general, with the camera lens of smaller focal length, the angle of camera view increases and larger area can be captured by the camera [45]. Therefore, the larger area can be captured with 25 mm lens compared to the 35 mm lens using same camera, which means the size of object becomes smaller in the captured image with 25 mm lens. Consequently, the size of user’s eye (iris) is also smaller and the image resolution of user’s eyes in the captured image with 25 mm lens decreases compared to the 35 mm lens using same camera as shown in Table 7. With the smaller eye image, the image sizes of pupil and corneal SR also become smaller, and the accuracy of detecting pupil and corneal SR is inevitably decreased. Because the gaze position is calculated based on the relative position of pupil center and the center of corneal SR, the final gaze detection accuracy is also decreased with the lens of smaller focal length compared to that with the larger focal length (if all other conditions are assumed to be the same). To prove this irrespective of the kind of camera, we showed the results with two different cameras as shown in Table 7. Therefore, we posit that lenses whose focal lengths are between 25 and 35 mm are suitable for use in our gaze tracking camera. Table 7. Comparisons of gaze tracking accuracies and iris diameter in captured image. Various Cameras with Lenses

Gaze Detection Accuracy (◦ )

Iris Diameter (Pixels)

Gazelle with 35 mm lens Gazelle with 25 mm lens Grasshopper3 with 35 mm lens Grasshopper3 with 25 mm lens

0.57 0.84 0.56 0.9

117 82 121 87

In addition, the sizes of user’s irises are also compared in Table 7. Because a camera with a 35-mm lens can have a narrow angle of view compared to one with a 25-mm lens, the size of the iris in an image captured with a 35-mm lens is larger than with a 25-mm lens. 4.6. Comparative Subjective Tests on Usability and Analyses As a final experiment, we compared the usability of our gaze tracking system and a commercial gaze tracking system, TheEyeTribe [23], in terms of user convenience and interest. For experiment, 10 users (who participated in the experiment of Section 4.5) did web surfing for 15 min by our gaze tracking system and TheEyeTribe [23], respectively. For fair comparisons, we requested them to do web surfing using same website (http://www.naver.com) at the same date. Each user moved his gaze position by using our gaze tracking system and TheEyeTribe, respectively, and selected links on the website by clicking mouse button. That is, the participants used a multimodal selection technique by combining gaze position on screen and a manual clicking of mouse left button. Our system and TheEyeTribe provide the feedback about the current gaze position on screen by generating the message of Microsoft Windows operating system (OS) in the case of mouse movement. Then, we asked 10 users (who participated in the experiment of Section 4.5) to rate the convenience of use and their interest in our system and TheEyeTribe on a 5-point scale (5: very high, 4: high, 3: normal, 2: low, 1: very low) by questionnaire. Here, the notion of interest is linked to the amusement the user experience when using one of the systems. Each person tried five times for this procedures (web surfing for about 15 min and subjective test). As explained in Section 4.5, the 10 users were composed of five males and five females with the average age of 26.4 (the standard deviation of age as 1.71). Three users wore glasses, and three users wore contact lenses. The other four users did not wear glasses and contact lens. The y-unit of Figure 14 represents these points (from 1 to 5).

Sensors 2016, 16, 1396 Sensors 2016, 16, 1396

19 of 22 19 of 22

(a)

(b)

Figure 14. 14. Comparisons Comparisons of of user user convenience convenience and and interest interest with with our our system system and and conventional conventional system: system: Figure (a) user convenience; and (b) user interest. (a) user convenience; and (b) user interest.

As shown in Figure 14a, the users thought that our gaze tracking system is more convenient As shown in Figure 14a, the users thought that our gaze tracking system is more convenient than than TheEyeTribe. It means that people regard that our gaze tracking system is more convenient TheEyeTribe. It means that people regard that our gaze tracking system is more convenient and easier and easier to be used in the case of moving their gaze position with mouse button clicking in order to be used in the case of moving their gaze position with mouse button clicking in order to do web to do web surfing. In addition, as shown in Figure 14b, the users thought that our gaze tracking surfing. In addition, as shown in Figure 14b, the users thought that our gaze tracking system is more system is more interesting than TheEyeTribe. It means that people regard that our system is more interesting than TheEyeTribe. It means that people regard that our system is more interesting in the interesting in the case of moving their gaze position with mouse button clicking in order to do web case of moving their gaze position with mouse button clicking in order to do web surfing. surfing. The lens DOF for our gaze tracking camera was designed to work in the range from 50 to 80 cm, The lens DOF for our gaze tracking camera was designed to work in the range from 50 to 80 based on the results shown in Figure 7e, about 88.3% of user’s z-distances can be covered by our gaze cm, based on the results shown in Figure 7e, about 88.3% of user’s z-distances can be covered by tracking camera. However, according to the specification of TheEyeTribe [23], its operating range our gaze tracking camera. However, according to the specification of TheEyeTribe [23], its operating is from 45 cm to 75 cm, which is closer to monitor (by 5 cm) compared to our system. As shown range is from 45 cm to 75 cm, which is closer to monitor (by 5 cm) compared to our system. As in Figure 7e, the probability of user’s z-distance is higher than 13% in the range from 75 to 79 cm. shown in Figure 7e, the probability of user’s z-distance is higher than 13% in the range from 75 to 79 Nevertheless, TheEyeTribe cannot cover this z-distance range, which lowers the rating scores for cm. Nevertheless, TheEyeTribe cannot cover this z-distance range, which lowers the rating scores for convenience and interest. Actually, we asked the participants after the experiments, and they convenience and interest. Actually, we asked the participants after the experiments, and they responded that this was the main cause of low rating scores. responded that this was the main cause of low rating scores. In addition, in order to prove that the user convenience with our system is statistically higher than In addition, in order to prove that the user convenience with our system is statistically higher that with commercial system, we performed a t-test [46]. The t-test has been widely used for statistical than that with commercial system, we performed a t-test [46]. The t-test has been widely used for analysis for the difference between two measured values. Although the rating based on five-point statistical analysis for the difference between two measured values. Although the rating based on scale for each system is not very high and database is rather small to perform a significance test, we five-point scale for each system is not very high and database is rather small to perform a performed the t-test. The t-test was performed using two independent samples: user convenience significance test, we performed the t-test. The t-test was performed using two independent samples: with system (µwith = 3.87, = 0.69) and system (µ = 2.8, σ = system 0.5). The(μcalculated user our convenience ourσ−system (μ = with 3.87, the σ = commercial 0.69) and with the commercial = 2.8, σ = 4 p-value about 9.2p-value × 10 was , which is smaller the 99% (0.01) significance level. −4, which 0.5). Thewas calculated about 9.2 × 10than is smaller than the 99% (0.01) significance level.In general, the null hypothesis for the t-test, that there is no difference between the two independent samples, is defined, and for the null hypothesis is rejected the case that the calculated In general, the null hypothesis the t-test, that there is noindifference between the two p-value is smaller than significance level [46]. The rejection means that there exists significant difference independent samples, is defined, and the null hypothesis is rejected in the case that the calculated between two independent samples. Therefore, weThe can rejection conclude means that there a significant difference p-value is smaller than significance level [46]. thatis there exists significant between user convenience with our system and the commercial system at thethat significance of 99%. difference between two independent samples. Therefore, we can conclude there is alevel significant In addition, the t-test was performed using two independent samples: user interest with our difference between user convenience with our system and the commercial system at the significance system (µ = 4.13, σ = 0.72) and with the commercial system (µ = 3.0, σ = 0.59). The calculated level of 99%. 3 , which is smaller than the 99% (0.01) significance level. Therefore, we can p-value aboutthe 1.2 × 10−was In was addition, t-test performed using two independent samples: user interest with our system (μ = 4.13, σ = 0.72) and with the commercial system (μ = 3.0, σ = 0.59). The calculated p-value was about 1.2 × 10−3, which is smaller than the 99% (0.01) significance level. Therefore, we can

Sensors 2016, 16, 1396

20 of 22

conclude that there is a significant difference between user interest with our system and the commercial system at the significance level of 99%. And, we performed Cohen’s d analysis, by which the size of the difference between the two groups can be shown using the effect size [47]. Cohen’s d analysis has been also widely used for analyzing the difference between two measured values. In general, Cohen’s d is classified as small at about 0.2–0.3, as medium at about 0.5, and as large at greater than or equal to 0.8. If the calculated Cohen’s d is closer to 0.2–0.3 than 0.5 and 0.8, we can say that the difference between measured values has small effect size. If the calculated Cohen’s d is closer to 0.8 than 0.2–0.3 and 0.5, we can say that the difference between measured values has large effect size [47]. The calculated Cohen’s d about user convenience was about 1.78 (closer to 0.8 than 0.2–0.3 and 0.5), from which we can conclude that the difference in user convenience between our system and the commercial system is a large effect. The calculated Cohen’s d about user interest was about 1.72 (closer to 0.8 than 0.2–0.3 and 0.5), from which we can conclude that the difference in user interest between our system and the commercial system is also a large effect. From the t-test and Cohen’s d analysis, we can conclude that there is a significant difference in user convenience and interest between our system and the commercial system. 5. Conclusions In this study, we introduce an empirical design for the optimal gaze tracking camera lens based on experimental measurements of the amount and velocity of user’s head movement. Accurate measurements of the amount and velocity of user’s head movement were made with a web-camera and ultrasonic sensor, while each user played a game, surfed the web, typed, and watched a movie in a desktop environment. User’s head position in successive images is automatically located using AdaBoost. The amount and velocity of head movement in 3D space are obtained using a pinhole camera model and the z-distance measured by the ultrasonic sensor. Based on these measurements, we determine the optimal viewing angle and DOF of the camera lens. We design our gaze tracking system using a lens meeting these specifications, and the accuracy of our gaze tracking system is compared with those obtained using various other cameras and lenses. In addition, the user convenience of our system is compared to that of a commercial system. The results of our empirical study would be helpful for developers of gaze tracking cameras and systems. In future work, we will extend our empirical study to other kinds of gaze tracking systems, such as wearable systems or those that include panning, tilting, and auto focusing functionality. In addition, we would perform the comparative experiments using our ultrasonic sensor with commercial depth camera. In addition, although we tried our best to obtain the generalized conclusion from our experiments, there exist various factors not considered in our experiments. Other factors of user’s preferences and activities can affect the results, and we will also consider these factors in future research. Acknowledgments: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01056761), and in part by the Bio & Medical Technology Development Program of the NRF funded by the Korean government, MSIP (NRF-2016M3A9E1915855), and in part by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2016-H8501-16-1014) supervised by the IITP (Institute for Information & communications Technology Promotion). Author Contributions: Weiyuan Pan and Kang Ryoung Park designed and implemented the overall gaze tracking system with experiments. Dongwook Jung, Hyo Sik Yoon, Dong Eun Lee, Rizwan Ali Naqvi, and Kwan Woo Lee helped the experiments with the collection of data. Conflicts of Interest: The authors declare no conflict of interest.

References 1.

Sibert, L.E.; Jacob, R.J.K. Evaluation of Eye Gaze Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Hague, The Netherlands, 1–6 April 2000; pp. 281–288.

Sensors 2016, 16, 1396

2. 3. 4.

5.

6. 7.

8. 9. 10.

11. 12.

13. 14.

15.

16.

17. 18.

19. 20.

21. 22. 23.

21 of 22

Noris, B.; Keller, J.-B.; Billard, A. A wearable gaze tracking system for children in unconstrained environments. Comput. Vis. Image Underst. 2011, 115, 476–486. [CrossRef] Franchak, J.M.; Kretch, K.S.; Soska, K.C.; Adolph, K.E. Head-mounted eye tracking: A new method to describe infant looking. Child Dev. 2011, 82, 1738–1750. [CrossRef] [PubMed] Ryan, W.J.; Duchowski, A.T.; Birchfield, S.T. Limbus/Pupil Switching for Wearable Eye Tracking Under Variable Lighting Conditions. In Proceedings of the 2008 Symposium on Eye Tracking Research & Applications, Savannah, GA, USA, 26–28 March 2008; pp. 61–64. Lukander, K.; Jagadeesan, S.; Chi, H.; Müller, K. OMG!—A New Robust, Wearable and Affordable Open Source Mobile Gaze Tracker. In Proceedings of the 15th International Conference on Human-Computer Interaction with Mobile Devices and Services, Munich, Germany, 27–30 August 2013; pp. 408–411. Gneo, M.; Schmid, M.; Conforto, S.; D’Alessio, T. A free geometry model-independent neural eye-gaze tracking system. J. NeuroEng. Rehabil. 2012, 9. [CrossRef] [PubMed] Agustin, J.S.; Skovsgaard, H.; Mollenbach, E.; Barret, M.; Tall, M.; Hansen, D.W.; Hansen, J.P. Evaluation of a Low-Cost Open-Source Gaze Tracker. In Proceedings of the 2010 Symposium on Eye Tracking Research & Applications, Austin, TX, USA, 22–24 March 2010; pp. 77–80. Lee, J.M.; Lee, H.C.; Gwon, S.Y.; Jung, D.; Pan, W.; Cho, C.W.; Park, K.R.; Kim, H.-C.; Cha, J. A new gaze estimation method considering external light. Sensors 2015, 15, 5935–5981. [CrossRef] [PubMed] Morimoto, C.H.; Mimica, M.R.M. Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. 2005, 98, 4–24. [CrossRef] Ohno, T.; Mukawa, N.; Yoshikawa, A. FreeGaze: A Gaze Tracking System for Everyday Gaze Interaction. In Proceedings of the 2002 Symposium on Eye Tracking Research & Applications, New Orleans, LA, USA, 25–27 March 2002; pp. 125–132. Yoo, D.H.; Chung, M.J. A novel non-intrusive eye gaze estimation using cross-ratio under large head motion. Comput. Vis. Image Underst. 2005, 98, 25–51. [CrossRef] Chen, J.; Tong, Y.; Gray, W.; Ji, Q. A Robust 3D Eye Gaze Tracking System Using Noise Reduction. In Proceedings of the 2008 Symposium on Eye Tracking Research & Applications, Savannah, GA, USA, 26–28 March 2008; pp. 189–196. Zhu, Z.; Ji, Q. Novel eye gaze tracking techniques under natural head movement. IEEE Trans. Biomed. Eng. 2007, 54, 2246–2260. [PubMed] Guestrin, E.D.; Eizenman, M. Remote Point-of-Gaze Estimation with Single-Point Personal Calibration Based on the Pupil Boundary and Corneal Reflections. In Proceedings of the 24th Canadian Conference on Electrical and Computer Engineering, Niagara Falls, ON, Canada, 8–11 May 2011; pp. 000971–000976. Nagamatsu, T.; Sugano, R.; Iwamoto, Y.; Kamahara, J.; Tanaka, N. User-Calibration-Free Gaze Tracking with Estimation of the Horizontal Angles Between the Visual and the Optical Axes of Both Eyes. In Proceedings of the 2010 Symposium on Eye Tracking Research & Applications, Austin, TX, USA, 22–24 March 2010; pp. 251–254. Lu, F.; Okabe, T.; Sugano, Y.; Sato, Y. A Head Pose-free Approach for Appearance-based Gaze Estimation. In Proceedings of the 22nd British Machine Vision Conference, University of Dundee, Dundee, UK, 29 August–2 September 2011; pp. 1–11. Zhu, Z.; Ji, Q. Eye Gaze Tracking under Natural Head Movements. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 918–923. Coutinho, F.L.; Morimoto, C.H. Free Head Motion Eye Gaze Tracking Using a Single Camera and Multiple Light Sources. In Proceedings of the 19th Brazilian Symposium on Computer Graphics and Image Processing, Manaus, Brazil, 8–11 October 2006; pp. 171–178. Coutinho, F.L.; Morimoto, C.H. Improving head movement tolerance of cross-ratio based eye trackers. Int. J. Comput. Vis. 2013, 101, 459–481. [CrossRef] Kim, K.-N.; Ramakrishna, R.S. Vision-Based Eye-gaze Tracking for Human Computer Interface. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan, 12–15 October 1999; pp. 324–329. f -Number. Available online: https://en.wikipedia.org/wiki/F-number (accessed on 11 May 2016). Cho, D.-C.; Kim, W.-Y. Long-range gaze tracking system for large movements. IEEE Trans. Biomed. Eng. 2013, 60, 3432–3440. [CrossRef] [PubMed] TheEyeTribe. Available online: https://theeyetribe.com/ (accessed on 1 June 2016).

Sensors 2016, 16, 1396

24. 25. 26.

27. 28.

29.

30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.

45. 46. 47.

22 of 22

Tobii. Available online: http://www.tobii.com (accessed on 22 July 2016). Kinect. Available online: https://en.wikipedia.org/wiki/Kinect (accessed on 22 July 2016). Mora, K.A.F.; Odobez, J.-M. Gaze Estimation from Multimodal Kinect Data. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 25–30. Mallick, T.; Das, P.P.; Majumdar, A.K. Characterizations of noise in Kinect depth images: A review. IEEE Sens. J. 2014, 14, 1731–1740. [CrossRef] Chen, L.; Lin, H.; Li, S. Depth Image Enhancement for Kinect Using Region Growing and Bilateral Filter. In Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan, 11–15 November 2012; pp. 3070–3073. Lachat, E.; Macher, H.; Mittet, M.-A.; Landes, T.; Grussenmeyer, P. First Experiences with Kinect V2 Sensor for Close Range 3D Modelling. In Proceedings of The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Avila, Spain, 25–27 February 2015; pp. 93–100. SoftKinetic. Available online: http://www.softkinetic.com/Products/DepthSenseCameras (accessed on 4 August 2016). Intel RealSense. Available online: https://en.wikipedia.org/wiki/Intel_RealSense (accessed on 4 August 2016). Patriot. Available online: http://www.polhemus.com/?page=Motion_Patriot (accessed on 26 July 2016). Al-Rahayfeh, A.; Faezipour, M. Eye tracking and head movement detection: A state-of-art survey. IEEE J. Transl. Eng. Health Med. 2013, 1, 2100212-1–2100212-12. [CrossRef] [PubMed] Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [CrossRef] Pinhole Camera Model. Available online: https://en.wikipedia.org/wiki/Pinhole_camera_model (accessed on 26 July 2016). SRF04 Ultrasonic Sensor. Available online: http://www.cornerstonerobotics.org/curriculum/lessons_ year2/erii24_ultra_sonic_sensor.pdf (accessed on 11 May 2016). Audio Frequency. Available online: https://en.wikipedia.org/wiki/Audio_frequency (accessed on 24 May 2016). Laser Rangefinder DLE70 Professional. Available online: http://www.bosch-pt.com/productspecials/ professional/dle70/uk/en/start/index.htm (accessed on 26 July 2016). Webcam C600. Available online: http://support.logitech.com/ko_kr/product/2-mp-webcam-c600 (accessed on 11 May 2016). Introduction to Programming with OpenCV. Available online: http://www.cs.iit.edu/~agam/cs512/lectnotes/opencv-intro/opencv-intro.html (accessed on 11 May 2016). Dying of the Light (film). Available online: https://en.wikipedia.org/wiki/Dying_of_the_Light_(film) (accessed on 3 June 2016). Point Grey Cameras. Available online: http://www.ptgrey.com/Cameras (accessed on 11 May 2016). 68-95-99.7 Rule. Available online: https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule (accessed on 2 June 2016). Jung, D.; Lee, J.M.; Gwon, S.Y.; Pan, W.; Lee, H.C.; Park, K.R.; Kim, H.-C. Compensation method of natural head movement for gaze tracking system using an ultrasonic sensor for distance measurement. Sensors 2016, 16, 110. [CrossRef] [PubMed] Angle of View. Available online: https://en.wikipedia.org/wiki/Angle_of_view (accessed on 26 July 2016). Student’s t-Test. Available online: http://en.wikipedia.org/wiki/Student’s_t-test (accessed on 1 June 2016). Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155–159. [CrossRef] [PubMed] © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Empirical Study on Designing of Gaze Tracking Camera Based on the Information of User's Head Movement.

Gaze tracking is the technology that identifies a region in space that a user is looking at. Most previous non-wearable gaze tracking systems use a ne...
7MB Sizes 1 Downloads 13 Views