Feature Article

Nonrigid-Deformation Recovery for 3D Face Recognition Using Multiscale Registration Liang Cai and Feipeng Da ■ Southeast University

F

ace recognition is highly preferable to other biometrics for many applications, including surveillance, access control, and machinehuman interaction, mainly because it’s nonintrusive. However, traditional 2D recognition systems don’t work well when handling variations in pose, light conditions, and expression. For face recognition to achieve high accuracy, it must employ different capture modalities. With the development of scanning technology, acquisition of 3D shapes is becoming more accurate and less intrusive. This allows for low-cost 3D facial recognition by measuring the face’s geometry. This method can potentially overcome traditional methods’ sensitivity to pose and illumination. Even though matching local regions on faces can achieve good performance under certain conditions,1,2 we believe that using global features is promising because it can handle more characteristics. Our method relies on nonrigid deformation, an important global feature, to differentiate interpersonal disparities. We divide face recognition into two phases: enrollment and authentication.3 Using a reference model, all faces register their deformations. During authentication, our method directly compares the deformations retrieved from the database with the scanned faces, using a distance metric (see Figure 1). (For a look at other methods for deformation extraction, see the sidebar.) We propose a novel way to find the mapping between two shapes. Our method involves an extended tool you can use for point-set correspondence and nonrigid-deformation extraction. To increase accuracy, it employs a multiscale approach Published by the IEEE Computer Society

incorporating manifold harmonics, a variation of spectral-geometry analysis.

Background Spectral-geometry analysis, which Gabriel Taubin first introduced,4 is a theoretical tool to characterize the classical approximations of filters. It’s based on the similarity between the eigenvectors of the graph LaA proposed method extracts placian and the basis functions nonrigid deformation used in discrete Fourier transby finding the mapping forms. Basically, you compute the Laplacian’s eigenfunctions between two shapes. It and eigenvalues on a general integrates geometric shape decomposition and nonrigid manifold surface. Given a mesh with n vertices, point-set registration its graph Laplacian operator L ∈ to improve registration ℝn×n is a matrix where Li,j = wi,j accuracy. Employing manifold > 0, whenever (i, j) is an edge; harmonics, the method otherwise, Li,j = 0 and Li,i = decomposes shapes into low–Swi,j. The coefficients wi,j are frequency and high-frequency weights associated with the parts. Then, it applies graph edges. You can choose the the modified registration uniform weight wi,j = 1 or more algorithm to obtain elaborate weights, computed from the embedding of the graph—for deformation parameters. example, a distance weight with and a cotangent weight with wi,j = cot(ai,j) + cot(bi,j). To avoid the influence of an irregular mesh, we select a fully mesh-independent symmetric weight:5

0272-1716/12/$31.00 © 2012 IEEE

,

IEEE Computer Graphics and Applications

37

Feature Article

Deformation parameters Rigid Scale Rotation Transformation

Nonrigid-deformation database

Face segment Test scan

... Registration Nonrigid

Recovery

Classification

Reference face

Feature extraction

Figure 1. Nonrigid-deformation recovery for 3D face recognition during authentication. Our method directly compares the deformations retrieved from the database with the scanned faces, using a distance metric.

(a)

(b)

(c)

(d)

Figure 2. Manifold harmonics on a face. The images show the eigenfunctions Hk whose indices k are (a) 1, (b) 5, (c) 10, and (d) 100. The harmonic behavior of these eigenfunctions resembles trigonometric polynomials in classical Fourier analysis. This implies that any application that uses the Fourier transform can be applied in the manifold.

where ai,j, bi,j denotes the two angles opposite edge (i, j) and Ai, Aj are the areas of the Voronoi cell of vertex i, j. The solution to this eigenproblem after mapping the eigenvector of L to the canonical basis yields a series of eigenpairs (Hk, lk) called the manifoldharmonics bases (MHBs; see Figure 2). MHBs have received much attention recently because a manifoldharmonics transform (MHT) performs a generalization of the Fourier transform on irregular 3D meshes. In particular, on a sphere, the eigenfunctions correspond to spherical harmonics, which are often used in face recognition to represent illumination under unknown lighting.6 The MHBs are orthogonal; that is, the functional inner product 38

May/June 2012

〈Hi, Hj〉 = 0 if i ≠ j. We also ensure that an MHB is orthonormal by dividing each basis vector Hk by its functional norm 〈Hk, Hk 〉. Smaller eigenvalues of the spectrum correlate to low-frequency signals, which account for global features; larger eigenvalues correlate to high-frequency signals, which represent the details. Using MHBs, we can define an MHT to convert the signal on the manifold surface into the spectral domain. Consider a function defined on the geometry vertex position of a triangulated surface based on , where xj denotes the function value at vertex j and fj(xk) = dj,k, where f is the Dirac delta function and d is a piecewise linear hat function that’s defined on the vertex and its 1-ring. We can reconstruct X at j using the first S frequencies:

(a)

(b) Figure 3. Filtering on a manifold by (a) shape and (b) appearance. The first image in each row is the original signal. The other images are the reconstructed results for S = 900, 300, and 100, where S is the number of low-frequency bases. This process removes high-frequency information.

3D Face Preprocessing

,

where item

is a coefficient vector with each corresponding to each frequency basis Hk: .

D is a diagonal matrix determined by the mesh structure. Hk (k = 1, 2, …, n) are the orthogonal MHBs on the surface. In practice, X could be a geometry coordinate or color. Figure 3 illustrates reconstruction for shape and appearance using S low-frequency bases. This process basically applies an ideal low-pass filter to the signal on a manifold, removing high-frequency information.

Multiscale Registration Figure 4 shows our method’s basic framework. First, using manifold harmonics, we decompose each face into two parts. One represents the lowfrequency component; the other preserves the high-frequency information. Next, we apply a modified registration to the low-frequency parts to obtain deformation parameters. Finally, we add detailed versions of the faces to calculate the nonrigid deformation.

This stage crops the facial region from an input point cloud acquired by a 3D scanner. We remove unnecessary data in the point cloud (such as for the shoulder, hair, and ears) for deformation recovery. Our two-step segmentation exploits the fact that the nose is the face’s most prominent part and is at the face’s center (see Figure 5). First, we detect the nose tip on the basis of depth information. Second, we crop the face region on the basis of an ellipse around the nose.

Multiscale Decomposition Assume we have target facial point set X = {x i} n and model Y = {y i} m , where n and m are the number of points. We’d like to build a function that approximates the mapping from X to Y. Unlike Andriy Myronenko and Xubo Song, who considered the mapping as X = Y + v′,7 we employ both rigid and nonrigid motions to model the mapping: X = sR(Y + v) + t,(1) where s is a scale factor, R is a rotation matrix, and t is a translation vector. Compared with v′, which denotes the extrinsic difference between X and Y, the nonrigid deformation v cares more about their intrinsic discrimination. Dividing X, Y, and v into IEEE Computer Graphics and Applications

39

Feature Article Low-frequency components

Low-frequency components XL = sR(YL+GW)+t

Decomposition

Decomposition

Registration XL

X

High frequency XH

Rigid

Nonrigid

s,R,t

GW

XH /sR

Adjustment

Y

YL High frequency –YH

Σ

XH /sR –YH +GW

Reconstruction +

Y

∆nonrigid

Figure 4. Multiscale registration. First, we decompose each face into two parts. One represents the low-frequency component; the other preserves the high-frequency information. Next, we apply a modified registration to the low-frequency parts to obtain deformation parameters. Finally, we add detailed versions of the faces to calculate the nonrigid deformation. X is the target facial point set, Y is the model, s is the scale factor, R is a rotation matrix, t is a translation vector, G is a Gaussian radial basis function, and W is a coefficient matrix.

X H = sR(Y H + vH).(2)

Nonrigid Registration

(a)

(b)

Figure 5. Face segmentation. (a) The nose tip and ellipse around the nose. (b) The segmented face based on the depth image. Our twostep segmentation exploits the fact that the nose is the face’s most prominent part and is at the face’s center.

two distinct frequency components, we rewrite the mapping in Equation 1 as X H + X L = sR(Y H + Y L + vH + vL) + t, where H and L denote the high- and low-frequency parts. According to the discussion in the “Background” section, we obtain these equations:

The mapping between Y L and X L is a probability density estimation problem.7 We treat Y L as the Gaussian mixture model (GMM) centroids; X L is the observation data the GMM generated. Besides these Gaussian clusters, we must consider an outlier component with uniform distribution. Under these conditions, the likelihood of observation xn,L belonging to a cluster m has a Gaussian distribution with the mean mm = sR(ym,L + vm,L) + t and covariance s2. So, the conditional distribution is . The likelihood that observation xn,L is an outlier is a uniform distribution , where M + 1 denotes the outlier cluster. Denoting the uniform distribution’s weight as w, 0 ≤ w ≤ 1 (w = 0.1 in our experiments), the mixture model takes the form .

X L = sR(Y L + vL) + t, 40

May/June 2012

Related Work in Deformation Extraction

M

any methods for deformation extraction exist. Volker Blanz and Thomas Vetter’s well-known 3D morphable model uses a statistical method to estimate the deformation of 3D face shapes.1 Faisal Al-Osaimi and his colleagues’ method learns deformations in the expression deformation model, a principal-component-analysis subspace.2 Researchers have also developed nonstatistical methods. Xiaolei Huang and her colleagues proposed a global-to-local deformation framework to deform a shape to a new one of the same class.3 They also applied their framework to 3D faces. Ioannis Kakadiaris and his colleagues4 and Georgios Passalis and his colleagues5 deformed the Annotated Face Model to scan data. They applied multistage alignment algorithms to perform model fitting and applied mesh parametrization to analyze geometry information. Rigid registration algorithms such as ICP (Iterative Closest Point) are common for deformation extraction, especially when both alignment and correspondence are required. However, because ICP just aligns two point sets as least-squares optimizations under rigid motions, the established correspondence isn’t accurate enough and the obtained deformation is often imprecise. These

We define the correspondence probability between two points as the posterior probability of the GMM centroid, on the basis of Bayes’ theorem: , where P(m) = 1/M for all GMM components. Assuming that observations are independent and identically distributed, we obtain the log likelihood of all the point sets:

disadvantages hamper recognition performance.

References 1. V. Blanz and T. Vetter, “Face Recognition Based on Fitting a 3D Morphable Model,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 9, 2003, pp. 1063–1074. 2. F. Al-Osaimi, M. Bennamoun, and A. Mian, “An Expression Deformation Approach to Non-rigid 3D Face Recognition,” Int’l J. Computer Vision, vol. 81, no. 3, 2009, pp. 302–316. 3. X. Huang, N. Paragios, and D. Metaxas, “Shape Registration in Implicit Spaces Using Information Theory and Free Form Deformations,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, 2006, pp. 1303–1318. 4. I. Kakadiaris et al., “Three-Dimensional Face Recognition in the Presence of Facial Expressions: An Annotated Deformable Model Approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 4, 2007, pp. 640–649. 5. G. Passalis, I. Kakadiaris, and T. Theoharis, “Intraclass Retrieval of Nonrigid 3D Objects: Application to Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 2, 2007, pp. 218–229.

where Pold denotes the posterior probabilities, N M Np = ∑ ∑ P old m x n,L , D is the dimension of n=1 m=1 points, and l is a weighting constant. We use the expectation-maximization algorithm to find the optimal parameters. First, we find the correspondence between X L and Y L . Then, we get the optimal rigid and nonrigid motion parameters (s, R, t, v) to minimize the objective function in Equation 4. The process repeats until convergence occurs.

(

)

Deformation Recovery ,(3) where q is the set of transformation parameters. So, we can formulate deformation-parameter fitting as the maximization of the log likelihood in Equation 3. If we approximate vL by vL = GW, where G is a Gaussian radial basis function containing the inner structure of Y L , and W is a coefficient matrix, the objective function is

N pD E= log σ 2 2 λ        + tr ( s2R T W T GWR) 2 1 N ,M old 2 + 2 P (m x n,L ) x n − sRy m,L − GW − t , 2σ n,m=1 (4)





By introducing filtering to point-set registration, we can obtain the rigid parameters s, R, and t and the low-frequency part of deformation vL as the minimizer of the minimization problem of Equation 3. Once we’ve computed the best rigid parameters, we can solve the nonrigid deformation of the highfrequency part in terms of Equation 2. Moreover, we can acquire the complete nonrigid deformation by summing the deformation components.

Experiments and Discussion To verify our method’s performance, we conducted comprehensive experiments on the Face Recognition Grand Challenge (FRGC) face database.8 FRGC v2 includes FRGC v1 and images acquired in fall 2003 and spring 2004. It includes 4,007 3D face images of 466 distinct human subjects, with from 1 to 22 images per subject. For each subject, it has neutral IEEE Computer Graphics and Applications

41

Feature Article

(a) 20 10 0 –10 –20

(b)

Figure 6. Deformation extraction. (a) Four reconstructed faces. (b) The color-coded deformations on the reference face (the unit of measurement is millimeters). Deformation is crucial to reconstruction and recognition.

expressions and nonneutral expressions such as smiling, astonished, puffy cheeked, and angry.

Deformation Extraction Deformation, a displacement field on the reference obtained by our method, is crucial to reconstruction and recognition. Figure 6 shows four reconstructed faces and their deformations. To evaluate the extracted deformation quantitatively, we introduced two registration-error measurements. One was the repetition rate, which is related to the number of points having more than one correspondence. The other was the distance error, which is the reconstructed face minus its scan. To see how the filtering affects registration performance, we compared results for S = 3 for ■■

■■

■■

CPD-MM, our version of Closest-Point Distance that uses a mixture model; ICP (Iterative Closest Point), a rigid-registration method; and a single-scale version of CPD (without filtering).

The test faces fell into six groups based on the number of points in the point set, which ranged from 500 to 3,000. Figure 7 summarizes the registration errors. Both nonrigid registration methods were more accurate than ICP. CPD-MM achieved the lowest registration repetition rate and distance error. Obviously, filtering is important for enhancing registration accuracy. 42

May/June 2012

In our multiscale approach, weights are just as important as smooth results. We compared a symmetric weight (manifold harmonics) with cotangent, uniform, and distance weights (see Figure 8). After filtering without manifold harmonics, the registration errors were high. After filtering with manifold harmonics, the repetition rate and distance error greatly decreased. On the basis of these results, we conclude that the best choice is a symmetric weight. Time consumption changed with the filter parameters S. Table 1 reports the number of iterations for different values of S. It clearly shows that as S increased, the registration algorithm took longer to converge. It also indicates that filtering for nonrigid registration is necessary because it can reduce processing time.

Recognition Results We compared the recognition performance for rigid registration, CPD, and CPD-MM. We extracted the model-based deformation for each probe to characterize an individual by the different registration methods. (In FRGC parlance, a probe is one or more images with unknown identities, and a gallery is one or more images with known identities.) Then, we converted the deformations into geometry images.9 Applying Haar and pyramid transforms, we obtained two sets of coefficients. We compared the two subjects (gallery and probe) using the distance metrics that Ioannis Kakadia-

0.40

Repetition rate

0.35 0.30 0.25 0.20 0.15 0.10 0.05

Distance error per point (mm)

0.20 CPD ICP CPD-MM

0.15

0.10

0.05

0 0

500

(a)

1,000 1,500 2,000

0

2,500 3,000

No. of points in a set

CPD ICP CPD-MM

0

500

1,000 1,500 2,000

(b)

2,500 3,000

No. of points in a set

Figure 7. Performance with and without filtering, for Iterative Closest Point (ICP), Closest-Point Distance (CPD), and Closest-Point Distance with a mixture model (CPD-MM). (a) The repetition rate versus the number of points in a point set. (b) The distance error per point. The nonrigid registration methods (CPD and CPD-MM) were more accurate than ICP. CPD-MM achieved the lowest registration repetition rate and distance error. 0.25

1.00 Repetition rate Distance error (mm)

Identification rate

0.20 0.15

0.10

0.05

0

0.90

0.85

CPD-MM CPD Rigid registration

0.80 Cotangent

Uniform

Distance

Symmetric

Type of weight Figure 8. Comparing different weights for filtering. On the basis of our results, we conclude that the best choice is a symmetric weight.

ris and his colleagues defined.3 Finally, the system ranked the most similar faces on the basis of the similarity distance metrics, and we used cumulative match characteristic (CMC) curves to measure identification performance. Figure 9 shows the CMC curves for the probe set. The nonrigid methods performed significantly better. CPD produced a worse rank-1 identification rate than CPD-MM by approximately 2.2 percent. CPD-MM achieved an identification rate higher than 99 percent after rank 3. This excellent performance verifies our method’s efficiency.

The Effects of Expression Variation FRGC v2 categorizes facial expressions, allowing an easy division into two datasets: neutral and nonneutral facial expressions. In this test, we computed a 4,007 × 4,007 similarity matrix and

0.95

5

10

15

20

Rank Figure 9. Cumulative-match-characteristic curves comparing rigid registration, CPD, and CPD-MM. CPD-MM achieved an identification rate higher than 99 percent after rank 3. The results verify our method’s efficiency.

Table 1. Time consumption. No. of eigenvectors (S) No. of iterations

3

30

60

90

18

31

35

59

normalized it by matching all the 3D scans. Using the masks provided by the FRGC protocol, we obtained receiver operating characteristic (ROC) curves to evaluate verification performance. Figure 10 compares the two subsets’ performance with that of the full database. The verification rates at a 0.001 FAR (false-acceptance rate) were 97.4 percent for the full database, 96.2 percent for nonneutral expressions, and 98.7 percent for neutral expressions. The 2.5 percent decrease from the best to worst performance is modest. We IEEE Computer Graphics and Applications

43

Feature Article 1.00

Verification rate

0.98 0.96 0.94 Neutral expressions Full database Nonneutral expressions

0.92 0.90 10–3

10–2

10–1

100

False-acceptance rate (log scale) Figure 10. ROC (receiver operating characteristic) III curves for three probe sets. The results indicate that our method is robust to expression variation.

conclude that our method is robust to expression variation.

Comparison with Other Methods We compared our method with ICP and three state-of-the-art methods: ■





AFM (Annotated Face Model) is a nonstatistical method based on rigid registration.3 EDM (Expression Deformation Model) is a statistical deformable-model method that focuses on expression variation.10 MMH (Multimodal Hybrid) is a 2D-3D hybrid method.1

Each method used all the data in FRGC v2. Table 2 compares the identification rate by rank-1 recognition and the verification rate at a 0.001 FAR; the results are from the original papers. It clearly indicates that our method performed the best. It also shows that nonstatistical methods performed better. This is mainly because the statistical method is just a linear approximation of nonrigid deformation. It’s robust to noise and missing data but less efficient.

ing. Second, it can match faces without considering the generalizing problem. Compared with other nonstatistical methods based on rigid alignment, it has better accuracy and robustness. Although these promising results have validated our method’s efficiency, we need to address three limitations. First, we must find a way to handle missing or noisy data. A smoothing filter based on spectral-geometry analysis can’t handle data with a low signal-to-noise ratio, such as outliers. So, we plan to design a surface reconstruction algorithm that’s robust to outliers and noise from scanned facial data if we treat missing data as an outlier noise problem. Then, we could interpolate the missing data and outlier noise data on the basis of a surface reconstruction equation. Second, our face segmentation is based on nose-tip detection, but such a depth-informationbased approach isn’t robust. To deal with this, we could employ a multiscale feature-point-detection algorithm. Finally, in some situations intensive expressions (for example, a wide-open mouth) reduce the recognition rate. The main reason is that it’s difficult to distinguish deformations specific to a person from deformations common to all people in different expressions. To solve this problem, we could use a frequency-domain feature containing most of the discriminative personal information.

Acknowledgments We thank the Face Recognition Grand Challenge organizers—Jonathon Phillips, Kevin Bowyer, and Patrick Flynn—for providing the face data.

References

O

ur method has several advantages over previous statistical methods. First, it’s efficient without registering several expressions for learn-

1. A. Mian, M. Bennamoun, and R. Owens, “An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 11, 2007, pp. 1927–1943. 2. C. Samir, A. Srivastava, and M. Daoudi, “ThreeDimensional Face Recognition Using Shapes of Facial Curves,” IEEE Trans. Pattern Analysis and Machine

Table 2. A performance comparison. Method

Rank-1 recognition (%) Verification rate at a 0.001 false-acceptance rate (%)

44

May/June 2012

Iterative Closest Point

Annotated Face Model3

75.7

97.00

Not given

97.00

Expression Deformation Model10

Multimodal Hybrid1

Our method

96.10

96.20

98.20

94.05

Not given

97.40

Intelligence, vol. 28, no. 11, 2006, pp. 1858–1863. 3. I. Kakadiaris et al., “Three-Dimensional Face Recognition in the Presence of Facial Expressions: An Annotated Deformable Model Approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 4, 2007, pp. 640–649. 4. G. Taubin, “A Signal Processing Approach to Fair Surface Design,” Proc. Siggraph, 1995, pp. 351–358. 5. B. Vallet and B. Levy, “Spectral Geometry Processing with Manifold Harmonics,” Computer Graphics Forum, vol. 27, no. 2, 2008, pp. 251–260. 6. Y. Wang et al., “Face Relighting from a Single Image under Arbitrary Unknown Lighting Conditions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 11, 2009, pp. 1968–1984. 7. A. Myronenko and X. Song, “Point-Set Registration: Coherent Point Drift,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 12, 2010, pp. 2262–2275. 8. K. Bowyer, K. Chang, and P. Flynn, “A Survey of Approaches and Challenges in 3D and Multi-modal 3D+2D Face Recognition,” Computer Vision and Image Understanding, vol. 101, no. 1, 2006, pp. 1–15. 9. G. Passalis, I. Kakadiaris, and T. Theoharis, “Intra-

class Retrieval of Nonrigid 3D Objects: Application to Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 2, 2007, pp. 218–229. 10. F. Al-Osaimi, M. Bennamoun, and A. Mian, “An Expression Deformation Approach to Non-rigid 3D Face Recognition,” Int’l J. Computer Vision, vol. 81, no. 3, 2009, pp. 302–316. Liang Cai is a PhD candidate in Southeast University’s Department of Automation. His research interests include pattern recognition and digital geometry processing, with applications to biometrics and computer vision. Cai has a BS in physics from Lanzhou University. Contact him at [email protected]. Feipeng Da is a professor in Southeast University’s Department of Automation. His research interests include 3D information acquisition and recognition, with applications to biometrics, computer vision, and intelligent control. Da has a PhD in control theory and control engineering from Southeast University. Contact him at [email protected]. Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.

Showcase Your Multimedia Content on Computing Now! IEEE Computer Graphics and Applications seeks computer graphics-related multimedia content (videos, animations, simulations, podcasts, and so on) to feature on its Computing Now page, www.computer.org/portal/web/ computingnow/cga. If you’re interested, contact us at [email protected]. All content will be reviewed for relevance and quality.

IEEE Computer Graphics and Applications

45

Nonrigid-deformation recovery for 3D face recognition using multiscale registration.

Nonrigid deformation is a fundamental feature in face recognition. The proposed method extracts nonrigid deformation by finding the mapping between tw...
9MB Sizes 0 Downloads 3 Views