4954

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

A Fast Adaptive Parameter Estimation for Total Variation Image Restoration Chuan He, Changhua Hu, Wei Zhang, and Biao Shi

Abstract— Estimation of the regularization parameter, which strikes a balance between the data fidelity and regularity, is essential for successfully solving ill-posed image restoration problems. Based on the classical total variation (TV) model and prevalent alternating direction method of multipliers, we hammer out a fast algorithm being able to simultaneously estimate the regularization parameter and restore the degraded image. By applying variable splitting technique to both the regularization term and data fidelity term, we overcome the nondifferentiability of TV and achieve a closed form to update the regularization parameter in each iteration. The solution is guaranteed to satisfy Morozov’s discrepancy principle. Furthermore, we present a convergence proof for the proposed algorithm on the premise of a variable regularization parameter. Experimental results demonstrate that the proposed algorithm is superior in speed and competitive in accuracy compared with several state-of-the-art methods. Besides, the proposed method can be smoothly extended to the multichannel image restoration. Index Terms— Total variation (TV), adaptive parameter estimation (APE), alternating direction method of multipliers (ADMM), discrepancy principle, variable splitting.

I. I NTRODUCTION

I

MAGE blurring is a frequently occurring phenomenon in life: many uncertainties, including human faults (e.g., hand shake when taking pictures), instrumental errors (e.g., defocusing of the lens system and sensor noise), and environmental factors (e.g., atmospheric turbulence and relative motion between the camera and the scene), may result in blurred and noised images. Practical requirements have made image restoration, which aims at recovering an estimate of the original scene from the degraded observation, be one of the most basal tasks in imaging. So far, this task has been extended into many areas, such as medical imaging, surveillance, astronomy and remote sensing [1], etc. Generally speaking, the degradation process of an image can be modeled as a linear shift-invariant system in the following Manuscript received August 11, 2013; revised January 26, 2014, May 18, 2014, and August 27, 2014; accepted September 12, 2014. Date of publication September 24, 2014; date of current version October 13, 2014. This work was supported in part by the National Natural Science Foundation of China under Grant 61174030, Grant 61203189, and Grant 61374120 and in part by the National Science Fund for Distinguished Young Scholars of China under Grant 61025014. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Damon M. Chandler. The authors are with the High-Tech Institute of Xi’an, Xi’an 710025, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2360133

discrete form f = K uclean + n,

(1)

where K is the matrix representation of the blurring/convolution operator, which is assumed to be known; f and uclean are the observed and original images respectively, both expressed in vector form; n is a vector of zero-mean Gaussian white noise. As is common, we adopt the vector expression for images here, where the pixels of an m × n image are stacked into an mn vector in lexicographic order. Thus the (i, j )th pixel of an image becomes the ((i −1)n+ j )th entry of the vector. For most scenarios of practical interest, estimating uclean from f is a notorious ill-posed linear inverse problem (IPLIP) [2]. The solution of (1) is highly sensitive to noise or perturbation in the observed image. Directly solving problem (1) via the inverse filter usually leads to an unacceptable result. The key to solving this ill-posed inverse problem is properly incorporating some sort of prior knowledge about the original image into the restoration process through regularization [1], [3]. In general, a large class of regularization methods results in a minimization functional form as follows:   λ (2) min J (u) + K u − f 22 . u 2 In the above formulation, u denotes the estimate of uclean . The quadratic term is the so-called data fidelity term, whereas J (u) represents the regularization term, which has the ability of numerical stabilizing and encouraging the result to have some desirable properties. Regularization parameter λ balances the relative weight between the data fidelity and the regularization terms. With this model, Galatsanos and Katsaggelos [4] justified the regularization by the mean square error (MSE) analysis. The choice of the regularizer J (u) is a central work for image restoration problems. The most popular regularizers now are L1-based, i.e., in the context of wavelet frames [5]–[7] or variational frameworks such as total variation (TV) [7]–[17]. Compared with the earlier Tikhonov regularization [2], they are famed for their better detail preservation ability. Besides, statistical method for determining J (u) is also a hotspot currently [18], [19]. In this paper, we focus on the solution of the TV-regularized restoration problem   λ (3) min TV (u) + K u − f 22 , u 2

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

HE et al.: FAST APE FOR TV IMAGE RESTORATION

4955

i.e., the regularizer in (2) is the TV semi-norm of u. The main advantage of TV regularization is its distinguished edge preservation ability [8]. However, the solution of (3) suffers from its nondifferentiability and nonlinearity. Up to now, several numerical methods have been proposed to solve (3), including time-marching schemes [8], primal-dual based algorithms [20]–[23], variable splitting approaches [24]–[33], etc. Another essential issue in the TV-regularized restoration problem is the selection of the regularization parameter λ. By appropriately adjusting λ, a compromise is achieved between the noise suppression and the nature preservation of the restored image. Although there exist various approaches to the parameter selection for Tikhonov-regularized problems [2], most works in the literature only concern a fixed predetermined λ for the TV-based restoration problem [8], [24]–[28]. Nevertheless, a few recent works focused on the adaptive parameter estimation (APE) of the TV-regularized restoration problem. Consequently, several strategies for choosing the parameter λ in (3) become available, e.g., Morozov’s discrepancy principle [22], [32]–[35], the generalized cross-validation (GCV) method [36], [37], the L-curve method [38], [39], the unbiased predictive risk estimator (UPRE) method [40], the majorization-minimization (MM) approach [41], and the variational Bayesian approach [42], [43]. If the noise variance σ 2 is available, Morozov’s discrepancy principle is a good choice for the selection of λ. According to this principle, a good solution of the TV restoration problem should lie in set   (4) S = u : K u − f 22 ≤ c , where c = τ mnσ 2 , and τ is a predetermined parameter which is noise-dependent [22], [32], [33]. Under the discrepancy principle, the TV-regularized image restoration problem is equivalent to the following constrained optimization problem min TV (u) s.t. u ∈ S. u

(5)

With the theory of Lagrange, we are able to convert the constrained problem (5) to the unconstrained problem (3). If u is a solution of (5), it will also be a solution of (3) for a particular λ ≥ 0. Given λ, we use u∗ (λ) to denote the corresponding optimal solution of (3). Then, as for the minimization of (5), we have either u∗ (0) ∈ S for λ = 0, or  ∗   K u (λ) − f 2 = c (6) 2 for λ > 0. In fact, if λ = 0, minimizing (3) is equivalent to minimizing TV(u), whose solution is a constant image. Obviously, this case would never happen to a real-world application. Thus, the discrepancy principle is trying to seek a λ∗ > 0 such that the minimizer of (5) is not a constant one. Note that a closed form solution of (3) does not exist, and therefore, it is difficult to determine whether the solution falls into the feasible set S. Obtaining fast algorithms to solve (5) is now an important research cutting edge. In order to make use of existing methods of the unconstrained problem (3), Blomgren and Chan [44]

provided a modular solver to update λ. Although an appropriate solution can be found, the computational cost of their approach is expensive, since problem (3) should be solved many times for a sequence of λs. Ng et al. [32] adopted an instance of the classical alternating direction method of multipliers (ADMM) to solve (5). In each iteration, the current estimate is projected onto the feasible set S by solving a constrained least squares problem. By employing the periodic boundary or Neumann boundary condition, the blur matrix K can be diagonalized easily, and consequently, the least squares problem can be solved efficiently. When dealing with λ in each iteration, a Newton’s iterative scheme is involved. Afonso et al. [33] proposed another ADMM based algorithm to handle the constrained problem (5). In their algorithm, an auxiliary variable is introduced to represent K u in (4). Thus, the TV restoration problem is decomposed into a Moreau’s proximal denoising problem [45] and an inversing filtering problem. Then in each iteration, the corresponding proximal mapping is computed with several inner iterations referring to Chambolle’s denoising algorithm [46]. Based on the primal-dual model [47] of TV, Wen and Chan [22] derived an efficient algorithm for the constrained problem (5). The solution of (5) is first transformed into finding the saddle point of the primal-dual problem. Then a primal-dual proximal point method [21], [48] is adopted to compute the solution. The Newton’s iterative method is also employed in their algorithm to calculate λ for guaranteeing that the solution satisfies condition (4). Methods dealing with problem (5) mentioned above involve an inner iterative scheme to guarantee a feasible solution. Besides, only in [22], a convergence proof is presented. In this paper, we propose an instance of ADMM to solve the constrained TV regularization problem (5). Our contributions herein are threefold. First, unlike the results that focus on unconstrained problems with a fixed λ [5]–[8], [24]–[28], our primary aim is to handle the constrained problem (5) and to find the optimal λ adaptively without manual interference. Second, unlike the existing results that focus on the constrained problem (5) [22], [32]–[34], we propose a compacter method to solve problem (5) without any inner iteration. In our algorithm, by adopting the variable splitting technique, we introduce two auxiliary variables for both K u and the TV norm. Consequently, the constrained problem (5) is efficiently solved by the alternating direction method (ADM) with a separable structure and linear constraints. Profiting from this, in each iterative operation, λ is updated with a closed form to guarantee that the solution is in feasible set S. Thus, our method consumes less computation cost and achieves a faster speed. Finally, a convergence proof based on convex analysis is provided. The convergence proof herein differs from other proofs for ADMM in the literature [28], [49], [50], due to the premise of a variable regularization parameter λ. Besides, our method can be smoothly extended to multichannel image processing. Experiments indicate that our method finds the optimal λ efficiently, and possesses superiority both in speed and accuracy compared with some state-ofthe-art methods. Due to the equivalence between ADMM,

4956

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

split Bregman method, and Douglas-Rachford splitting method [51], our algorithm can also be seen as an instance of the latter two famous methods. The rest of this paper is organized as follows. In Section II, we present some basic notations and the augmented Lagrangian model for the TV-regularized problem. Section III contains the derivation leading to the proposed algorithm and the convergence analysis. Section IV reports the experimental results which demonstrate the effectiveness of our algorithm in choosing λ and restoring images. Finally, Section V ends this paper with a succinct conclusion. II. AUGMENTED L AGRANGIAN M ODEL FOR THE TV-R EGULARIZED P ROBLEM A. Basic Notations Without loss of generality, we denote Euclidean space Rmn as V and define Q = V × V . For u ∈ V , u i, j ∈ R denotes the ((i − 1)n + j )th component of u. For y ∈ Q, yi, j = (yi, j,1 , yi, j,2 ) ∈ R2 denotes the ((i − 1)n + j )th component of y. Define inner products and Euclidean norms of V and Q as m,n  u, v V = u i, j v i, j , u2 = u, u V ; i, j m,n 2  y, q Q = yi, j,k qi, j,k ,  y2 =  y, y Q . (7) i, j k=1



yi, j = In addition, at each pixel (i, j ), we denote





yi,2j,1 + yi,2j,2 . For y ∈ Q, denote  yI = m,n i, j yi, j . The two dimensional first-order difference operator matrix is a mapping ∇ : V → Q, and ∇u ∈ Q is given by (∇u)i, j = ((∇X u)i, j , (∇Y u)i, j ). Then, we have the isotropic TV(u) = ∇uI . If we assume the periodic boundary condition, using the inner products of V and Q, we can find that the adjoint operator of −∇ equals the discrete divergence operator div : Q → V , i.e., div = −∇ T . Therefore, for any u ∈ V and y ∈ Q, it holds that −div y, u V =  y, ∇u Q . B. Augmented Lagrangian Model of the TV-Based Problem Notice that K is a blur matrix generated by some pointspread function (PSF) and we have K 1 = 1 [22], [47], where 1 is the vector of all 1s. Thus the null space of K contains no constant vector besides 0. Whereas, the null space of ∇ is the set of all constant vectors. Therefore, only 0 is the common element of the null spaces of ∇ and K , i.e., NULL (∇) ∩ NULL (K ) = {0}. Under this condition, the functional (3) is convex, proper, coercive, and continuous. According to Fermat’s rule [48] and the generalized Weierstrass theorem, we have the following lemma. Lemma 1 [28], [47]: The problem (3) has at least one solution u∗ , which satisfies 

 0 ∈ λK T K u∗ − f − div ∂ ∇u∗ I , (8) where ∂∇u∗ I represents the subdifferential [48] of TV (u) at ∇u∗ . In order to liberate ∇u out from the nondifferentiable term and simplify the updating of the regularization parameter λ, we resort to the variable splitting technique [25].

Concretely, an auxiliary variable y ∈ Q is introduced to represent ∇u (or yi, j ∈ R2 for (∇u)i, j ), and another auxiliary variable x ∈ V is introduced for K u. Then we obtain the equivalent constrained form of problem (3) as follows   λ min  yI + x − f 22 s.t. K u = x, y = ∇u. u,x, y 2

(9)

The augmented Lagrangian (AL) functional for the constrained optimization problem (9) is defined as λ x − f 22 +  yI − μT (x − K u) 2 β2 β1 x − K u22 +  y − ∇u22 , − ξ T ( y − ∇u) + 2 2 (10)

LA (u, x, y; μ, ξ ; λ) =

where μ ∈ V and ξ ∈ Q are the Lagrange multipliers, whereas β1 and β2 are positive penalty parameters. For AL functional (10), we consider the following saddle-point problem

Find u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ ∈ V × V × Q × V × Q s.t.

LA u∗,x ∗, y∗ ; μ, ξ ; λ ≤ LA u∗, x ∗, y∗; μ∗,ξ ∗; λ

≤ LA u, x, y; μ∗,ξ ∗;λ . (11) The relationship between the saddle-point in (11) and the solution of (3) is stated in Theorem 1 and its proof can be found in the Appendix. Let us present a lemma being critical to the proof of Theorem 1 at first. Lemma 2 [52]: Assuming that F = F1 + F2 , F1 and F2 being lower semi-continuous (l.s.c.) convex functions of R N into R, F1 being Gâteaux-differentiable with differential F1 . Then if p∗ ∈ R N , the following two conditions are equivalent to each other: 1) p∗ is a solution of Inf F ( p);   p ∗ ∗ 2) F1 ( p ) , p − p + F2 ( p) − F2 ( p∗ ) ≥ 0 ∀ p ∈ R N . Theorem 1: u∗ ∈ V is a solution of (3) if and only exist x ∗ , μ∗ ∈ V and y∗ , ξ ∗ ∈ Q such that if ∗there ∗ u , x , y∗ ; μ∗ , ξ ∗ is a solution of (11). Lemma 1, together with Theorem 1, shows that problem (11) has at least one solution and each u∗ is a minimizer of problem (3). The augmented Lagrangian method (ALM) [53] can solve the saddle-point problem (11) in the following iterative framework ⎧  

k+1 k+1 k+1 = arg min L k k ⎪ ⎪ A u, x, y; μ , ξ ; λ ; ⎨ u ,x , y u,x, y

(12) k+1 = μk − β x k+1 − K uk+1 ; μ 1 ⎪ ⎪

⎩ k+1 = ξ k − β2 yk+1 − ∇uk+1 . ξ

In (12), the problem yielding uk+1 , x k+1 , yk+1 exactly in each iteration is not trivial, since it involves inner iterations. Here, we employ ADM to solve this problem, in which just

one operation yielding uk+1 , x k+1 , yk+1 is performed in each iteration. The convergence analysis in Section III clarifies its rationality.

HE et al.: FAST APE FOR TV IMAGE RESTORATION

4957

III. P ROPOSED M ETHOD In this section, we resort to ADM to solve the constrained TV-regularized problem (5). The regularization parameter λ is updated with a closed form in each iteration and it finally converges to the optimal one in accordance with the discrepancy principle. Our ADMM iterative scheme to solve (5) is as follows   (13) uk+1 = arg min LA u, x k , yk ; μk , ξ k ; λk ; u   yk+1 = arg min LA uk+1 , x k , y; μk , ξ k ; λk ; (14) y   x k+1 = arg min LA uk+1 , x, yk+1 ; μk , ξ k ; λk+1 ; (15) x   k+1 k (16) = μ − β1 x k+1 − K uk+1 ; μ   ξ k+1 = ξ k − β2 yk+1 − ∇uk+1 . (17) In (15), λk+1 is the updated regularization parameter according to the discrepancy principle in the (k + 1)th iteration. From the definition of LA in (10) and the above five iterative equations, we observe that only variable x but not variable u is related to λ, or in other words, only the updating of variable x is restricted to the discrepancy principle. In the following, we will describe in detail how to solve subproblems (13)–(15). A. Derivation of the Proposed Algorithm The minimization subproblem with respect to u is quadratic as follows ⎧  ⎫  ⎨β  k 2⎬ k 2 β   μ ξ 1  2   K u−x k+  + ∇u−yk+  . uk+1= arg min ⎩2  β  2  β ⎭ u 1

2

2

2

(18) Therefore, we have        uk+1 = β1 K T K−β2  −1 K T β1 x k−μk −div β2 yk −ξ k , (19) where  = div · ∇ denotes the Laplace operator. Under the periodic boundary condition, operators K and ∇ are block-circulant matrices so that they can be diagonalized by a Fast Fourier Transform (FFT) matrix. Therefore, equation (19) can be solved by two FFTs and one inverse FFT in O (mn log (mn)) multiplicative operations for an m × n image [25]. Alternatively, under the assumption of Neumann boundary condition, FFT should be substituted with DCT. For the y subproblem, from (14), we have ⎧ ⎫  k 2⎬ ⎨

   ξ

β i, j 

2

yk+1  . (20)  y − ∇uk+1 − i, j = arg min ⎩ yi, j + i, j 2  i, j β2  ⎭ y i, j 2

Functional (20) is a proximal minimization (PM) problem [48], and its solution should be given explicitly by the two-dimensional shrinkage [25] as follows



 k+1 ξ ki, j k

  + ∇u ξ 1 i, j

i, j β2

k+1

, yk+1 +

− , 0

i, j = max

∇u k

i, j β2 β2

∇uk+1 + ξ i, j

i, j β2

(21)

where the convention 0 × (0/0) = 0 is assumed. The computation cost of (21) is therefore linear with mn. According to (15), the subproblem for x can be written as  k+1 2  λ β1    x − f 22 + x k+1 = arg min x − ak+1  , (22) 2 2 2 x 

where ak+1 = K uk+1 + μk β1 . Functional (22) apparently shows that variable x is λ related, and from (9) we observe that x plays the role of K u. Therefore, in the following, we just examine whether x meets the discrepancy principle, i.e., whether x − f 22 ≤ c holds true. From (22), we can find that the minimization subproblem with respect to x is quadratic and it has a closed form solution:    λk+1 + β1 . (23) x k+1 = λk+1 f + β1 ak+1 The solution of λ in each iteration falls into two cases according to the range of ak+1 . On the one hand, if  2  k+1  − f  ≤ c, (24) a 2

λk+1

we can simply set = 0 with x k+1 = ak+1 , and obviously, k+1 satisfies this x 2 the discrepancy principle. On the other hand, if  ak+1 − f 2 > c, according to the discrepancy principle, we should solve the following equation  2  k+1  − f  = c. (25) x 2

Substituting

x k+1

λk+1

in (25) with (23), we obtain  √      = β1  f − ak+1  c − β1 . 2

(26)

From the above discussion we learn that, by introducing the auxiliary variable x, K u is liberated out from the restriction of the discrepancy principle. Profiting from this, a closed form solution of λ is achieved in each iteration without accessional conditions. This is the major difference between our method and that in [32], where the authors also referred to ADMM to solve the constrained TV-regularized restoration problem. However, they just introduced an auxiliary variable for the TV regularizer. Unlike equation (26), a closed form solution of λ for equation (6) does not exist at all. Consequently, a Newton’s iterative method was employed to compute the parameter λ. Inevitably, inner iterations were brought in and some additional constraints should be imposed on (6) to guarantee the existence and uniqueness of λ [22], whereas, our method is free from these constraints. The resulting algorithm is summarized in Algorithm 1 (APE-ADMM). In Algorithm 1, Step 4 with respect to the u subproblem consumes the most computation. In each iteration, we should solve FFT/IFFT three times. Therefore, if APE-ADMM runs L loops, the computation cost is about 3L FFT operations’ time. B. Convergence Analysis

 In this subsection, we show that the sequence uk generated by APE-ADMM converges to a minimizer of the constrained TV-regularized problem (5); whereas the sequence

4958

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Algorithm 1 Adaptive Parameter Estimation for ADMM (APE-ADMM)

we conclude the following convergence theorem for algorithm APE-ADMM.   k Theorem 2: Suppose that the sequences u and λk are  k generated by APE-ADMM. Then u converges to a solution of the constrained problem (5) and λk tends to the optimal regularization parameter λ∗ corresponding to the constraint u ∈ S. From Lemma 5 and Theorem 2, it is learnt that, the solution of problem (5) may be nonunique due to a singular blur matrix K , but fortunately each solution can equally make problem (5) achieve the minimum functional value. Thus, each solution should be seen as a rational estimate of the original image and algorithm APE-ADMM can find such a solution. C. Parameter Setting Tricks

 k λ tends to λ∗ corresponding to the constraint u ∈ S, i.e., the solution of (5) is also a solution of (3) for λ = λ∗ . According to Step 6 – Step 9 of APE-ADMM, when k → +∞, the sequence λk tends to a nonnegative λ† , and we have either λ† = 0 with u ∈ S, or λ† > 0 with K u − f 22 = c ( In fact, only the latter case would happen to a natural image). The saddle-point condition is still valid for LA u, x, y; μ, ξ ; λ† and we denote u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ as a saddle point of it in the following discussion. Lemma 3 and Lemma 5 show the contractive and convergent properties of the sequences generated by algorithm APE-ADMM, and their proofs are given in the Appendix.   be the sequence Lemma 3: Let uk , x k , yk ; μk , ξ k generated by APE-ADMM. Then it holds that   ⎧  k  k k k ⎪ ⎪ x , y , μ , and ξ are bounded; ⎪ ⎪     ⎪ ⎪ lim  x k+1 − x k  = 0, lim  yk+1 − yk  = 0; ⎨ 2 k→+∞  k→+∞   2 (27) k k k k    lim x − K u 2 = 0, lim y − ∇u 2 = 0; ⎪ ⎪ ⎪ k→+∞ k→+∞   ⎪  ⎪ ⎪ lim  k+1 k μk+1 − μk  = 0, lim  ⎩ − ξ ξ  = 0. 2 k→+∞

k→+∞

2

√ √ √ √ Denote v = β1 x,  β2 y, μ β1 , ξ β2 . Then it holds  k+1 that v − v ∗ 2 ≤ v k − v ∗ 2 , i.e., v k is Fejér monotone with respect to the set of v ∗ s.  all k N Lemma 4 [48]: Let p be a sequence in space  k R and N let C be a nonempty subset of R . Suppose  that  p isFejér monotone with respect to C ( pk+1 − p∗  ≤  pk − p∗ ) and that every cluster point of pk belongs to C. Then pk converges to a point   in C. Lemma 5: Let uk , x k , yk ; μk , ξ k be the sequence generated by APE-ADMM. Then it converges  to a saddle point of LA u, x, y; μ, ξ ; λ† . In particular, uk converges to a † solution of problem (3) with  kλ = λ . converges to a solution of Lemma 5 shows that u the unconstrained problem (3) with λ = λ† . On the other hand, λ† is obtained according to the discrepancy principle. Therefore, λ† is the λ∗ that allows u∗ to be the solution of the constrained problem (5). From the above discussion,

The upper bound c in (4) is noise-dependent [22], [32], [33] and a good choice of c should minimize the error between the original and the restored images. In this paper, we adopt the commonly used wavelet transform based median rule [22], [54] to estimate the noise variance σ 2 . Once σ 2 is available, c = τ mnσ 2 is generally used to estimate c. τ = 1 had been a conventional choice. However, as mentioned in [22], this choice may result in oversmoothed solution when the noise level is not high, which implies that λ is too small and we should set τ < 1 for this choice. Indeed, there exists no uniform criterion for choosing τ and it is still a pendent problem deserving further study. For Tikhonov-regularized methods, one feasible strategy for choosing τ is the equivalent degrees of freedom (EDF) method [4]. It gives an estimate of τ by solving equation K u∗ (λ) − f 22 = EDF · σ 2 . However, the EDF method is hard to directly transplant to TV-regularized methods, since a closed form solution of u∗ (λ) for them does not exist at all. Another simple but practical strategy for choosing τ is slightly adjusting the value of τ according to the blurred signal-to-noise ratio which is defined by  (BSNR),

BSNR = 10 log10 var ( f ) σ 2 (dB). var ( f ) denotes the variance of f . In this paper, by fitting experimental data with a straight line, we suggest setting τ = −0.006BSNR + 1.09 for deblurring problems and τ = −0.03BSNR+1.09 for denoising problems. Although we should adjust the parameters of the line for different types of problems, abundant experiments on natural images indicate that, under a given type of problem, the parameter setting is robust to a certain extent towards the variations of the image type and the image size. Similar strategies can also be found in some other works dealing with the constrained TV restoration problem (5) [32]–[34]. Another problem which we want to expatiate on is the selection of parameters β1 and β2 . In fact, simply setting β1 = β2 > 0 is sufficient for the convergence of APEADMM. However, from the definition of LA in (10), we observe that λ penalizes the distance between x and f , β1 penalizes the distance between x and K u, and β2 plays the same role between y and ∇u. When the BSNR of the observed image becomes higher, K u subsequently becomes nearer to f , and in this case, λ should be larger. Thus, if we want K u to approach f much faster, we should set β1 much larger.

HE et al.: FAST APE FOR TV IMAGE RESTORATION

4959

From this point of view, a higher BSNR means a larger β1 . By large numbers of experiments, we find that with β1 = 10(0.1BSNR−1) and β2 = 1, APE-ADMM can result in a reasonable solution rapidly. With differently weighted β1 and β2 , our method becomes more flexible. IV. E XPERIMENTAL R ESULTS In this section, four experiments are performed to illustrate the effectiveness of the proposed method, and each experiment has a specified goal as follows. 1) The first goal is to show the significance of adaptive regularization parameter selection, by comparing APE-ADMM with the other two famous TV-based image restoration algorithms, namely, the fast total variation deconvolution algorithm (FTVD-v4, http://www.caam.rice.edu/~optimization/L1/ftvd/) [24], [25] and the fast iterative shrinkage/thresholding algorithm (FISTA, http://iew3.technion.ac.il/~becka/papers/tv_fista) [55]. FTVDv4 uses variable splitting, together with ADM, to recover images, whereas FISTA is a gradient-based back-forward splitting algorithm. Both of them are distinguished for fast speed. In these two algorithms, the regularization parameter λ is obtained manually by trial-and-error, so that λ is fixed throughout the implementation. We demonstrate that, in general, APE-ADMM automatically finds the optimal λ and converges faster than FTVD-v4 and FISTA. Besides, we show that, the restoration results of either FTVD-v4 or FISTA are sensitive to the selection of λ. 2) The second goal is to compare APE-ADMM with the other three state-of-the-art adaptive TV-regularized algorithms, namely, Wen-Chan [22], CSALSA (http://cascais.lx.it.pt/~mafonso/salsa.html) [33] and Ng–Weiss–Yuan [32]. The results show that our algorithm outperforms the competitors in terms of speed, and usually attains restorations with higher peak signal-to-noise ratio (PSNR) than the counterparts. Since we employ images of different sizes in this experiment, the variation of the speed along with the different image sizes is illustrated well. 3) The third goal is to show that, when applied to denoising problems, APE-ADMM is competitive with the famed Chambolle’s projection algorithm [46], [56] and adaptive TV-based split Bregman algorithm [26], [31]. 4) The last goal is to generalize APE-ADMM to the restoration of multichannel images. The above four goals are elaborated in detail through Subsection IV-A to IV-D. The results presented below are obtained by running a MATLAB implementation of APE-ADMM on a PC with Intel Core (TM) i5 CPU (3.20GHz) and 8GB of RAM under Windows 7. We use the PSNR to assess the quality of observed and restored images (with pixel values assumed in [0, 255]). It is defined as ! " 2552 · mn (dB), (28) PSNR = 10 log10 u − uclean 22 where u is the restored image and uclean is the original image. One 256 × 256 (Lena), two 512 × 512 (Boat and Barbara), and one 1024 × 1024 (Man) test images shown in Fig. 1 are used in the experimental studies from Sections IV-A to IV-C. For simplicity, we denote the average blur kernel with a square blurring size si ze by A(si ze), the Gaussian blur kernel with

Fig. 1. Lena (256 × 256), Boat (512 × 512), Barbara (512 × 512), and Man images (1024 × 1024). TABLE I C OMPARING APE-ADMM, FTVD- V 4, AND FISTA W ITH T HEIR O WN O PTIMAL λs

a square blurring size si ze and a standard deviation sigma by G(si ze, sigma), and the motion blur kernel with a length len and an angle theta (in the counterclockwise direction) by M(len, theta). A. Experiment 1 – Why a Variable λ? We first explain why a variable regularization parameter λ is more attractive in image restoration, and demonstrate that our algorithm is successful in finding the optimal λ automatically. Here, “optimal” means achieving the best restoration judged by the PSNR. Two competitors, i.e., FTVD-v4 and FISTA, and two images, i.e., Boat and Man, are involved in comparison. In FISTA, the regularization parameter appears as a multiplier to the regularizer, whereas, in FTVD-v4 or APE-ADMM, the parameter is a multiplier to the data fidelity term. For the convenience of comparison, the regularization parameter of FISTA used here is its reciprocal. After average blurring and motion blurring the Boat and Man images with the average blur kernel A(9) and the linear motion blur kernel M(30, 30) respectively, we add Gaussian noise of variances σ 2 = 4 and σ 2 = 20 to each to obtain the degraded images. The PSNRs of the degraded Boat and Man images are 23.30 dB and 21.64 dB, respectively. We use a uniform stopping for   criterion  all the tested algorithms, i.e., uk+1 − uk 2 uk 2 ≤ 10−6 or the iteration counter becomes larger than 1000, where uk denotes the restored image in the kth iteration. Table I shows the results of APE-ADMM, as well as the results of FTVD-v4 and FISTA under their own optimal λs,

4960

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 2. Plots of PSNRs over CPU time for APE-ADMM, FTVD-v4, and FISTA. (a) Plots for image Boat (512 × 512) with average blur and noise (σ 2 = 4). (b) Plots for image Man (1024 × 1024) with motion blur and noise (σ 2 = 20).

Fig. 4. Plots of PSNRs corresponding to eleven λ values for FTVD-v4 and FISTA. (a) Plots for image Boat. (b) Plots for image Man. TABLE II D ETAILS OF THE I MAGE D EBLURRING E XPERIMENT

Fig. 3. Evolutions of (a) λs and (b) objective function values over iterations for APE-ADMM.

in terms of PSNR, iterations, and CPU time. From Table I we observe that, for each restoration problem, the final parameter of APE-ADMM is close to the optimal parameters of the other two algorithms; the PSNRs of APE-ADMM, FTVD-v4, and FISTA are basically on the same level; when comparing the iterations and CPU time, our method claims superiority over the challengers. Fig. 2 presents the plots of PSNRs over CPU time for the three algorithms. It’s not difficult to find that, for the Boat and Man images, the PSNRs of APE-ADMM rise and converge faster than those of the other two algorithms. It is shown in Fig. 3 (a) that, as the iteration number increases, APE-ADMM finds the optimal λ rapidly, though some fluctuations may occur at the preceding part of the iterative procedure. This guarantees the fast convergence of our algorithm. Note that λ may fall into zero during convergence and this is allowable in our method. Fig. 3 (b) demonstrates the evolutions of the objective function values over iterations for APE-ADMM. Notice that, unlike the objective function value of algorithms with a fixed λ, the objective function value of APE-ADMM does not monotonously descend and it follows the oscillation of the variable λ. It is noteworthy that, APE-ADMM finds its optimal λ in the above-mentioned adaptive way, whereas, the optimal λs of FTVD-v4 and FISTA should be obtained by trialand-error. In fact, we experimentally find the optimal λs of FTVD-v4 and FISTA with the assistance of APE-ADMM. Since all the three algorithms are TV-based and they share the same minimization functional form, it is well-founded that the optimal λs of them are close. Regarding the nearest integer of the optimal parameter λ∗ (denoted as round(λ∗ )) of APE-ADMM as reference point, we choose eleven equally spaced points in interval [0.5round(λ∗ ), 2round(λ∗ )]

(including the endpoints) as possible optimal λs of FTVD-v4 and FISTA. Then we calculate and plot the approximative curves of PSNRs over λ for these two algorithms. Fig. 4 (a) and (b) display these curves for images Boat and Man, respectively. After slightly tuning, we obtain the optimal λs of FTVD-v4 and FISTA shown in Table I. From Fig. 4 we learn that, the PSNR of either FTVD-v4 or FISTA exhibits sensitivity to the variation of λ. Both FTVD-v4 and FISTA only get reasonable results near the optimal λ; however, when λ is away from the optimal one, their PSNRs descend fast. Fig. 5 shows the restored images of different algorithms for image Man with different λs. When using the optimal λ, either FTVD-v4 or FISTA obtains a restoration competitive with that of APE-ADMM, concerning both PSNR and visual impression. However, when setting λ = 0.5round(λ∗ ), both FTVD-v4 and FISTA result in over-smoothed restorations, e.g., the textures of the hair in Man image remain blurred. On the contrary, when employing λ = 2round(λ∗ ), both obtain restorations with noise points. Indeed, when image size increases, the procedure of capturing the optimal λ will become more troublesome for algorithms without adaptive parameter estimation. Besides, the optimal λ is sensitive to the variations of image size, image type, and blur kernel.

B. Experiment 2 – Comparison with Some Other Adaptive Algorithms In this subsection, we compare our algorithm in terms of both speed and accuracy with the other three adaptive TV-based image restoration algorithms: one primal-dual model based method denoted by Wen-Chan [22], and two ADMM based methods denoted by C-SALSA [33] and Ng-WeissYuan [32]. The description of these three methods has been presented in Section I. We employ the five image deblurring problems adopted in [33] as the background problems, which are shown in Table II. In problems 3A and 3B,

HE et al.: FAST APE FOR TV IMAGE RESTORATION

4961

TABLE III C OMPARISON B ETWEEN D IFFERENT M ETHODS IN T ERMS OF λ, PSNR, I TERATIONS AND CPU T IME ( IN S ECONDS )

Fig. 5. Man image restorations of APE-ADMM, FTVD-v4, and FISTA with different λs. The degraded image is obtained by blurring the original image with M(30, 30) (the 30-length linear motion blur kernel with a 30-degree angle) and corrupting the result with Gaussian noise of variance 20.

i, j = −7, . . . , 7. Three test images of different sizes, i.e., Lena (256×256), Boat (512×512), and Man (1024×1024) are used for comparison. We use the uniform stopping criterion suggested in the previous subsection for all the tested algorithms. The other settings comply with the original references. Table III visualizes the comparison results in terms of PSNR, total iterations, CPU time, as well as the final regularization parameter λ. The best result of each test item is highlighted in boldface. “–” means no calculation result is given in the algorithm. Algorithm Ng-Weiss-Yuan achieves best results if applied to images with pixel values in [0, 1]. Hence we first divide pixel values of the test images by 255 and then apply it to the rescaled images. Besides, the input noise variances shown in Table II are transformed to keep the PSNRs of the degraded observations fixed for Ng-Weiss-Yuan.

The λs given by Ng-Weiss-Yuan are corresponding to pixel value range [0, 1] but not [0, 255], and it is difficult to gain a transform that directly describes the relationship between λs of the two pixel value ranges. Therefore, we only take λs obtained by APE-ADMM and Wen-Chan for comparison. In Fig. 6, the evolutions of PSNRs and λs of different algorithms over CPU time, for the Boat image under problem 2B are displayed. In Fig. 7, the corresponding degraded and restored images are shown. Some observations can be gained based on the results in Table III, Fig. 6, and Fig. 7. First, compared with the challengers, APE-ADMM achieves the best PSNRs while consuming the least CPU time and within the least iterations. If we divide CPU time by iteration number, we can find that our algorithm costs the least CPU time per iteration.

4962

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

TABLE IV C OMPARISON IN PSNR OF THE I MAGE D ENOISING E XPERIMENT

Fig. 6. The (a) PSNR evolutions of different algorithms over CPU time, and (b) λ evolutions of APE-ADMM and Wen-Chan over CPU time, for the Boat image under problem 2B.

This conclusion matches our expectation: by adaptively updating the regularization parameter λ with a closed form, APE-ADMM achieves fast computation without inner iterations. Besides, the cost time per iteration of the other three algorithms is apt to be affected by parameter settings for the inner iterations. Profiting from the parameter setting strategy for β1 and β2 , our algorithm accomplishes the restoration tasks within fewer iterations. Especially, Fig. 6 (a) shows that, the PSNR of APE-ADMM ascends and converges faster than its counterparts. Second, from Table III we observe that, the superiority of our algorithm is still well kept when the background problems and images vary. As expected, the larger test image we use, the higher superiority in speed we get. Thirdly, in most cases, APE-ADMM and Wen-Chan finally achieve similar λs. Higher PSNR indicates that, APE-ADMM generally obtains the more reasonable λ. Fig. 6 (b) demonstrates that, the evolution of λ for APE-ADMM steps into convergence earlier than that for Wen-Chan. Fig. 7 displays that, for image Boat under problem 2B, the restored image of APE-ADMM looks sharper in detail than that of Wen-Chan, since a smaller value of λ tends to predict an over-smoothed result. C. Experiment 3 – Comparison in Denoising Problems The above studies indicate that APE-ADMM is fast and effective for image deblurring problems. In this subsection, we compare it with the other two distinguished adaptive TV-based image denoising algorithms, i.e., Chambolle’s projection algorithm [46], [56] and adaptive Split Bregman denoising algorithm [26], [31], both adopting the regularization parameter update strategy for denoising proposed in [46] and having public online implementations at “http://www.ipol.im/”. However, compared with the proposed method, their adaptive parameter update strategy is designed for denoising, whether it can be generalized to deconvolution is unknown. The test images used here are Barbara and Boat. We first add Gaussian noise of variances 100, 400, and 900 to these two images, respectively. Then, the three algorithms are adopted to remove the noise. Table IV compares their PSNRs. Since both the Chambolle’s algorithm and the adaptive split Bregman algorithm used here are online test programs, we do not include CPU time as a comparison item. From Table IV, we observe that APE-ADMM achieves higher PSNRs compared

with the competitors. Fig. 8 shows the restorations for Barbara image of different algorithms under noise of variance 400, and the difference images (difference between the restored and the original images) of the three algorithms, representing relative errors of 9.58%, 10.04%, and 10.29%, respectively. For better visualization, we stretch the dynamic of the difference images onto [0, 255] by an affine contrast change. Except the gain in PSNR, we can observe from Fig. 8 that, APE-ADMM does better in preserving details than the other two algorithms, due to relatively less loss of textures. Higher restoration quality indicates that, under the same TV regularizer, our method achieves more superior λ than the two competitors. D. Experiment 4 – Extension to Multichannel Deblurring In this subsection, we generalize our method to multichannel image deblurring. Although some nonadaptive restoration algorithms, such as FTVD-v4, possess multichannel versions, few adaptive deconvolution algorithms have been extended to multichannel image processing. In multichannel image deblurring, cross-channel blurs mean a blur matrix which could not be wholly diagonalized by FFT, and this will severely weaken the efficiency of some commonly used inner iterative algorithms, such as the Newton’s iterative scheme. Whereas, our algorithm can be smoothly extended to multichannel cases referring to the strategy for FTVD-v4 [24], due to each substep of our method admitting a closed form solution. To test the applicability of our method towards the multichannel image restoration, we run one test with a RGB image Peppers (512 × 512) and cross-channel blurs generated in the following manner [27]: 1) Generate 9 kernels: {A(13), A(15), A(17), G(11, 9), G(21, 11), G(31, 13), M(21, 45), M(41, 90), M(61, 135)}; 2) Randomly assign the above 9 kernels to {H11, H12, H13; H21, H22, H23; H31, H32, H33}, where Hii , i = 1, 2, 3, are within-channel kernels and the rests are cross-channel kernels; 3) Multiply relative weights {0.7, 0.15, 0.15; 0.1, 0.8, 0.1; 0.2, 0.2, 0.6} to the corresponding blur kernels. After blurring the Peppers image with the above blur kernels, we add Gaussian noise with variance 4 to obtain the observed image. We again use the stopping criterion suggested in Experiment 1 for the proposed algorithm. After 166 iterations, APE-ADMM reaches convergence with a CPU time

HE et al.: FAST APE FOR TV IMAGE RESTORATION

Fig. 7.

4963

Degraded and restored Boat images of APE-ADMM, Wen-Chan, C-SALSA, and Ng-Weiss-Yuan under problem 2B described in Table II.

Fig. 8. First row: denoising results for Barbara image of APE-ADMM, Chambolle’s algorithm, and split Bregman algorithm under noise of variance 400; second row: difference images (difference between the restored and the original images with stretched dynamic onto [0, 255] by an affine contrast change) of APE-ADMM with a 9.58% relative error, Chambolle’s algorithm with a 10.04% relative error, and split Bregman algorithm with a 10.29% relative error.

82.30s and a regularization parameter λ = 10.4. Fig. 9 shows the ground truth, degraded, restored, and error images, respectively. APE-ADMM reconstructs the edges in Peppers well, with a relative error restricted to 7.74%. V. C ONCLUDING R EMARKS We have proposed a fast algorithm based on ADM for the constrained TV image restoration problem. Unlike other ADMM based algorithms in the literature, our algorithm further exploits the potential of ADMM by introducing auxiliary variables for both the data fidelity and the TV regularization terms. Since the noise variance of observed images can usually be estimated, the discrepancy principle is applied to update the regularization parameter with a closed form in each iteration. We present a convergence proof for our algorithm, and experimental results indicate that the developed algorithm does better than some state-of-the-art methods in both speed and accuracy. In fact, the idea of this paper can be extended to some other L1-based (e.g., frame based [5]–[7], [57] or total generalized variation (TGV) based [12]) image restoration problems and the convergence can also be guaranteed.

Fig. 9. Ground truth, degraded, restored, and error images for image Peppers.

have



μT x ∗ − K u∗ +ξ T y∗ −∇u∗

T ∗

T ∗ x − K u∗ + ξ ∗ y −∇u∗ ∀ (μ, ξ ) ∈ V × Q. ≥ μ∗

A PPENDIX P ROOF OF T HEOREM 1

Proof: Suppose that u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ satisfies the saddle-point condition. From the first inequality in (11), we

(29) Taking ξ = ξ ∗ in (29), we obtain

T ∗

μT x ∗ − K u ∗ ≥ μ∗ x − K u∗ ∀μ ∈ V.

(30)

4964

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Inequality (30) indicates x ∗ − K u∗ = 0. Following the same way, we have y∗ − ∇u∗ = 0. Therefore, it holds that  ∗ x − K u∗ = 0, (31) y∗ − ∇u∗ = 0. Equation (31), together with the second inequality in (11), indicates    ∗ ∇u  + λ  K u∗ − f 2 ≤ λ x − f 2 +  yI 2 I 2 2 2 ∗ T ∗ T β1 x − K u22 − μ (x − K u) − ξ ( y − ∇u) + 2 β2  y − ∇u22 ∀ (u, x, y) ∈ V × V × Q. + (32) 2 Taking x = K u and y = ∇u in (32), we have  ∗   ∇u  + λ  K u∗ − f 2 ≤ ∇uI + λ K u − f 2 . (33) 2 I 2 2 2 u∗

Inequality (33) shows that is a solution of problem (3). Conversely, assuming that u∗ ∈ V is a solution of (3), we take x ∗ = K u∗ and y∗ = ∇u∗ . According to Lemma 1, there exist μ∗ and ξ ∗ such that μ∗ = λ (K u∗ − f ), ξ ∗ = ∗  , and div ξ ∗ = λK T (K u∗ − f ). Then we verify that I

∂∇u ∗ u , x ∗ , y∗ ; μ∗ , ξ ∗ is a saddle-point in (11). Since x ∗ = K u∗ and y∗ = ∇u∗ , the first inequality in (11) holds. In the following, we show that, ∀ (u, x, y) ∈ V × V × Q, we have

LA u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ ; λ ≤ LA u, x, y; μ∗ , ξ ∗ ; λ . (34)

From (10) we find that LA u, x, y; μ∗ , ξ ∗ ; λ is convex, proper, coercive, and continuous with respect to u, x, and y, respectively. Thus, according to Lemma 2, it has a minimizer ¯ x, ¯ y¯ ) over V × V × Q, if and only if (u,     K T μ∗ + ∇ T ξ ∗ , u − u¯ + β1 K T (K u¯ − x) ¯ , u − u¯   (35) + β2 ∇ T (∇ u¯ − y¯ ) , u − u¯ ≥ 0 ∀u ∈ V, # ∗ $  yI −  y¯ I − ξ , y − y¯ ¯ y − y¯  ≥ 0 ∀ y ∈ Q, (36) + β2  y¯ − ∇ u, # $ λ λ x − f 22 −  x¯ − f 22 − μ∗ , x − x¯ 2 2 ¯ x − x ¯ ≥ 0 ∀x ∈ V. (37) + β1  x¯ − K u,

P ROOF OF L EMMA 3

Proof: Suppose that u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ is saddle-point

of LA u, x, y; μ, ξ ; λ† . Define uˆ k+1 = uk+1 − u∗ , and k+1 following the same way, define xˆ k+1 , yˆ k+1 , μˆ k+1 , and ξˆ . According to the first inequality of (11) with λ = λ† , we have x ∗ = K u∗ and y∗ = ∇u∗ . This result, together with (16) and (17), shows ⎧   ⎨μˆ k+1 = μˆ k − β1 xˆ k+1 − K uˆ k+1 ,   (38) ⎩ξˆ k+1 = ξˆ k − β2 yˆ k+1 − ∇ uˆ k+1 . It follows that ⎧  k 2  k+1 2     μˆ  −μˆ  ⎪  k+1 k k+1 k+1 k+12 ⎪ 2 2 ⎨ = 2 μ ˆ , x ˆ −K u ˆ −K u ˆ −β x ˆ   , 1 β1 2  k 2  k+1 2      2  k ⎪ ξˆ  −ξˆ  ⎪   k+1 k+1 k+1 k+1 ⎩ 2 2 −β2 ˆy −∇ uˆ  . = 2 ξˆ , yˆ −∇ uˆ β2 2

(39) On the one hand, from the proof of Theorem 1 and the second inequality of (11) with λ = λ† , we obtain    

K T μ∗ +∇ T ξ ∗ , u−u∗ +β1 K T K u∗ − x ∗ , u−u∗  

+ β2 ∇ T ∇u∗ − y∗ , u − u∗ ≥ 0 ∀u ∈ V, (40)   $ #  yI −  y∗ I − ξ ∗ , y − y∗ # ∗ $ (41) + β2 y − ∇u∗ , y − y∗ ≥ 0 ∀ y ∈ Q, † †  # $ λ λ   x ∗ − f 2 − μ∗ , x − x ∗ x − f 22 − 2 2 # 2 $ + β1 x ∗ − K u∗ , x − x ∗ ≥ 0 ∀x ∈ V. (42) On the other hand, since uk+1 , x k+1 , and yk+1 are the solutions with respect to their corresponding subproblems in each iteration, according to Lemma 2, we also have       K T μk+∇ T ξ k , u−uk+1 +β1 K T K uk+1−x k , u−uk+1     + β2 ∇ T ∇uk+1 − yk , u − uk+1 ≥ 0 ∀u ∈ V, (43)        yI −  yk+1  − ξ k , y − yk+1 I   k+1 + β2 y − ∇uk+1 , y − yk+1 ≥ 0 ∀ y ∈ Q, (44)     λk+1 λk+1  k+1 2 x − f 22 − − f  − μk , x − x k+1 x 2 2  2 k+1 k+1 k+1 + β1 x − Ku , x − x ≥ 0 ∀x ∈ V. (45)

On the one hand, if we take u¯ = u∗ , x¯ = x ∗ , and y¯ = y∗ in (35), according to the assumption of μ∗ and ξ ∗ above, we have (u∗ , x ∗ , y∗ ) satisfying (35). On the other hand, according

to Lemma 1, we have 0 ∈ λK T (K u∗ − f ) − div ∂ y∗ I ( y∗ = ∇u∗ ). Taking u¯ = u∗ and y¯ = y∗ in (36), we find that the third term becomes zero. With divξ ∗ = λK T (K u∗ − f ) and the nonnegative property of Bregman distance, inequality (36) is equivalent to   $ #    yI −  y∗ I − ∂  y∗ I , y − y∗ ≥ 0 ∀ y ∈ Q.

Taking u = uk+1 in (40) and u = u∗ in (43), by addition, we obtain     μˆ k , K uˆ k+1 + β1 K uˆ k+1 − xˆ k , K uˆ k+1     k + ξˆ , ∇ uˆ k+1 + β2 ∇ uˆ k+1 − yˆ k , ∇ uˆ k+1 ≤ 0. (46)

Following the same way, inequality (37) holds by taking u¯ = u∗ and x¯ = x ∗ . Therefore, (u∗ , x ∗ , y∗ ) satisfies (35), (36), and (37). Thus the second inequality in (11) holds, and the proof is completed.

Similarly, taking y = yk+1 in (41) and y = y∗ in (44), by addition, we have     k (47) ξˆ , − yˆ k+1 + β2 ∇ uˆ k+1 − yˆ k+1 , − yˆ k+1 ≤ 0.

HE et al.: FAST APE FOR TV IMAGE RESTORATION

4965

 2 From steps 6–9 of APE-ADMM, we have 1)  a k+1 − f 2 ≤  2 c and λk+1 = 0, or 2)  ak+1 − f 2 > c, λk+1 > 0, and 2  k+1 x − f 2 = c. Since x ∗ − f 22 = c, for case 1) and case 2), similarly to (47), it always holds that     μˆ k , − xˆ k+1 + β1 K uˆ k+1 − xˆ k+1 , − xˆ k+1 ≤ 0. (48) Adding (46), (47), and (48) together, it follows that   k   μˆ k , xˆ k+1 − K uˆ k+1 + ξˆ , yˆ k+1 − ∇ uˆ k+1     ≥ β1 xˆ k+1 − xˆ k , K uˆ k+1 + β2 yˆ k+1 − yˆ k , ∇ uˆ k+1  2  2     + β1  xˆ k+1 − K uˆ k+1  + β2  yˆ k+1 − ∇ uˆ k+1  . (49) 2

2

Combining (39) and (49), we obtain  2    k 2  k+1 2  k  k+1 2   ˆ  μˆ  − μˆ   ξ  − ξˆ 2 2 2 2 + β1 β2     ≥ 2β1 xˆ k+1 − xˆ k , K uˆ k+1 +2β2 yˆ k+1 − yˆ k , ∇ uˆ k+1  2  2     + β1  xˆ k+1 − K uˆ k+1  + β2  yˆ k+1 − ∇ uˆ k+1  . (50) 2

2

Then we estimate the former two terms in the right hand side of (50). By the constructions of x k and yk from uk , we have        yI −  yk  − ξ k−1 , y − yk I   (51) + β2 yk − ∇uk , y − yk ≥ 0,     k k λ λ  k 2 x − f 22 − x − f  − μk−1 , x − x k 2 2 2  + β1 x k − K uk , x − x k ≥ 0. (52) Adding (44) with y = yk and (51) with y = yk+1 , we obtain  2  k  k−1   β2  yˆ k+1 − yˆ k  − ξˆ − ξˆ , yˆ k+1 − yˆ k 2  (53) − β2 ∇ uˆ k+1 − ∇ uˆ k , yˆ k+1 − yˆ k ≤ 0. From steps 6–9 of APE-ADMM, it always holds that   2 2  λk  x k − f 2 ≥ λk  x k+1 − f 2 , 2 2   λk+1  x k+1 − f 2 ≥ λk+1  x k − f 2 .

(54)

Thus, adding (45) with x = x k and (52) with x = x k+1 , we obtain  2     β1  xˆ k+1 − xˆ k  − μˆ k − μˆ k−1 , xˆ k+1 − xˆ k 2  (55) − β1 K uˆ k+1 − K uˆ k , xˆ k+1 − xˆ k ≤ 0. From (38) with k = k − 1, (53), and (55), we have ⎧ 2     ⎪ ⎨ xˆ k+1 − xˆ k , K uˆ k+1 − xˆ k ≥  xˆ k+1 − xˆ k  , 2 2    ⎪   k+1 k k+1 k k+1 k ⎩ yˆ − yˆ , ∇ uˆ − yˆ ≥  yˆ − yˆ  . 2

(56)

Equation (56), together with ⎧ %      &   k+1 2  k 2  k+1 k 2 ⎪ k+1 k k ⎪  − xˆ  − xˆ −ˆx  , ⎨ xˆ −ˆx , xˆ = 12  xˆ 2 2 2 % 2  2   & (57)   ⎪ k+1 k k k 2 1  k+1   k   k+1 ⎪ ⎩ yˆ − yˆ , yˆ = 2  yˆ  − yˆ  − yˆ − yˆ  , 2

2

2

results in ⎧  2  2  2         ⎪ ⎨ 2 xˆ k+1−ˆx k , K uˆ k+1 ≥  xˆ k+1  − xˆ k  + xˆ k+1−ˆx k  , 2 2 2         ⎪ k+1 2  k 2  k+1 k 2 ⎩ 2 yˆ k+1− yˆ k , ∇ uˆ k+1 ≥   yˆ  − yˆ  + yˆ − yˆ  . 2 2 2 (58) Then combining (50) and (58), we obtain  k 2  k+1 2  2   ˆ   k+1 2   k  ξ  − ξˆ  μˆ  − μˆ  2 2 2 2 + β1 β2 %  %     &  &  k 2  k+1 2  k 2  k+1 2 + β1  xˆ  −  xˆ  + β2  yˆ  −  yˆ  2 2 2 2  2    k+1  k+1 k+1  k+1 2 ≥ β1  xˆ − K uˆ − ∇ uˆ  + β2  yˆ  2 2  2  2  k+1  k+1 k k + β1  xˆ − xˆ  + β2  yˆ − yˆ  . (59) 2

2

The above inequality indicates that the sequence ⎫ ⎧  2  k 2 ˆ  k ⎪ ⎬ ⎨  2  2 ⎪ μˆ  ξ      2 2 + + β1  xˆ k  + β2  yˆ k  ⎪ 2 2⎪ β2 ⎭ ⎩ β1 is nonnegative, bounded, and nonincreasing, so that it achieves a limit. Therefore, while k → +∞, the left hand side of inequality (59) tends to 0, which indicates that the limit of the right hand side is 0. This result, together with equations (16), (17), and (38), indicates the existence of Lemma 3. P ROOF OF L EMMA 5 Proof: Taking u = u∗ in (43), y = y∗ in (44), and x = x ∗ in (45), by addition, we obtain   2 k+1    ∗ λk+1   k+1  k+1  y  + λ  x ∗ − f 2 ≥  + − f y x    2 I I 2 2 2  2    k+1 k+1  k+1 k k+1 + β1 K u − x  + β1 x − x , Ku − K u∗ 2  2    k+1 k+1  − y  + β2 yk+1 − yk , ∇uk+1 − ∇u∗ + β2 ∇u 2    k k+1 k+1 − ξ k , yk+1 − ∇uk+1 . (60) − μ ,x − Ku According to Lemma 3 and the above inequality, we have % 2 &  †  †   ∗   k+1  λ  k+1  y  + λ  x ∗− f 2 ≥ lim  + − f y  .  x I 2 k→+∞  2 I 2 2 (61) ∗ ∗ ∗ ∗ ∗ is a saddle-point of Note that u , x , y ; μ , ξ LA u, x, y; μ, ξ ; λ† . Thus it follows from (11) that % 2 &  †  †   ∗   k+1  λ  k+1  y  + λ  x ∗− f 2 ≤ lim  y  + x − f  . I 2 k→+∞  2 I 2 2 (62)

4966

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Inequalities (61) and (62), together with x ∗ = K u∗ , y∗ = ∇u∗ , and Lemma 3, indicate that †  †   ∗     ∇u  + λ  K u∗ − f 2 =  y∗  + λ  x ∗ − f 2 I I 2 2 2 % 2 &     † λ  k   2 = lim  yk  + x − f  2 k→+∞ I 2 % 2 &   † λ  k  k  = lim ∇u  + K u − f  . 2 k→+∞ I 2

(63)

Furthermore, according to Lemma 3, we have %  †  2&     λ   lim LA uk , x k , yk ; μk , ξ k ; λ† = lim  yk  + x k− f  2 k→∞ k→+∞ I 2    ∗  λ†  ∗ 2 =  y I +  x − f 2 = LA u∗ , x ∗ , y∗ ; μ∗ , ξ ∗ ; λ† . (64) 2 Equation (64) asserts that every cluster point of sequence {uk , x k , yk ; μk , ξ k } is a saddle point of † LA (u, x, y; μ, ξ ; λ† ). √Rewrite √ x, y;μ, √ LA (u, √ ξ ; λ †) as † β√ L A (λ ) = L √ 1 x, β2 y;μ A (u, √ β1 , ξ√ β2 ; λ ). It follows that (u∗ , β1 x ∗ , β2 y∗ ; μ∗ β1 ,ξ ∗ β2 ) or every √ √ √ √ cluster point of {uk , β1 x k , β2 yk ; μk β1 , ξ k β2 } is a saddle point of L A (λ† ). Treating all the saddle points to√ Lemma 3 and of L A (λ† ) as √one set√ and according √ Lemma 4, {uk , β1 x k , β2 yk ; μk β1 , ξ k β2 } converges † k to a saddle point of L A (λ √of u k comes √ √ ) (the √convergence k k k from the convergence of { β1 x , β2 y ; μ β1 , ξ β2 } and (19)). Thus, {uk , x k , yk ; μk , ξ k } converges to a saddle point of LA (u, x, y; μ, ξ ; λ† ). ACKNOWLEDGMENT The authors sincerely thank the editors and anonymous referees for their valuable comments. Their help has greatly enhanced the quality of this paper. The authors also thank Prof. Y. W. Wen and Prof. P. Weiss for providing the source codes of [22] and [32] respectively, as well as the authors of FTVD-v4 and C-SALSA for making their codes available online. C. He acknowledges the helpful discussions and suggestions from Prof. Licheng Jiao, Prof. Yingliang Zhang, Prof. Xiangyu Kong, and Ph. D. Xiaosheng Si. R EFERENCES [1] P. Campisi and K. Egiazarian, Blind Image Deconvolution: Theory and Applications. Boca Raton, FL, USA: CRC Press, 2007. [2] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems. Washington, DC, USA: Holt, Rinehart and Winston, 1977. [3] C. R. Vogel, Computational Methods for Inverse Problems. Philadelphia, PA, USA: SIAM, 2002. [4] N. P. Galatsanos and A. K. Katsaggelos, “Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation,” IEEE Trans. Image Process., vol. 1, no. 3, pp. 322–336, Jul. 1992. [5] M. Fornasier, Y. Kim, A. Langer, and C.-B. Schönlieb, “Wavelet decomposition method for L 2 /TV-image deblurring,” SIAM J. Imag. Sci., vol. 5, no. 3, pp. 857–885, 2011. [6] Y. Zhang and N. Kingsbury, “Improved bounds for subbandadaptive iterative shrinkage/thresholding algorithms,” IEEE Trans. Image Process., vol. 22, no. 4, pp. 1373–1381, Apr. 2013. [7] J.-F. Cai, B. Dong, S. Osher, and Z. Shen, “Image restoration: Total variation, wavelet frames, and beyond,” J. Amer. Math. Soc., vol. 25, no. 4, pp. 1033–1089, 2012.

[8] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, pp. 259–268, 1992. [9] Z. Yang and M. Jacob, “Nonlocal regularization of inverse problems: A unified variational framework,” IEEE Trans. Image Process., vol. 22, no. 8, pp. 3192–3203, Aug. 2013. [10] Y. Hu and M. Jacob, “Higher degree total variation (HDTV) regularization for image recovery,” IEEE Trans. Image Process., vol. 21, no. 5, pp. 2559–2571, May 2012. [11] S. Lefkimmiatis, J. P. Ward, and M. Unser, “Hessian Schatten-norm regularization for linear inverse problems,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1873–1888, May 2013. [12] K. Bredies, K. Kunisch, and T. Pock, “Total generalized variation,” SIAM J. Imag. Sci., vol. 3, no. 3, pp. 492–526, 2010. [13] K. Bredies, Y. Dong, and M. Hintermüller, “Spatially dependent regularization parameter selection in total generalized variation models for image restoration,” Int. J. Comput. Math., vol. 90, no. 1, pp. 109–123, 2013. [14] Y. Hu, G. Ongie, S. Ramani, and M. Jacob, “Generalized higher degree total variation (HDTV) regularization,” IEEE Trans. Image Process., vol. 23, no. 6, pp. 2423–2435, Jun. 2014. [15] A. Chambolle, S. E. Levine, and B. J. Lucier, “An upwind finite-difference method for total variation–based image smoothing,” SIAM J. Imag. Sci., vol. 4, no. 1, pp. 277–299, 2011. [16] P. Getreuer, “Contour stencils: Total variation along curves for adaptive image interpolation,” SIAM J. Imag. Sci., vol. 4, no. 3, pp. 954–979, 2011. [17] R. H. Chan and J. Ma, “A multiplicative iterative algorithm for boxconstrained penalized likelihood image restoration,” IEEE Trans. Image Process., vol. 21, no. 7, pp. 3168–3181, Jul. 2012. [18] G. Chantas, N. Galatsanos, A. Likas, and M. Saunders, “Variational Bayesian image restoration based on a product of t-distributions image prior,” IEEE Trans. Image Process., vol. 17, no. 10, pp. 1795–1805, Oct. 2008. [19] E. Vera, M. Vega, R. Molina, and A. K. Katsaggelos, “Iterative image restoration using nonstationary priors,” Appl. Opt., vol. 52, no. 10, pp. D102–D110, Apr. 2013. [20] M. Goldman, “Continuous primal-dual methods for image processing,” SIAM J. Imag. Sci., vol. 4, no. 1, pp. 366–385, 2011. [21] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis., vol. 40, no. 1, pp. 120–145, 2011. [22] Y. W. Wen and R. H. Chan, “Parameter selection for total-variationbased image restoration using discrepancy principle,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1770–1781, Apr. 2012. [23] B. He and X. Yuan, “Convergence analysis of primal-dual algorithms for a saddle-point problem: From contraction perspective,” SIAM J. Imag. Sci., vol. 5, no. 1, pp. 119–149, 2012. [24] M. Tao, J. Yang, and B. He, “Alternating direction algorithms for total variation deconvolution in image reconstruction,” Dept. Math., Nanjing Univ., Jiangsu, China, Tech. Rep. 09-18, Aug. 2009. [25] Y. Wang, J. Yang, W. Yin, and Y. Zhang, “A new alternating minimization algorithm for total variation image reconstruction,” SIAM J. Imag. Sci., vol. 1, no. 3, pp. 248–272, 2008. [26] T. Goldstein and S. Osher, “The split Bregman method for L1-regularized problems,” SIAM J. Imag. Sci., vol. 2, no. 2, pp. 323–343, 2009. [27] J. Yang, W. Yin, Y. Zhang, and Y. Wang, “A fast algorithm for edge-preserving variational multichannel image restoration,” SIAM J. Imag. Sci., vol. 2, no. 2, pp. 569–592, 2009. [28] C. Wu and X.-C. Tai, “Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models,” SIAM J. Imag. Sci., vol. 3, no. 3, pp. 300–339, 2010. [29] N. B. Brás, J. Bioucas-Dias, R. C. Martins, and A. C. Serra, “An alternating direction algorithm for total variation reconstruction of distributed parameters,” IEEE Trans. Image Process., vol. 21, no. 6, pp. 3004–3016, Jun. 2012. [30] P. Getreuer, “Total variation deconvolution using split Bregman,” Image Process. Line, vol. 2, pp. 158–174, Jul. 2012. [31] P. Getreuer, “Rudin–Osher–Fatemi total variation denoising using split Bregman,” Image Process. Line, vol. 2, pp. 74–95, May 2012. [32] M. K. Ng, P. Weiss, and X. Yuan, “Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods,” SIAM J. Sci. Comput., vol. 32, no. 5, pp. 2710–2736, Aug. 2010.

HE et al.: FAST APE FOR TV IMAGE RESTORATION

[33] M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems,” IEEE Trans. Image Process., vol. 20, no. 3, pp. 681–695, Mar. 2011. [34] Y. Wen and A. M. Yip, “Adaptive parameter selection for total variation image deconvolution,” Numer. Math., Theory, Methods, Appl., vol. 2, no. 4, pp. 427–438, 2009. [35] V. A. Morozov, Methods for Solving Incorrectly Posed Problems, A. B. Aries and Z. Nashed, Eds. New York, NY, USA: Springer-Verlag, 1984. [36] G. H. Golub, M. Heath, and G. Wahba, “Generalized cross-validation as a method for choosing a good ridge parameter,” Technometrics, vol. 21, no. 2, pp. 215–223, 1979. [37] H. Liao, F. Li, and M. K. Ng, “Selection of regularization parameter in total variation image restoration,” J. Opt. Soc. Amer. A, Opt., Image Sci., Vis., vol. 26, no. 11, pp. 2311–2320, Nov. 2009. [38] P. C. Hansen, “Analysis of discrete ill-posed problems by means of the L-curve,” SIAM Rev., vol. 34, no. 4, pp. 561–580, Dec. 1992. [39] H. W. Engl and W. Grever, “Using the L–curve for determining optimal regularization parameters,” Numer. Math., vol. 69, no. 1, pp. 25–31, Nov. 1994. [40] Y. Lin, B. Wohlberg, and H. Guo, “UPRE method for total variation parameter selection,” Signal Process., vol. 90, no. 8, pp. 2546–2551, Aug. 2010. [41] J. M. Bioucas-Dias, M. A. T. Figueiredo, and J. P. Oliveira, “Adaptive total variation image deblurring: A majorization–minimization approach,” in Proc. EUSIPCO, Florence, Italy, 2006. [42] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Parameter estimation in TV image restoration using variational distribution approximation,” IEEE Trans. Image Process., vol. 17, no. 3, pp. 326–339, Mar. 2008. [43] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Variational Bayesian blind deconvolution using a total variation prior,” IEEE Trans. Image Process., vol. 18, no. 1, pp. 12–26, Jan. 2009. [44] P. Blomgren and T. F. Chan, “Modular solvers for image restoration problems using the discrepancy principle,” Numer. Linear Algebra Appl., vol. 9, no. 5, pp. 347–358, Jul./Aug. 2002. [45] P. L. Combettes and J. Pesquet, “A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery,” IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 564–574, Dec. 2007. [46] A. Chambolle, “An algorithm for total variation minimization and applications,” J. Math. Imag. Vis., vol. 20, nos. 1–2, pp. 89–97, 2004. [47] T. F. Chan and J. Shen, Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods. Philadelphia, PA, USA: SIAM, 2005. [48] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. New York, NY, USA: Springer-Verlag, 2011. [49] T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk, “Fast alternating direction optimization methods,” Dept. Comput. Appl. Math., Univ. California, Los Angeles, Los Angeles, CA, USA, Tech. Rep. 12-35, May 2012. [50] W. Deng and W. Yin, “On the global and linear convergence of the generalized alternating direction method of multipliers,” Dept. Comput. Appl. Math., Rice Univ., Houston, TX, USA, Tech. Rep. 12-14, 2012. [51] S. Setzer, “Operator splittings, Bregman methods and frame shrinkage in image processing,” Int. J. Comput. Vis., vol. 92, no. 3, pp. 265–280, 2011. [52] I. Ekeland and R. Témam, Convex Analysis and Variational Problems (Classics in Applied Mathematics). Philadelphia, PA, USA: SIAM, 1999. [53] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. New York, NY, USA: Springer-Verlag, 2006. [54] S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed. San Diego, CA, USA: Academic, 1999. [55] A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process., vol. 18, no. 11, pp. 2419–2434, Nov. 2009. [56] J. Duran, B. Coll, and C. Sbert, “Chambolle’s projection algorithm for total variation denoising,” Image Process. Line, vol. 3, pp. 311–331, Dec. 2013. [57] S. Xie and S. Rahardja, “Alternating direction method for balanced image restoration,” IEEE Trans. Image Process., vol. 21, no. 11, pp. 4557–4567, Nov. 2012.

4967

Chuan He received the B.S. and M.S. degrees from the High-Tech Institute of Xi’an, Xi’an, China, in 2008 and 2010, respectively, where he is currently pursuing the Ph.D. degree. His research interest covers signal/image processing and remote sensing.

Changhua Hu received the B.S. and M.S. degrees from the High-Tech Institute of Xi’an, Xi’an, China, in 1987 and 1990, respectively, and the Ph.D. degree from Northwestern Polytechnic University, Xi’an, in 1996. He is currently a Cheung Kong Professor with the Hi-Tech Institute of Xi’an. He was a Visiting Scholar with the University of Duisburg-Essen, Essen, Germany, in 2008. He has authored or co-authored two books, and about 100 articles. His research interest covers signal processing, fault diagnosis and prediction, life prognosis, and fault-tolerant control.

Wei Zhang received the M.S. degree from the Shaanxi University of Science and Technology, Xi’an, China, in 2002, and the Ph.D. degree from Xidian University, Xi’an, in 2008. She is currently an Associate Professor with the Hi-Tech Institute of Xi’an, Xi’an. Her research interest covers signal processing and communication technology.

Biao Shi received the Ph.D. degree from the Xi’an University of Technology, Xi’an, China, in 2011. He currently holds a post-doctoral position with the HiTech Institute of Xi’an, Xi’an. His research interest covers signal/image processing and computational intelligence.

A fast adaptive parameter estimation for total variation image restoration.

Estimation of the regularization parameter, which strikes a balance between the data fidelity and regularity, is essential for successfully solving il...
12MB Sizes 4 Downloads 12 Views