284

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 1, JANUARY 2015

Total Variation Regularization via Continuation to Recover Compressed Hyperspectral Images Duncan T. Eason and Mark Andrews, Member, IEEE

Abstract— In this paper, we investigate a low-complexity scheme for decoding compressed hyperspectral image data. We have exploited the simplicity of the subgradient method by modifying a total variation-based regularization problem to include a residual constraint, employing convex optimality conditions to provide equivalency between the original and reformed problem statements. A scheme that utilizes spectral smoothness by calculating informed starting points to improve the rate of convergence is introduced. We conduct numerical experiments, using both synthetic and real hyperspectral data, to demonstrate the effectiveness of the reconstruction algorithm and the validity of our method for exploiting spectral smoothness. Evidence from these experiments suggests that the proposed methods have the potential to improve the quality and run times of the future compressed hyperspectral image reconstructions. Index Terms— Compressed sensing, hyperspectral imaging, inverse problems.

I. I NTRODUCTION

H

YPERSPECTRAL imaging (HSI) is an extension of monochrome or RGB imaging that captures spatial information over many contiguous spectral bands [1]. Many practical problems in image enhancement, object recognition and target detection have resisted solution with conventional image sources, but there is great promise that the spectral information provided by HSI will lead to development in these areas. HSI data are combinations of emitted and reflected light that, when combined with the spectral response of the detector, provide spectral signatures covering the visible and nearinfrared parts of the spectrum [2]. In principle, matching the measured reflectance functions with known spectral signatures allows materials to be automatically determined [3]. However, due to the complexity and cost of collecting HSI data, practical applications are limited. A technique that could potentially aid HSI is a recently emerging acquisition technique called compressed sensing (CS) [4]. An alternative to traditional Shannon-Nyquist sampling, CS allows a signal to be reconstructed from a relatively small number of random linear combinations of samples [4], [5]. Provided the signal has a sparse representation in a known basis (i.e., few non-zero coefficients), the Manuscript received November 17, 2013; revised June 4, 2014; accepted November 18, 2014. Date of publication November 26, 2014; date of current version December 16, 2014. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Sergio Goma. The authors are with the Department of Electrical and Computer Engineering, University of Auckland, Auckland 1142, New Zealand (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2376273

number of measurements required can be far fewer than required by traditional acquisition techniques [6]. Signal reconstruction relies on exploiting structure present in the original signal by promoting this sparsity [7]. Many successful CS image reconstruction schemes rely on the total variation (TV) regularization [8], which punishes variation between adjacent pixels in accordance with the structure of most natural images [9]. TV methods applied to HSI reconstruction have also met with success [1], [10]. In this paper, we refer to CS applied to HSI as compressed hyperspectral imaging (CoHSI—pronounced “cozy”). One of the crucial advantages of CoHSI is that processing during acquisition is minimized, with expensive computation delayed to a computer off-field. Unfortunately, due to the size of the data involved and the complexity of current state-of-the-art recovery algorithms, reconstruction of whole HSI volumes remains a slow process. Early image recovery algorithms recast TV regularization as second-order cone programs [11], which provided strong theoretical guarantees but were very computationally expensive and impractical for large images with multiple spectral bands. Techniques that minimize computational requirements and maintain memory efficiency are therefore crucial in creating a feasible CoHSI recovery algorithm. Additionally, when extending standard image reconstruction techniques to perform CoHSI reconstruction, it would be prudent to consider the spectral structure inherent in HSI data. Currently, there are algorithms that exploit interspectral correlation [10], [12], and others that rely on the linear-mixing model (LMM) (see [1]), with these techniques focusing on improving reconstruction quality rather than computational speed. We propose a CoHSI reconstruction algorithm that uses spatial TV regularization to recover HSI data from CS measurements, additionally exploiting the similarity between contingent spectral bands. A focus of the algorithm is to minimize complexity and hence improve speed, whilst maintaining comparable reconstruction quality. A subgradient method provides this simplicity, and effort has been made to provide similar theoretical guarantees to more computationally intensive methods. The main contributions in this sense are the investigation of conjugate subgradient methods, suggested by Wolfe in 1975 [13] and visited very little in the optimization literature [14]; and the equating of the necessary constrained optimization problem with an unconstrained optimization problem by using optimality conditions in order to automatically determine appropriate regularization parameters, currently chosen heuristically by many methods [1], [15], [16]. We exploit spectral smoothness by recovering bands in stages

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

EASON AND ANDREWS: TOTAL VARIATION REGULARIZATION

285

and interpolating between recovered bands to generate starting points for the optimization routines of the remaining bands. This paper is organized as follows: Section II formalizes the form of the reconstruction problem; Section III provides the reader with some of the necessary convex optimization knowledge; Section IV gives an overview of our reconstruction algorithm, both generally and specifically; Section V discusses the estimation of starting points and the manipulation of other data; and Section VI and Section VII offer results and conclusions respectively. II. P ROBLEM F ORMULATION Let xb ∈ Rn represent the column-wise stacking of the data at the b-th spectral band of an h × h × p hyperspectral volume (with p spectral bands, h 2 = n pixels per  np = N  band, and total voxels). Furthermore, let X = x1 , . . . ,x p ∈ Rn× p denote a matrix representing the hyperspectral cube. Since each column of X corresponds to a 2D image of a particular spectral band, linear combinations of samples can be taken from separate bands using the same measurement matrix for  ∈ Rm×n, where m < n is the number of measurements  each spectral band. This provides Y = y1 , . . . ,y p ∈ Rm× p (where the total number of measurements is mp = M) and CoHSI acquisition can be viewed mathematically as Y = X

(1) Rm× p

Assuming acquisition noise E ∈ is present, it may be modeled as white Gaussian noise with variance σ 2 , which approximates the numerous disparate noise sources in an image acquisition system [17]. CS acquisition becomes Y = X + E

(2)

With Y , , and E known, (2) is not sufficient in determining the original signal: when m < n there are infinitely many candidate images X for which Y = X + E [5], [6], [18]. Something must be assumed about the signal, and for many compressed sensing recovery algorithms 1 regularization has been used, promoting sparsity in a particular basis [19]. This has typically been done by finding the solution with the smallest 1 norm out of all the solutions with a residual (the normed difference between Y and  Xˆ ) less than a predetermined amount related to the noise. However, it has been empirically shown that TV regularization is more effective on image problems since it can better preserve edges or boundaries, essential in characterizing images [1], [8]. Hyperspectral images typically identify different regions of a small number of materials, so low spatial variation is a reasonable assumption [1]. Therefore, we propose to recover hyperspectral images from CoHSI data by solving (3) Xˆ = argminXT V s.t. X − Y  F ≤ ε X

where · F is the Frobenius norm, ε = E F , and XT V  Dv X S + Dh X S is the spatial anisotropic TV norm (Dv , Dh ∈ Rn×n are discrete gradient operators that provide vertical and horizontal differences respectively, and · S  vec (·)1 ) [20]. Alternatively, (3) can be expressed as (4) Xˆ = argminXT V x∈C

where C = {X |X − Y  F ≤ ε} hereafter and defines the convex feasible set. ·T V is a convex operator [10], and may be minimized using convex optimization techniques. III. C ONSTRAINED O PTIMIZATION Consider a class of inequality-constrained minimization problems of the form xˆ = argmin f 0 (x) x

s.t. f 1 (x) ≤ e

(5)

where neither f0 (x) nor f 1 (x) need to be differentiable, but must have valid subgradients; f 0 (x) must be convex; and f 1 (x) ≤ e (where e is a constant) defines a convex set S = {x| f 1 (x) ≤ e}. To highlight the minimization over a convex set, (5) can be viewed as xˆ = argmin f 0 (x) . x∈S

(6)

A. Gradient Descent and the Subgradient Method For unconstrained convex optimization problems, gradient descent (GD) is an iterative method capable of minimizing differentiable functions. It involves applying steps to the optimization variable in the direction of the negative gradient, i.e., x = −∇f 0 (x). The algorithm can be characterized by xk+1 = xk − αk ∇ f0 (xk )

(7)

where αk is the step size for the k t h iteration and is typically chosen through a line search [21]. GD is a descent method, typified with f (xk+1 ) < f (xk ), ∀k ∈ N. The algorithm can therefore be halted when  f (xk ) − f (xk+1 )2 / f (xk+1 )2 becomes small. However, GD requires a differentiable objective function. A similar method that optimizes nondifferentiable convex functions is called the subgradient method. It operates similarly to GD, except: (a) search directions are calculated using the subgradient, referred to henceforth as g ∈ ∂ f 0 (x); (b) the step sizes are commonly fixed ahead of time and certain step size rules can be proven to provide convergence [22]; and (c) it is not a descent method—the function value can and does increase and the stopping criterion used for GD is not appropriate, meaning other stopping criteria must be chosen [23]. One extension of the subgradient method is the projected subgradient method—which solves (6)—and is given by xk+1 = PS(xk − αk gk ), where PS(z) is the Euclidean projection of z onto S. It can be shown that projection does not affect convergence guarantees [22]. B. Objective Function Augmentation An alternative form of optimizing constrained problems using unconstrained methods involves the Lagrangian function. The idea of the Lagrangian function is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions [21]. Given a Lagrange multiplier μ, (5) can be optimized by solving xˆ = argmin { f 0 (x) + μf 1 (x)} x

(8)

286

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 1, JANUARY 2015

Problems (5) and (8) are equivalent in the sense that once the value of e or μ is fixed, there is a value for the other such that both problems provide the same solution [17]. However, it can be shown that solving (8) with any gradient method converges faster when μ is low [24]. Continuation schemes (see [15], [17]) propose a sequence of problems with increasing values of μ, and use the intermediate solution as a “warm-start” for the next problem. Such schemes are particularly effective for large scale problems but do not utilize any relationship between μ and e and instead define the set of multipliers more arbitrarily [24]. C. Conjugate Gradient Methods The linear conjugate gradient method (CG) is an iterative method for solving large systems of linear equations, given by y = Ax, where A is a square, symmetric, positive-definite matrix. It operates by recasting  the problem in the form  xˆ = argminx 12 (xT Ax) − yT x and is similar to GD, except each search direction is conjugate with all the previous search directions. (See [25] for a detailed explanation of conjugacy and proof of convergence.) CG provides a specific method for selecting αk , and search directions are calculated using dk = ∇ f0 (xk ) + βk dk−1

(9)

where βk > 0. CG methods are similar to bundle methods in that each search direction contains a linear combination of previous search directions [22]. CG concepts can also be applied to minimize any differentiable function: dk can replace ∇ f 0 (xk ) in (7) to provide nonlinear CG. Moreover, replacing f 0 (xk ) in (9) with the Lagrangian described in Section III-B can solve constrained optimization problems. However, nonlinear CG does not provide a method for calculating the step sizes αk [25]. D. Convex Optimality It is well-known that confirming that a solution of a convex function φ(x) is minimal is equivalent to finding a zero in the subgradient, i.e., xˆ is optimal if 0 ∈ ∂φ(ˆx) [26]. φ(x) may be split into two convex functions, φ (x) = f 0 (x) + μf 1 (x) (as in (8)), and the optimality condition is xˆ ∈ X ⇔ 0 ∈ ∂ f 0 (ˆx) + μ∂ f1 (ˆx)

(10)

where X is the set of optimal solutions (which may be one or more solutions) [17]. To be optimal, the zero vector must be in the space spanned by the addition of the two subgradients. IV. P ROPOSED A LGORITHM We propose an algorithm called the Projected Augmented Nonlinear Conjugate Subgradient method (PANCoS) which incorporates a continuation-like scheme using multiple regularization parameters and, being a descent-like method, is well suited for optimization of large signals [27]. This is a crucial quality when performing CoHSI due to the dimension of the data. In order to solve (3) we first recast it as an unconstrained optimization problem using the Lagrangian

L (X, μ) = XT V + μ2 X − Y 2F (which maintains convexity [28]), giving   μ (11) Xˆ = argmin XT V + X − Y 2F 2 X Problem (11) is not solvable by GD due to the 1 norm’s inclusion in the TV norm, making the objective function nondifferentiable. However, the TV norm does have a valid subgradient given by G T V = DvT sgn (Dv X) + DhT sgn (Dh X), where sgn (·) is the element-wise sign function [29]. The second term in L(X, μ) has no discontinuities and can be shown to have the derivative μT(X − Y ). Hence the subgradient associated with (11) is G = G T V + μT (X − Y )

(12)

This gives an immediate course for optimization: setting the search direction as the negative of the subgradient, i.e., X k+1 = X k − αk G k

(13)

and search directions can be made conjugate using (9): Dk = G k + β Dk−1

(14)

When Dk in (14) replaces G k in (13) it gives the Augmented Nonlinear Conjugate Subgradient method. Euclidean projection ensures that every solution is within the feasible set as defined in (4). If a solution is already feasible, projection does nothing; if outside the feasible set, Euclidean projection generates the feasible solution with the smallest Frobenius norm difference from the original solution. We denote the projection of Z onto the set C with the operator X = PC (Z ), which can be shown to have the closed form solution  Z − Y  F ≤ ε 0, (15) PC (Z ) = Z + γ † (Y − Z ), Z − Y  F > ε where γ = (1 − ε/Z − Y  F ) and † = T (T )−1 is the Moore-Penrose pseudoinverse of . Employing both the Lagrangian and projection operator is reasonable as the projection step can be seen as maintaining feasibility, with the Lagrangian preventing search directions from pointing away from the feasible set. To select μ, we employ a continuation scheme in which (11) is solved with a particular μ = μ0 (calculated using convex optimality conditions and expanded upon in Section A). A continuation-like scheme is then applied to (11), routinely increasing μ and “warm-starting” from previous solutions (see [15]). However, most continuation schemes run a fixed number of μs [24]. Our algorithm runs until a particular μ provides a pre-projected feasible solution and then terminates. (After projection, the solution will always be feasible. The pre-projected solution is examined as this indicates better whether μ is appropriate.) Stopping earlier is advantageous in two ways: (a) if μ grows too large, less weight is placed on minimizing the TV, meaning the solution is unlikely to be consistent with (3); and (b) unnecessary iterations are avoided. Current proposed stopping criteria for the subgradient method have strong theoretical guarantees but take effect

EASON AND ANDREWS: TOTAL VARIATION REGULARIZATION

287

very slowly. For this reason, the subgradient method is usually used without any formal stopping criterion [22]. We propose a method whereby the value of the Lagrangian is averaged over a certain number of iterations and compared to the previous averaged Lagrangian function values. If this comparison indicates a proportional decrease less than some pre-determined threshold, PANCoS terminates. As recommended in [22], the final solution is from the iteration with the lowest function value, i.e., Xˆ k = argmini L (X i , μ), i = 1, . . . , k. PANCoS can be summarized thus: 1) Calculate subgradient G k . 2) Use the subgradient and previous search direction to calculate the conjugate search direction Dk , or use just the subgradient if it is the first iteration. 3) Update X k+1 . 4) If X k+1 is infeasible, project it onto the feasible set. 5) Examine the averaged Lagrangians; if they are still decreasing more than some threshold, return to step 1). 6) X is the lowest TV X k generated by steps 1) to 5). 7) If the pre-projected X is infeasible, increase μ by a predetermined factor η (i.e., μr = ηr μ0 ) and return to step 1). Otherwise, Xˆ ← X and terminate. Steps 1) to 5) will hereafter be referred to as the inner PANCoS loop, with the inclusion of steps 6) and 7) describing the outer loop. k stores the number of inner loops run, with r denoting the number of completed outer loops. Expanded pseudo code can be seen in Table I.

TABLE I PANC O S A LGORITHM . T HE I NNER L OOP I S D ELINEATED W ITH D OTTED L INES , THE O UTER L OOP W ITH D OT-D ASHED L INES

A. PANCoS Formulation Recall that L (X, μ) is convex. Considering (10), a solution ˆ μ), i.e., Xˆ is optimal of (11) if 0 ∈ ∂L( X, Xˆ ∈ X ⇔ 0 ∈ Gˆ T V + μT ( Xˆ − Y )

(16)

where 0 ∈ Rn× p is the zero matrix, X is the set of optimal solutions, and Gˆ T V is the subgradient of  Xˆ T V . Lemma IV.1: Xˆ can only be an optimal solution of (11) if μ ≤ 4/T ( Xˆ − Y )max . Proof: Let us examine an arbitrary G T V : considering only the Dv term, it can be shown that ⎧ (Dv X) j,s > 0 ⎨ 1, n× p 0, (Dv X) j,s = 0 (17)  sgn(Dv X) ∈ R ⎩ −1, (Dv X) j,s < 0 for j = 1, . . . , n and s = 1, . . . , p, where (Dv X) j,s is the discrete vertical gradient of X at pixel j and spectral band s. Examination of DvT reveals that the rows can contain one of four things: (a) all zeros, (b) −1 at entry ( j, j ), (c) 1 at entry ( j, j − 1), or (d) the addition of (b) and (c). This, combined with (17), which implies that sgn (Dv X) ∈ {−1, 0, 1}, tells us that DvT sgn (Dv X) ∈ {−2, . . . , 2}. The same can be said for all of the elements of DhT sgn (Dh X), thus G T V ∈ {−4, . . . , 4}

(18)

for any X. It readily follows from (16) that Xˆ can only be optimal if all elements of μT ( Xˆ − Y ) are within the same range as Gˆ T V . (Optimality requires that the addition of the two

terms must be capable of adding to 0.) Therefore, (16) and (18) imply that μT ( Xˆ − Y ) ∈ {−4, . . . , 4}. Consequently Xˆ ∈ X ⇔ μ ≤

4 T ( Xˆ

− Y )max

(19)

This completes the lemma. √ Theorem IV.1: If μ = 4 N /(σ ε), where σ 2 = λmin (T ), solving (11) satisfies the inequality constraint in (3). Proof: The objective in (11) is convex, so√Xˆ will always satisfy the inequality in (19). Since · F ≤ N ·max and σ 2  Xˆ − Y 2F ≤ T ( Xˆ − Y )2F [30], it can be easily determined that 16N (20) μ2 ≤ 2 σ  Xˆ − Y 2 F

Rearranging, the solution of (11) will always satisfy  Xˆ − Y 2F ≤

16N σ 2 μ2

(21)

Recall the constraint in (3): X − Y 2F ≤ ε2

(22)

288

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 1, JANUARY 2015

By equating the R.H.S. of (21) and (22) and rearranging, an appropriate value for the regularization parameter μ can be calculated using √ 4 N μ= (23) σε Solving (11) with μ = μ as in  μ Xˆ = argmin XT V + X − Y 2F 2 X

(24)

(22) is thus always satisfied. This completes the proof. Note that Theorem IV.1 guarantees that solutions of (24) are in the feasible set, but does not necessarily provide the equivalent solution of (3). The above inequalities are tight in theory [30], however in practice this is often not the case. The solution to (24) is likely the minimum TV  solution within a subset of C, Cs = X|X − Y 2F ≤ ρε2 , where ρ < 1. The value of ρ is dependent on the tightness of the inequalities, experimentally measured for our simulations to be ρ ≈ 0.20, but ostensibly unexcogitable by analytic means. Our continuation scheme starts PANCoS with μ0 = μ/τ , where τ = ηq with η > 1 and q ∈N. Recall μr = ηr μ0 ; the  set of μs used could contain all of μ0 , ημ0 , . . . , ηq−1 μ0 , μ , but Theorem IV.1 guarantees that more will never be required. Knowing μ and μ0 provides the benefit of identifying the maximum number of continuation sequences required. Performing continuation starting with μ0 = μ/τ affords two advantages: (a) faster convergence [15]; and (b) with a lower μ0 and less emphasis on minimizing the residual, the larger subset (represented by a larger ρ) in which the optimal solution is likely to fall allows for a lower TV norm solution to be found. Finally in regards to the selection of μ, our scheme cycles through the set of μs until a feasible solution is found (instead of running the gamut of regularization parameters) potentially increasing the size of the subset Cs and providing a lower TV solution whilst avoiding unnecessary iterations. Experiments with τ = 215 (i.e., η = 2, q = 15) indicate that this is typically when μ ∈ μ/24 , μ/22 (confirming the inequalities in Theorem IV.1 are not tight). Additionally,  Xˆ is farther from the noisy compressed observations, implying that solution could be less subject to acquisition noise. As intimated in Section III-A, αk is typically set ahead of time for each iteration. We have chosen a constant step size where αk = α is a positive constant independent of k. Convergence can be guaranteed for any value of α, but the rate of convergence will change [22]. Fig. 1 shows that larger step sizes converge faster but settle at higher objective function values. We have developed an approach where α is selected when k = 0 using a literature-standard backtracking line search, with coarse adjustments to α [21]. Recall that G k is a combination of the TV subgradient and the residual derivative (the residual is differentiable). While there is no guarantee that the first step will be a descent direction, in practice the combination of G T V and the derivative of a large residual commonly results in one. If the backtracking line search fails, α defaults to 0.01. (Note that projection does

Fig. 1. T Vbest vs. iteration for first 20 iterations of PANCoS with a constant step length α.

not affect convergence guarantees provided by constant step size [22].) In linear CG, there are several equivalent expressions for calculating the conjugate weightings βk . In nonlinear CG, these expressions are no longer equivalent and research continues in determining which is best. We have employed T g the Polak-Ribière formula βkP R = (gkT (gk −gk−1 ))/(gk−1 k−1 ) (where gk = vec(G k )), due to the strong theoretical guarantees and fast convergence of iterations it provides. However, this method can occasionally collapse into a situation where search directions cycle infinitely without Fortunately, if  converging.  calculated using βk = max βkP R , 0 , it is equivalent to “restarting” CG if βkP R < 0, forgetting the past search directions and starting a new in the direction of the subgradient. This prevents search directions from incorporating previous contradictory ones [25]. Note that convergence guarantees provided by a constant α are not maintained when search directions are made conjugate, but we have sacrificed some theoretical certainty as empirical evidence suggests that conjugating search directions causes the algorithm to converge in fewer iterations. Step 5) of PANCoS (Section IV) instructs that the inner loop only terminates if the averaged Lagrangian stops decreasing. 1, where Let Lk = L (X k , μ); starting from k = a − k a is the number of values averaged, L¯ k = i=k−a+1 Li and L¯ k+1 are compared. Our choice of a = 4 is largely heuristic, but experiments show a 4-iteration fluctuating cycle (exhibited by the TV only—the precise mechanism is still under investigation, although suspected to be due to subgradient steps) present in the objective function values. Averaging over 4 iterations removes this fluctuation and provides a much clearer interpretation of the objective, as seen in Fig. 2. After choosing a tolerance (we have found tol ∈ [10−3 , 10−6 ] to be effective), if L¯ k+1 ≥ (1 − tol) L¯ k then the inner PANCoS loop is terminated. The outer PANCoS loop terminates when the pre-projected solution generated by the inner loop is feasible. A slight modification is applied to PANCoS: after each PANCoS inner loop terminates, α is decreased and the procedure restarted from the final estimate.

EASON AND ANDREWS: TOTAL VARIATION REGULARIZATION

289

TABLE II ISP-PANCoS A LGORITHM

Fig. 2. L k and L¯ k (with 4 successive values averaged) vs. iteration for first 60 iterations of single PANCoS inner loop. Algorithm terminates at k = 22.

This only occurs once, and helps ensure appropriate step sizes and improves robustness to varying initial conditions. The whole process (excluding the aforementioned α-decreasing modification) is summarized in Table I, with computational complexity for each iteration dominated by matrix multiplications with cost O (m N). V. I NFORMED S TARTING P OINTS PANCoS is supplemented with a CoHSI specific scheme, hereafter referred as Informed Starting Points (ISP), where only a small number of spectral bands are recovered from a “cold-start”, (where the starting points have little or no association with the optimization problem [15]), and all the other optimization routines start from points informed by the reconstructed bands. Our ISP scheme uses PANCoS to recover a subset of the spectral bands, and then interpolates between them to provide starting points for the optimization routines that reconstruct the remaining bands. Spectral smoothness suggests that materials have similar reflectance values in adjacent bands, so the starting points from this scheme should be similar to the optimal solution, greatly decreasing the required number of iterations. Taking this further, “cold-start” optimization can be performed on just a small number of spectral bands, with reconstruction performed in levels, gradually filling in wavelengths until the whole hyperspectral image has been recovered. This tactic benefits from few spectral bands being reconstructed uninformed, and ISPs get more accurate as the process moves forward because starting points will be calculated with more proximal spectral bands. One may look at ISP as assisting continuation, as starting points—coming from recovered spectral bands—will already have low TV. Therefore, runs of PANCoS with very low μ (which heavily weights minimizing TV) are unnecessary, and each level of ISP may be run with a different starting μ0 . This algorithm, referred to as ISP-PANCoS, is summarized below: 1) Delegate which spectral bands are recovered in which levels: bi denotes the bands recovered by level i = 0, . . . l −1, and wi is the number of bands in level i .

2) Due to each level potentially reconstructing a different number of spectral bands (with varying amounts of data), each level has a different noise power ε. Calculate εi for each level. 3) Set μ0,i for each level. (μ0,i represents the starting μ for the i t h ISP-PANCoS level). 4) Isolate batches of measured data, Ybi (the columns of Y specified by bi ). 5) Calculate X 0,b0 ← † Yb0 for the first “cold-start” level, where † is the pseudoinverse of . (X 0,b0 represents the starting point for the i t h ISP-PANCoS level.) 6) Recover the bands specified by b0 (represented by Xˆ b0 ) from a “cold-start” using PANCoS. 7) For each level, do the following: reconstruct the corresponding bands from “warm-starts” by interpolating between recovered bands to generate ISPs before regularizing using PANCoS. ˆ 8) Arrange Xˆ bi for i = 0, . . . , l − 1 to provide X. A. ISP Considerations In order to assist PANCoS in reconstruction, the ISP scheme must manipulate CoHSI observations and information thereof, with details given in the following section, concluded with pseudo code of the whole algorithm. To create information regarding which spectral bands are recovered in which levels of reconstruction, b0 is set to [1, p] (the first and last spectral band). After recovering Xˆ b0 , b1 is the band in the center of the spectral range, i.e., b1 = p/2 . By keeping track of the bands that have been recovered, bi for i = 2, . . . , l − 1 can be set as the spectral bands in the middle of the gaps between already reconstructed bands. The number of bands recovered by each level grows by a factor of 2 for 2 ≤ i ≤ l − 2, with the final level filling in what is left (therefore, knowing p, l can be calculated a priori). We already know that ISPs get more accurate the higher the level; this scheme matches the most accurate ISPs with levels where the most spectral bands are reconstructed. The power of the noise in acquisition has thus far been represented by ε = E F . However, ISP-PANCoS requires different εi for each level of reconstruction because of the varying number of spectral bands which are being recovered.

290

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 1, JANUARY 2015

Fig. 3. Simulated hyperspectral image using Shepp-Logan Phantom. (a) 256 × 256 Shepp-Logan Phantom. Key indicates spatial locations of each material. (b) Normalized spectral signatures of chosen materials.

Fig. 4. Comparisons of PANCoS and TVAL3—with and without ISP—recovering simulated hyperspectral data. (a) Reconstruction quality vs. SR; error bars indicate minima and maxima. (b) Average run-time to recover each spectral band vs. SR (the legend and error bars are the same as in (a)). (c) Single spectral band of original hyperspectral image (λ = 0.475μm). (d) Reconstruction of (c) using ISP-PANCoS; SR = 0.1. (e) Reconstruction of (c) using TVAL3; SR = 0.1.

If the noise matrix E is not explicitly available, but the variance σ 2 is known, the statistical model of the noise (modeled as AWGN) implies that εi can be estimated using  εi ∼ (25) = σ 2 mwi where mwi is the number of elements in Ybi . Using (25) to calculate εi , each level of ISP-PANCoS attempts Xˆ bi = argmin XT V X

s.t. X − Ybi 2F ≤ εi2

(26)

Because the feasible set defined by each level of ISP-PANCoS is unique, εi and Yi should replace ε and Y in (15) in order to perform each projection. Additionally, using the same techniques as Theorem IV.1, it is possible to show that appropriate regularization parameters for √ i = 0, . . . , l − 1 can be calculated using μi = 4 nwi /(σ εi ). Moreover, as expressed earlier, μ0,i can be different for each ISP-PANCoS level—lower for early levels, and higher for later levels—introducing level-specific calculations μ0,i = μi /τi . With a lack of theoretical guidance, we have empirically

EASON AND ANDREWS: TOTAL VARIATION REGULARIZATION

Fig. 5. False-color ground-truth of Salinas scene (legend omitted for clarity).

selected τ0 = η5 , τl−1 = η2 , and interpolated between these values for the middle levels. Finally, calculating ISPs is simply a linear interpolation using the two most adjacent reconstructed bands. ˆ ISP-PANCoS is presented in Table II, where interp( X) interpolates between already recovered spectral bands of the estimated hyperspectral image to provide the ISPs. Most steps can be performed in advance so ISP has a limited effect on computational complexity. Only interpolation (to generate ISPs) adds with a cost O(nwi ). However, this only occurs once for each ISP level, generally having little effect on the overall algorithm. Computational complexity for each iteration of PANCoS now varies with wi , given by O(mnwi ). A final note: Although not incorporated into these experiments, it is possible to calculate new starting points for early levels from the later levels and re-estimate these bands—using PANCoS—from more advantageous “warm-starts”. Experiments where bands from levels i = 0, . . . , l − 2 are re-estimated show that this can increase fidelity by between 0.5 and 1.0dB, but more than doubles computation time. Since speed is a focus of this paper, this technique is omitted. VI. N UMERICAL R ESULTS In order to establish the potential benefits of ISP, we compare ISP-PANCoS to PANCoS where all bands are reconstructed simultaneously (no ISP), with the worth of PANCoS verified by comparing it to TVAL3 [8], a stateof-the-art CS image reconstruction algorithm, which we ran with the recommended parameters. Experiments with TVAL3 have been performed both with and without the aid of ISP. We compare PANCoS with a monochrome image reconstruction algorithm in order to show that ISP can provide benefits to any reconstruction algorithm. Experiments were performed using simulated and real hyperspectral data with 256 × 256 pixel spatial planes and 224 spectral bands. All measurements were acquired with SNRs of 30dB, and each trial was repeated seven times and then averaged with the lowest and highest values disregarded. All four algorithms were run with a range of sampling ratios (SR)—the ratio of measurements to signal length M/N. The metrics compared are the quality of the reconstruction using PSNR and the time taken to perform the

291

Fig. 6. Three bands of the salinas scene (λ = {0.465, 0.690, 1.135}μm) in channels red, green, and blue respectively.

Fig. 7. The Salinas scene as recovered by ISP-PANCoS (λ = 0.465, 0.690, 1.135 μm chosen à la Fig. 6); {SR = 0.1}.

reconstruction in seconds per spectral band. Both are common metrics for evaluating CS reconstruction algorithms. In all experiments we simulate a single pixel hyperspectral camera (SPHC) developed by Rice University [31]. The measurement matrix is defined as in [32], which is created and stored explicitly in memory. Unlike other CS reconstruction schemes, ISP-PANCoS does not rely on properties and simplifications provided by the measurement matrix in order to perform efficient reconstruction (such as [24]). Therefore, any arbitrary measurement matrix can be employed, and we employ one compatible with the SPHC scheme we simulate. All experiments were carried out in MATLAB 7.11.0 (64-bit) on a 3.33GHz quad-core desktop computer with 8GB of RAM. A. Simulated Hyperspectral Data We selected 4 disparate spectral signatures from the ASTER Spectral Library [33], which we normalized then assigned to different regions of a 256 × 256 Shepp-Logan Phantom determined by the original intensity. (Two small regions of different intensities have been combined with adjacent regions.) The signatures of the materials and their locations can be seen in Fig. 3. The purpose of these experiments is to validate the choice of TV regularization by recovering a contrived image with extremely low TV, but still legitimately demonstrate the effectiveness of ISP by using real spectral signatures.

292

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 1, JANUARY 2015

Fig. 8. Comparisons of PANCoS and TVAL3—with and without ISP—recovering real hyperspectral data. (a) Reconstruction quality vs. SR; error bars indicate minima and maxima. (b) Average run-time to recover each spectral band vs. SR (the legend and error bars are the same as in (a)).

Fig. 4 shows run-time and reconstruction quality against SR. We observe that run-times for both ISP experiments are faster than their “cold-start” counterparts, with speed improvements of up to 305% for PANCoS and 70% for TVAL3. Moreover, ISP-PANCoS takes as little as 20% of the time required by ISP-TVAL3. PANCoS run-times also (generally) improve for experiments with higher SRs, because the extra information provided by the additional samples causes PANCoS to converge faster, more than compensating for the increased complexity of each iteration. Conversely, each TVAL3 experiment requires more iterations for higher SRS, combining with the more complex iterations and increasing run-times. While TVAL3 provides similar reconstruction qualities for very low SRs, Fig. 4(b) reveals the importance of correctly selecting μ, with both TVAL3 experiments producing reconstructions with worse PSNRs at higher SRs. Both PANCoS experiments— with self-selecting μ—show significant improvement with increasing SR, with ISP-PANCoS producing the highest fidelity reconstructions at almost all SRs, with improvements of up to 1.85dB over PANCoS and 8.9dB over TVAL3. Fig. 4 also provides an additional visual comparison of reconstruction quality between ISP-PANCoS and ISP-TVAL3; notice the smoother reconstruction provided by ISP-PANCoS. A side note: TVAL3 experiments with noise-free reconstructions show the expecting trend of improving PSNR with increasing SR. However, we feel that assuming no noise is an unrealistic approach. B. Real Hyperspectral Data Data collected using a 224-band AVARIS sensor [34] obtained from [35] of Salinas Valley, California was used to test our algorithm on real hyperspectral data. We resized each spatial plane to 256 × 256 pixels. The ground-truth is shown in Fig. 5, along with a representation of three different bands (λ = {0.465, 0.690, 1.135} μm) in the red, green and blue channels respectively in Fig. 6. An ISP-PANCoS reconstruction of the same wavelengths is shown in Fig. 7. Fig. 8 reinforces the observations made in the simulated data case: for the majority of the experiments, ISP-PANCoS produces superior quality reconstruction when compared to the other algorithms, demonstrating the importance of automated regularization parameter selection. ISP once more

offers large run-time improvements for both algorithms, with fidelity improvements of up to 0.7dB with PANCoS. (Quality differences between TVAL3 and ISP-PANCoS appear to be statistically non-significant.) VII. C ONCLUSION HSI data processing has the potential to provide solutions to problems that conventional image processing techniques cannot solve. However, the added data limits the current applications due to complications in acquisition and storage. CS, a technique of undersampling the data whilst retaining all the information, can alleviate some HSI difficulties and make it a more viable technique. However, CS itself poses some unique challenges, not least of which is reconstructing the original data from an ill-posed set of equations. Current techniques rely on regularizing the solution with some kind of parameter, with computational demands increasing with dimension of the problem. Spectral smoothness of the underlying image can also be exploited, with varying approaches and results. This work, based on the Lagrangian function, introduces a technique for recovering images from CoHSI data using TV regularization. Computational simplicity is the main focus, with our algorithm employing a subgradient-like scheme. Speed is improved by conjugating search directions and projecting back onto the feasible set after each step and equivalency of the original problem and the Lagrangian is established using convex optimality conditions. Adjacent spectral bands are used to inform starting points for a majority of the required optimization routines. Experimental results, using synthetic and real hyperspectral data, have validated ISP-PANCoS, with significant speed improvements over the state-of-the-art TVAL3. Additionally, the importance of rigorously selecting regularization parameters has been highlighted. Our approach assumes low spatial TV and little change between contiguous spectral bands, but it may be desirable to assume LMM in an effort to further reduce complexity and improve reconstruction quality and run-times. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their insightful and useful comments.

EASON AND ANDREWS: TOTAL VARIATION REGULARIZATION

R EFERENCES [1] C. Li, T. Sun, K. F. Kelly, and Y. Zhang, “A compressive sensing and unmixing scheme for hyperspectral data processing,” IEEE Trans. Image Process., vol. 21, no. 3, pp. 1200–1210, Mar. 2012. [2] H. Liang, “Advances in multispectral and hyperspectral imaging for archaeology and art conservation,” Appl. Phys. A, vol. 106, no. 2, pp. 309–323, 2012. [3] R. Bowmaker, R. J. Dunn, K. B. Moynihan, T. J. Roper, and M. Andrews, “Construction of a practical hyperspectral image acquisition system,” in Proc. 26th Int. Conf. Image Vis. Comput. New Zealand, 2011, pp. 1–6. [4] E. Candès and J. Romberg, “Practical signal recovery from random projections,” in Proc. SPIE Int. Symp. Electron. Imag., Comput. Imag. III, San Jose, CA, USA, Jan. 2005. [5] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. [6] E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008. [7] R. Baraniuk, M. Davenport, M. Duarte, and C. Hegde. (Apr. 2, 2011). An Introduction To Compressive Sensing. [Online]. Available: http://cnx.org/content/col11133/1.5/ [8] C. Li, “An efficient algorithm for total variation regularization with applications to the single pixel camera and compressive sensing,” M.S. thesis, Dept. Comput. Appl. Math., Rice Univ., Houston, TX, USA, 2009. [9] D. Needell and R. Ward, “Near-optimal compressed sensing guarantees for total variation minimization,” IEEE Trans. Image Process., vol. 22, no. 99, pp. 3941–3949, Oct. 2013. [10] M. Golbabaee and P. Vandergheynst, “Joint trace/TV norm minimization: A new efficient approach for spectral compressive imaging,” in Proc. 19th IEEE Int. Conf. Image Process. (ICIP), Sep./Oct. 2012, pp. 933–936. [11] E. Candès and J. Romberg. (Oct. 2005). 1 -Magic: Recovery of Sparse Signals Via Convex Programming. [Online]. Available: http://users.ece.gatech.edu/~justin/l1magic/downloads/l1magic.pdf. [12] A. Abrardo, M. Barni, C. M. Carretti, S. K. Kamdem, and E. Magli, “A compressive sampling scheme for iterative hyperspectral image reconstruction,” in Proc. 19th Eur. Signal Process. Conf. (EUSIPCO), 2011, pp. 1120–1124. [13] P. Wolfe, “A method of conjugate subgradients for minimizing nondifferentiable functions,” in Nondifferentiable Optimization, vol. 3. Amsterdam, The Netherlands: North Holland, 1975, pp. 145–173. [14] E. A. Nurminski, “Conjugate subgradient method revisited,” unpublished. [15] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 586–597, Dec. 2007. [16] T. Goldstein and S. Osher, “The split Bregman method for L1-regularized problems,” SIAM J. Imag. Sci., vol. 2, no. 2, pp. 323–343, 2009. [17] E. T. Hale, W. Yin, and Y. Zhang, “A fixed-point continuation method for 1 -regularized minimization with applications to compressed sensing,” Dept. Comput. Appl. Math., Rice Univ., Houston, TX, CAAM Tech. Rep. TR07-07, Jul. 2007. [18] E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Commun. Pure Appl. Math., vol. 59, no. 8, pp. 1207–1223, 2006. [19] R. G. Baraniuk, “Compressive sensing [lecture notes],” IEEE Signal Process. Mag., vol. 24, no. 4, pp. 118–120, Jul. 2007. [20] Y. Shi and Q. Chang, “Efficient algorithm for isotropic and anisotropic total variation deblurring and denoising,” J. Appl. Math., vol. 2013, pp. 1–14, Jan. 2013. [21] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004. [22] S. Boyd and A. Mutapcic, “Subgradient methods,” Dept. Elect. Eng., Stanford Univ., Stanford, CA, USA, Tech. Rep. EE364b, Apr. 2008.

293

[23] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA, USA: Athena Scientific, 1999. [24] S. Becker, J. Bobin, and E. J. Candès, “NESTA: A fast and accurate first-order method for sparse recovery,” SIAM J. Imag. Sci., vol. 4, no. 1, pp. 1–39, 2011. [25] J. R. Shewchuk, “An introduction to the conjugate gradient method without the agonizing pain,” Comput. Sci. Division, Carnegie Mellon Univ., Pittsburgh, PA, USA, Tech. Rep. CMU-CS-TR-94-125, Aug. 1994. [26] R. T. Rockafellar, Convex Analysis. Princeton, NJ, USA: Princeton Univ. Press, 1970. [27] A. Beck and M. Teboulle, “Mirror descent and nonlinear projected subgradient methods for convex optimization,” Oper. Res. Lett., vol. 31, no. 3, pp. 167–175, 2003. [28] D. Goldfarb, S. Ma, and K. Scheinberg, “Fast alternating linearization methods for minimizing the sum of two convex functions,” Math. Program., vol. 141, nos. 1–2, pp. 349–382, 2013. [29] K. L. Clarkson, “Subgradient and sampling algorithms for 1 regression,” in Proc. 16th Annu. ACM-SIAM Symp. Discrete Algorithms, 2005, pp. 257–266. [30] S. Boyd, “Symmetric matrices, quadratic forms, matrix norm, and SVD,” Dept. Elect. Eng., Stanford Univ., Stanford, CA, USA, Tech. Rep. EE263, 2007. [31] M. F. Duarte et al., “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 83–91, Mar. 2008. [32] L. Gan, T. Do, and T. D. Tran, “Fast compressive imaging using scrambled block Hadamard ensemble,” to be published. [33] A. M. Baldridge, S. J. Hook, C. I. Grove, and G. Rivera, “The ASTER spectral library version 2.0,” Remote Sens. Environ., vol. 113, no. 4, pp. 711–715, 2009. [34] S. Lundeen. (May 8, 2013). AVIRIS—Airborne Visible/Infrared Imaging Spectrometer. [Online]. Available: http://aviris.jpl.nasa.gov/, accessed Jun. 4, 2013. [35] Hyperspectral Remote Sensing Scenes—GIC. [Online]. Available: http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_ Scenes, accessed Nov. 14, 2012.

Duncan T. Eason was born in Auckland, New Zealand, in 1988. He received the B.E. (Hons.) degree in electrical and electronic engineering from the University of Auckland, Auckland, in 2010, where he is currently pursuing the Ph.D. degree in electrical and electronic engineering. From 2009 to 2010, he was a Summer research student with the Signal Processing Group, University of Auckland.

Mark Andrews (M’86) received the B.E. (Hons.) degree in electrical and electronic engineering and the Ph.D. degree in engineering from the University of Auckland, Auckland, New Zealand, in 1985 and 1990, respectively, where he has been with the Department of Electrical and Computer Engineering since 1990. His research interests include signal processing, hyperspectral image processing, and computer vision. He has served as a Committee Member and the Chairman of the IEEE New Zealand North Branch.

Total variation regularization via continuation to recover compressed hyperspectral images.

In this paper, we investigate a low-complexity scheme for decoding compressed hyperspectral image data. We have exploited the simplicity of the subgra...
1MB Sizes 0 Downloads 6 Views