IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 11, NOVEMBER 2014

2143

Learning From Errors in Super-Resolution Yi Tang and Yuan Yuan

Abstract—A novel framework of learning-based superresolution is proposed by employing the process of learning from the estimation errors. The estimation errors generated by different learning-based super-resolution algorithms are statistically shown to be sparse and uncertain. The sparsity of the estimation errors means most of estimation errors are small enough. The uncertainty of the estimation errors means the location of the pixel with larger estimation error is random. Noticing the prior information about the estimation errors, a nonlinear boosting process of learning from these estimation errors is introduced into the general framework of the learning-based super-resolution. Within the novel framework of super-resolution, a low-rank decomposition technique is used to share the information of different super-resolution estimations and to remove the sparse estimation errors from different learning algorithms or training samples. The experimental results show the effectiveness and the efficiency of the proposed framework in enhancing the performance of different learning-based algorithms. Index Terms—Boosting, learning-based super-resolution, low-rank decomposition, sparsity.

I. Introduction UPER-RESOLUTION is a problem of reconstructing a high-resolution image from one or more observed lowresolution images. Generally, super-resolution techniques are divided into two parts, multiframe super-resolution and singleimage super-resolution, because of the difference in the number of the observed low-resolution images. Multiframe super-resolution can be traced back to the seminal work by Tsai and Huang [1] in 1984. They noticed the fact that the complementary information can be found from a sequence of observed low-resolution images because of the subpixel translation during the imaging process. Thus, a higher-resolution image can be reconstructed by collecting all of the complementary information. Indeed, their idea is connected with meta-learning [2], [3] which means boosting all information for a special application. Their idea has been verified by the application of Satellite Probatoire Pourl’Observation de la Terre 5 (SPOT5) where two 5m resolution images with half-pixel translation are used to generate a single 2.5m resolution image [4]. For more details about multiframe superresolution, please see [5]–[8].

S

Manuscript received January 28, 2013; revised July 2, 2013 and December 27, 2013; accepted January 8, 2014. Date of publication August 1, 2014; date of current version October 13, 2014. This work was supported in part by the State Key Program of National Natural Science of China under Grant 61232010, and in part by the National Natural Science Foundation of China under Grant 61172143 and Grant 61105051. This paper was recommended by Associate Editor F. Hoffmann. Y. Tang is with the School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming 650500, China (e-mail: [email protected]). Y. Yuan is with the Center for Optical Imagery Analysis and Learning, State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2014.2301732

Fig. 1. Sparsity of the estimation error generated by linear-regression-based algorithm. Left: low-resolution image. Middle: upper left, high-resolution image; upper right, interpolation image; lower left, difference image of the super-resolution image and the high-resolution image; lower right, binarization of the difference image. Right: the curve of sparsity of estimation error.

Single-image super-resolution is proposed around 2000 because of the work of Baker and Kanade [9]. They show in theory that the images satisfying the subpixel translation hypothesis cannot provide enough useful information for superresolving with larger magnification factor. To break out this limitation of multiframe super-resolution, information beyond the observed low-resolution images should be introduced into super-resolution. Therefore, an additional training set which consists of high- and low-resolution image pairs is employed for super-resolving an observed low-resolution image. The problem of super-resolving an observed low-resolution image with the help of a training set is mostly named as single-image super-resolution, example-based super-resolution, or learningbased super-resolution [10], [11] now. Learning-based super-resolution techniques are more convenient against to the multiframe super-resolution in applications because of less constraints on the observed low-resolution images. Meanwhile, learning from the image pairs is also a challenge in machine learning because of the complexity of images themselves. Therefore, we focus on the problem of learning-based super-resolution in this paper. The learning-based super-resolution techniques could be roughly divided into two parts in the view of machine learning, namely, supervised learning framework and unsupervised learning framework. Within the supervised learning framework, the learning-based super-resolution is treated as a problem of supervised regression. A regression function is learned from the training image pairs for representing the relation between a low-resolution image and its highresolution counterpart. Then, the high-resolution estimation can be obtained from an observed low-resolution image by using the regression function. For example, kernel-based regression techniques are used by Ni et al. [12] and Kim et al. [13]; local regression techniques are used by Tang et al. [14] and Lu et al. [15]; greedy regression in sparse coding space is used by Tang et al. [16], [17]; semi-supervised regression techniques are used by Tang et al. [18], [19]. Within the framework of unsupervised learning, learning-based superresolution is thought as a problem of feature learning. A set of high- and low-resolution template pairs are learned from training set by using unsupervised learning techniques. Then,

c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267  See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

2144

the representing feature vector is generated by approximating the observed low-resolution image with the linear combination of the learned low-resolution templates. A high-resolution estimation is obtained by combining the learned high-resolution templates where the representing feature vector is used as a weight vector. For example, nonlocal template pairs and Gaussian random field are used in Freeman et al. [10]; nearest neighbor template pairs and manifold learning are used by Chang et al. [20], Zhang et al. [21], and Gao et al. [22], [23]; learned template pairs and sparse representing techniques are used by Yang et al. [24], Lu et al. [25], Dong et al. [26], Mu et al. [27], and Lu et al. [28]. We notice that the estimation error will change if the learning-based super-resolution algorithms or the training sets are changed. Following the idea of multiframe superresolution, the observation that the difference exists among different estimation errors motivates us to consider the problem whether the super-resolution could benefit from the information learned from the estimation errors. It should be noticed that this problem is different from the traditional multiframe super-resolution because the difference among these estimation errors is random. The randomness makes our problem more challenge than the multiframe super-resolution. Meanwhile, the idea of learning from the errors makes it different from the ordinary learning-based super-resolution because it is an indirect model of using training samples. We find that two basic properties of the estimation errors, namely, sparsity and uncertainty, are helpful for learning from the super-resolution estimation errors. The sparsity of an estimation error means the major estimation error appears on the limited part of an estimated high-resolution image. The phenomenon is firmly connected with the fact that it is more difficult to estimate high-frequent information given only a low-resolution image. Thus, the major estimation error often appears on the high-frequent part of a natural images. It is clear that the high-frequent part is generally small for almost all nature images. So, the estimation error of super-resolution can be thought as a sparse error for the natural images. Uncertainty of the estimation error means that the estimation error is changing with the varying of the training data and learning algorithms. For a given training data, different learning algorithms will introduce different estimation errors into the super-resolution estimations. Similarly, by changing the training data, the same learning algorithm will introduce different estimation errors into the super-resolution estimations too. In theoretical, the latter is strictly connected with the theory of algorithmic stability [29], [30]. Uncertainty of the estimation errors hints the complementary information is contained in different super-resolution estimations. The sparsity of the estimation errors makes it possible to benefit from the complementary information for super-resolution. In fact, any super-resolution estimation can be thought as a combination of the real high-resolution image and a sparse estimation error. Given a set of super-resolution estimations, they can be divided into a set of copies of the high-resolution images and a set of different sparse estimation errors. Therefore, these estimations could be improved if they can be well decomposed. By vectorizing the images, the decomposition of the super-resolution estimations can be modeled as a problem of the low-rank and sparse matrix decomposition. The low-rank restrict is a natural result of uniqueness of the real high-resolution. The sparsity constraint is from the sparsity

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 11, NOVEMBER 2014

Fig. 2. Sparsity of the estimation error of different super-resolution algorithms. Each group of images is arranged as Fig. 1. Left: low-resolution image. Middle: upper left, high-resolution image; upper right, interpolation image; lower left, difference image of the super-resolution image and the high-resolution image; lower right, binarization of the difference image. Right: the curve of sparsity of estimation error. (a) Bilinear interpolation algorithm. (b) NE algorithm [20]. (c) Kernel-regression-based algorithm [13]. (d) Dictionary-based algorithm [24].

of the estimation errors. Meanwhile, the uncertainty of the estimation errors prohibits the sparse error matrix be a low-rank matrix. Thus, the estimation errors correspond to the sparse matrix and the enhanced super-resolution estimations to the low-rank matrix. Fortunately, the low-rank and sparse matrix decomposition problem can be efficiently resolved by many matrix decomposition algorithms such as robust principal component analysis (RPCA) [31]. Therefore, it is an effective and efficient way to learn from the estimation errors for improving super-resolution performance. The rest of this paper is organized as follows. In Section II, we briefly review the problem of learning-based superresolution and some related super-resolution algorithms. In Section III, some statistical results are proposed to show the sparsity and uncertainty of the super-resolution estimation error. In Section IV, a novel boosting style framework for improving the performance of learning-based super-resolution algorithms is proposed according to the sparsity and uncertainty of estimation error. In Section V, experimental results are reported for verifying the effectiveness and the efficiency of the proposed super-resolution framework. Finally, some discussion on the novel learning-from-error model is presented in Section VI.

TANG AND YUAN: LEARNING FROM ERRORS IN SUPER-RESOLUTION

2145

Fig. 3. Statistical results about sparsity. (a) NE algorithm [20]. (b) Linear-regression-based algorithm [14]. (c) Kernel-regression-based algorithm [13]. (d) Dictionary-based algorithm [24].

Fig. 4. Ideal distribution model. A truth image and its two basic types of estimations are mapped into a 2-D space. The truth image appears on the center, the estimations with same normalized estimation error δ0 distribute in the same radial, and the estimations with different normalized estimation errors but with same energy of estimation error i 2 = γ distribute on the same circle.

II. Related Work Learning-based super-resolution is a technique of superresolving low-resolution images with the help of a training set of high- and low-resolution image pairs. Mostly, a training set consists of some images similar to the observed low-resolution images. For example, the natural image pairs are used for super-resolving a natural image. The priori about the natural image pairs is the basic for super-resolving a natural image. Recently, self-sampling techniques are reported in [32]–[35] for generating training image pairs from the observed images. It is clear that the self-sampling techniques will make the training and test data more related than the traditional ones. Given a training set, different learning strategies deduce different learning-based super-resolution models. As mentioned in Section I, learning-based super-resolution is thought as a regression problem when supervised learning strategy is employed. By using a regression function or a greedy function, a priori about high- and low-resolution training image pairs are explicitly represented. However, a regression function is mostly too general to focus on the needs of super-resolving an observed low-resolution image. Therefore, it is the main tasks for supervised super-resolution algorithm to select some matching natural image priori. For example, the edge priori is used in [12] and [13]. Sparse representation priori is used in [16]. Unsupervised learning model is also used in learning-based super-resolution. Within the unsupervised learning model, it is a critical trick to assume that the high- and low-resolution image patches share same representation vector in a well selected feature space. Therefore, the core of unsupervised super-resolution algorithms is learning a well feature space. Mostly, a feature space is defined by some template sets and the matched coding methods, where the template sets are almost learned with unsupervised methods. For example, nearest neighbor templates are chosen according to Euclid distance between low-resolution image patches, and local linear embedding (LLE) [36] is used to encode the low-resolution image patches in [20]. It is clear that the local feature space

Fig. 5. Uncertainty of learning-based super-resolution: estimations: All super-resolution estimations distribute around the truth image with the different directions and the different distances. The difference among the directions and the distances visualizes the uncertainty of the estimation errors of these estimations. (a) Super-resolution with a common training set and different algorithms. (b) Super-resolution with different training sets and a common algorithm.

is learned without the information of high-resolution training image patches. Different from the algorithm reported in [20], Yang et al. [24] employ a pair of high- and low-resolution dictionaries and sparse coding techniques to define the feature space. The pair of the dictionaries are simultaneously learned with a unsupervised dictionary technique [37]. Much effort has been paid for enhancing the performance of the learning-based super-resolution algorithms by generating more related training samples and by designing more ingenious methods of analyzing training data. However, it is impossible to design an optimal super-resolution algorithm because the one-sidedness of a given training set and a chosen method of data analysis. Fortunately, the diversity of the training sets and the super-resolution algorithms makes it possible to boost the super-resolution estimations by themselves. III. Properties of Super-Resolution Estimation Error A. Sparsity of Estimation Error The sparsity of estimation error is connected with the location and intensity of an estimation error. Freeman et al. [11] pointed out that the low-resolution images contain low- and mid-frequent information of nature images, which means the low- and mid-frequent part can be well estimated. However, it is a tough task to estimate high-frequent part of an image because of the absence of the high-frequent information about high-resolution image. Thus, it can be imaged that the major estimation error should appear in the high-frequent part of super-resolution estimation. As a special case, the linear-regression-based algorithm for super-resolution is considered. By using this algorithm, the low-resolution image (left in Fig. 1) is super-resolved three times. The high-resolution image and the super-resolution image are, respectively, shown on the upper left and upper right in the middle of Fig. 1. The difference image of both images is shown on the lower left in the middle of Fig. 1. According to the difference image, it can be found that larger estimation errors appear in the high-frequent part, and the smaller estimation errors in the smooth region. By binarizing the difference image, all pixels with large enough estimation

2146

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 11, NOVEMBER 2014

TABLE I List of Super-Resolution Algorithms

TABLE II PSNRs/SSIMs for First Setting

Fig. 6. Samples of test and training images. (a) Test samples. (b) Training samples.

error are shown by a binarization image (lower right in the middle of Fig. 1). Based on the binarization image, it is clear that the pixels with larger estimation errors all locate around the high-frequent part of the test image. Notice that the highfrequent part is always a small part of a nature image. It is obvious that the major estimation error is sparse. In fact, the statistical results proposed on right of Fig. 1 show that the percentage of pixels with the estimation error larger than 30 is less than 22%. Therefore, the sparsity of estimation error is a result of the difficulty of estimating high-frequent information. For other popular learning-based algorithms, similar phenomenon can be found. As shown in the middle column of Fig. 2, major estimation errors appear on the high-frequent part of the test image. Meanwhile, the curves in the right column of Fig. 2 have also shown that most of the major errors are rare. Thus, the sparsity of the estimation errors is there for NE algorithm, kernel-regression-based algorithm and dictionary-based algorithm. Moreover, 1000 nature images from internet are used to verify the sparsity of estimation error in statistical sense. As shown in Fig. 3, the sparsity of estimation error exists for NE algorithms, regression-based algorithms and dictionary-based algorithms. Therefore, it is safe to assume that the estimation error of learning-based super-resolution algorithms is sparse. B. Uncertainty of Estimation Error The super-resolution estimation is the result of the interaction of learning algorithm and the training data. The performance of the estimation will change with the varying of the learning algorithm and the training data. Therefore, the estimation error gets uncertain.

Fig. 7. Different information of Toy data. Raw estimations are in line 1, refined estimations and their sparse error estimations are respectively in the lines 2–3, and the difference images of raw and refined estimations are, respectively, in the lines 4–5. (a) First type of estimation errors. (b) Second type of estimation errors. p Given a set of super-resolution estimations Iˆ = {Iˆi }i=1 , these estimations can be decomposed as (1) Iˆi = I + i , i = 1, 2, · · · , p

where I is the real high-resolution image, i is the ith sparse estimation error, and p is the number of these estimations. Let i = γi δi , i = 1, 2, · · · , p, where γi = i 2 ∈ R and δi 2 = 1. The value of γi is the intensity of the ith estimation error, and the normalized estimation error δi is connected with the locations of the ith estimation error. The difference in the intensity or location of estimation error is the expression of the uncertainty of the estimation error. As an ideal example, two sets of estimations are considered. Let the first set of estimations be   Iˆ 1 = Iˆi1 = I + 1i (2) where 1i = γi δ0 , γi ∈ R, i = 1, 2, · · · , s1 . Let the second set of estimations be   Iˆ 2 = Iˆi2 = I + 2i (3) where 2i = γδi , γ ∈ R, i = 1, 2, · · · , s2 . To Iˆ 1 , the difference among these estimations is just the difference of

TANG AND YUAN: LEARNING FROM ERRORS IN SUPER-RESOLUTION

2147

Fig. 8. From left to right, the results obtained by algorithms including manifold based algorithm [20], linear and heat kernel regression algorithms [13], and dictionary based algorithm [24] are shown. From line 1 to line 3, the raw and refined super-resolution estimations, and the difference images between both estimations are displayed. (a) Girl. (b) Butterfly. (c) Building. (d) Flower. TABLE III Average PSNR/SSIM for Mixing Setting

intensity. To Iˆ 2 , the difference among these estimations is just the difference of location. By using the dimension reduction technique, i.e., locality preserving projections (LPP) [38], the uncertainty, or the difference among both sets of estimations can be visualized. As shown in the left of Fig. 4, the estimations in the set Iˆ 1 appear on a line through the truth image. And the estimations in the set Iˆ 2 distribute on the circle with the truth image as a center. Therefore, the types of difference among some estimations can be distinguished by visualizing the distribution of these estimations. In Fig. 5, the uncertainty of super-resolution estimations is shown. In Fig. 5(a), these super-resolution estimations are generated by just varying super-resolution algorithms. In Fig. 5(b), all of the super-resolution estimations are generated by Chang’s NE algorithm [20] with different training data. According to Fig. 5(a) and (b), rare estimations distribute on a line through the truth image. Therefore, the difference in

location of estimation error is the main difference when the super-resolution algorithms or training samples are changed. As the uncertainty of estimation error is mainly expressed as the difference in location of estimation error. It can be expected for a series of super-resolution estimations that the estimation error on a special pixel can be refined by sharing the information of all of these estimations. IV. Nonlinear Boosting Framework for Learning-Based Super-Resolution In this section, the problem of learning from estimation errors for super-resolution is connected with a problem of lowrank and sparse matrix decomposition based on the uncertainty and the sparsity of the super-resolution estimation errors. Then, an efficient nonlinear boosting framework is proposed in this section by using an efficient low-rank and sparse matrix decomposition technique RPCA [31].

2148

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 11, NOVEMBER 2014

TABLE IV Computation Time (s)

Fig. 9. Experimental results of the first test image when the second setting is considered. The images in line 1 are the samples of the raw estimations, the images in line 2 are the samples of the refined estimations, and the images in line 3 are the difference images of both estimations. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4. (e) Alg. #5. (f) Alg. #6.

ˆ the superFor a set of super-resolution estimations I, resolution estimations can be decomposed according to (1), that is Iˆi = I + i i = 1, 2, · · · , p. (4) By vectorizing all of these images in (4), there is a matrixbased version of (4) p p p (vec(Iˆi ))i=1 = (vec(I))i=1 + (vec(i ))i=1 (5) where vec is an operator for vectorizing a matrix. p p Denote M = (vec(Iˆi ))i=1 , L∗ = (vec(I))i=1 , and S ∗ = p ∗ (vec(i ))i=1 . It is clear that the matrix L is low-rank because all columns of L∗ are same. The matrix S ∗ is sparse because it is shown in Section III-A that all super-resolution estimation errors are sparse. Therefore, the truth high-resolution image I can be found from a set of super-resolution estimations by ideally decomposing the estimation matrix M into a low-rank matrix L∗ and a sparse estimation error matrix S ∗ , that is L∗ = M − S ∗ . (6) Following the idea of (6), the task of boosting a set of super-resolution estimations can be understood as a problem of low-rank and sparse matrix decomposition

M =L+S

(7)

where L is a low-rank matrix and S is a sparse matrix. Ideally, the low-rank matrix L corresponds to the matrix L∗ and the sparse matrix S to the matrix S ∗ . Therefore, it could be expected that a column of L corresponds to a refined superresolution estimation which integrates all useful information of all original super-resolution estimations. And, the information of distorting the truth is isolated into the sparse matrix S. Therefore, the task of boosting all of these estimations is connected with the following optimization problem: min rank(L) + μS0 , L,S

s.t. M = L + S.

(8)

Here, rank(L) is the rank of the matrix L, and S0 is the number of the nonzero elements of S, μ is a regularization parameter. It should be noticed that the optimization problem (8) cannot be solved if the sparse matrix S is low-rank also [31]. Fortunately, the uncertainty of super-resolution estimation errors ensures that the sparse matrix S is always full rank. Therefore,

TANG AND YUAN: LEARNING FROM ERRORS IN SUPER-RESOLUTION

2149

Fig. 10. Experimental results of the second test image when the second setting is considered. The images in line 1 are the samples of the raw estimations, the images in line 2 are the samples of the refined estimations, and the images in line 3 are the difference images of both estimations. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4. (e) Alg. #5. (e) Alg. #6.

the low-rank and sparse matrix decomposition model can be employed for learning from the super-resolution estimation errors. Following the steps of RPCA [31], the nonlinear boosting framework for learning-based super-resolution could be designed based on the convex release version of (8) min L∗ + μS1 , L,S

s.t. M = L + S

(9)

where  · ∗ is the trace-norm,  · 1 is the 1 -norm, and μ12 is the number of pixels of the high-resolution truth image. The nonlinear boosting framework for learning-based superresolution is summarized in Algorithm 1. The operator mat(·) in Algorithm 1 is used to transform a vector to a matrix with the same size as the groundtruth image. V. Experiment To show the performance of Algorithm 1, 50 test images and 200 training images are used. Samples of test and training images are shown in Fig. 6. Each of these training and test images is represented by a set of image patches where 3 × 3 patch is used to represent low-resolution images, and 9 × 9 patch is used to represent high-resolution images. For training learning algorithms, all low-resolution image patches are represented as gradients vectors consisting of horizontal and vertical directional gradients.

A. Toy Data In this section, toy data is used to show the difference between both types of basic estimation errors within our framework. For the first type of estimation errors, the locations of the errors are common but the intensities of error are different. For the second type of estimation errors, the locations of the errors are different but the intensities of the error are same. As an example, a square block error is used to generate the set of estimations with the first type of errors [Fig. 7(a)]. By shifting a square block, a set of square block errors is used to generate the second type of errors [Fig. 7(b)]. As shown in Fig. 7(a), the estimations with the first type of errors (line 1) are decomposed into the refined estimations (line 2) and the sparse errors (line 3). The estimation errors (line 4) are estimated by RPCA as shown in line 5, Fig. 7(a). Therefore, the estimations with the first type of errors cannot be removed completely, which connected with the inherent errors of the learning super-resolution. The inherent errors correspond to the inherent gab between the information of low- and high-resolution images. Different from the first type of errors, the uncertainty of the estimation errors connected with the second type of errors is helpful to improve the quality of estimations [Fig. 7(b)]. In an ideal setting where the square block errors are disjoint, the differences between the refined estimations and the truth image are extremely small. This example shows that refined

2150

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 11, NOVEMBER 2014

Fig. 11. Experimental results of the third test image when the second setting is considered. The images in line 1 are the samples of the raw estimations, the images in line 2 are the samples of the refined estimations, and the images in line 3 are the difference images of both estimations. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4. (e) Alg. #5. (f) Alg. #6.

Fig. 12. Experimental results of the fourth test image when the second setting is considered. The images in line 1 are the samples of the raw estimations, the images in line 2 are the samples of the refined estimations, and the images in line 3 are the difference images of both estimations. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4. (e) Alg. #5. (f) Alg. #6.

estimations will benefit from the uncertainty of the estimation errors. Therefore, uncertainty of the estimation errors is helpful in boosting the performances of the estimations. B. Real Data To generate raw super-resolution estimations, six learningbased super-resolution algorithms are used. All algorithms are numbered and shown in Table I. Each raw super-resolution estimation is generated by one of the above algorithms and 2000 training examples. Four test images are used to show the effectiveness of Algorithm 1. All test images are used to show the stability of Algorithm 1.

Two basic settings are considered. One is training different learning-based super-resolution algorithms with a common training set, and the other is training a common learningbased super-resolution algorithm with different training sets. For the first setting, experimental results are shown in Fig. 8 and Table II. In Fig. 8, the raw super-resolution estimations (line 1) are respectively generated by Alg. #1–Alg. #6 from the left to the right. Comparing these estimations with the refined estimations (line 2, Fig. 8), some artifacts around the canthus and the nose are partly removed. The images in line 3 of Fig. 8 show directly the difference between a raw estimation and a refined estimation. The light part on each of these

TANG AND YUAN: LEARNING FROM ERRORS IN SUPER-RESOLUTION

2151

Fig. 13. Ratios related with PSNR and SSIM of the first test image when the second setting is considered. The ratios are larger than 1, which shows both image quality indexes are enhanced by Algorithm 1. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4. (e) Alg. #5. (f) Alg. #6.

Algorithm 1 Nonlinear Boosting Framework for Learning-Based Super-Resolution

Require: Test set T , the ith training set Si and the ith learning algorithm Ai , i = 1, 2, · · · , p. Generating Raw Estimations 1: for i = 1, 2, · · · , p do 2: Generate the ith operator P i : X → Y P i = Ai (Si ); 3:

Generate the ith raw estimation Iˆi = P i (T );

4:

Transform the ith raw estimation to a vector Iˆi = vec(Iˆi );

end for Boosting Raw Estimations 6: Combine all raw estimations  p M = Iˆi i=1 ; 5:

7:

Decompose the matrix M into a low-rank matrix L and a sparse matrix S with RPCA M = L + S;

8: 9:

for all the ith column vector Li of L do Transform Li into a refined estimations I˜i I˜i = mat(Li );

10: end for Ensure: 11: return A set of refined estimations I˜i , i = 1, 2, · · · , p.

images are the part which is refined by our boosting process of Algorithm 1. Statistical results including the peak signalto-noise ratio (PSNR) and the structural similarity (SSIM) are reported in Table II. The statistical results hint that most of these estimations are heavily enhanced by our boosting process but the estimations generated by Alg. #6. The fact shows that some information is exchanged among these estimations. The exchanging process makes most of these estimations benefit.

However, the negative information shared by the weaker estimations make the best estimation weaker. We believe that the negative effects are the result of the weaker uncertainty. It is clear that less number of the estimations and the limited size of a training set make the uncertainty of these estimations weaker. In the following experiments on the mixing setting, we will find that all algorithms benefit from each other because of stronger uncertainty. For the second setting, twenty training sets are used for training Alg. #1–Alg. #6. For each of the test images, the experimental results including all estimation images and the corresponding results about PSNRs and SSIMs are summarized in two figures. As an example, the experimental results of the test image Girl is shown in Fig. 9. Each subfigure of Fig. 9 corresponds to a special learning-based super-resolution algorithm. In Fig. 9(a) the experimental results of Alg. #1 are reported. The images in Fig. 9(a) consist of the samples including the raw estimations (line 1), the refined estimations (line 2), and the difference images of both estimations (line 3). In Fig. 13, the changes of the statistical results about the raw and refined super-resolution estimations is shown. The changes of the statistical results is shown by the ratio between the ones of the refined estimations and the ones of the raw estimations. The experimental results of Alg. #2–Alg. #6 are respectively shown in Fig. 9(b)–(f) and Fig. 13(b)–(f). The experimental results of the other test images are shown in Figs. 10–12 and Figs. 14–16. For all raw estimations, different artifacts have been observed because different training sets are used to train a common algorithm. By comparing the raw and the refined estimations, it is found that most of artifacts on the raw estimations have been removed. The fact indicates that different artifacts are adaptively found and released by Algorithm 1. It is a result of exchanging useful information from different raw super-resolution estimations. By consulting other superresolution estimations, it is easier to distinguish the artifacts from the truth image. Therefore, Algorithm 1 provides an effective boosting method to share information from different super-resolution estimations. The difference images have shown the changing from the raw estimations to the refined estimations. The major changing almost appears on the part with manifest artifacts and the high-frequent part of the test images. The fact verifies that Algorithm 1 focus on rectifying the sparse errors of the super-resolution estimations. As a byproducts of using Algorithm 1, some grid texture errors can be observed in all

2152

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 11, NOVEMBER 2014

Fig. 14. Ratios related with PSNR and SSIM of the second test image when the second setting is considered. The ratios are larger than 1, which shows both image quality indexes are enhanced by Algorithm 1. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4, (e) Alg. #5. (f) Alg. #6.

Fig. 15. Ratios related with PSNR and SSIM of the third test image when the second setting is considered. The ratios are larger than 1, which shows both image quality indexes are enhanced by Algorithm 1. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4. (e) Alg. #5. (f) Alg. #6.

of the difference images. The imperceptible errors are the results of reconstructing the super-resolution images with some averaging tricks. These averaging tricks make the estimations smoother, however, they also lead to the block effect. It is interesting that the block effect could be released by Algorithm 1 without additional image priori or the information about the patch size. Therefore, Algorithm 1 is implemented by revealing different sparse estimation errors. It means that the sparsity and uncertainty of these estimations are the base of Algorithm 1. Statistical results for the mixing setting. Horizontal axis: indexes of the test images. The statistical results of PSNRs and SSIMs have also supported above observations. Noticing the results shown in Figs. 13–16, the ratios between PSNRs/SSIMs of the raw and the refined estimations are significantly larger than one. The experimental results means that the boosting process of Algorithm 1 makes both image quality indexes be improved. Therefore, these statistical results show a positive evidence on the effectiveness of Algorithm 1. Beyond both of the basic settings, a mixing setting is considered for showing the relationship between the uncertainty of the raw estimations and the performance of Algorithm 1. Within the mixing setting, the raw estimations generated by different algorithms are pooled together. Different degree of uncertainty could be generated by shifting the number of the

raw estimations. Based on the experimental results in the second setting, the number of the raw estimations is set from 10 to 120 with a step of 10. It can be observed in Table III that both image quality indexes PSNR and SSIM get better with the increasing of the number of the raw estimations. The fact shows higher uncertainty is helpful for enhancing the performance of Algorithm 1. Combining the observation about the second type of estimations in Section V-A, we believe that the uncertainty of these raw super-resolution estimations is the information source of Algorithm 1. For the boosting process of Algorithm 1, higher uncertainty means more information for enhancing the performance of super-resolution. Therefore, more raw estimations is welcomed by Algorithm 1 if the computation ability permits. In Table IV, the computation time of Algorithm 1 is also reported. It shows that the computation time gets longer with the increasing of the size and the number of the raw superresolution estimations. It should note that it is valuable to consume the computation time because the boosting process of Algorithm 1 is an indirectly process to use a huge amount of training samples. In fact, most of known super-resolution algorithms is very time consuming when large training set is used. For example, the training time of Algs. #3 and #4 is more than 24 h when ten thousands of training samples are used. Therefore, Algorithm 1 is an efficient method to superresolve with a large size of training data.

TANG AND YUAN: LEARNING FROM ERRORS IN SUPER-RESOLUTION

2153

Fig. 16. Ratios related with PSNR and SSIM of the fourth test image when the second setting is considered. The ratios are larger than 1, which shows both image quality indexes are enhanced by Algorithm 1. (a) Alg. #1. (b) Alg. #2. (c) Alg. #3. (d) Alg. #4. (e) Alg. #5. (f) Alg. #6.

Fig. 17. Statistical results for the mixing setting: Horizontal axis: indexes of the test images. Vertical axis: Results of PSNR or SSIM, respectively. (a) PSNRs for the mixing setting. (b) SSIMs for the mixing setting.

At last, 50 test images downloaded from internet are used for testing the stability of Algorithm 1. For 50 test images, the mixing setting is considered. The statistical results about PSNR and SSIM are shown in Fig. 17(a) and (b), respectively. In each of both figures, an error bar is calculated with 80 estimations generated by different super-resolution algorithms with different training sets. For a pair of error bars in same column, the left one corresponds to the raw estimations, and the right one to the refined ones. According to Fig. 17, the lower bounds of the statistical indexes corresponding to the refined estimations are all superior than the upper bounds corresponding to the raw estimations. The variances of the statistical indexes corresponding to the refined estimations

are also smaller than the counterpart of the raw estimations. Therefore, the results reported in Fig. 17 show apparently our framework is helpful to boosting the performance of different super-resolution algorithms. VI. Conclusion In this paper, we have shown that the super-resolution estimation errors are generally sparse and uncertain. Both priori on estimation errors make it possible to enhance the image quality of these super-resolution estimations. Specifically, we adopt a low-rank and sparse matrix decomposition model to represent the process of boosting the super-resolution estimations. Experimental results show Algorithm 1 is efficient

2154

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 11, NOVEMBER 2014

and effective in boosting the information of super-resolution estimations. Beyond the well performance of enhancing super-resolution estimations, Algorithm 1 provides a chance to learn with large data. Algorithm 1 follows the idea of boosting algorithms such as AdaBoost [39], [40] that a better estimation could be generated by a set of weaker estimations. Thus, weaker estimations are learned on a small training set. Better estimation is generated by a low-rank and sparse decomposition technique. The back-end integration trick reduces the burden of learning with large data. Lastly, Algorithm 1 is an adaptive process of generating robust estimations. The low-rank and sparse decomposition technique makes the refined estimations more similar to each other because different sparse estimation errors are removed. Thus, the refined estimations are more robust than the raw estimations. However, there is enough space to enhancing the robust estimation process of Algorithm 1. It should note that we just consider the algebraic structure of the raw estimations because RPCA is used in Algorithm 1. More information about the density distribution of the raw estimations is omitted here. Therefore, Algorithm 1 gets worse when less estimations and more outliers are loaded. Recent nice jobs on robust estimation such as [41] and [42] will be helpful to improve our Algorithm 1. In these jobs, reliability of the raw estimations is measured according to the information of some density distribution related with raw estimations. Combining the reliability of the raw estimations, Algorithm 1 will suffer less interference of some outliers, which will be helpful to detect meaningful information of a truth image. We believe that these excellent robust estimation methods will heavily improve the performance of Algorithm 1 especially when less raw estimations are used. References [1] R. Y. Tsai and T. S. Huang, “Multiframe image restoration and registration,” in Advances in Computer Vision and Image Processing, vol. 1. Greenwich, CT, USA: JAI Press Inc., 1984, pp. 317–339. [2] V. Boyarshinov and M. Magdon-Ismail, “Efficient optimal linear boosting of a pair of classifiers,” IEEE Trans. Neural Netw., vol. 18, no. 2, pp. 317–328, Mar. 2007. [3] J. Lu, K. N. Plataniotis, A. N. Venetsanopoulos, and S. Z. Li, “Ensemblebased discriminant learning with boosting for face recognition,” IEEE Trans. Neural Netw., vol. 17, no. 1, pp. 166–178, Jan. 2006. [4] C. Latry and B. Rouge, “Super resolution: Quincunx sampling and fusion processing,” in Proc. IEEE Geosci. Remote Sens. Symp., vol. 1. 2003, pp. 315–317. [5] X. Li, Y. Hu, X. Gao, D. Tao, and B. Ning, “A multi-frame image super-resolution method,” Singal Process., vol. 90, no. 2, pp. 405–414, Feb. 2008. [6] M. Protter, M. Elad, H. Takeda, and P. Milanfar, “Generalizing the non-local-means to super-resolution reconstruction,” IEEE Trans. Image Process., vol. 18, no. 1, pp. 36–51, Jan. 2009. [7] H. Takeda, P. Milanfar, M. Protter, and M. Elad, “Superresolution without explicit subpixel motion estimation,” IEEE Trans. Image Process., vol. 18, no. 9, pp. 1958–1975, Sep. 2009. [8] X. Gao, Q. Wang, X. Li, D. Tao, and K. Zhang, “Zernike-moment-based image super resolution,” IEEE Trans. Image Process., vol. 20, no. 10, pp. 2738–2747, Oct. 2011. [9] S. Baker and T. Kanade, “Limits on super-resolution and how to break them,” in Proc. IEEE CVPR, vol. 2. 2000, pp. 372–379. [10] W. Freeman, E. Pasztor, and O. Carmichael, “Learning low-level vision,” Int. J. Comput. Vision, vol. 40, no. 1, pp. 25–47, 2000. [11] W. Freeman, T. Jones, and E. Pasztor, “Example-based superresolution,” IEEE Comput. Graph. Applicat., vol. 22, no. 2, pp. 56–65, Mar./Apr. 2002. [12] K. Ni and T. Nquyen, “Image superresolution using support vector regression,” IEEE Trans. Image Process., vol. 16, no. 6, pp. 1596–1610, Jun. 2007. [13] K. Kim and Y. Kwon, “Single-image super-resolution using sparse regression and natural image prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 6, pp. 1127–1133, Jun. 2010.

[14] Y. Tang, P. Yan, Y. Yuan, and X. Li, “Single-image super-resolution via local learning,” Int. J. Mach. Learn. Cybern., vol. 2, no. 1, pp. 15–23, 2011. [15] X. Lu, H. Yuan, Y. Yuan, P. Yan, L. Li, and X. Li, “Local learning-based image super-resolution,” in Proc. IEEE MMSP, Oct. 2011, pp. 1–5. [16] Y. Tang, Y. Yuan, P. Yan, and X. Li, “Greedy regression in sparse coding space for single-image super-resolution,” J. Visual Commun. Image Represent., vol. 24, no. 2, pp. 148–159, 2013. [17] Y. Tang, Y. Yuan, P. Yan, and X. Li, “Single-image super-resolution via sparse coding regression,” in Proc. ICIG, 2011, pp. 267–272. [18] Y. Tang, X. Pan, Y. Yuan, P. Yan, L. Li, and X. Li, “Local semisupervised regression for single-image super-resolution,” in Proc. IEEE MMSP, 2011, pp. 1–5. [19] Y. Tang, X. Pan, Y. Yuan, P. Yan, L. Li, and X. Li, “Single-image superresolution based on semi-supervised learning,” in Proc. Asian Conf. Pattern Recognit., 2011, pp. 52–56. [20] H. Chang, D. Yeung, and Y. Xiong, “Super-resolution through neighbor embedding,” in Proc. IEEE CVPR, vol. 1. Jun. 2004, pp. 275–282. [21] K. Zhang, X. Gao, X. Li, and D. Tao, “Partially supervised neighbor embedding for example-based image super-resolution,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 2, pp. 230–239, Apr. 2011. [22] X. Gao, K. Zhang, X. Li, and D. Tao, “Joint learning for single-image super-resolution via a coupled constraint,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 469–480, Feb. 2012. [23] X. Gao, K. Zhang, X. Li, and D. Tao, “Image super-resolution with sparse neighbor embedding,” IEEE Trans. Image Process., vol. 21, no. 7, pp. 3194–3205, Jul. 2012. [24] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution as sparse representation of raw image patches,” in Proc. IEEE CVPR, Aug. 2008, pp. 1–8. [25] X. Lu, H. Yuan, P. Yan, Y. Yuan, and X. Li, “Geometry constrained sparse coding for single image super-resolution,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., Jul. 2012, pp. 1648–1655. [26] W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and superresolution by adaptive sparse domain selection and adaptive regularization,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1838–1857, Jul. 2011. [27] G. Mu, X. Gao, K. Zhang, X. Li, and D. Tao, “Single image super resolution with high resolution dictionary,” in Proc. ICIP, 2011, pp. 1141–1144. [28] X. Lu, H. Yuan, P. Yan, Y. Yuan, and X. Li, “Utilizing homotopy for single image superresolution,” in Proc. Asian Conf. Pattern Recognit., 2011, pp. 316–320. [29] O. Bousquet and A. Elisseeff, “Stability and generalization,” J. Mach. Learn. Res., vol. 2, pp. 499–526, Mar. 2002. [30] S. Agarwal and P. Niyogi, “Generalization bounds for ranking algorithms via algorithmic stability,” J. Mach. Learn. Res., vol. 10, pp. 441–474, Feb. 2009. [31] J. Wright, A. Ganesh, S. Rao, and Y. Ma., “Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization,” in Adv. Neural Inf. Process. Syst., 2009, pp. 2080–2088. [32] M. Zontak and M. Irani, “Internal statistics of a single natural image,” in Proc. IEEE CVPR, 2011, pp. 977–984. [33] D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in Proc. ICCV, Sep./Oct. 2009, pp. 349–356. [34] H. He and W. Siu, “Single image super-resolution using Gaussian process regression,” in Proc. IEEE CVPR, 2011, pp. 449–456. [35] K. Zhang, X. Gao, D. Tao, and X. Li, “Multi-scale dictionary for single image super-resolution,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., 2012, pp. 1114–1121. [36] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. [37] H. Lee, A. Battle, and A. Y. Ng, “Efficient sparse coding algorithms,” in Proc. Adv. Neural Inf. Proc. Syst., 2007, pp. 801–808. [38] X. He and P. Niyogi, “Locality preserving projections,” in Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Sch¨olkopf, Eds. Cambridge, MA, USA: MIT Press, 2004. [39] Y. Freund and R. Schapire, “A decision-theoretic generalization of online learning and an application to boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997. [40] J. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Stat., vol. 29, no. 5, pp. 1189–1232, 2001. [41] H. Wang, D. Mirota, and G. Hager, “A generalized kernel consensusbased robust estimator,” IEEE Trans. Pattern Anal. Mach Intell., vol. 32, no. 1, pp. 178–184, Jan. 2010. [42] H. Wang, T. Chin, and D. Suter, “Simultaneously fitting and segmenting multiple-structure data with outliers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 6, pp. 1177–1192, Jun. 2012.

Authors’ photographs and biographies not available at the time of publication.

Learning from errors in super-resolution.

A novel framework of learning-based super-resolution is proposed by employing the process of learning from the estimation errors. The estimation error...
33MB Sizes 1 Downloads 6 Views