Colorization using the rotation-invariant feature space.

Feature Article

Colorization Using the RotationInvariant Feature Space Bin Sheng and Hanqiu Sun ■ Chinese University of Hong Kong Shunbin Chen, Xuehui Liu, and Enhua Wu ■ Chinese Academy of Sciences

C

olorization—adding colors to monochrome images—is often investigated as a subproblem of image segmentation, owing to the intuition that the more accurately you segment images, the better you can colorize them.1,2 Such intuition motivates most people to improve colorization by pursuing a “smarter” segmentation. To humans, an image is a semantically meaningful arrangement of regions and objects. Different user interpretations of an image might lead to different segmenCurrent colorization based on tations. Image colorization is image segmentation makes highly subjective, depending on not only the source grayscale imit difficult to add or update age but also the users. By assumcolor reliably and requires considerable user intervention. ing that you should assign one color to each segmented region, A new approach gives similar users might encounter intuitive colors to pixels with similar yet challenging questions such as texture features. To do this, “How many regions do I need to it uses rotation-invariant segment?” or “How many colors Gabor filter banks and applies are sufficient to colorize the imoptimization in the feature age?” Having users create scribspace. bles to segment complex images into semantically meaningful regions and objects is generally difficult, owing to the color blending that naturally occurs in images. Colorization often requires complex inputs and refining to produce good results. Most colorization involves a one-pass process, which essentially sets up the color correspondence between “similar” pixels. You can measure the similarity by either pixel intensity or texture patterns. Approaches based on spatial-intensity continuity require much user input for disjoint regions with a similar texture.3 These approaches 24

March/April 2011

are based on image gray values, thereby limiting their use for texture-intensive images. Approaches based on classifying texture patterns try to reduce user interaction. But without user intervention, these approaches will fail for complex images owing to severe feature space overlaps. However, users also have difficulty specifying proper colors near the complex boundaries between fuzzy or subtle structures. Our study of this problem led to insight into the complementary nature of sharp texture contrast and smooth texture transition. The former is characterized by the distinct texture features that can be classified well in the feature space, the latter by the coherent texture change that can be gradually colorized in the feature space. Our goal is to optimize the color distribution between a pixel and its texture-similar neighborhood with an energy minimization in the feature space. Here, we present progressive colorization that uses rotation-invariant Gabor filtering optimization in the feature space. This optimization propagates the scribbled colors to regions with similar texture patterns, even over disjoint regions. Our approach incorporates user scribbles to construct a more discriminative feature space and measure rotation-invariant feature similarity among the pixels. It has two core passes: ■■

■■

Users paint the image’s main colors with a few broad strokes. They then refine the coloring using additional strokes.

This coarse-to-fine approach, unlike one-pass segmentation-based colorization, is intuitive for

Published by the IEEE Computer Society

0272-1716/11/$26.00 © 2011 IEEE

(a)

(b)

(c)

Figure 1. Progressive colorization using optimization based on texture continuity, for an image of the Potala Palace: (a) the initial input scribbles, (b) incremental color detailing, and (c) the final colorization result. This coarse-to-fine approach is intuitive for incrementally detailing the image through simple user scribbles. Regularization and principal component analysis

Initial coloring results Iterative color propagation

Optimization Feature vector

Gabor banks

Source grayscale image Initial scribbling

Fourier analysis

Detailing

Final colorized image

Local direction map

Figure 2. Our colorization approach. We first analyze the source image and construct an elaborate feature space, to discriminate between different regions. We then use an energy optimization to colorize the image pixels in the feature space. Iterative color propagation enriches the color details.

incrementally detailing the image through simple input scribbles. Figure 1 shows our proposed approach for an image of the Potala Palace.

The Colorization Algorithm Our approach formulates the set of rotationinvariant Gabor-based features that combines the texture-difference and texture-continuity constraints with the energy optimization in the texture feature space. Figure 2 outlines our approach’s main processing steps.

Rotation-Invariant Gabor Filters We aim to construct a feature space that can measure the texture similarity among the pixels. To generate the feature vectors, we use the responses of a bank of 24 Gabor filters (four scales and six orientations) on the k × k neighborhood around a pixel, where we experimentally set k at 5. These filters express, locally, a texture’s scale and orientation. (For the Gabor functions’ mathematical properties, see “Texture Features for Browsing and Retrieval of Image Data.”4) Given an image I(x, y), its Gabor-filtered images are J m,n ( x, y ) =

∑ ∑ I (x , y ) g 1

x1

y1

1

m,n

( x − x1 , y − y1 ).

To generate a set of Gabor functions g m,n(x, y), we rotate and scale a 2D Gabor function g(x, y) to form an almost complete and nonorthogonal basis set: g m,n ( x, y ) = a−2m g ( x ʹ, yʹ)

(1)

x �= a−m ( x cos qn + y sin qn ) y �= a−m (−x sin qn + cos qn ) , where g(·) is a normal 2D Gabor filter defined in the “Generating Gabor Filters” sidebar; a > 1; qn = np/R; m = 0, 1, …, S - 1; and n = 0, 1, …, R - 1. S is the number of scales; R is the number of orientations. To measure the similarity of two textures i and j, we use the 1-norm distance between feature vectors comprising the filtered images’ means and standard deviations, in several orientations and scales:4,5 dm,n (i, j) =

µmi ,n − µmj ,n α (µm,n )

+

σmi ,n − σmj ,n α (σm,n )

,

where a(μm,n) and a(sm,n) are the standard deviations of the respective feature vectors for feature normalization. IEEE Computer Graphics and Applications

25

Feature Article

Generating Gabor Filters

G

iven a 2D Gabor function g(x, y) and its Fourier transform G(u, v),

gm ,n ( x , y ) =

 1 x2 y 2   1 exp −  2 + 2  + 2πjWx    2πσ x σy   2  σ x σy 

Jm ,n ( x , y ) =

(B)

The mean and standard deviation of the filtered images’ magnitude, which we use to construct the feature vector, are

μm ,n =

respectively, where σu =

∑ ∑

(A)

and 2   v 2     1  (u −W )  G (u , v ) = exp −  +  ,  2  su2 sv2       

S is the total number of scales; R is the total number of orientations. Given an image I(x, y), its Gabor wavelet transform is

1 1 and σv = . 2πσ x 2πσy

In Equations A and B, sx and sy characterize the Gabor filter’s spatial extent and frequency bandwidth, and (W, 0) represents the center frequency of the filter in the frequency-domain rectilinear coordinates (u, v). Let g(x, y) be the mother generating function for the Gabor filter family. We can generate a set of Gabor functions gm,n (x, y) by rotating and scaling g(x, y) to form an almost complete, nonorthogonal basis set; that is,

1 N

x1

∑∑ x

y

y1

I ( x1, y 1) gm ,n ( x − x1, y − y − 1) .

Jm ,n ( x , y )

and

σm ,n =

1 N

∑ ∑ (J x

y

m ,n

2

( x , y ) − µm ,n ) .

Many texture classification techniques follow B.S. Man junath and Wei-Ying Ma’s approach,1 using four scales (S = 4) and six orientations (R = 6). So, we form the feature vector as T = µ0 ,0 σ0 ,1µ0 ,1 ⊃ µ3,5σ3,5 .

gm ,n ( x , y ) = a−2m g ( x �, y )�,

Reference

where x′ = a-m (xcosqn + ysinqn); y ′ = a-m (-xsinqn + cosqn); a > 1; qn = np/R; m = 0, 1, …, S - 1; and n = 0, 1, …, R - 1.

1. B.S. Manjunath and W.-Y. Ma, “Texture Features for Browsing and Retrieval of Image Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, 1996, pp. 837–842.

Such measurement of feature distance works well for some textures or Manga images but performs unacceptably for identifying textures’ rotated versions. This is because the representation directly calculates the difference of the filter banks sharing the same orientation. The measurement might regard two texture patches as very different patches, even if their appearances are almost identical but their orientations are different. Natural images often contain objects of interest in variable orientations. To construct a feature vector that will lead to rotation-invariant behavior, we determine a function A(x, y) denoting the texture direction of every pixel i in position (x, y) as A ( x, y ) = qtexi . The calculation of feature distance is orientationmatched. We generate the family of rotationinvariant Gabor filters for each pixel i by altering the filter’s orientation qn according to its texture direction qtexi :

θn = (nπ + θtexi ) / R .

(2)

Substituting Equation 2 into Equation 1, we gener26

March/April 2011

ate the rotation-invariant Gabor filter family for feature vector formation. The key here is how to find the pixels’ directionality. Our approach is based on Fourier transformation. (Because we use the Fourier form of Gabor filters to generate feature vectors, obtaining the Fourier modules here doesn’t require additional computation.) We know that an oriented feature of slope q in an image introduces a primary energy distribution along the direction of slope q ± p/2 in the complex frequency plane. So, we consider an orientation-sensitive Fourier measurement for pixel p to detect the local direction by t p (θ) =

∫

2

Fp (ρ, θ) d ρ .

F(r, q) is the Fourier transformation F(u, v) in polar coordinates on the k × k sampling window centered on p, so that we determine the direction of p as the q that maximizes tp(q). By exploiting pixels’ local texture direction, we construct the rotation-invariant feature vector, which is yielded in a 48-dimensional feature space (see Figure 3). Because natural textures are

1.0

Direction map

Confidence map

Value

0.8 0.6 0.4 0.2 0 (b)

0

5

10

15

20

25 Feature

30

35

40

45

50

0

5

10

15

20

25 Feature

30

35

40

45

50

1.0 Value

0.8 0.6 0.4 0.2 0 Original Image

(c)

(a) Figure 3. Comparing our rotation-invariant Gabor filters to traditional 2D Gabor filters. We specified (a) three sample points sharing the identical texture but having different texture orientations. For these samples, we graphed the responses of the (b) rotation-invariant and (c) traditional Gabor filter banks for 48 features. The results show that the rotation-invariant filters provide texture determination that’s more accurate than or identical to that provided by traditional filters.

complex and noisy, we regularize the Gabor filters’ outputs. We saturate the too-high values by a nonlinear transformation. With an averaging operator, we smooth the variations. So, Oh(i) is the hth element of pixel i’s feature vector, O is the image’s domain, and Wi is a k × k window around i. We consider the revised feature element O′h(i) as Ohʹ (i) =

1 Nk 2

∫

 Oh ( x )   dx , tan h α  σ (Oh )  Wi

where a and N are parameters experimentally set to 0.25 and 5, respectively, and s(Oh) is the standard deviation of the hth component of all the vectors in the feature space. In many pictures, objects aren’t uniform (they comprise many subobjects), and texture patterns usually exist across several scales. These features result in ambiguous user interpretations during interactive colorization. The inherent difficulty in image colorization is that user scribbles of the image might lead to different segmentation results. We derive a more discriminative feature space, which can adaptively fit the user’s interpretation of different texture patterns. We let the user refine the uniform subregions, using scribbles with the same color. Inspired by Revital Irony and her colleagues, we employ intradifferences and interdifferences.1 (For more on previous research in image colorization, see the related sidebar.) In our case, intradifferences are the difference vectors between pixels in the same stroke.

They represent the diversity of texture patterns in strokes of the same color (which is ignored in user interpretation). Interdifferences are the difference vectors between pixels in different-colored strokes. They indicate the texture differences among the different-colored strokes (which are recognized in user interpretation). Our texture features should make texture-similarity decisions mainly on the basis of interdifferences, ignoring the intradifferences. We transform (rotate) the space to align the new axis with the intradifference vectors’ principle direction. Then, we project the points onto the minor directions, ignoring irrelevant dimensions. Finally, we transform the subspace again to enhance the interdifferences. To do this, we first randomly sample m intradifference vectors, where m is normally 0.05nColor2, nColor being the total number of pixels on the strokes in each color. Then, we perform principal component analysis (PCA) and remove the eigenvectors corresponding to the kintra highest eigenvalues, where kintra is normally 16. Next, we randomly sample the m′ interdifferences in the resulting subspace and perform PCA, with m′ = 0.05(nScribble2 - nColor2), nScribble being the total pixels of all the strokes that the user scribbled. We keep the eigenvectors corresponding to the 16 largest eigenvalues. We’ve found that, compared with the uniform sampling, such random sampling on scribbles is simple to use and without obvious side effects in the resultant colorization. This process produces IEEE Computer Graphics and Applications

27

Feature Article

Previous Work in Image Colorization

T

he core of colorization is to estimate the 3D color from the single dimension of intensity and user input scribbles. Relying on the known grayscale intensity, some colorization approaches1–3 employ techniques that learn the relation between the grayscale image and its colored version from examples. 2,4 Revital Irony and her colleagues presented an approach to colorize grayscale images from a segmented example image. 5 Instead of relying on a series of independent pixel-level decisions, they developed a supervised-learning strategy that attempts to better classify the feature space and a voting technique to increase the colorized image’s spatial consistency. However, this technique requires a similar presegmented example image. Otherwise, segmenting the example image can be nearly as difficult as colorizing the input image. Recently, Xiopei Liu and his colleagues used multiple reference images to avoid problems from illumination differences.6 Other colorization approaches are user guided. 7–9 The user draws color scribbles over an image; the colors diffuse from the scribbles outward across the image. Anat Levin and her colleagues propagated the colors from the scribbles by solving a simple optimization problem, on the premise that neighboring pixels with similar (monochromatic) intensity should have similar colors.7 Liron Yatziv and Guillermo Sapiro used a weighted average of scribble colors. The weights are proportional to the geodesic distance between a pixel and the corresponding scribble.9 These approaches assume that the intensities are locally smooth, which might not hold for textured images. Recently, researchers have proposed approaches to colorize images for nonphotorealistic rendering. Most of these approaches use a mechanism based on segmentation, similar to typical colorization. Daniel S`ykora and his colleagues introduced a video colorization framework for black-and-white cartoons.10 They combined unsupervised image segmentation, background reconstruction, and structural prediction to reduce manual intervention. Yingge Qu and her colleagues presented a level-set segmentation approach for colorizing Manga images.8 They analyzed the feature space and defined a distance between a pixel and a scribble in that space. Their approach can successfully segment nonphotorealistic images into regions of homogeneous textures. In a natural image, the feature space is difficult to clearly classify, owing to the variety of textures. Defining an effective affinity, and hence a good criterion to control the level set, is hardly possible. Colorization is closely related to texture classification and segmentation. Texture clustering often uses features such as filter banks,11 random fields, and wavelets. Most colorization approaches assume, explicitly or implicitly, that image segments can be well defined as coherent seg28

March/April 2011

ments in texture space.5,8 Qu and her colleagues’ manga colorization technique, for example, uniformly groups pattern regions into a small number of distinctive clusters before colorization. Our approach (see the main article) leverages the advantages of approaches based on texture clustering and intensity similarity, while largely avoiding their shortcomings. Although it shares several features with both Qing Luan and her colleagues’1 and Qu and her colleagues’8 approaches, our colorization algorithm differs significantly. Unlike Luan and her colleagues’ approach, it has no color-labeling or color-mapping step, thus significantly decreasing the user interaction needed to create final colorizations. And, unlike Qu and her colleagues’ approach, it has no level-set propagation, which makes it more suitable for natural images.

References 1. Q. Luan et al., “Natural Image Colorization,” Proc. 2007 Eurographics Symp. Rendering, Eurographics, 2007, pp. 309–320. 2. Y.-W. Tai, J. Jia, and C.-K. Tang, “Local Color Transfer via Probabilistic Segmentation by Expectation-Maximization,” Proc. 2005 Int’l Conf. Computer Vision and Pattern Recognition (CVPR 05), vol. 1, IEEE CS Press, 2005, pp. 747–754. 3. T. Welsh, M. Ashikhmin, and K. Mueller, “Transferring Color to Greyscale Images,” Proc. Siggraph, ACM Press, 2002, pp. 277–280. 4. E. Reinhard et al., “Color Transfer between Images,” IEEE Computer Graphics and Applications, vol. 21, no. 5, 2001, pp. 34–41. 5. R. Irony, D. Cohen-Or, and D. Lischinski, “Colorization by Example,” Proc. 2005 Eurographics Symp. Rendering, Euro graphics, 2005, pp. 201–210. 6. X. Liu et al., “Intrinsic Colorization,” ACM Trans. Graphics, vol. 27, no. 5, 2008, article 152. 7. A. Levin, D. Lischinski, and Y. Weiss, “Colorization Using Optimization,” ACM Trans. Graphics, vol. 23, no. 3, 2004, pp. 689–694. 8. Y. Qu, T.-T. Wong, and P.-A. Heng, “Manga Colorization,” ACM Trans. Graphics, vol. 25, no. 3, 2006, pp. 1214–1220. 9. L. Yatziv and G. Sapiro, “Fast Image and Video Colorization Using Chrominance Blending,” IEEE Trans. Image Processing, vol. 15, no. 5, 2006, pp. 1120–1129. 10. D. S`ykora, J. Buriánek, and J. Žára, “Unsupervised Colorization of Black-and-White Cartoons,” Proc. 3rd Int’l Symp. Nonphotorealistic Animation and Rendering, ACM Press, 2004, pp. 121–127. 11. B.S. Manjunath and W.-Y. Ma, “Texture Features for Browsing and Retrieval of Image Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, 1996, pp. 837–842.

a transformation TPCA of the vector of the 48element responses of the rotation-invariant Gabor filter banks to a point in the low-dimensional subspace. The feature distance between pixels i and j is dist (i, j) = TPCA ( FRIG (i)) − TPCA ( FRIG ( j)) , where FRIG(x) is the feature vector generated by the rotation-invariant Gabor filter banks corresponding to the k × k neighborhood window centered at x. The generated feature space robustly handles the overlapped texture (see Figure 4). (a)

Coloring by Scribbles Once we obtain the feature vectors, we add color according to the texture similarity, which we measure as dist(i, j) in the feature space. We develop an energy minimization that defines neighborhoods using our feature space, rather than image space neighborhoods, as in Anat Levin and her colleagues’ research.3 For each pixel i, we find its textural neighbors NF(i) that look like it. NF(i) is essentially the K nearest neighbors of pixel i in the feature space, so there’s no explicit connectivity in the texture space. In colorization, determining NF(i) is crucial. We assume that at least one of each pixel’s 8-connected image spatial neighbors Nspatial(i) is similar to it. Let Dtn(i) = min(dist(sk, i)). We define sk ∈ Nspatial(i), the texture neighborhood’s distance 8,000

(b)

Figure 4. Identifying the texture regions that users have marked. (a) Our feature vector can identify the texture regions. (b) Level-set propagation on the original Gabor filter fails to identify the overlapped pattern. 5 (Source: Yingge Qu; used with permission.)

threshold, as Dthreshold(i) = t ⋅ Dtn(i), where t is a scale parameter experimentally set to 1.5. Then, we define NF(i) as

{

}

N NFF (i) = j dist (i, j) < Dthreshold (i) . Our neighborhood can form several connected subgraphs in the feature space (see Figure 5). This might pose an implicit constraint on user scribbles: the colors specified for each subgraph in the feature space are propagated to the pixels only

t = 0.4

No. of connected subgraphs

7,000 6,000 t = 0.5

5,000 4,000 3,000 2,000 1,000 0

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

t Figure 5. The connected subgraphs formed in the feature space, for four images appearing in this article: the Potala Palace (green), a peacock (red), vases (magenta), and a flower field in front of a house (blue). The number of connected subgraphs corresponds directly to the monotonic decreasing functions of t (a scale parameter), which represent the neighborhood threshold in the feature space.

IEEE Computer Graphics and Applications

29

Feature Article

(a)

(b)

(c)

Figure 6. Coloring over disjoint regions: (a) the input strokes, (b) our colorization using rotation-invariant optimization in the feature space, and (c) colorization using optimization in the image spatial space.3 Our approach propagates colors more effectively.

Because our approach constructs the texture neighborhood in the rotationinvariant feature space, the propagation of our color detailing is intuitive. in the subgraph itself. To make the colors freely propagate over the whole image, the feature space should be totally connected for colorization. In most of our tests, the number of subgraphs usually decreases to 1 when t is 1.5, which makes the user input flexible. For the neighborhood constructed in the feature space, we impose the constraint that the pixels in NF(i) have a color similar to that of pixel i as the texture feature similarity. The problem here is how to estimate the U and V component values of the input grayscale image from the known intensity value Y and some color hints by the user via scribbles or seed pixels. Because we process the U and V values similarly, for simplicity we focus here on only U. We minimize the difference between the color component U(i) at pixel i and the weighted average of the colors at the neighboring pixels in the feature space: E (U ) =

∑ i

 2  WkiU (k) , U (i) −   k∈N F (i) 

∑

where W ki is a weighting function that sums to one. W ki is large when the feature vector of i is similar to that of j, and small when the feature distance is long. k ∈ NF(i) denotes that k belongs to i’s neighboring pixels in the feature space. We rewrite the weighting functions3 by using the distances between the pixels in the feature space: Wki ∝ e−dist(k,i)/2sF (i) , k ∈ N F (i) , 2

30

March/April 2011

where sF(i) is the variance of the feature distances between i and its neighbors. Given a set of locations iuser where the colors are user-specified, u(iuser) = uscribble and v(iuser) = vscribble, we minimize E(U) and E(V) subject to these constraints. Because the cost functions are quadratic and the constraints are linear, this optimization yields a large, sparse system of linear equations, which we can solve using methods such as GMRES (generalized minimal residual).6 As we mentioned before, the energy optimization propagates the color into similar regions regardless of their spatial connectivity. If we used the pixel neighbors in the image spatial space,3 we couldn’t propagate the colors over the disjoint texture regions by assigning the colors reliably near the small range of input strokes. Because we use texture similarity in the rotation-invariant feature space instead of intensity continuity in the spatial neighborhood, we can propagate colors more effectively (see Figure 6). Furthermore, we can simply apply our colorization approach to edit the final coloring effects, as we discuss in the next sections. Color detailing and refining. To add or update color details in colorized images, users scribble strokes indicating which regions or objects need further colorization and what colors to use. This makes colorizing complex images simple. An iterative energy optimization propagates the color Uuser of the user-specified pixels to their textural neighbors: U user 0 U detail (i) =  U (i)  p+1 U detail (i) =

∑

k∈N F (i)

if i∈ strokedetail otherwise P WkiU detail (k) ,

0 where U detail (i) is the color of i calculated at the initial detailing stage, strokedetail is the set of pixels on the user scribbles, and p is the iteration time

(a)

(b)

(c)

(d)

Figure 7. Color detailing. The (a) input strokes and (b) colorization results for patch-based color labeling. 2 To obtain the results, the user draws multiple strokes on similar patterns with different orientations and scales. The (c) input strokes and (d) colorization results for our progressive colorization (two input strokes in the first pass, five strokes in the second pass). Unlike patch-based color labeling, our approach isn’t sensitive to details varying in the scale and rotations. (Source: Qing Luan; used with permission.)

of the user-controlled color detailing. Because our approach constructs the texture neighborhood in the rotation-invariant feature space, the propagation of our color detailing is intuitive, beyond the 8-connected spatial limitation. The detailing doesn’t require many strokes for detail regions that are widely scattered throughout the image’s spatial space. Unlike patch-based color labeling,2 our detailing isn’t sensitive to details varying in the scale and rotations (see Figure 7). Although our colorization can faithfully estimate color in most texture regions, it might cause texture misclassification and assign colors incorrectly for the texture boundaries because of the rapid change of texture features. To deal with the inevitable mistakes, we’ve designed an interactive refinement mechanism to recover the spatial continuity in some cases. The refining is similar to our color detailing, except that it defines the pixel neighborhood in the image’s spatial space, as in

Levin and her colleagues’ research.3 In our tests, such color refining provided better colorization, easing the balancing of texture continuity with spatial continuity. Color transferring and blending. By carefully selecting the palette of colors, our approach also allows exemplar colorization to achieve a convincing result.1,7 Given a similar reference color image, colorization of a grayscale image is fairly easy. We first apply the Gabor filter to both the target image’s and the reference image’s luminance channel. We measure the feature distance between a pixel in the target image and one in the reference image. That is, for every pixel in the target image, we define its texture-neighboring pixels in the reference image. Once we’ve constructed the interimage neighborhood Nf(i), we use an energy optimization to obtain the color in the target image. Figure 8 shows an example of color transferring using our approach. IEEE Computer Graphics and Applications

31

Feature Article

(a)

+

=

(b)

(c)

Figure 8. Color transferring: (a) the source grayscale image, (b) the reference color image, and (c) the resulting image with transferred colors. Unlike Revital Irony and her colleagues’ approach,1 ours doesn’t require presegmentation for the reference image.

We can similarly apply our approach to color blending. Because our approach propagates color in the feature space, we interactively assign different colors for pixels having similar textures. In such a way, we can obtain soft, natural blending by optimizing the U and V channels. Figure 9 compares our blending result with that of Liron Yatziv and Guillermo Sapiro.8

Results and Discussion We performed experiments on an Intel Core 2 2.3GHz PC with an Nvidia GeForce 8800 GPU and 1 Gbyte of RAM. Our sample images ranged from 300 × 250 to 800 × 600 pixels. Color detailing and refining was GPU-accelerated by propagating the color from the neighboring pixels in parallel; the colorization detailing could be archived within 3,000 iterations. To implement progressive color detailing and refining in real time, we used Nvidia’s CUDA (Compute Unified Device Architecture). Table 1 shows the timing results for coloring passes for different images. Figure 10 compares our results with those of Yingge Qu and her colleagues’ method5 and Qing Luan and her colleagues’ method.2 Our approach can handle the overlapped

(a)

(b)

texture patterns without lossy resizing. Qu and her colleagues use essentially texture segmentation with a Gabor filter, which can’t handle the overlapped texture and texture transition. Luan and her colleagues’ patch-based approach works well on manga images. However, it has difficulty capturing texture features because it uses sum-of-squareddifference distances between patches. Also, the colorization will likely reduce the image size, which limits the colorization’s quality. In our approach, owing to the rotation-invariant Gabor filtering, the feature similarity is more reliable. Figure 11 compares our approach with previous ones for a natural image. Using three strokes (see Figure 11a), our approach produces the colored image in Figure 11b. Alternatively, using 10 strokes (see Figure 11c) Tomihisa Welsh and his colleagues’ patch-based approach7 produces the image in Figure 11d, Levin and her colleagues’ intensity-based approach3 produces the image in Figure 11e, and Yatziv and Sapiro’s intensity-based approach8 produces the image in Figure 11f. The results show that patch-based colorization isn’t sufficient to classify the different regions, especially the sky and the grass, while optimization based solely on intensity can’t identify the texture boundary. Figure 12 shows different coloring effects on an image of a flower field in front of a house.2 Our progressive colorization (see Figures 12a and 12b) produces better results than other approaches that use the same number of input strokes (see Figures 12c through 12g). For example, you could also use lazy snapping9 (see Figure 12d) and graph-cut segmentation10 (see Figure 12e) to color the grayscale image by labeling and color filling. However, these methods can’t output natural colorization results because the texture regions in natural images usually can’t be accurately segmented or labeled. Our approach produces results similar to those of approaches that use the same number of strokes for the initial colorization but require more strokes or manual color tuning to produce the final result

(c)

Figure 9. Color blending: (a) the input scribbles, (b) blending using Liron Yatziv and Guillermo Sapiro’s approach,8 and (c) our blending, which produces soft, natural scenery. 32

March/April 2011

Table 1. The timing for our color detailing and refining on a GPU. Image

Image size (pixels)

Coloring pass

No. of iterations

Time (ms)

Potala Palace (Fig. 1)

600 × 450

2nd (detailing)

2,500

31

Manga character (Fig. 10b)

800 × 590

2nd (detailing)

1,000

15

Manga character (Fig. 10c)

800 × 590

3rd (detailing)

1,000

15

House with flowers (Fig. 12b)

392 × 262

2nd (detailing)

1,500

10

(a)

(b)

(c)

(e)

(d)

Figure 10. Colorization of a manga image: our progressive colorization for (a) eight input strokes in the first pass, (b) one stroke in the second pass, and (c) one stroke in the third pass; (d) colorization using Yingge Qu and her colleagues’ approach;5 and (e) colorization using Qing Luan and her colleagues’ approach.2 In our approach, the feature similarity is more reliable. (Source for 10e: Qing Luan; used with permission.)

(a)

(b)

(c)

(d)

(e)

(f)

Figure 11. Colorization of a natural image: With (a) three input strokes, the results for (b) our approach. With (c) 10 input strokes, the results for (d) Tomihisa Welsh and his colleagues’ patch-based approach,7 (e) Anat Levin and her colleagues’ intensity-based approach,3 and (f) Yatziv and Sapiro’s intensity-based approach.8 Our approach produces plausible colorized results owing to the feature space constructed with rotationinvariant Gabor filtering. (Source: Qing Luan; used with permission.)

(see Figures 12h through 12j). Colorizations based on feature space segmentation either can’t handle severe texture misclassification (see Figure 12e) or require additional interactions to recover (see Figure 12j).

Like other texture-based colorization approaches, ours might not properly colorize image regions having similar features but different semantic meanings. This is because the system regards these regions to be uniform and therefore blends the IEEE Computer Graphics and Applications

33

Feature Article

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Figure 12. Comparing our progressive colorization to other approaches. Results for our approach after (a) three strokes in the first pass and (b) three more strokes in the second pass. For (c) six input strokes, the results for (d) lazy snapping and (e) graph-cut segmentation on texture features. For Levin and her colleagues’ approach,3 the (f) user input and (g) result. For Luan and her colleagues’ approach,2 (h) the six initial input strokes, (i) manual color-tuning for specified sample pixels, and (j) the final result. Owing to our scribblebased optimization in the feature space, our approach can produce satisfactory colorization results with fewer user interactions compared with previous approaches. (Source: Qing Luan; used with permission.)

colors together. Because the problem is caused by Gabor filters’ limited discernibility, we’ll design a new texture descriptor by taking morphological features into account to better handle such a problem.

O

ur approach is effective and is simple to apply to a wide range of images. To minimize the interactions needed to produce good results, we plan to explore more sophisticated, monochrome texture descriptors, such as steerable pyramids and other wavelet-related transforms. Using descriptors that constitute a better model of the human visual system, we hope to further improve our approach’s texture matching.

Acknowledgments This research was supported by Hong Kong Research Grants Council grants 416007 and 415806, the National Grand Fundamental Research 973 Program (2009CB320802), and a University of Macau research grant. We thank Qing Luan and Yingge Qu for providing the test images and helpful discussions. 34

March/April 2011

References 1. R. Irony, D. Cohen-Or, and D. Lischinski, “Color ization by Example,” Proc. 2005 Eurographics Symp. Rendering, Eurographics, 2005, pp. 201–210. 2. Q. Luan et al., “Natural Image Colorization,” Proc. 2007 Eurographics Symp. Rendering, Eurographics, 2007, pp. 309–320. 3. A. Levin, D. Lischinski, and Y. Weiss, “Colorization Using Optimization,” ACM Trans. Graphics, vol. 23, no. 3, 2004, pp. 689–694. 4. B.S. Manjunath and W.-Y. Ma, “Texture Features for Browsing and Retrieval of Image Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, 1996, pp. 837–842. 5. Y. Qu, T.-T. Wong, and P.-A. Heng, “Manga Colorization,” ACM Trans. Graphics, vol. 25, no. 3, 2006, pp. 1214–1220. 6. Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed., SIAM Press, 2003. 7. T. Welsh, M. Ashikhmin, and K. Mueller, “Transferring Color to Greyscale Images,” Proc. Siggraph, ACM Press, 2002, pp. 277–280. 8. L. Yatziv and G. Sapiro, “Fast Image and Video Colorization Using Chrominance Blending,” IEEE

Trans. Image Processing, vol. 15, no. 5, 2006, pp. 1120–1129. 9. Y. Li et al., “Lazy Snapping,” ACM Trans. Graphics, vol. 23, no. 3, 2004, pp. 303–308. 10. V. Kwatra et al., “Graphcut Textures: Image and Video Synthesis Using Graph Cuts,” ACM Trans. Graphics, vol. 22, no. 3, 2003, pp. 277–286.

Bin Sheng is a PhD candidate in interactive graphics and VR at the Chinese University of Hong Kong. His research interests include image-based modeling and rendering, mesh processing, sketch-based modeling, hardware-accelerated rendering, and naturalappearance modeling. He has an MSc in graphics and VR from the University of Macau. Contact him at [email protected]. Hanqiu Sun is an associate professor of computer science and engineering at the Chinese University of Hong Kong. Her research interests include virtual and augmented reality, interactive graphics and animation, hypermedia, mobile image and video processing and navigation, and touch-enabled simulations. Sun has a PhD in computer science from the University of Alberta. Contact her at [email protected]. edu.hk.

Shunbin Chen is a master’s student in computer graphics at the Institute of Software, Chinese Academy of Sciences. His research interests include image colorization, texture synthesis, and rendering techniques. Chen has a BS from the University of Electronic Science and Technology of China. Contact him at [email protected]. Xuehui Liu is an associate professor of computer science at the Institute of Software, Chinese Academy of Sciences. Her research interests include realistic-image synthesis and physically based modeling and rendering. Liu has a PhD in computer graphics and VR from the Institute of Software, Chinese Academy of Sciences. Contact her at [email protected]. Enhua Wu is a professor of computer science at the Institute of Software, Chinese Academy of Sciences, and the University of Macau. His research interests include virtual reality, realistic-image synthesis, physically based modeling and rendering, and data visualization. Wu has a PhD in computer graphics from the University of Manchester. Contact him at ehwu@ umac.mo or [email protected]. Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.

handles the details so you don’t have to! Professional management and production of your publication Inclusion into the IEEE Xplore and CSDL Digital Libraries Access to CPS Online: Our Online Collaborative Publishing System Choose the product media type that works for your conference: Books, CDs/DVDs, USB Flash Drives, SD Cards, and Web-only delivery!

Contact CPS for a Quote Today! www.computer.org/cps or [email protected]

IEEE Computer Graphics and Applications

35

News feature: Space fossils.

Feature-based attention elicits surround suppression in feature space.

Complete fold annotation of the human proteome using a novel structural feature space.

Anxious mood narrows attention in feature space.

Transfer Learning across Feature-Rich Heterogeneous Feature Spaces via Feature-Space Remapping (FSR).

Reduced visual feature binding in the near-hand space.

Automatic MRI segmentation of para-pharyngeal fat pads using interactive visual feature space analysis for classification.

Variational exemplar-based image colorization.

Interactions between space-based and feature-based attention.

Mining recurring concepts in a dynamic feature space.

Interactive Feature Space Explorer© for multi-modal magnetic resonance imaging.

3D Point Correspondence by Minimum Description Length in Feature Space.

Colorization-Based RGB-White Color Interpolation using Color Filter Array with Randomly Sampled Pattern.

Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-gram probability feature spaces.

PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine.

A systematic exploration of the micro-blog feature space for teens stress detection.

Heartbeat classification using disease-specific feature selection.

Disease gene prioritization using network and feature.

Robust Feature Selection Technique using Rank Aggregation.

Tree-structured feature extraction using mutual information.

Structured feature selection using coordinate descent optimization.

Joint spatial-spectral feature space clustering for speech activity detection from ECoG signals.

Structure-sensitive saliency detection via multilevel rank analysis in intrinsic feature space.

Modified Dendrogram of High-dimensional Feature Space for Transfer Function Design.