Transparent composite model for DCT coefficients: design and analysis.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2014.2300818, IEEE Transactions on Image Processing 1

Transparent Composite Model for DCT Coefficients: Design and Analysis En-hui Yang, Fellow, IEEE, Xiang Yu, Jin Meng, Chang Sun

Abstract—The distributions of DCT coefficients of images are revisited on a per image base. To better handle the heavy tail phenomenon commonly seen in DCT coefficients, a new model dubbed a transparent composite model (TCM) is proposed and justified for both modeling accuracy and an additional data reduction capability. Given a sequence of DCT coefficients, a TCM first separates the tail from the main body of the sequence. Then, a uniform distribution is used to model DCT coefficients in the heavy tail while a different parametric distribution is used to model data in the main body. The separate boundary and other parameters of the TCM can be estimated via maximum likelihood (ML) estimation. Efficient online algorithms are proposed for parameter estimation and their convergence is also proved. Experimental results based on Kullback-Leibler (KL) divergence and χ2 test show that for real-valued continuous AC coefficients, the TCM based on truncated Laplacian (LPTCM) offers the best trade-off between modeling accuracy and complexity. For discrete/integer DCT coefficients, the discrete TCM based on truncated geometric distributions (GMTCM) models AC coefficients more accurately than pure Laplacian models and GG mdoels in majority cases while having simplicity and practicality similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a good capability of data reduction/feature extraction—DCT coefficients in the heavy tail identified by the GMTCM are truly outliers, and these outliers represent an outlier image revealing some unique global features of the image. Overall, the modeling performance and the data reduction feature of GMTCM make it a desirable choice for modeling discrete/integer DCT coefficients in realworld image/video applications, as summarized in a few of our further studies on quantization design, entropy coding design, and image understanding and management. Index Terms—DCT, χ2 test, KL divergence, TCM, geometric distribution.

I. I NTRODUCTION Rom its earlier adoption in JPEG to its recent application in HEVC, the newest video coding standard [3], the discrete cosine transform (DCT) has been widely applied in digital signal processing, particularly in lossy image/video coding. It has thus attracted, during the past a few decades, a lot of interests in understanding the statistical distribution of DCT coefficients (see, for example, [1], [5], [8], [10], and references therein). Deep and accurate understanding of the distribution

F

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant RGPIN203035-11 and Strategic Grant STPGP397345, and by the Canada Research Chairs Program. The material in this paper was presented in part at the IEEE International Conference on Big Data, Silicon Valley, CA, USA, Oct. 2013. En-hui Yang, Jin Meng, and Chang Sun are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada. Email: [email protected]; [email protected]; [email protected]. Xiang Yu is with Multicom Technologies Inc., Email: [email protected].

of DCT coefficients would be useful to compression[13], rate control[8], and image understanding[1][14]. In the literature, Laplacian distributions, Cauchy distributions, Gaussian distributions, mixtures thereof, and generalized Gaussian (GG) distributions have all been suggested to model the distribution of DCT coefficients ([2], [5], [10]). Depending on the actual image data sources used and the need to balance modeling accuracy and model’s simplicity/practicality, each of these models may be justified to some degree for some specific application. In general, it is believed that in terms of modeling accuracy, GG distributions with a shape parameter and a scale parameter achieve the best modeling accuracy [2][10]. However, parameter estimation for GG distributions is difficult and hence the applicability of the GG model to applications, particularly on-line applications, may be limited. On the other hand, the Laplacian model has been found to balance well between complexity and modeling accuracy; it has been widely adopted in image and video coding [13], although its modeling accuracy is significantly inferior to that of the GG model [2]. Both Laplacian and GG distributions decay exponentially fast. However, in many cases it is observed that DCT coefficients have a relatively heavy tail, which can not be effectively modeled by an exponentially decaying function (see Section II-C for illustration and detailed discussions). Indeed, improvement on modeling the tail portion could lead to better coding performance, as shown in [8] in video coding. However, the Cauchy model does not model the main portion of DCT coefficients effectively. Therefore, in addition to balancing modeling accuracy and model’s simplicity/practicality, a good model of DCT coefficients also needs to balance the main portion and tail portion of DCT coefficients. In this paper, we propose a new model dubbed transparent composite model (TCM), which presents an enhanced modeling for the heavy-tail and also exhibits a non-linear data reduction capability. Specifically, the tail portion of DCT coefficients is modeled separately from the main portion of DCT coefficients by a uniform distribution, and the main portion is modeled instead by a different parametric distribution. This composite model introduces a boundary parameter to control which model to use for any given DCT coefficient; it is marked as transparent because there is no ambiguity regarding which model (the uniform or parametric) a given DCT coefficient will fall into once the TCM is determined. The separate boundary and other parameters of the TCM can be estimated via maximum likelihood (ML) estimation. We further propose efficient online algorithms and prove their global convergence. Experimental results show that for real-valued continuous

1057-7149 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


AC coefficients, the TCM based on Laplacian distributions matches up to pure GG models in term of modeling accuracy, but with simplicity and practicality similar to those of pure Laplacian models. On the other hand, for discrete/integer DCT coefficients, which are mostly seen in real-world applications of DCT, the discrete TCM based on truncated geometric distribution (GMTCM) models AC coefficients more accurately than pure Laplacian models and GG mdoels in majority cases while having simplicity and practicality similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a good capability of feature extraction/data reduction, which is of paramount importance to the operability and scalability of large-scale image/video data processing systems. Common data reduction techniques include transformation such as principal component analysis as in [18], which is conducted separately from data modeling. The proposed TCM, however, naturally results in data reduction. That is, on one hand, data in the heavy tail identified by the GMTCM are truly outliers, and these outliers represent an outlier image revealing some unique global features of the image; on the other hand, the outlier image only contains a statistically insignificant part (around 1%) of the original data, thus achieving a dramatic data reduction. As a result, the proposed TCM provides a unified solution to both theoretical modeling and data reduction for DCT coefficients. Preliminary results in this study were partially presented in its conference version [19]. The rest of the paper is organized as follows. Section II discusses background on various models and the heavy tail phenomenon in DCT coefficients. The continuous TCM and discrete TCM are presented in Section III and Section IV, respectively. Section V shows experimental results on modeling accuracy by TCMs, followed by discussions on applying TCMs in Section VI. Finally, Section VII concludes the paper. II. DCT MODELS AND THE HEAVY TAIL PHENOMENON This section first reviews three metrics for testing modeling accuracy, followed by a survey of relevant studies in the literature for modeling DCT coefficients. We then discuss the heavy tail phenomenon in DCT coefficients. A. Measurement for modeling accuracy There are three methods commonly used in the literature for testing modeling accuracy, i.e., Kolmogorov-Smirnov (KS) test, Kullback-Leibler (KL)[12] divergence, and χ2 test[2]. In general, the KS test is more sensitive to the main portion than to the tail part. The χ2 test, on the other hand, shifts its focus to the tail portion more than the KS test does. The KL divergence, as shown by its use of logarithm, stands in the middle of KS and χ2 test in terms of balancing between the fitness of the main portion and the fitness of the tail part. Similar as in [2], this paper prefers the χ2 test over the KS test for measuring the modeling accuracy. Besides the justification provided in [2] for using the χ2 test rather than KS test as χ2 gives a more meaningful guidance to source coding, our preference also roots in the heavy-tail phenomena of DCT coefficients. Specifically, χ2 test better characterizes a

statistically insignificant tail portion in a distribution while the KS test tends to overlook the tail part. In the following, more detailed discussions are present for the heavy tail phenomena. Besides using χ2 test, we also use the KL divergence for comparing modeling accuracy, due to its balance between the emphasis on the main portion by the KS test and the emphasis on the tail by the χ2 test. Given a sequence of sample probabilities {pi }, and a sequence of model probabilities {qi }, the KL divergence of the model from the observations is ∑ pi (2.1) KL = pi · ln , qi i where 0 ln 0 is defined as 0. The χ2 test is defined as χ2 =

∑ n · (pi − qi )2 i

qi

,

(2.2)

where n is the total number of samples. B. Models in the literature for DCT coefficients 1) Gaussian distributions: Gaussian distributions are widely used for modeling DCT coefficients [1], and its justification roots in the central limit theorem (CLT) [12]. A comprehensive collection of distributions based on Gaussian probability density function were studied in [9]. However, it was observed that DCT coefficients for natural images/video usually possess a tail heavier than Gaussian distributions [2]. Consequently, generalized Gaussian distributions have been suggested for modeling DCT coefficients. 2) Generalized Gaussian distributions: The probability density function (pdf) of generalized Gaussian distribution (GGD) with a zero mean for modeling DCT data is as follows, f (y) =

β β e−(|y|/α) 2αΓ(1/β)

(2.3)

where α is a positive scale parameter, β defines a positive shape parameter, and Γ(·) denotes the gamma function. It is easy to see that when β = 1, the GGD is de-generalized to a Laplacian distribution. When β = 2, it becomes the Gaussian distribution with variance α2 /2. With the free choice of the scale parameter α and the shape parameter β, the GGD has shown an effective way to parameterize a family of symmetric distributions spanning from Gaussian to uniform distributions, and a family of symmetric distributions spanning from Laplacian to Gaussian distributions. As mentioned above, DCT coefficient distributions are observed to posses heavy tails. In this regard, the GGD allows for either heavier-thanGaussian tails with β < 2, heavier-than-Laplacian tails with β < 1, or lighter-than-Gaussian tails with β > 2. As such, the GG model outperforms in general both the Gaussian and Laplacian models in terms of modeling accuracy for modeling DCT coefficients. Nevertheless, the benefit of accurate modeling by the GG model comes with some inevitable drawbacks: its lack of closed-form cumulative distribution function (cdf) and a high complexity for parameter estimation. As shown in



Lenna, Frequency Position (1 0)

[2], the maximum likelihood estimation for β is to solve the following equation,

∫

where

1

ψ(τ ) = γ +

10

(2.4)

counts

n ψ(1/β + 1) + log(β) 1 1∑ + log( |Yi |β ) β2 β2 n i=1 ∑n |Yi |β log |Yi | = 0, − i=1∑n β i=1 |Yi |β

15

5

(1 − tτ −1 )(1 − t)−1 dt

0

0

1 −(|y|/λ) e , (2.5) 2λ where λ denotes a positive scale parameter. Given a sequence of samples Yi , i = 1, · · · , n, the ML estimate of λ can be easily computed as f (y) =

1∑ |Yi |. n i=1 n

λ=

(2.6)

4) Other distributions: There are other distributions investigated in the literature for modeling DCT, [6], [8], [7], [9], that are inspired by the heavy-tail observations of DCT coefficients. An interesting one is Cauchy distribution [8], f (y) =

r 1 , π (y − y0 )2 + r2

(2.7)

where y0 is a location parameter and r stands for a scale parameter. Our studies on comparing the Cauchy model with the GGD show that the GGD generally provides a better goodness of fitting than the Cauchy model (See Section V). In addition, the application of the Cauchy distribution is also limited due to the fact that it does not have finite moments of any order, causing difficulties for its parameter estimation. C. Heavy tail observations Laplacian, Gaussian, and GG distributions all decay exponentially fast. As illustrated in Figure 1, however, DCT coefficients usually possess a much heavier tail. Figure 1 was obtained by applying the floating-point type-II 8 × 8 DCT to the well-known 512 × 512 Lenna image, where the yellow bars show the histogram of the DCT coefficients. It is evident from Figure 1 that the histogram of the DCT coefficients first decays quite rapidly for the main portion of DCT coefficients and then becomes relatively flat for the tail portion of DCT coefficients. The second panel of Figure 1 zooms in the tail portion and further compares the histogram of DCT coefficients against the GG and Laplacian models, where the yellow bars again represent the histogram of DCT coefficients, and the red and black curves show results from the GG and Laplacian models,

−200 −150 −100 −50 0 50 100 150 200 DCT coefficient magnitude interval with width being 1

250

Lenna, Frequency Position (1 0) 2.5

sample histogram Estimation by GGD (β=0.47045) Estimation by Laplace (λ=27.74683)

2

2

χ (GGD)=7.002849e+002 2

χ (Laplace)=1.827638e+005 1.5 counts

and γ ∑= 0.577··· denotes the ∑ Euler constant. Clearly, the n n terms i=1 |Yi |β log |Yi | and β i=1 |Yi |β yield a significant amount of computation when a numerical iterative solution of β is used. 3) Laplacian distributions: Due to its simplicity and fair modeling performance, the Laplacian model becomes the most popular choice in use [10][11], with its pdf given as follows,

1

0.5

0 100

150 200 250 300 DCT coefficient magnitude interval with width being 1

Fig. 1. Histogram and the tail of an AC component in the 8 × 8 DCT block of ‘Lenna’.

respectively. In Figure 1, the ML estimates of the parameters of the GG model were computed via Matlab codes from [14] while the λ value of the Laplacian model was computed using (2.6). For both models, the χ2 tests were performed to evaluate their respective modeling accuracy. According to the χ2 test, the GG model significantly outperforms the Laplacian model. Furthermore, in Figure 1, the obtained shape parameter β is much smaller than 1, meaning that the resulting GG distribution possesses a tail heavier than that of the Laplacian distribution. In comparison with the real data histogram shown in Figure 1, however, the GG model still suffers from an exponentially bounded tail, which is much lighter than that of the DCT coefficients. III. C ONTINUOUS T RANSPARENT C OMPOSITE M ODEL To better handle the heavy tail in DCT data, we now separate the tail of DCT coefficients from the main part and use a different model to model each of them. Since DCT coefficients in the tail portion are insignificant statistically, each of them often appears once or a few times. Hence it would make sense to model them separately by a uniform distribution while modeling the main portion by a parametric distribution, yielding a model we call a transparent composite model. In this section, we assume that DCT data are continuous and consider continuous TCMs. A. Description of general continuous TCMs Consider a pdf f (y|θ) with parameters θ ∈ Θ, where θ could be a vector, and Θ is the parameter space, and denote by F (y|θ) the corresponding cdf. Assume that f (y|θ) is symmetric in y with respect to the origin, and F (y|θ) is



concave as a function of y in the region y ≥ 0. It is easy to verify that Laplacian, Gaussian, GG, and Cauchy distributions all satisfy this assumption. The TCM based on F (y|θ) is defined as |y| < yc yc m, N1 (yc ) and N2 (yc ) remain the same and N3 (yc ) is empty for all yc ∈ Ii . Since by assumption F (y|θ) as a function of y is concave, it is not hard to verify that as a function of yc , (−|N2 (yc )| ln 2(a − yc ) − |N1 (yc )| ln [2F (yc |θ) − 1]) is convex over yc ∈ Ii , and hence its value over yc ∈ Ii is upper bounded by the maximum of its value at yc = Wi and yc = Wi−1 , i.e., the endpoints of Ii . Therefore, in view of (3.3), we have

∑

(3.5)

When Im is nonempty, a similar argument leads to sup [g(yc , b, θ|Y1n )] ≤ max{g(d, b, θ|Y1n ), g(Wm , b, θ|Y1n )}.

yc ∈Im

(3.6)

Putting (3.4) to (3.6) together yields max{g(yc , b, θ|Y1n ) : d ≤ yc < a, 0 ≤ b ≤ 1, θ} = max max{g(d, b, θ|Y1n ), g(Wi , b, θ|Y1n ) : m ≤ i ≤ n}. (3.7)

{i : |Yi | < yc } {i : yc < |Yi |} {i : |Yi | = yc }.

b,θ

Then the log-likelihood function g(yc , b, θ|Y1n ) according to (3.1) is equal to

1)

max g(yc , b, θ|Y1n ) yc ∈[d,Ymax ] max max{g(d, b, θ|Y1n ), g(Wi , b, θ|Y1n ), b,θ sup [g(yc , b, θ|Y1n )] : m ≤ i ≤ n}. yc ∈Ii max max

0≤b≤1

yc ∈Ii

Let Y1n = Y1 , Y2 , · · · , Yn be a sequence of DCT coefficients. Assume that Y1n behaves according to the TCM defined in (3.1) with Ymax , max{|Yi | : 1 ≤ i ≤ n} < a and Ymax ≥ d. (When Ymax < d, there are no outliers and the ML estimate of yc and b is equal to d and 1, respectively.) We next show how to compute the ML estimate of yc , b and θ. Given Y1n with d ≤ Ymax < a , let

= |N2 (yc )| ln(1 − b) + |N1 (yc )| ln b +

=

sup [g(yc , b, θ|Y1n )] ≤ max{g(Wi−1 , b, θ|Y1n ), g(Wi , b, θ|Y1n )}.

B. ML estimate of TCM parameters

g(yc , b, θ|Y1n )

max{g(yc , b, θ|Y1n ) : d ≤ yc < a, 0 ≤ b ≤ 1, θ} =

p(y|yc , b, θ)   2F (ycb|θ)−1 f (y|θ),    1−b , 2(a−yc ) , 1−b  max{ 2F (ycb|θ)−1 f (yc |θ), 2(a−y },  c)   0,

N1 (yc ) , N2 (yc ) , N3 (yc ) ,

which, together with (3.3), implies that

Therefore, the ML estimate of yc is equal to one of d, Wm , Wm+1 , · · · , Wn . We are now led to investigating maxb,θ g(yc , b, θ|Y1n ) for each yc ∈ {d, Wm , Wm+1 , · · · , Wn }. Define N1+ (yc ) , {i : |Yi | ≤ yc } and N2+ (yc ) , {i : yc ≤ |Yi |}.

ln f (Yi |θ)

i∈N1 (yc )

1−b bf (yc |θ) +|N3 (yc )| max{ln , ln } (3.2) 2(a − yc ) [2F (yc |θ) − 1] −|N2 (yc )| ln 2(a − yc ) − |N1 (yc )| ln [2F (yc |θ) − 1] where |S| denotes the cardinality of a finite set S, and the equality 1) is due to (3.1) and the fact that ln z is strictly increasing in the region z > 0. Since F (y|θ) is nondecreasing with respect to y, we have g(yc , b, θ|Y1n ) ≤ g(Ymax , b, θ|Y1n ) for any Ymax < yc < a, which leads to: max{g(yc , b, θ|Y1n ) : d ≤ yc < a, 0 ≤ b ≤ 1, θ}

(3.3)

= max{g(yc , b, θ|Y1n ) : d ≤ yc ≤ Ymax , 0 ≤ b ≤ 1, θ}. To continue, we now sort |Y1 |, |Y2 |, · · · , |Yn | in ascending order into W1 ≤ W2 ≤ · · · ≤ Wn . Note that Wn = Ymax . Let m be the smallest integer i such that Wi ≥ d. Define Im = (d, Wm ) and for any m < i ≤ n, Ii = (Wi−1 , Wi ). Then it is easy to see that the interval [d, Ymax ] can be decomposed as [d, Ymax ] = {d, Wm , Wm+1 , · · · , Wn } ∪ (∪ni=m Ii )

Further define: g + (yc , b, θ|Y1n ) , (|N2 (yc )|) [ln(1 − b) − ln 2(a − yc )] ∑ b + ln f (Yi |θ) (3.8) +|N1+ (yc )| ln 2F (yc |θ) − 1 + i∈N1 (yc )

( ) g − (yc , b, θ|Y1n ) , |N2+ (yc )| [ln(1 − b) − ln 2(a − yc )] ∑ b +|N1 (yc )| ln + ln f (Yi |θ) (3.9) 2F (yc |θ) − 1 i∈N1 (yc )

Note that the difference between g + (yc , b, θ|Y1n ) and g − (yc , b, θ|Y1n ) lies in whether or not we regard yc itself as an outlier when yc is equal to some Wi . Comparing (3.3) with (3.8) and (3.9), we have g(yc , b, θ|Y1n ) = max{g + (yc , b, θ|Y1n ), g − (yc , b, θ|Y1n )} (3.10) and hence max g(yc , b, θ|Y1n ) =

(3.11)

b,θ

max{max g + (yc , b, θ|Y1n ), max g − (yc , b, θ|Y1n )}. b,θ

b,θ



(b(yc ), θ(yc )) , arg max g(yc , b, θ|Y1n )

Let

b,θ

(b+ (yc ), θ+ (yc )) , arg max g + (yc , b, θ|Y1n ) b,θ

(b− (yc ), θ− (yc )) , arg max g − (yc , b, θ|Y1n ). b,θ

when f (y|θ) is the GG family, Step 6 is quite involved. In the next two subsections, we will examine Step 6 in two cases: (1) f (y|θ) is the Laplacian family, and the corresponding TCM is referred to as the LPTCM; and (2) f (y|θ) is the GG family, and the corresponding TCM is referred to as the GGTCM.

Then from (3.8) and (3.9), it is not hard to see that |N1+ (yc )| |N1 (yc )| and b− (yc ) = (3.12) n n and θ+ (yc ) and θ− (yc ) are the ML estimate of θ for the truncated distribution 2F (yc1|θ)−1) f (y|θ) over the sample sets {Yi : i ∈ N1+ (yc )} and {Yi : i ∈ N1 (yc )}, respectively. In view of (3.11), one can then determine (b(yc ), θ(yc )) by setting { + (b (yc ), θ+ (yc )) if (c) (b(yc ), θ(yc )) = (3.13) (b− (yc ), θ− (yc )) otherwise b+ (yc ) =

where (c) stands for g + (yc , b+ (yc ), θ+ (yc )|Y1n ) ≥ g − (yc , b− (yc ), θ− (yc )|Y1n ). Finally, the ML estimate of (yc , b, θ) can be determined as yc∗

=

arg max

b∗ θ∗

= =

yc ∈{d,Wm ,··· ,Wn } b(yc∗ ) θ(yc∗ ).

g(yc , b(yc ), θ(yc )|Y1n )

(3.14)

Summarizing the above derivations into Algorithm 1 for computing (yc∗ , b∗ , θ∗ ), we have proved the following result. Theorem 1: The vector (yc∗ , b∗ , θ∗ ) computed by Algorithm 1 is indeed the ML estimate of (yc , b, θ) in the TCM specified in (3.1). Algorithm 1 A general algorithm for estimating (yc , b, θ) 1: Sort {|Yi |}n i=1 in ascending order into W1 ≤ · · · ≤ Wn . 2: Determine m = min{i : Wi ≥ d}. 3: for each yc ∈ {d, Wm , Wm+1 , · · · , Wn } do 4: Set N1+ (yc ) = {i : |Yi | ≤ yc }, N1 (yc ) = {i : |Yi | < yc }. |N +(y )| 5: Compute b+ (yc ) = 1 n c and b− (yc ) = |N1n(yc )| . 6: Determine θ+ (yc ) and θ− (yc ) to be the ML estimate f (y|θ) of θ for the truncated distribution 2F (y over c |θ)−1) {Yi : i ∈ N1+ (yc )} and {Yi : i ∈ N1 (yc )}, respectively. 7: if g +(yc , b+(yc ), θ+(yc )|Y1n ) ≥ g −(yc , b−(yc ), θ−(yc )|Y1n ) 8: 9: 10: 11:

then set (b(yc ), θ(yc )) = (b+ (yc ), θ+ (yc )) else set (b(yc ), θ(yc )) = (b− (yc ), θ− (yc )). end if

C. LPTCM Plugging the Laplacian density function in (2.5) into (3.1), we get the LPTCM given by p(y|yc , b, λ) ,  1 −|y|/λ b  e  1−e−yc /λ 2λ   1−b

(3.15)

if |y| < yc if yc < |y| ≤ a 2(a−yc ) b 1 1−b −|y|/λ  max{ 1−e−yc /λ 2λ e , 2(a−yc ) } if |y| = yc    0 otherwise.

With reference to Step 6 in Algorithm 1, let S be either N1+ (yc ) or N1 (yc ). Then Step 6 in Algorithm 1 is equivalent to determining the ML estimate (denoted by λyc ) of λ in the truncated Laplacian distribution { 1 −|y|/λ 1 e if |y| ≤ yc 1−e−yc /λ 2λ p(y|λ) , (3.16) 0 otherwise from the sample set {Yi : i ∈ S}. Since |Yi | ≤ yc for any i ∈ S, the log-likelihood function of the sample set {Yi : i ∈ S} with respect to p(y|λ) is equal to L(λ) , −|S|[ln 2λ + ln(1 − e−yc /λ )] −

1∑ |Yi |. λ i∈S

Then we have λyc = arg max0≤λ≤∞ L(λ). It is not hard to verify that L(1/t) as a function of t > 0 is strictly concave. Computing the derivative of L(λ) with respect to λ and setting it to 0 yields λ−

yc · e−yc /λ 1 ∑ − |Yi | = 0. −y /λ |S| 1−e c i∈S

(3.17)

It can be shown (see the proof of Theorem 2 below) that yc ·e−yc /λ s(λ) , λ− 1−e −yc /λ is a strictly increasing function of λ > 0, and limλ→0+ s(λ) = 0 and limλ→∞ = y2c . Let C=

1 ∑ |Yi |. |S|

(3.18)

i∈S

Then it follows: (1) when C = 0, λyc = 0, in which case the corresponding truncated Laplacian distribution is de-generated 12: end for to a delta function; (2) when C ≥ yc /2, λyc = ∞, in which 13: Determine yc∗= arg maxyc∈{d,Wm ,··· ,Wn } g(yc , b(yc ), θ(yc )|Y1n ). case the corresponding truncated Laplacian distribution is de14: Set b∗ = b(yc∗ ) and θ ∗ = θ(yc∗ ). generated to the uniform distribution over [−yc , yc ], and (3) when 0 < C < yc /2, λyc is equal to the unique root to (3.17). We are now led to solving (3.17) when 0 < C < yc /2, for Depending on whether or not Step 6 in Algorithm 1 can be implemented efficiently, the computation complexity of which Algorithm 2 is proposed. It can be shown Algorithm 2 Algorithm 1 varies from one parametric family f (y|θ) to converges exponentially fast, as by Theorem 2. another. For some parametric family f (y|θ) such as Laplacian Theorem 2: Assume that 0 < C < yc /2. Then λi computed distributions, Step 6 can be easily solved and hence Algo- in Step 3 of Algorithm 2 strictly increases and converges rithm 1 can be implemented efficiently. On the other hand, exponentially fast to λyc as i → ∞. 1057-7149 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2014.2300818, IEEE Transactions on Image Processing 6 Lenna, Frequency Position (1 0) 180

140 120

χ2(GGD)=7.002849e+002 χ2(Laplace)=1.827638e+005

100

yc · e−yc /λi−1 ; (3.19) 1 − e−yc /λi−1 Repeat Step 3 until λi − λi−1 < ϵ, where ϵ > 0 is a small prescribed threshold.

χ2(GGTCM)=6.736415e+002 χ2(LPTCM)=1.328218e+003

80 60

λi = C +

4:

sample histogram Estimation by GGD (β=0.47045) Estimation by Laplace (λ=27.74683) Estimation by GGTCM (β=0.47041) Estimation by LPTCM (λ=17.49230)

160

counts

Algorithm 2 Estimating λ for a truncated Laplacian model ∑ 1 1: Compute C = |S| i∈S |Yi |. Set λyc = 0 if C = 0; or set λyc = ∞ if C ≥ yc /2. 2: Initialization: set λ0 = C; 3: For i ≥ 1, compute

40 20 0 −100

−80

−60 −40 −20 0 20 40 60 DCT coefficient magnitude interval with width being 1

80

100

Lenna, Frequency Position (1 0)

yc ·e−yc /λ 1−e−yc /λ

− C. It is not hard to Proof: Define r(λ) = λ − verify that the derivative of r(λ) with respect to λ is

4 χ2(GGD)=7.002849e+002 2

(3.20)

χ (Laplace)=1.827638e+005 counts

e−yc /λ yc2 r′ (λ) = 1 − >0 [1 − e−yc /λ ]2 λ2

3

for any λ > 0. Thus, r(λ) is strictly increasing over λ > 0. Since λ0 = C > 0, it follows from (3.19) that λ1 > λ0 . In general, for any i ≥ 1, we have yc · e−yc /λi yc · e−yc /λi−1 = − −y /λ c i 1− 1 − e−yc /λi−1 ] [e 1 1 − (3.21) = yc y /λ e c i − 1 eyc /λi−1 − 1

0

which implies that λi+1 − λi > 0 whenever λi − λi−1 > 0. By mathematic induction, it then follows that λi strictly increases as i increases. We next show that all λi , i ≥ 1, are bounded. Indeed, it follows from (3.19) that yc · e−yc /λi −C 1 − e−yc /λi = λi − λi+1 < 0

r(λi ) =

λi −

χ2(GGTCM)=6.736415e+002 χ2(LPTCM)=1.328218e+003

2

λi+1 − λi

sample histogram Estimation by GGD (β=0.47045) Estimation by Laplace (λ=27.74683) Estimation by GGTCM (β=0.47041) Estimation by LPTCM (λ=17.49230)

5

1

100

120

140 160 180 200 220 240 260 DCT coefficient magnitude interval with width being 1

and tail portions better. In terms of χ2 values, it matches up to the GG model. More detailed comparisons will be presented in Section V. D. GGTCM Plugging the GG density function in (2.3) into (3.1), we get the GGTCM given by

2(a−yc )

lim λi = λyc .

i→∞

(3.22)

All remaining is to show that the convergence speed in (3.22) is exponentially To this end, let δ , ( )fast. −yc /λ 2 maxλ0 ≤λ≤λyc e −yc /λ 2 yλc . Then it follows from (3.20) [1−e ] that δ < 1. This, together with (3.21), implies that λi+1 −λi ≤ δ(λi − λi−1 ) for any i ≥ 1, and hence λi converges to λyc exponentially fast. This completes the proof of Theorem 2. Plugging Algorithm 2 into Step 6 in Algorithm 1, one then gets an efficient algorithm for computing the ML estimate of (yc , b, λ) in the LPTCM. To illustrate the effectiveness of the LPTCM, the resulting algorithm was applied to the same DCT coefficients shown in Figure 1. Figure 2 shows the resulting LPTCM against the histogram of DCT coefficients on the whole in each respective case. From Figure 2, it is clear that the LPTCM fits the histogram of DCT coefficients quite well and greatly improves upon the Laplacian model in each case. In comparison with the Laplacian model, it fits both the main

300

Fig. 2. Illustration of the overall curves and tails of the LPTCM and GGTCM for an AC component in the 8 × 8 DCT block of ‘Lenna’.

p(y|yc , b, α, β) ,  β bβ e−(|y|/α)   2αγ(1/β,(yc /α)β )   1−b

which, together with (3.20) and the fact that r(λyc ) = 0, implies that λi < λyc . Therefore λi converges as i → ∞. Letting i → ∞ in (3.19) yields

280

(3.23) |y| < yc yc < |y| ≤ a

bβ −(|y|/α)β 1−b  max{ 2αγ(1/β,(y ,2(a−yc ) } |y| = yc  β e  c /α) )  0 otherwise ∫ x s−1 −t where γ(s, x) is defined as γ(s, x) , 0 t e dt. With reference to Algorithm 1, in this case, Step 6 in Algorithm 1 is equivalent to determining the ML estimate (denoted by (αyc , βyc )) of (α, β) in the truncated GG distribution { β β e−(|y|/α) if |y| ≤ yc 2αγ(1/β,(yc /α)β ) p(y|α, β) , 0 otherwise (3.24) from the sample set {Yi : i ∈ S}. Since |Yi | ≤ yc for any i ∈ S, the log-likelihood function of the sample set {Yi : i ∈ is equal to [ S} with respect to p(y|α,]β) ∑ β yc β 1 L(α, β) , |S| ln β − ln 2α − ln γ( β , ( α ) ) − i∈S Yαi . Therefore (αyc , βyc ) = arg maxα,β L(α, β). Computing the partial derivatives of L(α, β) with respect to α and β and setting them to zero yields [ ]  β Yi 1 = β 1 ∑ t1/β−1 e−t  i∈S yc + γ(1/β,t) t |S| (3.25) β ∫ t 1/β−1 −y  e ln ydy Yi Yi tβ 2 ∑ β = ln t− 0 y + |S| i∈S yc ln yc γ(1/β,t)



where t = (yc /α)β . Unlike the case of LPTCM, however, solving (3.25) does not seem to be easy. In particular, at this point, we do not know whether (3.25) admits a unique solution. There is no developed algorithm with global convergence to compute such a solution either even if the solution is unique. As such, Step 6 in Algorithm 1 in the case of GGTCM is much more complicated than that in the case of LPTCM. Suboptimal alternatives are to derive approximate solutions to (3.25). One approach is to solve the two equations in (3.25) iteratively. Together with this suboptimal solution to (3.25), Algorithm 1 was applied to to the same DCT coefficients shown in Figure 1. Figure 2 shows the resulting GGTCM against the histogram of DCT coefficients on the whole in each respective case. We note that the resulting GGTCM improves on the GG model marginally, which may be due to the suboptimal solution to (3.25). IV. D ISCRETE T RANSPARENT C OMPOSITE M ODEL In practice (particularly in lossy image and video coding), DCT is often designed and implemented as a mapping from an integer-valued space (e.g., 8-bits pixels) to another integervalued space and gives rise to integer DCT coefficients. In addition, since most images and video are stored in a compressed format such as JPEG, H.264, etc., for applications based on compressed images and video, DCT coefficients are available only in their quantized values. Therefore, it is desirable to establish a good model for discrete (integer or quantized) DCT coefficients as well. This section proposes a discrete TCM. The particular discrete parametric distribution we will consider is a truncated geometric distribution, and the resulting discrete TCM is referred to as the GMTCM. To provide a uniform treatment for both integer and quantized DCT coefficients, we introduce a quantization factor of step size. Then both integer and quantized DCT coefficients can be regarded as integers multiplied by a properly chosen step size. A. GMTCM Uniform quantization with dead zone is widely used in image and video coding (see, for example, H.264 and HEVC). Mathematically, a uniform quantizer with dead zone and step by Q(X) = q × sign(X) × ( size q is given ) round ||X|−(∆−q/2)| where q/2 ≤ ∆ < q. Its input-output q relationship is shown in Figure 3. Assume that the input X is distributed according to the Laplacian distribution in (2.5). ) ( Then the quantized index sign(X) × round ||X|−(∆−q/2)| q is distributed as follows p0 = 1 − e− λ q q 1 ∆ pi = e− λ [1 − e− λ ]e− λ (|i|−1) , i = ±1, ±2, · · · (4.1) 2 With the help of q, discrete (integer or quantized) DCT coefficients then take values of integers multiplied by q. (Hereafter, these integers will be referred to as DCT indices.) Note that pi in (4.1) is essentially a geometric distribution. ∆

-Δ-q

-2q

Fig. 3.

-Δ

-q

Δ

Δ+q

q

0

Δ+2q

2q

3q

Uniform quantization with deadzone.

Using a geometric distribution to model the main portion of discrete DCT coefficients, we then get the GMTCM given by    p0 = bp q 1−e−q/λ pi = b(1−p) e− λ (|i|−1) 1−e 1 ≤ |i| ≤ K (4.2) −qK/λ , 2   pi = 1−b , K < |i| ≤ a 2(a−K) where 0 ≤ p ≤ 1 is the probability of the zero coefficient, 0 ≤ b ≤ 1, 1 ≤ K ≤ a, and a is the largest index in a given sequence of DCT indices. Here a is assumed known, and b, p, λ and K are model parameters. B. ML Estimate of GMTCM parameters 1) Algorithms: Let un = u1 , u2 , · · · , un be a sequence of DCT indices. Assume that un behaves according to the GMTCM defined by (4.2) with umax , max{|ui | : 1 ≤ i ≤ n} ≤ a. We now investigate how to compute the ML estimate (b∗ , p∗ , λ∗ , K ∗ ) of (b, p, λ, K) from un . Let N0 = {j : uj = 0}, N1 (K) = {j : 0 < |uj | ≤ K}, and N2 (K) = {j : |uj | > K}. The log-likelihood function of un according to (4.2) is equal to G(K, λ, b, p) , |N2 (K)| ln(1 − b) + (|N0 | + |N1 (K)|) ln b +|N0 | ln p + |N1 (K)| ln(1 − p) − |N2 (K)| ln 2(a − K) q 1 − e− λ q ∑ +|N1 (K)| ln (|uj | − 1). (4.3) − q − K 2(1 − e λ ) λ j∈N1 (K)

Then we have (b∗ , p∗ , λ∗ , K ∗ ) = arg max G(K, λ, b, p).

(4.4)

b,p,λ,K −

q

For any K, let L(K, λ) , |N1 (K)| ln 1−e− qλK − 2(1−e λ ) q ∑ (|u | − 1) and (b(K), p(K), λK ) , j j∈N1 (K) λ arg maxb,p,λ G(K, λ, b, p). In view of (4.3), one can |N0 | 1 (K)| verify that b(K) = |N0 |+|N and p(K) = |N0 |+|N , n 1 (K)| and whenever K > 1, λK = arg max L(K, λ).

(4.5)

0≤λ≤∞

When K = 1, G(K, λ, b, p) does not depend on λ and hence λ1 can be selected arbitrarily. We are now led to determining λK for each 1 < K ≤ a. At this point, we invoke the following lemma, which is proved in Appendix A. Lemma 1: Let Ke−Kt e−t − . g(t) , 1 − e−t 1 − e−Kt Then for any 1 < K ≤ a, L(K, qt ) as a function of t > 0 is strictly concave, and for any K > 1, g(t) is strictly decreasing over t ∈ (0, ∞), and limt→0+ g(t) = K−1 and limt→∞ g(t) = 0. 2



Computing the derivative of L(K, λ) with respect to λ and setting it to 0 yields e−q/λ e−Kq/λ −K −C =0 (4.6) −q/λ 1−e 1 − e−Kq/λ ∑ where C = |N11(K)| j∈N1 (K) (|uj |−1). In view of Lemma 1, then it follows that (1) when C = 0, λK = 0; (2) when K−1 C ≥ K−1 2 , λK = ∞; and (3) when 0 < C < 2 , λK is the unique solution to (4.6). In Case (3), the iterative procedure described below in Algorithm 3 can be used to find the unique root of (4.6). Algorithm 3 Estimating λK for a truncated geometric model 1: Select λ1 > 0 arbitrarily if K = 1. ∑ 2: Compute C = |N 1(K)| j∈N1 (K) (|uj | − 1). 1 3: Set λK = 0 if C = 0; or Set λK = ∞ if C ≥ K−1 2 . 1+C 4: Otherwise, set C0 = C and λ(0) = q/ ln C 0 . 0 5: For i ≥ 1, compute   Ci = C + Kq/λK (i−1) e −1 (4.7) q (i) λ = 1+Ci  ln Ci

6:

Repeat Step 5 until λ(i) − λ(i−1) < ϵ, where ϵ > 0 is a small prescribed threshold.

With λK computed by Algorithm 3, the optimal K ∗ will be obtained by solving K ∗ = arg max G(K, b(K), p(K), λK ).

(4.8)

1≤K≤a

Accordingly, we have b∗ = b(K ∗ ), p∗ = p(K ∗ ), and λ∗ = λK ∗ . 2) Convergence and Complexity Analysis: In parallel with Algorithm 2, Algorithm 3 also converges exponentially, as summarized in Theorem 3 and proved in Appendix B. Theorem 3: Assume that 0 < C < (K − 1)/2. Then λ(i) computed in Step 5 of Algorithm 3 strictly increases and converges exponentially fast to λK as i → ∞. The complexity of computing the ML estimate of the GMTCM parameters comes from two parts. The first part is to evaluate the cost of (4.3) over a set of K. The second part is to compute λK for every K using the Algorithm 3. Note that C in Algorithm 3 can be easily pre-computed for interesting values of K. Thus, the main complexity of Algorithm 3 is to evaluate the two simple equations in (4.7) for a small number of times in light of the exponential convergence, which is generally negligible. Essentially, the major complexity for the parameter estimation by Algorithms 3 is to collect the data histogram {hj , j = 1, · · · , a} once. Compared with the complexity of GG parameters estimation in [2] as shown in (2.4), where the data samples and the∑parameters to be n β estimated are closely ∑n tied βtogether as in the i=1 |Yi | log |Yi | term and the β i=1 |Yi | term, the complexity of parameter estimation in the case of GMTCM is significantly lower. V. E XPERIMENTAL RESULTS ON T ESTS OF M ODELING ACCURACY This section presents experimental results obtained from applying TCMs to both continuous and discrete DCT coef-

ficients and compare them with those from the Laplacian and GG models. In general, Laplacian is very simple and easy to apply, yet has an inferior modeling accurancy; GGD is very complicated, but provides a superior modeling accuracy. A. Test conditions and test materials Two criteria are applied in this paper to test modeling accuracy: the χ2 test, as defined in (2.2), and the KL divergence by (2.1). When comparison is conducted, a factor wd is calculated to be the percentage of DCT frequencies among all tested AC positions that are in favor of one model over another model in terms of having a smaller KL divergence from the data distribution. Another factor wχ2 is defined in a similar way, except that the comparison is carried out based on the χ2 test results for individual frequencies. Three sets of testing images are deliberately selected to cover a variety of image content. The first set includes 9 512 × 512 images used in JPEG standardization with faces, animals, buildings, landscapes, etc, referred to ‘bird’, ‘boat’, ‘fish’, ‘couple/Cp’, ‘hill’, ‘lenna’, ‘baboon/Bb’, ‘mountain/Bt’, and ‘pepper/Pp’, respectively. The second set has five high definition (1920 × 1080) frames selected from the first frame of each class-B sequences used for HEVC standardization tests [3], named as ‘BQTerrace’, ‘BasketballDrive’, ‘Cactus’, ‘Kimono’, and ‘Parkview’, and referred to as ‘B1’, ‘B2’, ‘B3’, ‘B4’, and ‘B5’, respectively, hereafter. The third set is taken from the first frame of four class-F sequences used for HEVC screen content tests, named as ‘SlideEditing’, ‘SlideShow’, ‘ChinaSpeed’, and ‘BasketballText’, and referred to as ‘SE’, ‘SS’, ‘CS’, and ‘BbT’, respectively, hereafter. Tests for continuous DCT coefficients were conducted by computing 8 × 8 DCT using floating point operations. In our tests for discrete DCT coefficients, a raw image was first compressed using a Matlab JPEG codec with various quality factors (QF) ranging from 100, 90, 80, to 70; the resulting quantized DCT coefficients and corresponding quantization step sizes were then read from obtained JPEG files. Tests were carried out for six different models: the Cauchy model, the Laplacian model, GG model, GGTCM, LPTCM, and GMTCM. GGTCM was applied only to continuous DCT coefficients. On the other hand, GMTCM is applicable only to discrete coefficients. The Laplacian and GG models were applied to both continuous and discrete DCT coefficients. B. Overall comparisons for each image Table I shows comparisons between the Cauchy model and GGD based on both the KL test and the χ2 test. When KL test is used, only 5% on average of all 63 AC frequencies are in favor of the Cauchy model. Although the result by χ2 test shows some merits of the Cauchy model for fitting the tail part, the inferior result by KL test shows that the Cauchy model does not model the main portion well. To some extend, the discrepancy between the KL test results and the χ2 test results indicates that the Cauchy model retains a flat tail at a cost of losing accuracy for the main portion. Furthermore, it is observed that the Cauchy model in general offers a modeling accuracy comparable to GGD when the data fits the GGD



TABLE I C OMPARING C AUCHY MODEL WITH GGD ( CONTINUOUS DCT). fish 5 6

Cp 0 63

C OMPARING LPTCM

WITH

wd (%) wχ2 (%)

wd (%) wχ2 (%)

bird 0 78

bird 90 95

boat 0 57

boat 25 49

fish 21 8

hill 0 32

Lenna 16 67

Bb 0 14

Mt 0 29

Pp 21 83

TABLE II GGD ( CONTINUOUS DCT). Cp 25 52

hill 37 51

Lenna 41 57

Bb 22 40

Mt 38 79

Pp 54 81

model with a shape parameter β within a range of [0.45, 0.55]. Nevertheless, for the 9 images in the test set 1, β varies in a range of [0.3, 1.3]. (Note that Laplacian is a special case of GGD with β = 1.) In the continuous case, the GGTCM outperforms the GG model, the LPTCM outperforms the Laplacian model, and the GG models outperforms the Laplacian model in general, as one would expect. An interesting comparison in this case is between the GG model and LPTCM. Table II shows the percentage wχ2 of frequencies among 63 AC positions that are in favor of the LPTCM over the GG model for each of 9 images in Set 1 in terms of the χ2 metric. For example, for the image ‘bird’, in terms of the χ2 metric, the LPTCM is better than the GG model for 60 out of 63 frequencies; for the image ‘lenna’, the LPTCM is better than the GG model for 36 out of 63 frequencies. Overall, it would be fair to state that the LPTCM and GG model behave similarly in terms of modeling accuracy. And yet, the LPTCM has much lower computation complexity than the GG model. In the discrete case, comparisons were conducted among the GMTCM, GG model, and Laplacian model in terms of both the divergence distance and χ2 value. As expected, the GMTCM is always better than the Laplacian model according to both the divergence distance and χ2 value, and hence the corresponding results are not included here. For the comparison between the GMTCM and GG model, results are shown in Tables III, IV, V, and VI for quantized DCT coefficients from JPEG coded images with various QFs. In Tables III, all 63 AC positions were tested; in Tables IV, V, and VI, all AC positions with 6 or more different non-zero AC coefficient magnitudes were tested. These tables show that when all quantization step sizes are 1, corresponding to QF= 100, the comparison between the GMTCM and GG model is similar to that between the LPTCM and GG model, i.e., their performances are close to each other. However, with quantization step sizes increasing, the GMTCM starts to outperform the GG model significantly, as shown in Tables IV, V, and VI, for all tested images. TABLE III OVERALL COMPARISONS BETWEEN THE GMTCM AND GG MODEL FOR ALL IMAGES CODED USING JPEG WITH QF= 100. wd (%) wχ2 (%)

bird boat fish Cp hill Lenna Bb Mt Pp SE SS CS BbT B1 B2 B3 B4 B5 95 38 100 44 59 60 49 48 71 95 78 8 48 52 83 62 40 60 98 57 100 59 67 67 52 83 84 83 89 8 62 52 89 71 65 65

TABLE IV OVERALL COMPARISONS BETWEEN THE GMTCM AND GG MODEL FOR ALL IMAGES CODED USING JPEG WITH QF= 90. bird boat fish Cp hill Lenna Bb Mt Pp SE SS CS BbT B1 B2 B3 B4 B5 wd (%) 89 73 97 69 82 75 79 83 85 100 92 79 86 78 93 76 85 75 wχ2 (%) 95 73 98 67 80 71 79 84 87 98 90 78 79 73 91 73 89 75

TABLE V OVERALL COMPARISONS BETWEEN THE GMTCM AND GG MODEL FOR ALL IMAGES CODED USING JPEG WITH QF= 80. bird boat fish Cp hill Lenna Bb Mt Pp SE SS CS BbT B1 B2 B3 B4 B5 wd (%) 98 76 98 79 88 79 86 81 86 100 97 87 86 70 94 80 91 85 wχ2 (%) 98 78 90 79 82 74 84 83 86 100 95 81 86 68 91 78 95 79

TABLE VI OVERALL COMPARISONS BETWEEN THE GMTCM AND GG MODEL FOR ALL IMAGES CODED USING JPEG WITH QF= 70. bird boat fish Cp hill Lenna Bb Mt Pp SE SS CS BbT B1 B2 B3 B4 B5 wd (%) 94 83 95 82 79 77 90 98 84 98 98 87 89 73 97 85 95 87 wχ2 (%) 97 83 89 79 91 80 84 94 87 98 97 87 89 75 94 85 95 83

C. Comparisons of χ2 among three models for individual frequencies In the above overall comparisons, Table III shows that the GMTCM and GG model are close, while the GMTCM wins the majority over the GG model for all other cases as shown in Tables IV-VI. We now zoom in to look at the χ2 values for all tested frequency positions for several representative images: (1) ‘bird’ which is strongly in favor of the GMTCM in Table III; (2) ‘CS’ which is strongly in favor of the GG model in Table III; and (3) ‘boat’ for which the GMTCM and GG model tie more or less in Table III. The respective χ2 scores are presented in Figures 4, 6, and 7, respectively. From Figures 4, 6, and 7, it is fair to say that (1) the GMTCM dramatically improves the modeling accuracy over the Laplacian model; (2) when the GMTCM is better than the GG model, χ2GMTCM is often much smaller, up to 15658 times smaller, than χ2GGD ; and (3) when the GG model is better than the GMTCM, the difference between χ2GMTCM and χ2GGD is not as significant as one would see in Case (2)—for example, in Figure 7, χ2GGD is only less than 10 times smaller than χ2GMTCM . Note that Figure 5 shows the KL convergence for the ‘bird’ image. It demonstrates that the results by the chisquare test are consistent with that by the KL convergence, as supported by the consistence between wχ2 and wd in the overall comparisons in Tables III to VI. Another interesting result is observed in Figure 8, which shows the χ2 values for JPEG coded ‘CS’ image with QF=90. Compared with the case where the source is JPEG coded with higher fidelity QF=100 as shown in Figure 7, most ACs now show better modeling accuracy by the GMTCM than by the GG model. VI. A PPLICATIONS This section briefly discusses applications of TCM in various areas of image/video processing. A. Data compression As DCT is widely applied in image/video compression, e.g. in JPEG, H.2641 , and HEVC2 , an accurate model for DCT coefficients would be helpful to further improvement in compression efficiency, complexity, or both in image/video coding. 1 H.264

uses a scaled approximation of DCT[4]. matrices in HEVC were derived by approximating DCT basis functions[3]. 2 Transform



8

11 GGD GMTCM Laplace

7

10 log ( chi−square score )

log ( chi−square score )

9 6 5 4 3

8 7

GGD GMTCM Laplace

6 5 4

2 1

3

Fig. 4. The χ2 scores by GGD, GMTCM, and Laplacian model for ACs from JPEG-coded image ‘bird’ with QF= 100.

0.08 0.07

0 10 20 30 40 50 60 70 Index of ACs for DCT coefficients (the zigzag scanning order)

Fig. 7. The χ2 scores by GGD, GMTCM, and Laplacian model for ACs from JPEG-coded image ‘CS’ with QF= 100.

11 GGD GMTCM Laplace

10 9 log ( chi−square score )

0.06 KL divergence

2


0.05 0.04 0.03 0.02

8 7

GGD GMTCM Laplace

6 5 4 3

0.01

2 10 20 30 40 50 60 Index of ACs for DCT coefficients (the zigzag scanning order)

Fig. 5. The KL divergence by GGD, GMTCM, and Laplacian model for ACs from JPEG-coded image ‘bird’ with QF= 100.

1) Loseless coding algorithm design: Entropy coding design in image and video coding such as JPEG, H.264 and HEVC is closely related to understanding the DCT coefficient statistics, due to the wide application of DCT in image and video compression. Because of its superior modeling accuracy and its simplicity, TCM could be utilized to design an entropy coding scheme for discrete DCT coefficients (such as in JPEG images). For example, one entropy coding scheme based on TCM could be designed as follows. First, GMTCM parameters are calculated and coded for each frequency. Then, a bit-mask

7

log ( chi−square score )

6

GGD GMTCM Laplace

5

4

3

2

1


Fig. 6. The χ2 scores by GGD, GMTCM, and Laplacian model for ACs from JPEG-coded image ‘boat’ with QF= 100.

1


Fig. 8. The χ2 scores by GGD, GMTCM, and Laplacian model for ACs from JPEG-coded image ‘CS’ with QF= 90.

is coded to identify outliers, so that outliers and DCT coefficients within the main portion can be further coded separately with their respective context modeling. For DCT coefficients within the main portion, parameters of the truncated geometric distributions are encoded and then used to further improve the coding efficiency. We have implemented this TCM-based entropy coding scheme. Our experiments demonstrate that in spite of the overhead for coding outliers flags, this TCM-based entropy coding scheme shows on average 25% rate saving when compared with a standard JPEG entropy codec for high fidelity JPEG images (with quantization step size being 1 for most low frequency AC positions). Detailed descriptions of this TCM-based entropy coding scheme along with its performance will be reported in our future work 2) Lossy coding algorithm design: Quantization design, as the core of lossy coding, roots in the rate distortion theory, which generally requires a statistic model to provide guidance to practical design. Quantization design in DCT-based image and video coding usually assumes a Laplacian distribution due to its simplicity and fair modeling accuracy [13]. Since the LPTCM improves dramatically upon the Laplacian model in terms of modeling accuracy while having similar simplicity, it would be interesting to investigate the application of the LPTCM to quantization design for DCT coefficients in DCTbased image and video coding in the future.



B. Image understanding Image understanding is another application for DCT coefficient modeling. It is interesting to observe that in natural images, the statistically insignificant outliers detected by the GMTCM carry perceptually important information, which shed lights into DCT-based image analysis. 1) Featured outlier images based on GMTCM: One important parameter in the GMTCM model is the cutting point yc = Kq between a parametric distribution for the main portion and the uniform distribution for the heavy tail portion. Statistically, the outlier coefficients that fall beyond yc into the tail portion are not significant—although the actual number of outliers varies from one frequency to another and from one image to another, it typically ranges in our experiments from less than 0.1% of the total number of AC coefficients to 19% with an average around 1.2%. However, from the image quality perception perspective, the outliers carry very important information, as demonstrated by Figures 9, 10 and 11. Figures 9-11 each include an original image, a so-called inlier image, and a so-called outlier image. An inlier image is generated by first forcing all outlier coefficients to zero and then performing the inverse DCT. An outlier image, on the other hand, is generated by first keeping only outliers, forcing all other DCT coefficients to zero, and then performing the inverse DCT. Three original images are taken from the three test sets with one from each set to show the perceptual importance of their respective outliers. As the inlier image contains all DC components and inlier AC components, a down-sizing operation would impact our perception on the difference between the original image and the inlier image. Hence, Figures 9-11 are presented in a possibly large size. In Figure 9, the outlier image captures most structural information as the railing. In Figure 10, the outlier image shows a fine sketch of the face, while the inlier image with all statistically significant coefficients shows an undesired quality, particularly with the blurring of eyes. In Figure 11, the basketball net is well sketched in the outlier image, but is much blurred in the inlier image. From these figures, it is evident that the tail portion is perceptually important. This, together with the statistical insignificance of outliers, makes the outlier image appealing to image understanding. On one hand, compared with the original image, the outlier image achieves dramatic dimension reduction. On the other hand, due to the preservation of perceptually important global information of the original image in the outlier image, some aspects of image understanding can be carried out instead from the outlier image with perhaps better accuracy and less computation complexity. It is interesting to show the information rate for outliers, i.e., how many bits are needed to represent outlier images. In our study of applying TCM to enhance entropy coding design for DCT coefficients, where outliers are encoded separately from inliers, it is observed that outliers only consume about 5% of the total bits. Finally, it is worthwhile to point out that outlier images are related to, but different from conventional edge detection. An outlier image captures some global uniqueness in an image,

while edges are usually detected based on local irregularity in the pixel domain. For example, the large area of vertical patterns on the left-top corner of Figure 9 is not captured as outliers because those vertical patters repeat themselves many times in the image, while it shows up as edges. 2) Image similarity : Similarity measurement among images plays a key role in image management, which attracts more and more attention in industry nowadays due to the fast growth of digital photography in the past decade. One application of DCT models is to measure the similarity among images by estimating the model parameters of different images and calculating a distribution distance. Because DCT coefficients well capture some spatial patterns in the pixel domain, e.g., AC1 reflecting a vertical pattern and AC8 preserving a horizontal pattern, the distribution distance between DCT coefficient models well represents the similarity between two images. Apparently, this type of similarity measurement roots in data histogram. Yet, in practice, histogram is not a good choice to be used, as it requires a heavy overhead. This is particularly problematic for a large scale image management system. On the other hand, model-based distribution distances use only a few parameters with negligible overhead, thus providing a good similarity measurement between digital images particularly when the modeling accuracy is high. The outlier images shown and discussed in Subsection VI-B1 can be used to further enhance image similarity testing based on model-based distribution distances. Since outliers are insignificant statistically, their impact on model-based distribution distances may not be significant. And yet, if two images look similar, their respective outlier images must look similar too. As such, one can build other metrics based on outlier images to further enhance image similarity testing. These and other applications using the GMTCM will be further studied. VII. C ONCLUSIONS Motivated by the heavy tail phenomenon in DCT coefficients and its perceptual importance, this paper has proposed a new model dubbed transparent composite model (TCM) for modeling DCT coefficients, which separates the tail portion of DCT coefficients from the main portion of DCT coefficients and uses a different distribution to model each portion: a uniform distribution for the tail portion and a parametric distribution such as truncated Laplacian, generalized gaussian (GG), and geometric distributions for the mail portion. Efficient online algorithms with global convergence have been proposed to compute the ML estimates of the parameters in the TCM. It has been shown that among the Laplacian model, GG model, GGTCM, and LPTCM, the GGTCM offers the best modeling accuracy for real-valued DCT coefficients at the cost of large extra complexity. On the other hand, for discrete DCT coefficients, tests over a wide variety of images based on both divergence distance and χ2 test have shown that the GMTCM outperforms both the Laplacian and GG models in term of modeling accuracy in majority cases while having simplicity and practicality similar to those of the Laplacian model, thus making the GMTCM a desirable choice for modeling discrete



inliners image

inliners image

outliner image

outliner image

Fig. 9. Demonstration of the perceptual importance of outlier coefficients by ‘BQterrace’, with the original followed by an inlier and an outlier image.

DCT coefficients in real-world applications. In addition, it has been demonstrated that the tail portion identified by the GMTCM gives rise to an image called an outlier image, which, on one hand, achieves dramatic dimension reduction in comparison with the original image, and on the other hand preserves perceptually important unique global features of the original image. It has been further suggested that the applications of the TCM, in particular the LPTCM and GMTCM, include image and video coding, quantization design, entropy coding design, and image understanding and management. A PPENDIX A In this appendix, we prove Lemma 1. First note that g(t) can be rewritten as g(t) = K − 1 + 1−e1 −t − 1−eK−Kt . Its derivative is equal to g ′ (t)

−e−t K 2 e−Kt + −t 2 (1 − e ) (1 − e−Kt )2 [ K−1 ] ∑ e−t ( e−it ) − Ke−(K−1)t/2 = − (1 − e−Kt )2 i=0 [ K−1 ] ∑ −it −(K−1)t/2 · ( e ) + Ke . (A.1)

=

Fig. 10. Demonstration of the perceptual importance of outlier coefficients by ‘Lenna’, with the original followed by an inlier image and an outlier image.

whenever K > 1, where KL = floor( K 2 ) − 1. This, together with (A.1), implies that g ′ (t) < 0 for any t > 0 whenever K > 1. Hence g(t) is strictly decreasing over t ∈ (0, ∞). Next we have ] [K−1 ∑ e−t −it −(K−1)t e −K ·e lim g(t) = lim+ t→0+ t→0 1 − e−Kt i=0 =

1 (K − 1). 2

Finally, the strict concavity of L(K, qt ) as a function of t follows from (A.1) and the fact that ∂ 2 L(K, qt ) = |N1 (K)|g ′ (t). ∂t2 This completes the proof of Lemma 1. A PPENDIX B

i=0 In this appendix, we prove Theorem 3. First, arguments similar to those in the proof of Theorem 2 can be used to It is not hard to verify that [ K−1 ] show that λ(i) is upper bounded by λK , strictly increases, and KL ∑ ∑ K−1−i ( e−it ) − Ke−(K−1)t/2 = (e−it/2 −e− 2 t )2 > 0 converges to λK as i → ∞. Therefore what remains is to show that the convergence is exponentially fast. To this end, i=0 i=0



inliners image

outliner image

Fig. 11. Demonstration of the perceptual importance of outlier coefficients by ‘BbT’, with the original followed by an inlier image and an outlier image.

let h(λ) ,

e−q/λ . 1−e−q/λ

In view of (4.7), it follows that

h(λ(i+1) )

K eKq/λ(i) − 1 = C + Kh(λ(i) /K)

=

C+

[3] G.J. Sullivan; J.-R. Ohm; W.-J. Han; T. Wiegand , “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, December 2012. [4] T. Wiegand, G. J. Sullivan, and A. Luthra. “Draft itu-t rec. h.264/iso/iec 14496-10 avc”. Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, 2003. [5] T. Eude, R. Grisel, H. Cherifi, and R. Debrie, “On the Distribution of The DCT Coefficients”, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 5, pp. 365368, Apr. 1994. [6] A. Briassouli, P. Tsakalides, and A. Stouraitis, “Hidden Messages in Heavy-tails: DCT-Domain Watermark Detection using Alpha-Stable Models”, IEEE Transactions on Multimedia, Vol. 7, No. 4, August 2005. [7] M. I. H. Bhuiyan, M. O. Ahmad, and M.N.S. Swamy, “Modeling of the DCT Coefficients of Images”, Proc. of IEEE International Symposium on Circuits and Systems, 2008, pp. 272-275. [8] N. Kamaci, Y. Altunbasak, and R. Mersereau, “Frame Bit Allocation for the H.264/AVC Video Coder Via Cauchy-Density-Based Rate and Distortion Models”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, No. 8, August 2005. [9] S. Nadarajah, “Gaussian DCT Coefficient Models”, Acta App. Math, 2009, 106: 455-472. [10] E. Lam, and J. Goodman, “A Mathematical Analysis of the DCT Coefficient Distributions for Images” IEEE Transactions on Image Processing, Vol. 9, No. 10, October 2000. [11] I-M. Pao, and M.T. Sun, “Modeling DCT Coefficients for Fast Video Encoding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 4, pp608-616, June 1999. [12] Rice, John (1995). Mathematical Statistics and Data Analysis (Second ed.). Duxbury Press. [13] G. Sullivan, “Efficient scalar quantization of exponential and Laplacian random variables”, IEEE Transactions on Information Theory, vol. 42, no. 5, pp. 1365-1374, 1996. [14] Minh Do, and M. Vetterli, “Wavelet-based Texture Retrieval using Generalized Gaussian Density and Kullback-Leibler Distance”, IEEE Transactions on Image Processing, Vol. 11, No.2, Feb. 2002. [15] C. Tu, J. Liang, and T. Tran, “Adaptive runlength coding,” IEEE Signal Processing Letters, Vol. 10, No. 3, pp. 6164, March 1997. [16] C. Tu and T. Tran, “Context-based entropy coding of block transform coefficients for image compression,” IEEE Transactions on Image Processing, Vol. 11, No. 11, pp. 12711283, November 2002. [17] X. Wu and N. Memon,“Context-based, adaptive, lossless image coding,” IEEE Transactions on Communications, Vol. 45, pp. 437444, April 1997. [18] C. Ponce and A. Singer, “Computing Steerable Principle Components of a large set of images and their rotations”, IEEE Trans. on Image Processing, Vol. 20, No. 11, Nov. 2011. [19] E.-H. Yang and X. Yu, “Transparent composite model for large scale image/video processing”, Proceedings of 2013 IEEE International Conference on Big Data, pp. 38-44, Silicon Valley, CA, USA, Oct. 2013.

and hence h(λ(i+1) ) − h(λ(i) ) = Kh(λ(i) /K) − Kh(λ(i−1) /K) ] Kh(λ(i) /K) − Kh(λ(i−1) /K) [ (i) (i−1) = h(λ ) − h(λ ) h(λ(i) ) − h(λ(i−1) ) [ ] ≤ δ h(λ(i) ) − h(λ(i−1) ) where δ = sup{ Kh(λ/K)−Kh(ν/K) : λ, ν ∈ [λ(0) , λK ], λ ̸= h(λ)−h(ν) ν}. In view of Lemma 1 and its proof (particularly (A.1)), it is not hard to verify that 0 < δ < 1. Therefore, as i → ∞, h(λ(i) ) converges to h(λK ) exponentially fast. Since the derivative of h(λ) is positive over λ ∈ [λ(0) , λK ] and bounded away from 0, it follows that λ(i) also converges to λK exponentially fast. This competes the proof of Theorem 3. R EFERENCES [1] T. Acharya and A. K. Ray, Image Processing - Principles and Applications, Wiley InterScience, 2006. [2] F. Muller, “Distribution shape of two-dimensional DCT coefficients of natural images”, Electronics Letters, Vol. 29, No. 22, October 1993.



En-hui Yang (M’97-SM’00-F’08) received the B.S. degree in applied mathematics from HuaQiao University, Qianzhou, China, and Ph.D. degree in mathematics from Nankai University, Tianjin, China, in 1986 and 1991, respectively. Since June 1997, he has been with the Department of Electrical and Computer Engineering, University of Waterloo, ON, Canada, where he is currently a Professor and Canada Research Chair in information theory and multimedia compression. He held a Visiting Professor position at the Chinese University of Hong Kong, Hong Kong, from September 2003 to June 2004; positions of Research Associate and Visiting Scientist at the University of Minnesota, Minneapolis-St. Paul, the University of Bielefeld, Bielefeld, Germany, and the University of Southern California, Los Angeles, from January 1993 to May 1997; and a faculty position (first as an Assistant Professor and then an Associate Professor) at Nankai University, Tianjin, China, from 1991 to 1992. He is the founding Director of the Leitch-University of Waterloo multimedia communications lab, and a Co-Founder of SlipStream Data Inc. (now a subsidiary of BlackBerry). He currently also serves as an Executive Council Member of China Overseas Exchange Association and an Overseas Advisor for the Overseas Chinese Affairs Office of the City of Shanghai, and is sitting on the Overseas Expert Advisory Committee for the Overseas Chinese Affairs Office of the State Council of China and a Review Panel for the International Council for Science. His current research interests are: multimedia compression, multimedia transmission, digital communications, information theory, source and channel coding, image and video coding, image and video understanding and management, and Big Data analytics. Dr. Yang is a recipient of several research awards including the 1992 Tianjin Science and Technology Promotion Award for Young Investigators; the 1992 third Science and Technology Promotion Award of Chinese Ministry of Education; the 2000 Ontario Premier’s Research Excellence Award, Canada; the 2000 Marsland Award for Research Excellence, University of Waterloo; the 2002 Ontario Distinguished Researcher Award; the prestigious Inaugural (2007) Premier’s Catalyst Award for the Innovator of the Year; and the 2007 Ernest C. Manning Award of Distinction, one of the Canada’s most prestigious innovation prizes. Products based on his inventions and commercialized by SlipStream received the 2006 Ontario Global Traders Provincial Award. With over 200 papers and more than 200 patents/patent applications worldwide, his research work has had an impact on the daily life of hundreds of millions people over 170 countries either through commercialized products, or video coding open sources, or video coding standards. He is a Fellow of the Canadian Academy of Engineering and a Fellow of the Royal Society of Canada: the Academies of Arts, Humanities and Sciences of Canada. He served, among many other roles, as a General Co-Chair of the 2008 IEEE International Symposium on Information Theory, an Associate Editor for IEEE Transactions on Information Theory, a Technical Program ViceChair of the 2006 IEEE International Conference on Multimedia & Expo (ICME), the Chair of the award committee for the 2004 Canadian Award in Telecommunications, a Co-Editor of the 2004 Special Issue of the IEEE Transactions on Information Theory, a Co-Chair of the 2003 US National Science Foundation (NSF) workshop on the interface of Information Theory and Computer Science, and a Co-Chair of the 2003 Canadian Workshop on Information Theory.

Xiang Yu (S’05-M’08) received the B.E. degree in Physics from Tsinghua University, China, in 1994, and the M.E. degree in Electrical Engineering from Peking University, China, in 1997, and the Ph.D. degree in Electrical and Computer Engineering from the University of Waterloo, Canada, in 2008, where he won the honor of Outstanding Achievement in Graduate Studies at the doctoral level. He is a chief scientist with Multicom Technologies Inc. for mobile multimedia communications. He held a senior researcher position with Research in Motion from July 2008 to Sept. 2012, where he contributed to the standardization of the new High Efficiency Video Coding (HEVC) standard. His research interests include image/video coding/transcoding, multimedia communications, joint source-channel coding, image processing and machine learning.

Jin Meng received the B.Eng. degree in information and electronics engineering (2006) from Tsinghua University, Beijing, China, and Ph.D. degree in electrical and computer engineering (2013) from University of Waterloo, Waterloo, Ontario, Canada. He is now a postdoctoral fellow in Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada. His research interests include image processing and understanding, computer vision, and machine learning.

Chang Sun received the B.S. in electrical engineering from Shandong University of Science and Technology, Qingdao, China, in 2005, and M.S. degrees in electrical engineering from Shandong University, Jinan, China, in 2008. He is currently pursuing the Ph.D. degree in electrical and computer engineering at the University of Waterloo, Waterloo, ON., Canada. His research interests include multimedia compression, information theory, and image processing. He is a recipient of the national fellowship of National Ministry of Education of China in 2005.


Statistical model of quantized DCT coefficients: application in the steganalysis of Jsteg algorithm.

Prognostic for hydraulic pump based upon DCT-composite spectrum and the modified echo state network.

Stratified scaffold design for engineering composite tissues.

Predicting Abraham model solvent coefficients.

Highly luminescent and transparent ZnO quantum dots-epoxy composite used for white light emitting diodes.

The design, fabrication and characterization of a transparent atom chip.

Design and validation of the transparent oxygen sensor array.

Multicomponent phase-field model for extremely large partition coefficients.

An instrumental variable random-coefficients model for binary outcomes.

Reflective composite sheet design for LCD backlight recycling.

A composite model for subgroup identification and prediction via bicluster analysis.

An efficient nano-composite layer for highly transparent organic light emitting diodes.

Channel estimation in DCT-based OFDM.

Design of a composite biomaterial system for tissue engineering applications.

Properties of particle phases for metal-matrix-composite design.

Experimental design data for the biosynthesis of citric acid using Central Composite Design method.

Virial coefficients and demixing in the Asakura-Oosawa model.

Practical, transparent prospective risk analysis for the clinical laboratory.

Colorless-Polyimide Composite Electrode: Application in Flexible and Transparent Resistive Switching Memory.

Self-poled transparent and flexible UV light-emitting cerium complex-PVDF composite: a high-performance nanogenerator.

Coefficients alpha for components.

A novel psychovisual threshold on large DCT for image compression.

Retraction Notice: 3D DWT-DCT and Logistic MAP Based Robust Watermarking for Medical Volume Data.

A conceptual model for healthcare facility design.