IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014

4799

Quantization Table Design Revisited for Image/Video Coding En-Hui Yang, Fellow, IEEE, Chang Sun, and Jin Meng

Abstract— Quantization table design is revisited for image/video coding where soft decision quantization (SDQ) is considered. Unlike conventional approaches, where quantization table design is bundled with a specific encoding method, we assume optimal SDQ encoding and design a quantization table for the purpose of reconstruction. Under this assumption, we model transform coefficients across different frequencies as independently distributed random sources and apply the Shannon lower bound to approximate the rate distortion function of each source. We then show that a quantization table can be optimized in a way that the resulting distortion complies with certain behavior. Guided by this new design principle, we propose an efficient statistical-model-based algorithm using the Laplacian model to design quantization tables for DCT-based image coding. When applied to standard JPEG encoding, it provides more than 1.5-dB performance gain in PSNR, with almost no extra burden on complexity. Compared with the state-of-the-art JPEG quantization table optimizer, the proposed algorithm offers an average 0.5-dB gain in PSNR with computational complexity reduced by a factor of more than 2000 when SDQ is OFF , and a 0.2-dB performance gain or more with 85% of the complexity reduced when SDQ is ON. Significant compression performance improvement is also seen when the algorithm is applied to other image coding systems proposed in the literature. Index Terms— Quantization table, soft decision quantization, rate distortion optimization, Shannon lower bound, image/video coding.

I. I NTRODUCTION

T

RANSFORM coding is widely used in practical lossy image/video coding systems to de-correlate nonoverlapped blocks. After each block is transformed, each of the resulting transform coefficients is quantized uniformly according to a quantization step size; all quantization step sizes used at different transform frequencies together form a quantization table. Quantized coefficients are then scanned into a 1-D sequence and finally encoded losslessly. Traditionally, as long as a quantization table is fixed, the corresponding quantization process is determined, where the

Manuscript received January 9, 2014; revised May 24, 2014 and July 17, 2014; accepted September 7, 2014. Date of publication September 15, 2014; date of current version September 30, 2014. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant RGPIN203035-11 and Grant STPGP397345 and in part by the Canada Research Chairs Program. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ali Bilgin. The authors are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2358204

quantized value of a transform coefficient depends only on its quantization step size and the transform coefficient itself. Such a quantization process is now referred to as hard decision quantization (HDQ). In this case, once the subsequent lossless coding method is given, quantization table design is equivalent to quantizer design. In the past two decades, quantization table design has been well studied in HDQ settings [1]–[3]. It is generally formulated as the following optimization problem1 inf R(Q), s.t. D(Q) ≤ DT , Q

(1)

where Q represents a quantization table, R and D are the resulting rate and distortion respectively, and DT denotes the target distortion. With the lossless coding method fixed, both R and D are functions of Q only. However, since R(Q) is generally a very complicated function of Q, finding an optimal solution or even a good approximate solution to (1) is computationally expensive. On the other hand, if R(Q) is inaccurately approximated, the solution to the resulting modified optimization problem would be far away from the solution to (1). For example, for JPEG encoding [4], Huang and Meng [1] proposed a quantization table optimization method where discrete cosine transform (DCT) coefficients are modelled by a Laplacian distribution. The rate (as well as distortion) is estimated by close-form formulas based on the statistic model, where the default Huffman code length specified by JPEG is assumed. The performance gain offered by their optimizer is limited mainly because of the inaccuracy in approximating the rate. To solve this problem, Wu and Gersho [2] proposed to evaluate the rate using the real coding rate with a greedy, steepest-descent algorithm. It achieves better rate-distortion (R-D) performance but at the expense of extremely high computational complexity, since the actual encoding is performed in each iteration of the algorithm. To avoid going through the actual encoding repeatedly, Ratnakar and Livny [3] later on developed a comparatively efficient2 JPEG quantization table optimizer using a trellis-based method where the rate is estimated by the empirical entropy of quantized DCT coefficients, rather than the coding rate of runsize pairs as in [2]. Their scheme achieves R-D performance similar to what was reported in [2], which represents the best JPEG coding performance so far when HDQ is assumed. 1 The optimization problem can also be formulated as minimizing the distortion at a given rate. 2 Compared with the algorithm in [2] it is more efficient, which is nevertheless considered to be computationally expensive as shown in Section V in this paper.

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

4800

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014

Recently, a more advanced quantization technique called soft decision quantization (SDQ) has been developed [5]–[12]. Unlike the case of HDQ, the quantization process in SDQ is now tightly coupled with, and to some extent controlled by, the subsequent lossless coding method so that a better R-D trade-off is achieved. Even with the same quantization table, the quantization process would be dramatically different if a different lossless coding method were used. Because of its superiority over HDQ, SDQ or its suboptimal version called rate distortion optimized quantization (RDOQ) [12] has been well adopted in both video coding standards H.264/AVC [13] and HEVC [14]. For the reader who is not familiar with SDQ, but familiar with entropy constrained vector quantization [15], SDQ can be regarded as a generalization of entropy constrained vector quantization. SDQ dates back to the notion of distortion Kolmogorov complexity introduced in [16]. The major differences between them are two-fold. First, SDQ can be applied to each individual, deterministic sequence, and its R-D optimization is carried out independently for each individual sequence, whereas entropy constrained vector quantization is for a probabilistic source and its R-D optimization is solved on average for all vectors. Second, SDQ can be applied in a one shot manner, i.e., to a whole sequence such as a whole image or video frame, whereas in entropy constrained vector quantization, the whole sequence is normally partitioned into small non-overlapping blocks and each block is then independently quantized. SDQ has been proposed from the computational perspective [11]. In some sense, the similarities and differences between SDQ and entropy constrained vector quantization are similar to those of Kolmogorov complexity and Shannon entropy. Given a lossless coding method, quantizer design under SDQ is not equivalent to quantization table design any more. Indeed, the quantizer design problem in SDQ can be separated into two sub-problems, i.e., quantization table design and SDQ design (see Section II for details). In the past, several researchers [3], [8], [17] studied these two design problems for JPEG encoding to some extent. In [17], Crouse and Ramchandran applied the method proposed in [2] to design the quantization table followed by the algorithm proposed in [18] called optimal thresholding to optimize quantized DCT coefficients, which is actually a sub-optimal SDQ design for JPEG coding. In [3], Ratnakar and Livny employed their quantization table optimization method to initialize the thresholding algorithm in [18]. Optimal SDQ design problem for JPEG coding was later on solved by a graph-based algorithm in [8] by Yang and Wang. To address the optimal design of both the quantization table and SDQ, they further proposed an iterative algorithm, which achieves the best JPEG coding performance in the literature when SDQ is considered. However, unlike the graph-based algorithm, which is optimal, the iterative algorithm does not seem to have a global convergence, often leading to local minima; as such, its performance highly depends on the initial quantization table.3 Indeed, Yang and Wang [8] 3 This is true as well for the iterative algorithm called joint thresholding proposed in [17].

used the quantization table optimizer in [3] to give an initial quantization table for their iterative algorithm in their best quantization scheme. Therefore, to a large extent, the quantization table design in SDQ has never been fully addressed. In light of the increasing importance of SDQ in image and video coding, we are motivated to revisit quantization table design in the context of SDQ. To make the problem tractable, we assume that, given a quantization table, the respective optimal SDQ could approach the theoretic compression performance given by the corresponding Shannon rate distortion function, and model transform coefficients across different frequencies as independently distributed random sources. We then apply the Shannon lower bound to approximate the rate distortion function, and show that a quantization table can be optimized in a way that the resulting distortion follows a certain behavior or profile (see Section III for details). Since this new design principle does not involve any specific lossless coding method, it can be applied to any image/video coding in theory, e.g., JPEG, MPEG-2, etc. Guided by this new design principle, we then propose an efficient statistical-model-based algorithm using the Laplacian model to design optimized quantization tables for DCT-based image coding. Experimental results showed that our algorithm, when applied to baseline JPEG encoding either with SDQ or without SDQ, gives the best compression performance with almost no extra complexity introduced in each of the respective cases. Lastly, to further verify this generic algorithm for DCT-based image coding, we apply it to the adaptive run-length coding (ARL) proposed in [19] and the embedded context-based entropy coding of block transform coefficients (ECEB) proposed in [20], yielding significantly improved coding efficiency in both cases. Preliminary results of this paper were presented in part in [21]. The rest of the paper is organized as follows. Section II formulates the new quantization table design problem, which is solved in Section III as our theoretical contribution. The proposed model-based algorithm is discussed in Section IV, and experimental results are given in Section V. Finally, we summarize the paper in Section VI. II. P ROBLEM F ORMULATION In this section, we first formulate the new quantizer design problem for image/video coding where the quantization table optimization is separated as a sub-problem. We then show some advantages of the new formulation for the quantization table design problem over (1). Suppose that an N × M transform is used, and there are in total n non-overlapping blocks of size N × M in each image or video frame. (Hereafter, each block of size N × M will be referred to as an N × M-block.) Define a quantization table Q = {q1 , q2 , . . . , q L }, where qi is the quantization step size for transform coefficients at the i th frequency position in a predefined scanning order, i ∈ [1, L], where L = N × M. Given Q, each transform coefficient at the i th frequency position would be reconstructed as j qi for some j ∈ {0, ±1, ±2, . . . , ±1+ qAi }, where A represents the maximum

YANG et al.: QUANTIZATION TABLE DESIGN REVISITED FOR IMAGE/VIDEO CODING

possible magnitude that transform coefficients could have. For example, A = 1024 in 8-bit JPEG encoding [4]. In this sense, we say that Q solely determines the reconstruction space, and would simply identify it with a reconstruction space in this paper when there is no ambiguity. When SDQ is adopted, we can divide a quantization process into two parts: a quantization table or reconstruction space Q and a mapping function or quantizer Q Q that maps each sequence of transform coefficients at the i th frequency position, i ∈ [1, L], into an index sequence of length n from {0, ±1, ±2, . . . , ±1+ qAi }n. Given a lossless coding method φ for index sequences, quantizer design under SDQ is equivalent to the following optimization problem: inf inf Rφ (Q, Q Q ), s.t. D(Q, Q Q ) ≤ DT Q QQ

4801

φ that is universal and optimal, in the sense that the inner minimization in (2) can be approximated by the Shannon rate distortion functions of transform coefficients with respect to the alphabet {0, ±1qi , ±2qi , . . . , ±1 + qAi qi } for all Q when n is large enough. From universal lossy source coding theory [5], [22], [23], such lossless coding methods exist. The advantage of this approach is two fold. First, it makes the problem (2) or equivalently (3) tractable. Second, it makes our solution Q to be independent of any specific lossless coding method, which often varies from one application to another, which in turn makes our solution Q widely applicable to many practical image and video coding problems where quantization tables are used, as a good initial Q.

(2)

where Rφ (Q, Q Q ) denotes the number of bits per N × M-block resulting from using φ to encode the index sequences given by Q Q , and D(Q, Q Q ) is the distortion per N × M-block resulting from the SDQ process Q Q in conjunction with Q. Here and throughout the rest of the paper, we assume the mean squared error distortion. In comparison of (2) with (1), there is a striking difference. In (1), once Q is given, the quantized value of each transform coefficient is determined, and so is D(Q). In addition, assuming the lossless coding method φ is fixed, R(Q) is a function of Q only. On the contrary, in (2), both the rate and distortion are functions of both Q and Q Q . Even when Q is given, the quantized value of each transform coefficient is still undecided, until the solution or an approximate solution to the inner minimization problem of (2) is found. Therefore, to a large extent, the quantization process is controlled by the subsequent lossless coding method φ. Convert (2) into the following unconstrained optimization problem   (3) inf inf D(Q, Q Q ) + θ Rφ (Q, Q Q ) Q QQ

where θ is the Lagrange multiplier denoting the relative tradeoff between rate and distortion. Given Q, the solution to the inner minimization problem in (3) is generally referred to as SDQ. In principle, with an initial quantization table Q, (3) can be solved by an iterative algorithm (as in [8] in the case of JPEG encoding): (Step 1) fix Q and seek a SDQ solution Q Q to the inner minimization problem in (3); (Step 2) fix the resulting Q Q and seek a solution Q to the outer minimization problem in (3); and (Step 3) repeat Steps 1 and 2 until a convergence occurs. Although each of Steps 1 and 2 could be optimal itself—please refer to [6]– [9], and [12]– [14] for the applications of these steps to image and video coding standards proposed so far—the iterative algorithm does not converge, in general, to a global optimum. As such, its performance highly depends on the initial Q. In this paper, we aim to determine the optimal Q in (3) or its approximation without determining explicitly the optimal Q Q in (3). In view of (3), optimal Q and Q Q are clearly related to each other and both depend on the lossless coding method φ. To overcome this difficulty, we shall consider

III. P ROBLEM S OLUTION Following the approach alluded to at the end of Section II, we now derive the optimal Q in (2) or its approximation without determining explicitly the optimal Q Q. As aforementioned, we will assume that φ is universal and optimal so that the inner minimization in (2) can be approximated by the Shannon rate distortion functions of transform coefficients with respect to the alphabet {0, ±1qi , ±2qi , . . . , ±1 + qAi qi } for all Q when n is large enough. Rewrite (2) as inf

inf

Q Q Q :D(Q,Q Q )≤DT

Rφ (Q, Q Q ).

(4)

We model transform coefficients across different frequencies L with certain distrias independent random sources {X i }i=1 butions, where each X i , 1 ≤ i ≤ L, is a sequence X i = {X i (k)}nk=1 of length n, representing all transform coefficients at the i th frequency position, and further regard D(Q, Q Q ) and Rφ (Q, Q Q ) in (4) as the average distortion and rate L , respectively. Under this (in bits) of the L sources {X i }i=1 assumption, transform coefficients can be optimally quantized and encoded by separately quantizing and encoding each source X i , 1 ≤ i ≤ L. Without loss of generality, we further assume that each source X i has a zero mean; otherwise, the mean could be subtracted first. As such, the inner minimization in (4) can be rewritten as inf

Q Q :D(Q,Q Q )≤DT

=

Rφ (Q, Q Q ) L 

inf

inf

L Q q :D(X i ,qi ,Q q )≤Di {Di }i=1 i i L 1≤i≤L i=1 Di =DT

Rφ (X i , qi , Q qi )

i=1



=

inf L {Di }i=1

L

i=1

=

Di =DT

inf L {Di }i=1 L D i=1 i =DT Di ≥D(X i ,qi )

L  ⎢ ⎢ ⎣ i=1

⎡ L  i=1

⎢ ⎢ ⎣

⎤ ⎥ Rφ (X i , qi , Q qi )⎥ ⎦

inf Q q i

D(X i ,qi ,Q q )≤Di i

⎥ Rφ (X i , qi , Q qi )⎥ ⎦

inf Q q



i

(5)

D(X i ,qi ,Q q )≤Di i

where Q qi , 1 ≤ i ≤ L, is a mapping from the set of sequences of n transform coefficients to the set {0, ±1, ±2, . . . ,

4802

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014

±1 + qAi }n and represents SDQ for the source X i , D(X i , qi , Q qi ) denotes the average distortion per transform coefficient between X i and the reconstruction sequence given by Q qi and qi (i.e., Q qi × qi ), Rφ (X i , qi , Q qi ) denotes the corresponding rate in bits per transform coefficient for X i , and D(X i , qi ) is the minimal average distortion per transform coefficient for the source X i , which is achievable with the reconstruction space {0, ±1qi , ±2qi , . . . , ±1 + qAi qi }n . Note that D(X i , qi ) is actually equal to the average distortion resulting from the HDQ of X i with the quantization step size qi . The last equality in (5) follows from the fact that when Di < D(X i , qi ), the set of Q qi with D(X i , qi , Q qi ) ≤ Di is empty and hence the corresponding inner minimization in (5) is ∞. At this point, we invoke universal redundancy results from lossy source coding theory [23], which say that when φ is universal4 and optimal, inf

Q q :D(X i , qi ,Q q )≤Di i

Rφ (X i , qi , Q qi )

i

=

q R Xii (Di ) + c



 ln n  A ln n +o 1 + 21 +  qi n n

(6)

q

where R Xii (Di ) is the Shannon rate distortion function of X i with respect to the reconstruction alphabet {0, ±1qi , ±2qi , . . . , ±1 + qAi qi }, and c = O(1) is a positive bounded term. Note that 1 + 21 + qAi  is simply the size of the alphabet {0, ±1qi , ±2qi , . . . , ±1 + qAi qi }, which is proportional to 1/qi . By absorbing some positive bounded term into c = O(1), we can rewrite (6) as inf

Q q :D(X i ,qi ,Q q )≤Di i

Rφ (X i , qi , Q qi )

 ln n  c ln n +o . = + qi n n Combining (7) and (5) with (4) yields inf

Q Q Q :D(Q,Q Q )≤DT

= inf Q

L : {Di }i=1

inf

inf

Q Q Q :D(Q,Q Q )≤DT

≥ inf Q

L : {Di }i=1

Rφ (Q, Q Q ) L  

inf 

L i=1 Di =DT

Di ≥D(X i ,qi )

(7)

Rφ (Q, Q Q )

inf 

L i=1 Di =DT

Di ≥D(X i ,qi )

L   i=1

q

R Xii (Di ) +

 ln n  c ln n +o . qi n n (8)

q Since there is no analytic formula for R Xii (Di ) in general, q to continue with (8), we further lower bound R Xii (Di ) by the Shannon lower bound to the rate distortion function of X i [26]: q L) R Xii (Di ) ≥ R (S X i (Di )

1  = max{H (X i ) − log 2πeDi , 0} (9) 2  1 2 H (X i ) − 2 log 2πeDi if σˆ i > Di = (10) 0 otherwise

(S L)

R X i (Di ) +

i=1

ln n  c ln n + o( ) . qi n n (11)

In (11), one can limit Q to those satisfying L 

D(X i , qi ) ≤ DT

(12)

i=1

L Di = DT with since the distortion profile Di satisfying i=1 Di ≥ D(X i , qi ), 1 ≤ i ≤ L, does not exist if (12) is not valid. We are now led to solve the following optimization problem instead inf Q

L : {Di }i=1

inf 

L i=1 Di =DT

Di ≥D(X i ,qi )

L   ln n  c ln n (S L) R X i (Di ) + + o( ) . (13) qi n n i=1

Given Q satisfying (12), solving the inner minimization in (13) is now equivalent to solving L  

i

q R Xii (Di )

inf

where σi2 is the variance of X i .) Plugging (9) into (8) yields

inf 

{Di } L : L Di =DT i=1 i=1 Di ≥D(X i ,qi )

(S L)

R X i (Di )

 (14)

i=1

since the last two terms in (13) do not depend on Di , L) 1 ≤ i ≤ L. Note that R (S X i (Di ) given in (9) is a convex function of Di . According to the Kuhn-Tucker conditions, the minimization in (14) or equivalently the inner minimization in (13) is achieved when5 ⎧ ⎨d  Di = Di (Q) = D(X i , qi ) ⎩ 2 σˆ i

if D(X i , qi ) ≤ d ≤ σˆ i2 if d < D(X i , qi ) (15) if d > σˆ i2

where d is chosen so that L 

Di (Q) = DT .

(16)

i=1

where H (X i ) is the differential entropy of X i , and is chosen such that H (X i ) = 21 log 2πeσˆ i2 . (Note that according to the maximum differential entropy lemma [26], σˆ i2 ≤ σi2 ,

L D(X i , qi ), the value of d satisAs DT increases from i=1 fying (16) also increases until it hits the ceiling max{σˆ i2 : 1 ≤ i ≤ L}. The solution Di (Q) in (15) can be interpreted as gas pumping with caps from both top and bottom, as illustrated

4 It was shown in [16], [22], and [24] that when φ is universal—for example, φ is the Lempel-Ziv algorithm [25]— q inf Q  :D(X ,q ,Q  )≤D Rφ (X i , qi , Q qi ) converges to R Xi (Di ) as n → ∞ qi qi i i i i even for general ergodic sources.

5 Equation (15) can be derived in a manner similar to the proof of [26, Th. 10.3.3] on the reverse water-filling for independent Gaussian random variables.

σˆ i2

YANG et al.: QUANTIZATION TABLE DESIGN REVISITED FOR IMAGE/VIDEO CODING

Fig. 1.

Gas pumping illustration for Di (Q).

Fig. 2.

Q

L : {Di }i=1

inf 

L  

L i=1 Di =DT Di ≥D(X i ,qi )

L) R (S X i (Di ) +

i=1

ln n c ln n + o( ) qi n n



qi∗ = sup{qi : D(X i , qi ) ≤ Di∗ }

L) R (S X i (Di (Q))

i=1

D(X i , qi∗ ) = Di∗ ,

(17) To continue, let us first investigate inf Q

Q

L 

L) R (S X i (Di (Q)).



δ(Q) = max{qi − qi∗ : 1 ≤ i ≤ L} ≤ 0

i=1

L) R (S X i (Di (Q)) = inf Q

i=1

= =

L : {Di }i=1

inf 

L : {Di }i=1

L 

L 

L i=1 Di =DT Di ≥D(X i ,qi )

inf 

L i=1

Di =DT i=1

(S L)

L) R (S X i (Di )

i=1

L  

R X i (Di∗ )

(S L)

R X i (Di )



(18)

i=1

where Di∗ =



d∗ σˆ i2

if d ∗ ≤ σˆ i2 otherwise

(19)

and d ∗ is chosen such that L 

Di∗ = DT .

Di (Q ∗ ) = Di∗ , 1 ≤ i ≤ L. Indeed, among all Q satisfying (12) and Di (Q) = Di∗ , 1 ≤ i ≤ L, Q ∗ is the largest in the sense that

In view of (14) to (16), it is not hard to see that inf

(22)

which, together with (15), (16), (19) and (20), implies

i=1

L 

(21)

for any 1 ≤ i ≤ L. In general, D(X i , qi ) is a strictly increasing and differentiable function of qi , which will be assumed in our subsequent derivations. Then we have

 ln n c ln n + o( ) + = inf Q qi n n i=1  L

 L  (S L)  ln n c ln n = inf + o( ) . R X i (Di (Q)) + Q qi n n L  

Reverse water-filling illustration for Di∗ .

Di∗ in (19) and (20) with Di (Q) in (15) and (16), we define Q ∗ = (q1∗ , q2∗ , . . . , q L∗ ) such that

in Fig. 1. Plugging (15) into (13), we have inf

4803

for any Q satisfying (12) and Di (Q) = Di∗ , 1 ≤ i ≤ L. Go back to (17). Note that the first summation in (17) is a non-decreasing function of qi , 1 ≤ i ≤ L, whereas the second summation in (17) is a strictly decreasing function of qi . Nonetheless, when n is large enough, the first summation is dominating. Therefore, an optimal Q o = (q1o , q2o , . . . , q Lo ) to (17) (or equivalently to (13)) would be the Q which tries to first minimize the first summation in (17) and then the second summation in (17) if there is room. In other words, we would expect that the optimal Q o is very close to Q ∗ . This is formalized in Theorem 1 below and proved in Appendix A. Theorem 1: Assume that D(X i , qi ), 1 ≤ i ≤ L, is a strictly increasing and differentiable function of qi . Then the optimal Q o = (q1o , q2o , . . . , q Lo ) to (13) satisfies |Q o − Q ∗ | = O(

(20)

i=1

Di∗ ,

The solution 1 ≤ i ≤ L, can be interpreted as a kind of reverse water-filling,6 as illustrated in Fig.2. Comparing 6 This reverse water-filling result is similar to the optimal distortion allocation for parallel Gaussian sources, for which Shannon lower bound on rate distortion functions is tight [26].

(23)

ln n ) n

(24)

where Q ∗ = (q1∗ , q2∗ , . . . , q L∗ ) is defined through (19), (20), and (22). Remark 1: So far we have made the assumption that the lossless coding method φ be universal and optimal, under which (6) is valid. One may wonder what would happen if φ is not. Careful examination of the derivations above and the proof in Appendix A reveals that a result similar to Theorem 1 remains valid if instead of (6), the SDQ performance with

4804

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014

respect to φ can still be decomposed into the sum of the first and second order terms: Rφ (X i , qi , Q qi ) = R Xii (Di ) + f (qi , n) q

inf

Q q :D(X i ,qi ,Q q )≤Di i

Algorithm 1 Optimal Quantization Table Design for JPEG Type DCT-Based Encoding Based on the Laplacian Model

i

(25) where f (qi , n) is a strictly decreasing function of qi for a given n and goes to 0 as n → ∞, i.e., f (qi , n) = o(1). In this case, we have |Q o − Q ∗ | = o(1)

(26)

instead of (24), where Q o = (q1o , q2o , . . . , q Lo ) is the optimal solution to (13) with the high order terms in (13) replaced by f (qi , n). Theorem 1 and (26) suggest a new design principle for designing a quantization table: we could simply compute Q ∗ with the expectation that Q o would be in a small neighborhood of Q ∗ since n is generally large. To further simplify the computation of Q ∗ without involving the computation of differential entropy, we could also replace σˆ i2 by σi2 in (19) and (20). In view of (8) and (11), it is reasonably believed that the new design principle would provide a good approximation to the optimal Q in the original problem (4), especially in the case of high rate (or equivalently small DT ) coding, where the Shannon lower bound is quite close to the actual Shannon rate distortion function (see [5], [27], and references therein for the tightness of the Shannon lower bound). In addition, since the determination of Q ∗ via (19), (20), and (22) is independent of any specific lossless coding method, the new design principle based on Theorem 1 and (26) could be applied to any practical image/video coding systems where quantization table design is involved. In the next section, we will apply this design principle to some DCT-based image coding systems. IV. A PPLICATION TO DCT-BASED I MAGE C ODING In JPEG and some DCT-based image coding such as [19] and [20], an image is first partitioned into non-overlapping 8 × 8-blocks, and each of the resulting 8 × 8-block is then transformed by a 8 × 8 DCT transform. Thus, in this case we have L = 64. To apply the design principle based on Theorem 1 and (26) to DCT-based image coding, there are two issues we need to look at: 1) how to model DCT coefficients at each frequency i , 1 ≤ i ≤ 64, and 2) how to calculate D(X i , qi ) and qi∗ from (22) or equivalently (21). For DC coefficients, which are corresponding to i = 1, it is hard to find a proper probability model because of their irregularity. As such, we would simply model them as a uniformly distributed random source X 1 = {X 1 (k)}nk=1 . Accordingly, we have D(X 1 , q1 ) = q12 /12, where q1 is the quantization step size for DC coefficients. For AC coefficients, several good models have been proposed in the literature, including the Laplacian model, generalized Gaussian model, and Cauchy model (see [28]– [30], and references therein). Due to its advantage of having good tradeoff between modeling accuracy and complexity, we shall focus only on the Laplacian model in what follows in this paper.

According to the Laplacian distribution, AC coefficients X i = {X i (k)}nk=1 at each frequency 2 ≤ i ≤ 64 are modelled with probability density function (pdf) 1 − |xλi | e i , 2 ≤ i ≤ 64, (27) 2λi where λi > 0 is called a scale parameter at the i th frequency position in zig-zag order, and can be estimated, in practice, from the sample values x i (1), x i (k), . . . , x i (n) of X i , where n = W × H /64, and W and H denote the width and height of the image to be encoded, respectively. In particular, the maximum likelihood (ML) estimate of λi is calculated as follows n 1 | x i,k |, 2 ≤ i ≤ 64. (28) λi = n f (x i ) =

k=1

With (27) and (28) in place, D(X i , qi ) can be calculated easily, and so is qi∗ with Di∗ = D(X i , qi∗ ) in (22). Note, however, that D(X i , qi ) is the smallest distortion achievable with quantization step size qi for the source X i , which is given by the uniform quantizer with step size qi . If qi∗ with Di∗ = D(X i , qi∗ ) was chosen, then the uniform quantizer with step size qi∗ would have to be used to achieve Di∗ , which leaves no room for the subsequent R-D trade-off with the reconstruction space {0, ±1qi∗ , ±2qi∗ , . . . , ±1 + qA∗ qi∗ }n . i To overcome this problem, we will reduce the value of qi∗ slightly by using a distortion function different from D(X i , qi ) in (21) and (22). (Note that this is consistent with the design principle based on Theorem 1 and (26) since Q o is expected to be in a small neighborhood of Q ∗ computed in accordance with D(X i , qi ).) Specifically, instead of using D(X i , qi ) in (21) and (22), we will use the distortion of a dead-zone quantizer with uniform reconstruction [31], [32]. Given the uniform reconstruction rule in JPEG and a quantization step size qi , 2 ≤ i ≤ 64, the corresponding dead-zone size (for the positive part) si , is computed by [32] qi . (29) si = qi − λi + q /λ i e i −1 Let D Lap (λi , qi ) denote the distortion of the resulting deadzone quantizer with the quantization step size qi for the

YANG et al.: QUANTIZATION TABLE DESIGN REVISITED FOR IMAGE/VIDEO CODING

4805

TABLE I PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR BASELINE JPEG FOR 512 × 512 A IRPLANE (F16)

TABLE II PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR BASELINE JPEG FOR 512 × 512 G OLDHILL

TABLE III PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR BASELINE JPEG FOR 512 × 512 L ENA

TABLE IV PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR BASELINE JPEG FOR 512 × 512 D OME

Laplacian source X i with the scale parameter λi , 2 ≤ i ≤ 64. It can be shown (after simplifying [32, eq. (9)]) that D Lap (λi , qi ) can be computed as follows D Lap (λi , qi ) = 2λ2i −

2qi (λi + si − 0.5qi ) . esi /λi (1 − e−qi /λi )

(30)

To compute our desired qi∗ , we then use D Lap (λi , qi ) in place of D(X i , qi ) in (21) and (22). Based on the Laplacian model (27) and (28), our desired ∗ ) can be determined as follows. PreQ ∗ = (q1∗ , q2∗ , . . . , q64 determine a maximum integer quantization step size qmax . If the water level d in (19) and (20) is greater than the source variance σi2 , i = 1, 2, . . . , 64, we directly quantize all

coefficients at this frequency to zeros. This strategy is called fast quantization. We then set qi∗ = qmax .7 Otherwise, qi∗ is selected such that  √ if i = 1 min { 12d, qmax } ∗ (31) qi = max {qi ∈ Q : D Lap (λi , qi ) ≤ d} if 2 ≤ i ≤ 64 where Q = {1, 2, . . . , qmax }. The maximization problem in (31) can be solved by the bi-section search over Q for 2 ≤ i ≤ 64. The procedure is summarized in Algorithm 1. 7 In this case, q ∗ is dummy when HDQ is considered since no matter what i qi∗ is, the corresponding reconstruction level is always 0. However, this is not true in general for SDQ, as the quantized zero is the initialization for the iterative algorithm.

4806

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014

TABLE V PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR BASELINE JPEG FOR 720 P S TOCKHOLM (1 ST F RAME )

TABLE VI PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR BASELINE JPEG FOR 1080 P K IMONO (1 ST F RAME )

TABLE VII PSNR C OMPARISON OF D IFFERENT Q- TABLE DESIGN METHODS FOR BASELINE JPEG FOR 2048×2560 B IKE

TABLE VIII PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR BASELINE JPEG FOR 2048×2560 W OMAN

TABLE IX PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB A LONG WITH JPEG 2000 FOR 512 × 512 A IRPLANE (F16)

V. E XPERIMENTAL R ESULTS Having described our quantization table design algorithm based on the Laplacian model, i.e., Algorithm 1, we now evaluate its performance first in baseline JPEG encoding and then in ARL [19] and ECEB [20]. Experiments have been

conducted on a set of standard 8-bit gray scale test images with different resolutions. In all experiments, qmax is set as 46. In JPEG cases, customized Huffman tables are used. Both the HDQ coding (i.e., SDQ off) and SDQ coding (i.e., SDQ on) have been tested. When SDQ is on, the iterative algorithm

YANG et al.: QUANTIZATION TABLE DESIGN REVISITED FOR IMAGE/VIDEO CODING

4807

TABLE X PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB A LONG W ITH JPEG 2000 FOR 512 × 512 G OLDHILL

TABLE XI PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB A LONG W ITH JPEG 2000 FOR 512 × 512 L ENA

TABLE XII PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB A LONG W ITH JPEG 2000 FOR 512 × 512 D OME

TABLE XIII PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB ALONG W ITH JPEG 2000 FOR 720 P S TOCKHOLM (1 ST F RAME )

TABLE XIV PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB A LONG W ITH JPEG 2000 FOR 1080 P K IMONO (1 ST F RAME )

in [8] has been further applied to provide a complete solution to the optimization problem (3). To facilitate our subsequent discussion, we shall refer to HDQ coding with quantization

table designed by Algorithm 1 as J-OptD-HDQ, and SDQ coding with its initial quantization table designed by Algorithm 1 as J-OptD-SDQ. Tables I-VIII show the

4808

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014

TABLE XV PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB A LONG W ITH JPEG 2000 FOR 2048×2560 B IKE

TABLE XVI PSNR C OMPARISON OF D IFFERENT Q-TABLE D ESIGN M ETHODS FOR ARL AND ECEB A LONG W ITH JPEG 2000 FOR 2048×2560 W OMAN

TABLE XVII C OMPUTER RUNNING T IME ( IN M ILLISECONDS ) OF D IFFERENT Q-TABLE D ESIGN M ETHODS AND O THER E NCODING C OMPONENTS FOR BASELINE JPEG E NCODING FOR 512 × 512 I MAGES

peak signal-to-noise ratio (PSNR) performance of J-OptDHDQ and J-OptD-SDQ for 512 × 512 Airplane (F16), 512 × 512 Goldhill, 512 × 512 Lena, 512 × 512 Dome, 720p Stockholm (the first frame of the corresponding testing video sequence in [14]), 1080p Kimono (the first frame of the corresponding testing video sequence in [14]), 2048 × 2560 Bike, and 2048 × 2560 Woman, respectively. Also shown in Tables I-VIII are the PSNR performance of the HDQ coding with quantization table designed by methods in [1] and [3] (hereafter referred to as J-OptQ-HDQ and J-RDOPT-HDQ, respectively) and the PSNR performance of the SDQ coding with its initial quantization table designed by methods in [1] and [3] (hereafter referred to as J-OptQSDQ and J-RDOPT-SDQ, respectively). As mentioned earlier, before our present work, J-RDOPT-HDQ represents the stateof-the-art JPEG HDQ optimizer, and J-RDOPT-SDQ represents the state-of-the-art JPEG SDQ optimizer [8]. In addition, the PSNR performance of baseline JPEG coding using a (scaled) default quantization table is listed in Tables I-VIII as an anchor. On the other hand, Table XVII shows the computer running times8 of different quantization table design schemes along with the running times of other JPEG encoding components for compressing some 512 × 512 images. In Table XVII, J-OptQ, J-RDOPT, and OptD represent the quantization table design methods in the references [1] and [3], and our proposed

Algorithm 1, respectively. The results of our proposed schemes are bolded in all the tables in this paper. As can be seen from these tables, the experimental results for all tested schemes are highly consistent with our discussion in Section I and Section IV regarding both coding performance and computational complexity. When SDQ is off, our proposed J-OptD-HDQ significantly outperforms J-OptQ-HDQ and J-RDOPT-HDQ by 0.7 dB9 and 0.5 dB roughly in PSNR on average with complexity reduced by a factor of more than 150 and 2000, respectively. When SDQ is on, the proposed J-OptD-SDQ provides a 0.3 dB gain and a 0.2 dB gain or more on average over J-OptQ-SDQ and J-RDOPT-SDQ respectively, with about 30% and 85% complexity reduced accordingly. Compared with baseline JPEG, our proposed J-OptQ-HDQ offers an average 1.5 dB gain with negligible complexity increase; and our proposed J-OptQ-SDQ provides an average 2.0 dB gain or more with a slight increase in computational complexity (due to the SDQ algorithm). Another interesting observation is that in both HDQ coding and SDQ coding, the performance gain from our proposed quantization table design scheme, tends to be larger when the rate is higher, which is consistent with our statement in the last paragraph of Section III. This, together with the significant overall coding gain, validates our theoretical findings in Section III once again.

8 All experiments in this paper were run on an Apple Mac Pro 8-core 2.4GHz 12G RAM computer.

9 Generally, a 0.1 dB gain in PSNR is equivalent to 2-3% bit rate reduction in JPEG encoding.

YANG et al.: QUANTIZATION TABLE DESIGN REVISITED FOR IMAGE/VIDEO CODING

4809

TABLE XVIII C OMPUTER RUNNING T IME (I N M ILLISECONDS ) OF A LL E NCODING C OMPONENTS FOR ARL OR / AND ECEB FOR 512 × 512 I MAGES

The complexity overhead (on top of baseline JPEG) of J-OptD-HDQ mainly comes from the calculation of the variance σi2 and the ML estimate of λi . In some very low/high rate coding cases, this complexity can be further reduced by skipping the variance calculation for some high/low frequencies, as σi2 can be always smaller/larger or equal to the water level d. For low rate encoding, sometimes the complexity introduced by J-OptD-HDQ can almost be compensated by the fast quantization strategy. In what follows, we will discuss the related experiments on ARL and ECEB. In ARL cases,10 both the HDQ and SDQ coding have been tested. When SDQ is on, the SDQ algorithm proposed in [10] and its corresponding iterative algorithm have been applied. The HDQ coding with quantization table designed by Algorithm 1 is dubbed A-OptDHDQ, and SDQ coding with its initial quantization table designed by Algorithm 1 is referred to as A-OptD-SDQ. Tables IX-XVI show the PSNR performance of A-OptDHDQ and A-OptD-SDQ for all the test images in the same order as the JPEG cases, which are compared with those of the ARL HDQ and SDQ coding with a (scaled) default JPEG quantization table (hereafter referred to as A-DefQHDQ and A-DefQ-SDQ, respectively). Tables IX-XVI show the PSNR performance of the ECEB11 HDQ (no ECEB SDQ coding performance is shown, since there is no SDQ algorithm designed for ECEB) with a (scaled) default JPEG quantization table, a uniform quantization table (as in the original ECEB codec, i.e., all quantization step sizes in the quantization table are the same), and a quantization table designed by Algorithm 1, referred to as E-DefQ-HDQ, E-UnifQ-HDQ and E-OptD-HDQ, respectively. Also shown in Tables IX-XVI is the PSNR performance of JPEG 2000 [33] (with 9/7 wavelet transform).12 Computer running times of all encoding components for the related ARL or/and ECEB tests are illustrated in Table XVIII. Again, OptD represents Algorithm 1 in Table XVIII. On average, A-OptD-HDQ and A-OptD-SDQ significantly outperform their respective counterpart A-DefQ-HDQ and A-DefQ-SDQ by 1.5 dB and 0.7 dB in PSNR, and E-OptDHDQ provides a 1.6 dB and a 0.3 dB notable performance gain over E-DefQ-HDQ and E-UnifQ-HDQ, respectively. Interestingly, in comparison with JPEG 2000, E-OptD-HDQ actually provides on average 0.3 dB gain in PSNR, as shown in Tables IX-XVI. In addition, according to Table XVIII, the complexity introduced by Algorithm 1 is negligible for both ARL and ECEB encoding. 10 All ARL related tests are run on our implementation of [19]. 11 All ECEB related tests are run on the code kindly provided by the authors

of [19] and [20]. 12 JPEG 2000 tests are based on the JasPer codec [34]. Note that some of the rate points in our tests could not be achieved by the JasPer codec, which is indicated in the tables by “–”.

VI. C ONCLUSIONS In this paper, quantization table design problem has been revisited from a new perspective where SDQ is considered. Unlike the traditional quantization table design where an actual encoding method is assumed, we design a quantization table for the purpose of reconstruction. An optimal distortion profile for designing quantization tables has been derived under some assumptions, which provides a generic solution for the quantization table design problem for image/video coding. Based on our theoretical result, we have then proposed an efficient algorithm using the Laplacian model to optimize quantization tables for DCT-based image coding. When tested over standard images for baseline JPEG encoding, our algorithm achieves the best compression performance when SDQ is both on and off, with almost no extra burden on complexity. As such, the proposed quantization table optimization algorithm, together with the SDQ algorithm in [8], shall be treated as a benchmark for evaluating future JPEG encoding algorithms. In addition, to further verify this generic algorithm for DCT-based image coding, we have also applied it to ARL [19] and ECEB [20], yielding significantly improved coding efficiency in both cases. Our algorithm requires a two-pass scan on the image to be encoded: the first scan is to perform transform block by block and determine a desired quantization table, and the second scan is to perform actual quantization and encoding, which may be a drawback for some online applications. A PPENDIX A In this Appendix, we prove Theorem 1. For any Q satisfying (12), define

L L  ln n c ln n   (S L) + o( ) R X i (Di (Q)) + F(Q) = qi n n i=1

i=1

and 

D(Q) =(D1 (Q), D2 (Q), . . . , D L (Q)). We want to compare F(Q ∗ ) with F(Q). We distinguish  among different cases: (i) δ(Q) < 0; (ii) δ(Q) ≥ C1 lnnn ;  (iii) C2 lnnn ≤ δ(Q) < C1 lnnn ; and (iv) 0 ≤ δ(Q) < C2 lnnn , where C1 > 0 and C2 > 0 are constants to be specified later. In Case (i), we have Di (Q) = Di∗ = Di (Q ∗ ), 1 ≤ i ≤ L and hence F(Q) > F(Q ∗ ). In Case (ii), in view of the strictly increasing and differentiable property assumption (15), (16), (19) and (20) about D(X i , qi ), it follows from  that |D(Q) − D(Q ∗ )| ≥ c1 C1 lnnn for some constant c1 > 0, where |D(Q)− D(Q ∗ )| denotes the Euclidean distance

4810

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014

between D(Q) and D(Q ∗ ). This, together with the convexity L) ∗ of R (S X i (Di ) and the optimality of Di , implies that L 

L) R (S X i (Di (Q))

i=1

>

L 

L) ∗ 2 2 R (S X i (Di (Q )) + cˆ1 c1 C1

i=1

ln n n

for some constant cˆ1 > 0, and hence F(Q) − F(Q ∗ ) ln n  c + n q i=1 i  ln n > cˆ1 c12 C12 − n ∗ L

> cˆ1 c12 C12

i:qi >qi

ln n ln n  c ln n − + o( ) ∗ n qi n n L

i=1

ln n c ln n + o( )>0 qi∗ n n

for large n, when C1 > 0 is chosen properly. In Case (iii), a similar argument can be used to show that F(Q) − F(Q ∗ )

 c ln n c ln n + o( ) − ∗ q q n n i ∗ i i:qi >qi   ln n ln n ∗ ) > c3 δ(Q) c2 |D(Q) − D(Q )| − + o( n n   ln n ln n ln n + o( − )>0 ≥ c3 δ(Q) c2 c1 C2 n n n

> cˆ1 |D(Q) − D(Q ∗ )|2 −

for some constants c2 > 0, c3 > 0, and large n, when C2 > 0 is properly chosen. Therefore, the optimal Q o = (q1o , q2o , . . . , q Lo ) to (17) falls into Case (iv), i.e., ln n , n which, together with (15), (16), (19) and (20), further implies that qio ≥ qi∗ − O( lnnn ) for any qio < qi∗ , and hence 0 ≤ δ(Q o ) < C2

ln n ). n This completes the proof of Theorem 1. |Q o − Q ∗ | = O(

R EFERENCES [1] A. C. Hung and T. H.-Y. Meng, “Optimal quantizer step sizes for transform coders,” in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Apr. 1991, pp. 2621–2624. [2] S.-W. Wu and A. Gersho, “Rate-constrained picture-adaptive quantization for JPEG baseline coders,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Apr. 1993, pp. 389–392. [3] V. Ratnakar and M. Livny, “An efficient algorithm for optimizing DCT quantization,” IEEE Trans. Image Process., vol. 9, no. 2, pp. 267–270, Feb. 2000. [4] Information Technology—Digital Compression and Coding of Continuous-Tone Still Images: Requirements and Guidelines, document ISO/IEC 10918-1:1994, 1994. [5] E.-H. Yang and Z. Zhang, “Variable-rate trellis source encoding,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 586–608, Mar. 1999. [6] E.-H. Yang and X. Yu, “Rate distortion optimization for H.264 interframe coding: A general framework and algorithms,” IEEE Trans. Image Process., vol. 16, no. 7, pp. 1774–1784, Jul. 2007. [7] E.-H. Yang and X. Yu, “Soft decision quantization for H.264 with main profile compatibility,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 1, pp. 122–127, Jan. 2009. [8] E.-H. Yang and L. Wang, “Joint optimization of run-length coding, Huffman coding, and quantization table with complete baseline JPEG decoder compatibility,” IEEE Trans. Image Process., vol. 18, no. 1, pp. 63–74, Jan. 2009.

[9] E.-H. Yang and L. Wang, “Full rate distortion optimizaton of MPEG-2 video coding,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Nov. 2009, pp. 605–608. [10] E.-H. Yang and L. Wang, “Joint optimization of run-length coding, context-based arithmetic coding and quantization step sizes,” in Proc. Can. Conf. Elect. Comput. Eng. (CCECE), vol. 4. May 2009, pp. 678–681. [11] E.-H. Yang. (Nov. 2010). Computational approach to lossy compression and its applications to image/video coding. Chinese Univ. Hong Kong. [Online]. Available: http://www.inc.cuhk.edu.hk/seminars/computational-approach-lossy-compression-and-it%s-applications-imagevideocoding [12] M. Karczewicz, P. Chen, Y. Ye, and R. Joshi, “R-D based quantization in H.264,” Proc. SPIE, vol. 7443, p. 744314, Sep. 2009. [13] Advanced Video Coding for Generic Audiovisual Services, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T, document ITU-T Recommendation H.264, 2005. [14] High Efficiency Video Coding (HEVC) Text Specification Draft 8, Joint Collaborative Team on Video Coding (JCT-VC), document ITU-T SG16 WP3 and I. JTC1/SC29/WG11, 2012. [15] P. A. Chou, T. Lookabaugh, and R. M. Gray, “Entropy-constrained vector quantization,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 1, pp. 31–42, Jan. 1989. [16] E.-H. Yang and S.-Y. Shen, “Distortion program-size complexity with respect to a fidelity criterion and rate-distortion function,” IEEE Trans. Inf. Theory, vol. 39, no. 1, pp. 288–292, Jan. 1993. [17] M. Crouse and K. Ramchandran, “Joint thresholding and quantizer selection for transform image coding: Entropy-constrained analysis and applications to baseline JPEG,” IEEE Trans. Image Process., vol. 6, no. 2, pp. 285–297, Feb. 1997. [18] K. Ramchandran and M. Vetterli, “Rate-distortion optimal fast thresholding with complete JPEG/MPEG decoder compatibility,” IEEE Trans. Image Process., vol. 3, no. 5, pp. 700–704, Sep. 1994. [19] C. Tu, J. Liang, and T. D. Tran, “Adaptive runlength coding,” IEEE Signal Process. Lett., vol. 10, no. 3, pp. 61–64, Mar. 2003. [20] C. Tu and T. D. Tran, “Context-based entropy coding of block transform coefficients for image compression,” IEEE Trans. Image Process., vol. 11, no. 11, pp. 1271–1283, Nov. 2002. [21] E.-H. Yang, C. Sun, and J. Meng, “Quantization table design revisited for image/video coding,” in Proc. 20th IEEE Int. Conf. Image Process. (ICIP), Sep. 2013, pp. 1855–1859. [22] E.-H. Yang, Z. Zhang, and T. Berger, “Fixed-slope universal lossy data compression,” IEEE Trans. Inf. Theory, vol. 43, no. 9, pp. 1465–1476, Sep. 1997. [23] E.-H. Yang and Z. Zhang, “The redundancy of source coding with a fidelity criterion .II. Coding at a fixed rate level with unknown statistics,” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 126–145, Jan. 2001. [24] E.-H. Yang and J. C. Kieffer, “Simple universal lossy data compression schemes derived from the Lempel–Ziv algorithm,” IEEE Trans. Inf. Theory, vol. 42, no. 1, pp. 239–245, Jan. 1996. [25] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE Trans. Inf. Theory, vol. 24, no. 5, pp. 530–536, Sep. 1978. [26] T. Cover and J. Thomas, Elements of Information Theory. New York, NY, USA: Wiley, 2006. [27] T. Linder and R. Zamir, “On the asymptotic tightness of the Shannon lower bound,” IEEE Trans. Inf. Theory, vol. 40, no. 6, pp. 2026–2031, Nov. 1994. [28] R. Reininger and J. D. Gibson, “Distributions of the two-dimensional DCT coefficients for images,” IEEE Trans. Commun., vol. 31, no. 6, pp. 835–839, Jun. 1983. [29] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans. Image Process., vol. 9, no. 10, pp. 1661–1666, Oct. 2000. [30] F. Muller, “Distribution shape of two-dimensional DCT coefficients of natural images,” Electron. Lett., vol. 29, no. 22, pp. 1935–1936, Oct. 1993. [31] G. J. Sullivan, “Efficient scalar quantization of exponential and Laplacian random variables,” IEEE Trans. Inf. Theory, vol. 42, no. 5, pp. 1365–1374, Sep. 1996. [32] G. J. Sullivan and S. Sun, “On dead-zone plus uniform threshold scalar quantization,” Proc. SPIE, vol. 5960, pp. 1041–1052, Jul. 2005. [33] JPEG2000 Requirements and Profiles, document ISO/IEC JTC 1/SC 29/WG 1, 2000. [34] M. Adams. JasPer JPEG 2000 Codec Version 1.900.1. [Online]. Available: http://www.ece.uvic.ca/~frodo/jasper, accessed Jan. 19, 2007.

YANG et al.: QUANTIZATION TABLE DESIGN REVISITED FOR IMAGE/VIDEO CODING

En-hui Yang (M’97–SM’00–F’08) received the B.S. degree in applied mathematics from Huaqiao University, Quanzhou, China, in 1986, and the Ph.D. degree in mathematics from Nankai University, Tianjin, China, in 1991. He has been with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada, since 1997, where he is currently a Professor and the Canada Research Chair in Information Theory and Multimedia Compression. He held a Visiting Professor position with the Chinese University of Hong Kong, Hong Kong, from 2003 to 2004, Research Associate and Visiting Scientist positions with the University of Minnesota, Minneapolis, MN, USA, the University of Bielefeld, Bielefeld, Germany, and the University of Southern California, Los Angeles, CA, USA, from 1993 to 1997, and a faculty position first as an Assistant Professor and then an Associate Professor with Nankai University from 1991 to 1992. He is the Founding Director of the Multimedia Communications Laboratory with the Leitch-University of Waterloo, and the Co-Founder of SlipStream Data Inc., Waterloo. He currently serves as an Executive Council Member of the China Overseas Exchange Association and an Overseas Advisor of the Overseas Chinese Affairs Office of the City of Shanghai, and serves on the Overseas Expert Advisory Committee of the Overseas Chinese Affairs Office of the State Council of China, and on a Review Panel for the International Council for Science. His current research interests are multimedia compression, multimedia transmission, digital communications, information theory, source and channel coding, image and video coding, image and video understanding and management, big data analytics, and security. Dr. Yang was a recipient of several research awards and honors, including the prestigious Inaugural Premier’s Catalyst Award for the Innovator of the Year in 2007, the 2007 Ernest C. Manning Award of Distinction, one of the Canada’s most prestigious innovation prizes, the 2013 CPAC Professional Achievement Award, and the 2014 Padovani Lecture Award. Products based on his inventions and commercialized by SlipStream received the 2006 Ontario Global Traders Provincial Award. With over 200 papers and over 200 patents/patent applications worldwide, his research work has had an impact on the daily life of hundreds of millions people over 170 countries. He is a fellow of the Canadian Academy of Engineering and the Royal Society of Canada, the Academies of Arts, Humanities, and Sciences of Canada. He served, among many other roles, as the General Co-Chair of the 2008 IEEE International Symposium on Information Theory, an Associate Editor of the IEEE T RANSACTIONS ON I NFORMATION T HEORY, the Technical Program Vice Chair of the 2006 IEEE International Conference on Multimedia and Expo, the Chair of the Award Committee for the 2004 Canadian Award in Telecommunications, a Co-Editor of the 2004 Special Issue of the IEEE T RANSACTIONS ON I NFORMATION T HEORY, and the Co-Chair of the 2003 U.S. National Science Foundation Workshop on the Interface of Information Theory and Computer Science, and the 2003 Canadian Workshop on Information Theory.

4811

Chang Sun received the B.S. and M.S. degrees in electrical engineering from the Shandong University of Science and Technology, Qingdao, China, and Shandong University, Jinan, China, in 2005 and 2008, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Waterloo, Waterloo, ON, Canada, in 2014. His research interests include multimedia processing, compression and transmission, and information theory. Dr. Sun was a recipient of the National Fellowship of National Ministry of Education of China in 2005.

Jin Meng received the B.Eng. degree in information and electronics engineering from Tsinghua University, Beijing, China, and the M.A.Sc. and Ph.D. degree in electrical and computer engineering from the University of Waterloo, Waterloo, ON, Canada. He currently holds a post-doctoral position with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada.

video coding.

Quantization table design is revisited for image/video coding where soft decision quantization (SDQ) is considered. Unlike conventional approaches, wh...
3MB Sizes 5 Downloads 4 Views