Modeling the subjective quality of highly contrasted videos displayed on LCD with local backlight dimming.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 2, FEBRUARY 2015

573

Modeling the Subjective Quality of Highly Contrasted Videos Displayed on LCD With Local Backlight Dimming Claire Mantel, Member, IEEE, Søren Bech, Jari Korhonen, Member, IEEE, Søren Forchhammer, Member, IEEE, and Jesper Melgaard Pedersen

Abstract— Local backlight dimming is a technology aiming at both saving energy and improving visual quality on television sets. As the rendition of the image is specified locally, the numerical signal corresponding to the displayed image needs to be computed through a model of the display. This simulated signal can then be used as input to objective quality metrics. The focus of this paper is on determining which characteristics of locally backlit displays influence quality assessment. A subjective experiment assessing the quality of highly contrasted videos displayed with various local backlight-dimming algorithms is set up. Subjective results are then compared with both objective measures and objective quality metrics using different display models. The first analysis indicates that the most significant objective features are temporal variations, power consumption (probably representing leakage), and a contrast measure. The second analysis shows that modeling of leakage is necessary for objective quality assessment of sequences displayed with local backlight dimming. Index Terms— Displayed image quality, backlight dimming, video quality, contrast, LED backlight, light leakage, liquid crystal display (LCD).

I. I NTRODUCTION

T

HERE are many steps in the processing of a video sequence from acquisition to display and each of them influences the quality. Up until now research in image and video quality assessment has mainly focused on evaluating the effect of transmission and compression artifacts on quality. Displays have usually been considered as ’free from artifacts’, meaning that their impact on quality was neglected. This is for example reflected in the ITU or VQEG recommendations [1], [2], where minimum characteristics are given for

Manuscript received March 13, 2014; revised July 23, 2014 and November 4, 2014; accepted December 2, 2014. Date of publication December 18, 2014; date of current version January 8, 2015. This work was supported by the Danish Council for Strategic Research under Grant 09-067034. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Damon M. Chandler. C. Mantel, J. Korhonen, and S. Forchhammer are with the Department of Photonics Engineering, Technical University of Denmark, Kongens Lyngby 2800, Denmark (e-mail: [email protected]; [email protected]; [email protected]). S. Bech is with Bang and Olufsen A/S, Struer 7600, Denmark, and also with the Department of Electronic systems, Section of Signal and Information Processing, Aalborg University, Aalborg 9100, Denmark (e-mail: [email protected]). J. M. Pedersen is with Bang and Olufsen A/S, Struer 7600, Denmark (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2383319

Fig. 1. Image produced with two backlight dimming algorithms: average luminance of the segment (left) and with all LEDs at full intensity (right).

displays so that they can be considered ’good enough’ to have no significant influence on the visual quality. The arrival of Liquid Crystal Displays (LCDs) with Local Backlight Dimming (LBD) makes the impact of the screen on quality more complex as it can play a more active role in the rendition. An LCD is composed of a light source, called the backlight, providing light to a grid of pixels, made of Liquid Crystal cells (LCs), that form the image [3]. The backlight is today commonly provided by LED segments, whose light is spread through a diffuser to smoothen it. The LBD technology consists in controlling each of the LED segments independently from one another and adapting their intensity to the image content. Compared to a conventional backlight where the whole screen is covered by a single segment at full intensity, this allows both energy savings and quality improvement. Indeed the backlight is the main energy consuming unit of a display and lowering the backlight luminance in the dark areas of an image can reduce power consumption significantly. It can also improve the image quality, as one of the main defects of LCDs is that black areas sometime look grayish. The two major spatial artifacts due to the backlight intensity are leakage when LCs receive more light than they can block and reach a higher luminance than desired [3] (thus creating those grayish areas) and clipping when LCs do not receive enough light to achieve the targeted luminance. An example of leakage and clipping defects is shown in Fig. 1.

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

574

There exist two types of structure for LED backlight: the LED segments can be located in a 2D arrangement behind the screen in an direct backlight or they can be located on the left and right sides of the screen for an edge-lit backlight. The experiments in this paper are based on edge-lit backlight. Stricter power regulations in TVs and better quality favored the increase in market share of LCDs with LBD. The LBD market share for displays is dominated by edgelit displays as opposed to displays with direct backlight. The resolution of the LED segments is much lower than that of the LC pixel grid. Each LED segment sheds light on several LC pixels and inversely each LC cell receives light from different segments. Thus for videos containing neighboring bright and dark areas, there is no ideal set of LED intensities that would satisfy the requirements of all pixels but there will be clipping, leakage or both. For this reason, LBD is bound to impact the quality of the rendered image and it has to be taken into account for quality assessment. Brunnström et al. have for example shown that LBD impacts the perception of motion blur in videos [4]. Another issue regarding quality assessment on LED-LCDs is the absence of a standardized way to compute the LED intensities defining the backlight for a given image. Several algorithms have been developed to exploit the possibilities offered by local backlight. Using simple image statistics [3], multiple histograms of the image [5] or a model of the display [6], [7], the algorithm decides on the intensity of each LED according to its own strategy. For exactly the same input image, the rendition on the display will differ and vary in quality. Therefore rendition with backlight dimming truly is an additional step in the quality assessment chain. To render an image with LBD, the display uses both the LED intensities and the LC values that, combined together, will create the rendered image. This combination takes place physically on the screen, thus there is no available numerical signal representing the image as it is displayed. In order to investigate the distortions created by the display and the LBD, the display needs to be modeled to create a representation of the actual image. Previous studies have shown that it is possible to model the displayed image [8], [9], which can then e.g. serve as input to objective metrics. The performance of LBD algorithms have objectively mostly been evaluated through contrast measures, which do not accurately represent the perceived quality. Furthermore no dedicated objective metric exists. The authors have investigated the efficiency of ’classic’ image quality metrics on images displayed with various LBD algorithms [10]. Several metrics performed satisfyingly for bright images but had more difficulty with contrasted dark images. This type of images is indeed the most challenging for LBD, as there is no perfect display solution but a tradeoff must be chosen between providing enough light to bright pixels (i.e. to avoid clipping) and not providing too much light to dark areas so they look truly black (i.e. to avoid leakage). Many characteristics of a display could play a role in the resulting quality of an image or video. The aim of this paper is to investigate which features of a sequence displayed on an LCD with LBD are the most relevant for quality assessment and how to best utilize that information in relation to quality


assessment. We first set-up a subjective experiment to obtain subjective evaluations of videos displayed on an LCD with LBD. We then used two different methods to predict the subjective results: statistical modeling using objective features and objective quality metrics applied with different display models. The first approach allows to assess which features extracted from the simulated signal are relevant for subjective quality assessment. The second investigates the importance of leakage modeling for a quality metric, or more precisely what accuracy is needed for the leakage factor in terms of value and angle dependence. This paper is organized as follows: prior art on the role of display for image quality assessment is presented in Sec. II. Thereafter, the LCD model used is described in Sec. III. Then, the subjective experiment and the analysis of the results are detailed in Sec. IV. In Sec. V, the subjective results are compared to objective measures and quality metrics. II. P REVIOUS W ORK In the literature, LBD algorithms are validated either through measures of the physical contrast or by applying an objective metric to a model of the image [3], [5], [6], [11]. As contrast is not fully representative of quality, a global approach using a model should become more general. Most existing models of the displayed image only implement the gamma correction of the screen. The model proposed in [9] was the first to mathematically model the leakage defect. It was further developed to simulate the luminance of rendered images as it is perceived by the Human Visual System [7], [8]. The objectives of this paper are to evaluate firstly which features of the rendered videos are the most relevant for quality assessment and secondly which leakage modeling, if any, works best towards predicting quality. In order to apply an objective quality metric on images simulated with a model of display, the metric needs to be validated for LBD defects with subjective data representing the ground truth. To the best of our knowledge, the only published studies comparing subjective and objective quality of images displayed with LBD are the aforementioned [10] and a preliminary analysis of the subjective experiment described in this paper [12]. The preliminary analysis examined the use of leakage models for objective quality metrics; it is extended here and completed with an investigation of the role of objective features for quality assessment. Therefore we also consider work not specifically related to LBD but concerned with modeling displays for objective quality assessment, with investigating the influence of some aspects of displayed videos on quality and with integrating those aspects for quality evaluations. In [13], Huang et al. modeled two properties of LCDs, the reflection of the ambient light and the gamma correction, to simulate the rendered images and use the simulations as input to quality metrics. The authors showed that modeling those aspects can enhance the performance of quality metrics. However, the authors modeled properties generically for all displays and they did not account for leakage. As shown in [10], ‘regular’ image quality metrics can be used (with some restraints) to assess images with LBD. In this paper the

MANTEL et al.: MODELING THE SUBJECTIVE QUALITY OF HIGHLY CONTRASTED VIDEOS

approach is not to use a display model to improve metrics but to use metrics to examine the influence of leakage on quality (e.g. in terms of contrast) and to find out which leakage model works best. In [14] and [15] the authors investigated the impact of some features on the perceived quality of LCD with globally dimmed backlights. In [14], the authors showed that the mean displayed luminance plays an important role for quality, whereas the ambient light does not. In [15] Seetzen et al. studied the influence of the contrast ratio, the peak luminance and the range of luminance over quality. The authors then provided models of the quality as a function of each feature, in order to have guidelines to set those parameters. Both papers focus on global dimming and do not account for one of the most important aspects of local backlight dimming: the spatial adaptation to the local image content. In addition to focusing on LBD, we shall in this paper use a model of the display to evaluate the relevancy of several features for quality assessment in regard to one set of subjective quality grades. We shall use Partial Least Squares (PLS) to predict subjective quality with objective features to identify which features are relevant for quality assessment. PLS analysis was applied in the context of video quality assessment by Keimel et al. to predict the quality of h.264 encoded videos using objective features extracted from the bistream [16]. PLS is a statistical analysis method that builds a linear model predicting a response variable using a set of predictors potentially numerous and correlated [17]. The goal in [16] was to use PLS with various features to design an objective quality metric.

575

Fig. 2.

Point spread function of one LED segment.

where f C and b(i ) are both in the physical (linear) domain. A. Backlight Computation In backlight architectures, the light from each LED segment is diffused and mixed with the light coming from the other segments on the diffuser plate, located behind the LC cells. The light from an individual light source is diffused, which may be described by a point spread function (PSF), that depends on the optical properties of the diffuser plate. The PSF of one LED segment measured on the display used here (edge-lit with a structure of 2 columns and 8 rows) is depicted in Fig. 2. The backlight relative luminance, b, can be modeled by multiplying the normalized PSF and the LED level r , and then summing the contributions from all the backlight elements [7], [9],

III. M ODELING OF LCD W ITH L OCAL LED BACKLIGHT In [9] a model of LCD with LBD was presented. It expresses the relative rendered luminance x as a function of the backlight luminance b, the LC transmittance f C and the target luminance y for each pixel i: x(i ) = b(i ) × f C (y(i ), b(i )).

(1)

All variables are expressed in the physical (linear) domain and the variables x(i ), y(i ) and b(i ) are normalized between 0 and 1. The value 1 represents the highest luminance the screen can reach, so the luminance expressed here is actually a relative luminance (i.e. luminance normalized by the peak white of the display). As the Human Visual System does not perceive the physical luminance linearly [18], the signal can also be expressed in a perceptually uniform domain. The model presented in [9] was entirely applied in the physical domain. In [7], the physical model (1) was combined with such a non-linear mapping when evaluating the image distortion. For simplicity, the luminance compression of the photoreceptors of the Human Visual System is often approximated with a Gamma function [18] and this relation is used to convert physical values to Gamma corrected domain values (we shall use γ = 2.2 [7]). Let the superscript ψ denote values in the Gamma corrected domain. The resulting luma x ψ (i ) of a pixel i in the Gamma corrected domain, displayed on an LCD-LED screen, is expressed as x ψ (i ) = (b(i ) × f C (y(i ), b(i )))(1/γ ),

(2)

b(i ) =

N

rk · h k (i ),

(3)

k=1

where N is the number of segments, rk is the intensity of segment k and h k (i ) is the value of the PSF of segment k at pixel i. B. Compensation The LC compensation values f C depend both on the input image y and the backlight value b. Ideally, at each pixel i , the compensation should be: f C (y(i ), b(i )) =

(y ψ (i ))γ y(i ) = , b(i ) b(i )

(4)

where y ψ (i ) is the target luma of the input image at pixel i in the Gamma corrected domain (also normalized between 0 and 1). The model is expressed here for luminance. For color videos compensation is applied on each sRGB color component separately (the backlight luminance being the same for the 3 color channels). As LCs cannot entirely block the light, they suffer from light leakage defined by a leakage factor ε. ε is computed as the ratio between the backlight and the luminance passing through LCs when they are fully shut. This implies that ε is the lowest value f C can take. LC cells are polarized filters controlled to let a certain amount of light through. As they do not create light, the highest value

576


f C can reach is 1. If the target luminance y(i ) is higher than the backlight luminance b(i ) then pixel i is clipped, i.e. the displayed luminance x(i ) is lower than y(i ). To summarize, when using hard clipping as here, the output luma at pixel i , x ψ (i ), is equal to 1 y ψ (i ) ψ (1/γ ) x (i ) = (b(i )) × , (5) (b(i ))(1/γ ) ⊥εψ where ⊥ and denote lower and upper thresholding, respectively ([7], [9]) and εψ is the leakage factor in the Gamma corrected domain (εψ = ε1/γ ). Finally in our setting both the backlight intensities rk (3) and the clipped LCs values (5) are quantized to 8 bits. The actual output luma at pixel i, x ψ (i ) is therefore 1 y ψ (i ) ψ (1/γ ) x (i ) = ((b(i )) Q8) × , (6) ((b(i )) Q8)(1/γ ) ⊥εψ Q8

where (b(i )) Q8 is the backlight relative luminance resulting from inserting quantized rk values in (3). C. Leakage in LCD Screens The leakage of an LC cell highly depends on the angle it is viewed from. The higher the angle, the more light will leak through. Leakage measured at different values of backlight and from several angles provided ground truth for our leakage model. We shall consider two versions of this leakage model (ε in (5-6)): one with uniform leakage (i.e. identical for all pixels) and the second with values varying horizontally as a function of the angle. Indeed this second version is relevant as for a recommended viewing distance of 3 times the height of the display the horizontal angle ranges from −16.5° to +16.5°. Both versions will be tested in the experiments to evaluate which precision is needed for leakage modeling. IV. T HE S UBJECTIVE E XPERIMENT A viewing experiment was conducted to collect subjective ratings of videos displayed on an LCD with LED backlight. The primary purpose of the experiment was to generate a data set to analyze the relations between the perceived quality and various objective measures. Subjective grades were collected for seven backlight dimming algorithms applied on four test sequences. The experiment and a statistical analysis of the results are detailed in this section. A. Description of the Experiment In this subjective test, the independent factors (i.e. controlled variables) are Algorithm, Sequence, Viewing angle and Subject. The dependent factor (i.e. measured variable) is the perceived visual quality. Seven algorithms were chosen to compute the backlight dimming for each frame of each video, the first six are: ’Gradient-Descent’ with two leakage factors [7], ’Albrecht’ [6], ’Cho13’ [11], ’Zhang’ [5] and ’Full’ which means setting all LEDs at full intensity. A number of backlight methods in the literature compute the LED value of each segment k based on the maximum, max k , or average, avgk , value

Fig. 3. A frame extracted from three of the four sequences used in the experiment.

of the segment [3] or apply a combination of the two. Therefore the seventh algorithm chosen for the experiment [19] (referred to as Mi n Max) represents this type of algorithm: L E Dk = mi n (0.6 × max k + 0.8 × avgk , max k ),

(7)

where L E Dk is the intensity for segment k. Each of these algorithms was implemented for the edge-lit display used here. The LC and LED values were calculated off-line a priori to the experiment. They were then displayed with a special playback mode which allows controlling both LEDs and LCs of the experimental display in real-time. All the algorithms are more focused on the quality of the displayed image than on energy saving. It implies that the differences between the various renderings are rather small, which makes both objective and subjective quality assessment challenging. A too fast variation in the LED intensity from one frame to another may produce a flash called flicker artifact. As the algorithms tested are designed for images rather than videos, a filter was applied to remove flicker when necessary [20]. A previous experiment [10] showed that the impact of LBD was mainly visible on dark images with high contrast, so we focused on these characteristics for the sequence selection. Four sequences were selected: ’Stars’, ’Titles’ (both from [21]), ’Volcano’ [22] and ’Uboat’; a frame of each of the first three are depicted in Fig. 3 (the fourth one cannot be shown here due to copyright reasons). All sequences (duration 5-8s) are available in perfect or extremely good quality (i.e. Blu-ray). They have a resolution of 1920 × 1080 and were shown at a framerate of 25fps. To also evaluate the role of the viewing angle, the experiment was split in two parts: in both cases the participants are positioned at the same location but in the first part the screen is rotated 15° anti-clockwise such that the viewing angle is 15°, while in the second the screen is straight in front of participants (at 0°). Subjects viewed the sequences on a 46” LC display with an edge-lit backlight composed of 16 segments: 8 rows and 2 columns. They sat at a distance of 3 times the height of the display [23]. The test sessions took place in a room with low indirect lighting and dark walls (ambient light level was 2 Lux measured on the screen) and the peak white of the display was 485Cd/m2. Each session lasted on average 20 min with a maximum of 25 min. The test set up was chosen to challenge the backlight dimming algorithms, by selecting challenging sequences shown on a display with only 16 controllable segments and especially so for 15°. Sixteen subjects participated in the first part of the experiment (15°) and three weeks later another 16 subjects (most of them the same as in the first part) participated in the 0° part. All participants were naive regarding the


Fig. 4. Median and 95% confidence intervals of subjective ratings as a function of Algorithm for individual sequences and viewing angles. The algorithms are numbered as follows; 1 = Full, 2 = Albrecht, 3 = Cho13, 4 = MinMax, 5 = Gradient descent 1, 6 = Gradient descent 2, and 7 = Zhang.

purpose of the experiment, however they are employees of Bang & Olufsen, working on video processing, and as such can be considered as experts viewers. Subjects reported their impressions using a small PC monitor (laying flat on a small table in front of them and having a very low light level) with an interface where seven buttons, labeled 1 - 7, could be activated with a mouse. Clicking one button activated one of the seven algorithms. Subjects were asked to report the ranking of each algorithm by placing a cursor appropriately on a horizontal line scale placed next to each button with end-point labels ‘worst’ and ‘best’, respectively. A session would start with handing the subject a written instruction to the experiment and after approximately 5 minutes a verbal instruction and demonstration of the usage of the GUI would be provided. The first sequence was then started and the subject could switch at will between the seven algorithms and position the cursors according to their ranking of perceived quality. When a new algorithm was activated the running sequence would start from the beginning. All algorithms could be activated as often as needed to make an assessment. Once all algorithms had been activated the next sequence could be activated and the above procedure repeated. The allocation of algorithm to button number was random and changed for each sequence and the presentation order of sequences followed a balanced Latin Square design. B. Statistical Analysis of Results Firstly, a statistical analysis was applied to evaluate to what extent the subjective data represented a variation of quality due to the independent variables Sequence, Viewing angle and Algorithm. The data collected consisted in rank ordering of the stimuli on a continuous scale and was measured in millimeter from the left end-point (the one labeled ’worst’). The results can be summarized as shown in Fig. 4. This format can be

577

treated both as quantitative and categorical data. We therefore proceeded with both a traditional analysis of variance (ANOVA [24]) of the interval data and a non-parametric analysis based on a Logit transformation of the categorical rank ordered responses [24]. For the non-parametric analysis, the data are converted from quantitative to categorical by binning the scale into ten equal length intervals.1 A statistical model including the fixed factors Algorithm, Sequence and Viewing angle and the random factor Subject plus second order interactions has been used for both approaches. The ANOVA analysis shows that the factors Algorithm (A), Sequence (S), Viewing angle (Va) plus the interactions between A&S, A&Subject, S&Va and Va&Subject are statistically significant (with error probability p < 0.05). For the non-parametric approach the factors: Algorithm, Sequence, Viewing angle, Subject plus all 2nd order interactions are statistically significant ( p < 0.05). If the interactions that include the Subject factor are disregarded for the present purpose, an interesting result follows: both models have significant interactions between the Algorithm & Sequence and between Sequence & Viewing angle. Both types of analysis thus suggests that the data can serve as basis for modeling the effect of the fixed main factors on the visual quality but also the effect of backlight dimming algorithm depending on the sequence (A&S interaction) and of the viewing angle depending on the sequence (S&Va interaction). In addition the nonparametric analysis includes a significant interaction between Algorithm & Viewing angle. The aim in the remainder of the paper will be to model these trends using different objective measures. V. U SING R ESULTS FOR O BJECTIVE Q UALITY A SSESSMENT In this section, the subjective data are exploited to investigate which aspects of displays with LBD are relevant for quality assessment. To begin with, the processing applied for both features and metrics calculation is presented in Sec. V-A and the principle of Partial Least Squares (PLS) Regression is explained in Sec. V-B. Then the analysis is twofold: firstly the link between subjective quality grades and several objective measures (or features) is studied in Sec. V-C; secondly the importance of leakage model for objective quality metrics is examined in V-D. A. Features and Metrics Computation The subjective assessments are evaluating the quality of videos as they are rendered on the display and the backlight dimming is the only variation between the different versions of a sequence. Therefore, modeling the display process, including the backlight dimming, is necessary to provide quality metrics and objective measures with the information essential to account for the characteristics of the stimuli. The displayed frames were simulated for each backlight algorithm and each sequence and therefore accounted for the two types 1 A rule of thumb is that a line scale can be considered to hold interval scale properties if it includes 11 or more categories [25]. Thus 10 categories represent the start of the transition from a category to an interval scale.

578


TABLE I L IST AND P ROPERTIES OF F EATURES C OMPUTED

of defects potentially present in the displayed videos: clipping and leakage. The model described in Sec. III simulated the rendered sequences which were then used as inputs to both feature measurements and full reference quality metrics. The objective image quality metrics tested here are MSE (computed on all pixels and on pixels presenting leakage only), PSNR, SSIM, IWSSIM [26], HDR-VDP-2 [27] and PSNR-HVS-M [28]. MSE, PSNR, SSIM, IWSSIM and PSNR-HVS-M were calculated using the modeled images in the Gamma corrected domain (x ψ (i ) from Eq. 6) along with the target input y ψ (i ) that serves as reference. MSE was computed on the three sRGB component whereas other metrics were computed on the luma. HDR-VDP-2 takes as input the absolute luminance values, which were computed using the relative luminances x(i ) and y(i ) scaled to their absolute values using the peak white of the display (485Cd/m2). As the aim was comparison with subjective scores, for HDR-VDP-2 the QMOS predictor was used. The video quality metrics tested are VQM [29], MOVIE [30] and STMAD [31]. As image metrics are computed separately for each frame, 11 temporal pooling methods were applied to turn each set of instantaneous measures into single values representing the whole sequence: • 5 types of average (over all frames, the worst 10%, the best 10%, the first 2 seconds and the last 3 seconds), • Minkowski summation with powers 2, 3, 4 and 5 [32], • the low pass FIR described by Hamberg and DeRidder [33] and • the asymmetrical pooling introduced by Ninassi et al. in [34] which consists in adding the average of the feature and a term representing its variation over time that favors the distortion decrease (compared to distortion increase). Twenty features were also computed to characterize the viewing experience as broadly as possible. The features were first computed for each frame (or between each consecutive pair of frames), as listed in Table I, then the same 11 temporal pooling methods as for image metrics were applied. The objective features can be separated in three categories according to their focus: the displayed image, the backlight or the temporal variation. The features representing the displayed image are the average luma of the modeled image, the average

over the brightest 5% luma, the number of leaking and clipping pixels, the leakage error (MSE computed on pixels presenting leakage) and the average over the local standard deviation of the pixel values (L S D), computed as follows: L S D = mean i∈Si (Std Dev i∈Ai (x ψ (i ))),

(8)

where Ai represents a square centered at pixel i with sides corresponding to 2°, corresponding to the size of foveal vision [35], and Si can be either each segment or the whole frame (both predictors were tried). To characterize the backlight of each frame, the maximum LED intensity and the power consumption (Pwr) were computed. Pwr is calculated as: Pwr =

N 1 rk, N

(9)

k=1

where rk is the intensity of the LED segment k and N the number of LED segments, which corresponds to measures made on a real LCD display [36]. Thus, Pwr expresses the average of the backlight intensity. The temporal aspects of videos were taken into account via the Sum of Squared Differences (SSD [37]), the Temporal Information (TI [38]), the maximal luma variation and a measure of flicker inspired by Michelson’s contrast [39], all computed between the modeled versions of two consecutive displayed frames. Two additional measures completed the temporal change category: the flicker measure computed on the modeled backlight and the maximal backlight variation (B LV ), defined as B LV = max i (|b(i, t) − b(i, t − 1)|),

(10)

where b(i, t) is the backlight relative luminance for pixel i at frame t. The measure of flicker (F L) was computed as F L(i, t) =

|b(i, t) − b(i, t − 1)| . b(i, t) + b(i, t − 1)

(11)

To compute the flicker measure of the perceived image, we replaced b(i, t) by the pixel value x ψ (i, t) for pixel i at frame t. Some measures were averaged over the whole screen while for others the maximum among all segments was used.


579

B. Principle of Partial Least Squares Regression

TABLE II

PLS is a statistical data analysis method that aims at finding the best linear regression model for a given set of predictors to predict a response variable [17]. Let Y ∈ R N denote the response variable vector with N the number of observations and X ∈ R N×K denote the corresponding set of K predictors observed. In the multiple linear regression context, the least squares regression would be Y = X B + E with B = (X X)−1 X Y,

L IST OF F EATURES S IGNIFICANTLY C ORRELATED W ITH S UBJECTIVE R ANKINGS . T HE C ORRELATIONS FOR THE B EST P OOLING ARE G IVEN : P EARSON ’ S CC / S PEARMAN ’ S ROCC

(12)

denotes the transpose. However, in the case of where numerous (K >N) or collinear predictors, X X cannot be inverted. Instead, PLS only inverts the first a components of X X (a ≤ K ). Therefore X is going to be projected in a latent projection space and decomposed in the product of two orthogonal matrices: loadings that define the projection space and scores which represent the X coordinates in this space. In that sense, this projection is similar to that of Principal Component Analysis (PCA). However, the difference with PCA is that the PLS method projects both predictors and response in a latent space that maximizes not only the variance of X (as PCA) but also the covariance of predictors X and response variable Y at the same time. The projection can be written as X = T P + F or T = X W ∗ (13) Y = U Q + G

Fig. 5.

PLS Regression obtained with 4 features.

W∗

represents the weights, where F and G are residuals, P and Q are the loadings and T and U the scores, which covariance is maximal. Therefore T can be used to predict Y too and (13) can be rewritten as Y = T Q + E = X W ∗ Q + E,

(14)

where E represents the residuals. C. Using Objective Features for Quality Assessment The importance of features for quality assessment is assessed by studying their relationship to the subjective rankings collected as described in Sec. IV. This relation is investigated in two steps: firstly, the correlations between the features and subjective results are presented. Secondly, a model of the subjective grades is computed using the features as predictors for PLS regression. 1) Correlations: The objective features for which Spearman’s rank order correlation coefficient (SROCC) indicates statistically significant ( p < 0.05) correlation to the subjective ratings are presented in Table II. At first sight, the sign of the correlations for the first three features may surprise. Indeed, clipping is a defect and yet clipping is positively correlated to quality. An explanation is that in the test it indirectly reflects the attenuation of leakage and along with the sign of the peak white and power correlations, this emphasizes the importance of ’true black’ for the selected dark sequences when displayed on LCDs. The negative correlation for the power consumption indicates that some of the algorithms are indeed capable of reducing power and increasing quality at the same time.

2) PLS Model: All 220 items (20 features × 11 poolings) are quite correlated to each other because of their nature and because of the temporal poolings; furthermore they are numerous. Therefore, the other approach chosen was to see which features combine to satisfying models to predict the subjective grades using PLS regression [17]. The models have been obtained by applying PLS regression iteratively over the full set of 220 predictors. As the different predictors have very different ranges, they are centered and normalized as a preprocessing to provide a balanced input to the PLS algorithm. The software UnscramblerX was used for computation and at each iteration the predictors with the lowest weighted regression coefficient were removed. The following 4 features (and respective temporal pooling) predict the subjective grades with an accuracy of R 2 = 0.66 (Pearson’s CC = 0.8) with a ‘leave one out’ validation2: the Maximum Backlight Variation (B LV , Average last 3 seconds), Power Consumption (Pwr , Minkowski power 5), the Flicker measure averaged over the whole modeled frame (FL, Minkowski power 5) and Local Standard Deviation averaged over the whole frame (L S D, Average). The features are combined by: Y = −0.57B LV − 1.34Pwr − 0.94F L + 1.05L S D, (15) with the features defined in (8-11). The prediction and regression coefficients obtained are shown in Fig. 5. The influence 2 Named full cross-validation in the Unscramblersoftware: only one sample at a time is kept out of calibration (training) and is used for validation (prediction). This is repeated until all samples have been kept out once.

580


Fig. 6. Pearson’s Correlation Coefficients between subjective grades and the objective grades using two leakage models: constant over the whole screen (Cst) and varying horizontally (HV). A red circle means that the difference between constant and horizontally varying correlations is statistically significant. (a) 0 degree (b) 15 degrees.

of changing the temporal pooling on the model performance was evaluated by trying out all the possible temporal pooling successively for each of those four features. The choice of temporal pooling makes an important difference for each feature, with the exception that all the Minkowski summations give similar results. Two out of those four features characterize the temporal changes. F L is computed on the modeled displayed video, whereas the second temporal feature is the backlight variation. As they both have a negative weight in the model, it seems that participants preferred slower changes of the backlight during the videos. The highest weighted coefficients are for the two remaining features: the power and the spatial standard deviation, indicating their importance in the model. Also in this case, the negative weight of the power in the model points to the importance of leakage in dark zones. The last feature, L S D, can be seen as an approximation of the local contrast. D. Leakage Model In this section, the influence of two aspects of the leakage model is investigated. The first aspect is the leakage factor and three different values of leakage factor are tested: = 0, which represents ignoring leakage completely, and = 0.00047 and 0.00068 which are the values measured on the display used for the experiment at 0° and 15° viewing angles, respectively. The second aspect studied here is the local variation of the leakage over the display: the leakage was either considered uniform with constant over the whole screen or varying horizontally (using linear interpolation and leakage values measured from 0° to 35°). 1) Using Features: The effect of using a varying vs. uniform leakage parameter in the model is studied here in relation to the prediction presented in Sec. V-C. The method presented in Sec. V-C2 is applied: using the same set of features but measured this time with the horizontally-varying leakage model. Starting from the same entry point of possible

predictors (20 features × 11 poolings) and reducing the number of features iteratively leads to the same set of four features being optimal. The model presented in Sec. V-C2 still obtains an accuracy of R 2 = 0.66. Therefore, modeling angle-dependent leakage does not change the performance of the prediction nor the most useful predictors. A model built using the same four features as in Sec. V-C2 but measured without modeling the leakage at all ( = 0) obtains a performance of R 2 = 0.25 and the contribution of the local standard deviation and flicker measure become non significant. The fact that those two features lose their significance without modeling leakage indicates that their contribution to the model highly depends on leakage. 2) Using Objective Quality Metrics: 3 The temporal pooling method has an effect on the correlations between objective quality metrics and subjective grades. However, it does not interact with the type of model or the leakage factor, so to clarify results only the best pooling (i.e. the one with highest correlation) is presented in this analysis. For MSE and MSE on leaking pixels the best pooling method is the average over the worst 10%, for the other image metrics it is the one introduced by Ninassi et al. [34]. The Pearson correlations (CC) of subjective data with objective metrics using the different leakage models are shown in Fig. 6.4 The Williams test [40] on CCs show that the correlations between constant and horizontally varying leakage are statistically different ( p < 0.05) only for MSE, PSNR and PSNR-HVS-M (circled in red in Fig. 6). This indicates that the horizontally varying model might prove useful for viewings at 0° but the test did not show significant influence at 15°. 3 The issue here is solely to investigate which parameters of the leakage model seem better adapted to quality assessment. Conclusions regarding the metrics performance would require more data (the number of sequences used is too low and their content too specific) and therefore the absolute values will not be commented on. 4 Spearman’s correlations on the ranks show similar trends and values so we chose here to show only the Pearson linear correlations.


581

Fig. 7. Pearson’s Correlations Coefficient between subjective grades and the objective grades using three leakage values: No leakage ( = 0), leakage measured at 0° (0 °) and leakage measured at 15° (15 °). A red circle means that the difference with correlations at 0° is statistically significant. (a) 0 degree (b) 15 degrees.

Figure 7 illustrates the correlations with different leakage values. Results show that disregarding the leakage in the display model decreases the metrics’ performance (except for VQM). The correlations are negative and these differences are statistically significant as verified using Williams’ test on CC values ( p < 0.05). Comparison between the 0° and 15° leakage value shows that the 15° leakage factor allows metrics to perform better, even for predicting subjective grades at 0°. However, this difference is significant only for some metrics: MSE, PSNR, PSNR-HVS-M and VQM. The PLS model with four features presented in Sec. V-C2 (Eq. 14) obtains a better performance (CC = 0.8) than the other metrics tested here (Fig 6). Thus the PLS modeling should provide a good starting point for further development toward an actual quality metric. VI. C ONCLUSION In this paper, we investigated what aspects of highly contrasted videos rendered on an LCD with LBD are relevant for the perceptual quality and predicting it objectively. A subjective test was conducted to collect quality evaluations of fairly dark videos with high contrast displayed on an LCD with seven different LBD algorithms. The tests are focused on challenging sequences and a high quality setting for the algorithms. To account for display rendition, the LED-LCD display is modeled. This model is used to analyze which objective characteristics are useful to predict subjective quality assessment. A model predicting the subjective grades is built using Partial Least Squares regression. Using four out of the 20 objective features tried, this model obtains a correlation of CC = 0.8. The four features used are the maximum backlight variation, the power consumption (probably expressing a level of leakage), a flicker measure and the local standard deviation (an approximation for spatial contrast) averaged over the whole frame. The analysis confirmed the importance of LC leakage for the quality perception of dark videos.

The second part of the analysis studied whether improving the accuracy of the leakage values in the display model improves the correlations of objective results with subjective grades. Results showed that, for dark sequences, accounting for the leakage artifact in the display model improves the performance of both objective metrics and PLS modeling statistically significantly. Approximating that leakage is constant over the screen seems sufficient when viewing from a 15° angle, whereas using a horizontally varying model increased correlation with subjective scores in some cases for 0° viewing. R EFERENCES [1] Methodology for the Subjective Assessment of the Quality of Television Pictures, document ITU-R, Rec. BT.500-13, 2012. [2] VQEG, Report on the Validation of Video Quality Models for High Definition Video Content, 2010. [3] H. Seetzen et al., “High dynamic range display systems,” ACM Trans. Graph., vol. 23, no. 3, pp. 760–768, Aug. 2004. [4] K. Brunnström, B. Andrén, and S. Tourancheau, “Dynamic backlight influence on motion blur measurement,” in SID Symp. Dig. Tech. Papers, May 2010, vol. 41, no. 1, pp. 1520–1523. [5] X.-B. Zhang, R. Wang, D. Dong, J.-H. Han, and H.-X. Wu, “Dynamic backlight adaptation based on the details of image for liquid crystal displays,” J. Display Technol., vol. 8, no. 2, pp. 108–111, 2012. [6] M. Albrecht, A. Karrenbauer, and C. Xu, “A video-capable algorithm for local dimming RGB backlight,” in SID Symp. Dig. Tech. Papers, 2009, vol. 40, no. 1, pp. 753–756. [7] N. Burini, E. Nadernejad, J. Korhonen, S. Forchhammer, and X. Wu, “Modeling power-constrained optimal backlight dimming for color displays,” J. Display Technol., vol. 9, no. 8, pp. 656–665, Aug. 2013. [8] J. Korhonen, N. Burini, S. Forchhammer, and J. M. Pedersen, “Modeling LCD displays with local backlight dimming for image quality assessment,” Proc. SPIE, vol. 7866, p. 843607, Jan. 2011. [9] X. Shu, X. Wu, and S. Forchhammer, “Optimal local dimming for LC image formation with controllable backlighting,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 166–173, Jan. 2013. [10] C. Mantel, N. Burini, J. Korhonen, E. Nadernejad, and S. Forchhammer, “Quality assessment of images displayed on LCD screen with local backlight dimming,” in Proc. 5th Int. Workshop Quality Multimedia Exper. (QoMEx), Jul. 2013, pp. 48–49. [11] H. Cho, B. C. Cho, H. J. Hong, E.-Y. Oh, and O.-K. Kwon, “A color local dimming algorithm for liquid crystals displays using color light emitting diode backlight systems,” Opt. Laser Technol., vol. 47, pp. 80–87, Apr. 2013.

582

[12] C. Mantel et al., “Modeling the leakage of LCD displays with local backlight for quality assessment,” Proc. SPIE, vol. 9014, p. 90140M, Feb. 2014. [13] T.-H. Huang, C.-T. Kao, and H. H. Chen, “Quality assessment of images illuminated by dim LCD backlight,” Proc. SPIE, vol. 8291, p. 82911Q, Feb. 2012. [14] P. S. Guterman, K. Fukuda, L. M. Wilcox, and R. S. Allison, “Is brighter always better? The effects of display and ambient luminance on preferences for digital signage,” in SID Symp. Dig. Tech. Papers, 2010, vol. 41, no. 1, pp. 1116–1119. [15] H. Seetzen, H. Li, L. Ye, W. Heidrich, L. Whitehead, and G. Ward, “Observations of Luminance, Contrast and Amplitude Resolution of Displays,” in SID Symp. Dig. Tech. Papers, 2006, vol. 37, no. 1, pp. 1229–1233. [16] C. Keimel, M. Klimpke, J. Habigt, and K. Diepold, “No-reference video quality metric for HDTV based on H.264/AVC bitstream features,” in Proc. 18th IEEE Int. Conf. Image Process. (ICIP), Sep. 2011, pp. 3325–3328. [17] S. Wold, M. Sjöström, and L. Eriksson, “PLS-regression: A basic tool of chemometrics,” Chemometrics Intell. Lab. Syst., vol. 58, no. 2, pp. 109–130, 2001. [18] C. A. Poynton, A Technical Introduction to Digital Video. New York, NY, USA: Wiley, 1996, ch. 6, p. 95. [19] E. Nadernejad, “Processing of coded video for LCD technology with LED-based backlight,” Ph.D. dissertation, Dept. Photon. Eng., Tech. Univ. Denmark, Lyngby, Denmark, 2013. [20] E. Nadernejad, C. Mantel, N. Burini, and S. Forchhammer, “Flicker reduction in LED-LCDs with local backlight,” in Proc. IEEE 15th Int. Workshop Multimedia Signal Process., Sep./Oct. 2013, pp. 312–316. [21] Sita Sings the Blues. [Online]. Available: http://archive.org/details/ Sita_Sings_the_Blues, accessed Feb. 2012. [22] (2010). Consumer Digital Video Library. [Online]. Available: http://www.cdvl.org [23] VQEG, TUTORIAL—Objective Perceptual Assessment of Video Quality: Full Reference Television, 2005. [24] A. Field, Discovering Statistics Using IBM SPSS Statistics. Thousand Oaks, CA, USA: SAGE Publications, 2013. [25] S. Bech and N. Zacharov, Perceptual Audio Evaluation—Theory, Method and Application. New York, NY, USA: Wiley, 2006. [26] Z. Wang and Q. Li, “Information content weighting for perceptual image quality assessment,” IEEE Trans. Image Process., vol. 20, no. 5, pp. 1185–1198, May 2011. [27] R. Mantiuk, K. J. Kim, A. G. Rempel, and W. Heidrich, “HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions,” ACM Trans. Graph., vol. 30, no. 4, Jul. 2011, Art. ID 40. [28] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti, “TID2008—A database for evaluation of full-reference visual quality assessment metrics,” Adv. Modern Radioelectron., vol. 10, no. 4, pp. 30–45, 2009. [29] M. H. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast., vol. 50, no. 3, pp. 312–322, Sep. 2004. [30] K. Seshadrinathan and A. C. Bovik, “Motion tuned spatio-temporal quality assessment of natural videos,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 335–350, Feb. 2010. [31] P. V. Vu, C. T. Vu, and D. M. Chandler, “A spatiotemporal most-apparent-distortion model for video quality assessment,” in Proc. 18th IEEE Int. Conf. Image Process. (ICIP), Sep. 2011, pp. 2505–2508. [32] A. B. Watson, “Temporal sensitivity,” in Handbook of Perception and Human Performance. New York, NY, USA: Wiley, 1986. [33] R. Hamberg and H. de Ridder, “Continuous assessment of perceptual image quality,” J. Opt. Soc. Amer. A, Opt. Image Sci. Vis., vol. 12, no. 12, pp. 2573–2577, 1995. [34] A. Ninassi, O. Le Meur, P. Le Callet, and D. Barba, “Considering temporal variations of spatial visual distortions in video quality assessment,” IEEE J. Sel. Topics Signal Process., vol. 3, no. 2, pp. 253–265, Apr. 2009. [35] B. A. Wandell, Foundations of Vision. Sunderland, MA, USA: Sinauer Associates Inc, 1995. [36] C. Mantel, N. Burini, E. Nadernejad, J. Korhonen, S. Forchhammer, and J. M. Pedersen, “Controlling power consumption for displays with backlight dimming,” J. Display Technol., vol. 9, no. 12, pp. 933–941, Dec. 2013. [37] X. Fan, W. Gao, Y. Lu, and D. Zhao, Flicking Reduction in all Intra Frame Coding, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, document JVT-E070, 2002.


[38] Subjective Video Quality Assessment Methods for Multimedia Applications, document ITU-T, Rec. P910, 2008. [39] S. A. Michelson, “Studies in optics,” Ph.D. dissertation, Univ. Chicago, Chicago, IL, USA, 1927. [40] D. C. Howell, Statistical Methods for Psychology, 7th ed. Belmont, CA, USA: Wadsworth, 2010. Claire Mantel (M’14) received the M.S. and Ph.D. degrees in signal processing from the Grenoble Polytechnic Institute, Grenoble, France, in 2007 and 2011, respectively. She is currently a Post-Doctoral Researcher with the Department of Photonics Engineering, Technical University of Denmark, Kongens Lyngby, Denmark. Her research interests include image and video coding and visual quality assessment.

Søren Bech received the M.Sc. and Ph.D. degrees from the Department of Acoustic Technology, Technical University of Denmark, Kongens Lyngby, Denmark. From 1982 to 1992, he was a Research Fellow with the Department of Acoustic Technology, where he studying perception and evaluation of reproduced sound in small rooms. In 1992, he joined Bang & Olufsen, Struer, Denmark, where he is currently the Head of Research. In 2011, he was appointed as a Professor of Audio Perception with Aalborg University, Aalborg, Denmark. His research interests include experimental procedures and statistical analysis of data from sensory analysis of audio and video quality. General perception of sound in small rooms is also a major research interest. Jari Korhonen (M’05) received the M.S. (Eng.) degree in information engineering from the University of Oulu, Oulu, Finland, in 2001, and the Ph.D. degree in telecommunications from the Tampere University of Technology, Tampere, Finland, in 2006. He has been an Assistant Professor with DTU Fotonik, Technical University of Denmark, Kongens Lyngby, Denmark, since 2010. His research interests cover both telecommunications and signal processing aspects in multimedia communications, including visual quality assessment, error control mechanisms, and multimedia transmission over wireless networks. Søren Forchhammer (M’04) received the M.S. degree in engineering and the Ph.D. degree from the Technical University of Denmark, Kongens Lyngby, Denmark, in 1984 and 1988, respectively. He is currently a Professor with DTU Fotonik, Technical University of Denmark, where he has been since 1988. He is also the Head of the Coding and Visual Communication Group with DTU Fotonik. His interests include source coding, image and video coding, distributed source coding, processing for image displays, 2D information theory, and visual communications.

Jesper Melgaard Pedersen received the B.Sc. degree in electrical engineering from the Engineering College of Aarhus, Aarhus, Denmark, in 1994. In 1994, he joined Bang & Olufsen, Struer, Denmark, to work with Video Processing in TV sets, where he is currently a Technology Specialist in digital video processing with the Picture Group. His main interests include the development and implementation of algorithms for video processing in consumer TV applications. Current active work is centered on algorithms for noise reduction, sharpness enhancement, and local dimming of LC displays.

Wide color gamut LCD with a quantum dot backlight.

Reflective composite sheet design for LCD backlight recycling.

Modeling the time--varying subjective quality of HTTP video streams with rate adaptations.

Reflections on needle-knife papillotomy (with videos).

Optimizing the hierarchical prediction and coding in HEVC for surveillance and conference videos with background modeling.

Age effects on orthodontic treatment: adolescents contrasted with adults.

The effect of local cryotherapy on subjective and objective recovery characteristics following an exhaustive jump protocol.

Event-level impact of Promescent on quality of sexual experience in men with subjective premature ejaculation.

Distributed dimming control for LED lighting.

A sensor-less LED dimming system based on daylight harvesting with BIPV systems.

High-efficiency backlight module with two guiding modes.

Highly contrasted responses of Mediterranean octocorals to climate change along a depth gradient.

Local Optima in Mixture Modeling.

Quantifying subjective assessment of sleep quality, quality of life and depressed mood in children with enuresis.

Effects of music videos on sleep quality in middle-aged and older adults with chronic insomnia: a randomized controlled trial.

Evaluating the role of content in subjective video quality assessment.

Saliency prediction on stereoscopic videos.

The subjective quality (SQF) of Bausch and Lomb SoflensTM.

Fast optimization method based on the diffuser dot density for uniformity of the backlight module.

Cross-protective efficacy of baculovirus displayed hemagglutinin against highly pathogenic influenza H7 subtypes.

Subjective and objective observation of skin graft recovery on Indonesian local cat with different periods of transplantation time.

Parametric mapping of contrasted ovarian transvaginal sonography.

The Subjective Quality of Life in Young People With Tourette Syndrome in China.

The effect of local dynamics of Atto 390-labeled lysozyme on fluorescence anisotropy modeling.