Compression of head-related transfer function using autoregressive-moving-average models and Legendre polynomials Sayedali Shekarchia) Institute of Biology, University of Southern Denmark, Campusvej 55, Odense M, Denmark

John Hallam Mærsk McKinney Møller Institute, University of Southern Denmark, Campusvej 55, Odense M, Denmark

Jakob Christensen-Dalsgaard Institute of Biology, University of Southern Denmark, Campusvej 55, Odense M, Denmark

(Received 3 December 2012; revised 22 August 2013; accepted 4 September 2013) Head-related transfer functions (HRTFs) are generally large datasets, which can be an important constraint for embedded real-time applications. A method is proposed here to reduce redundancy and compress the datasets. In this method, HRTFs are first compressed by conversion into autoregressive-moving-average (ARMA) filters whose coefficients are calculated using Prony’s method. Such filters are specified by a few coefficients which can generate the full head-related impulse responses (HRIRs). Next, Legendre polynomials (LPs) are used to compress the ARMA filter coefficients. LPs are derived on the sphere and form an orthonormal basis set for spherical functions. Higher-order LPs capture increasingly fine spatial details. The number of LPs needed to represent an HRTF, therefore, is indicative of its spatial complexity. The results indicate that compression ratios can exceed 98% while maintaining a spectral error of less than 4 dB in the C 2013 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4822477] recovered HRTFs. V PACS number(s): 43.60.Ek, 43.60.Gk, 43.66.Gf [SAF]

I. INTRODUCTION

The head related transfer function expresses the relation of free-field sound to sound in the ear canal. One such function exists for each sound direction, and it expresses the filtering of sound from that direction by the body, head, and external ear of a person (Blauert, 1996a). The head-related transfer function (HRTF) is the frequency domain representation; the time domain representation of the filter is the head-related impulse response (HRIR). Since the perception of sound direction can be manipulated by filtering of, e.g., a mono recording by the HRTFs from both ears (Blauert, 1996b; Middlebrooks, 1992), HRTFs have attracted a lot of attention in creation of virtual soundscapes. One way of utilizing the spatial information implicit in a HRTF dataset is deploying it in three-dimensional (3D) sound systems. In such systems a virtual auditory scene can be formed that can be used in various applications (e.g., computer gaming, hearing aids, cell phones). However, the main problem with HRTFs is that they are generally large, redundant, and individualized. Different subjects have different HRTFs; thus using a generalized HRTF data set may introduce significant localization errors in an individual’s virtual auditory scene that cause confusions (Wenzel et al., 1993; Scarpaci and Colburn, 2005). Therefore, there can be a need for storing large numbers of HRTFs, and in order to reduce memory

a)

Author to whom correspondence should be addressed. Electronic mail: [email protected]

3686

J. Acoust. Soc. Am. 134 (5), November 2013

Pages: 3686–3696

load and computational complexity it may be relevant to compress the data sets. Four methods have previously been proposed and investigated to model these datasets. The first method obtains an acoustical model mathematically by using sound wave propagation properties such as attenuation, reflection, and diffraction (Brown and Duda, 1998; Algazi et al., 2002). The second method represents a measured HRTF with a loworder pole-zero model that can reduce data size and computational complexity (Kulkarni and Colburn, 2004; Asano et al., 1990; Blommer and Wakefield, 1997). The third method represents empirically measured HRTFs with a low-order model constructed using principal-components analysis (Kistler and Wightman, 1992; Scarpaci and Colburn, 2005) and has been validated psychophysically by Kistler and Wightman (1992). The fourth method represents HRTF datasets by a state-space model so as to compress them and to build up a low-cost, low-spectral-error model (Georgiou and Kyriakakis, 1999; Mackenzie et al., 1997; Adams and Wakefield, 2009). Most of these methods used for modeling of HRTF datasets represent them with lower complexity systems, compress them and reduce their redundancy by employing a single technique (e.g., zero-pole modeling, mathematical decomposition modeling, exact acoustic modeling). In the work presented here, two techniques, an orthogonal transformation based on Legendre polynomials and Infinite Impulse Response (IIR) models (i.e., all-pole IIR and zero-pole IIR) are applied in combination and are investigated as an alternative mathematical decomposition method for compressing the HRTFs while keeping the localization

0001-4966/2013/134(5)/3686/11/$30.00

C 2013 Acoustical Society of America V

cues intact. The advantage of Legendre polynomials is that they are defined on a spherical surface and therefore are a straightforward way to represent the spherical distribution of HRTFs. Although different IIR models have been previously applied to HRTF datasets and also validated psychoacoustically (Kulkarni and Colburn, 2004), discrete Legendre polynomials have been deployed in some other data compression methods, specifically in image processing and pattern recognition (Li and Wen, 2010), and have not been studied on HRTF datasets either directly as a single compression technique or in combination with other techniques to achieve higher compression ratio. Legendre polynomials have two major properties in common with other orthogonal transforms: First, reducing the redundancy in a matrix and, second, representing the given matrix with a small number of elements. These two properties in combination with an infinite impulse model give a good compression ratio combined with relatively small spatial errors in reconstruction. The main focus of the present paper is on the compression of HRTF datasets, rather than the implementation of systems using the resulting compressed data. However, there are a few implementation-related observations we wish to make. First, implementation of the AR(MA) models produced by the first stage of compression is straightforward on most modern digital signal processing platforms; basic methods are standard textbook material. Second, we do not propose that the Legendre moments calculated during the second stage of compression be used directly for signal processing. Rather, they offer a compact way of storing the HRTF dataset in a space that is largely independent of the number of sampled directions, and potentially of interpolating between measured directions in a manner which takes the spatial variation of the HRTF into account. When a given filter is needed for the signal processing, it is first reconstructed from the LP-moment data and then applied using whatever method is most efficient for the hardware in use. One might also cache recently used filter coefficient sets to save repeated reconstruction from the LP moments. II. METHOD

The compression method explored in this manuscript consists of two consecutive transformations: Autoregressivemoving-average (ARMA) compression and Legendre polynomial compression. First, we calculate the ARMA coefficients for HRIRs at each spherical co-ordinate (direction) in the HRTF dataset. Second, Legendre polynomials applied to each IIR coefficient across all locations. Legendre polynomials have also been directly applied to HRTF and HRIR data sets in the MIT HRTF databases (Gardner and Martin, 1995) to give us a comparison with a single-stage mathematical decomposition method. Before starting with the compression model the empirical raw HRIR datasets are changed to minimum-phase-plus-delay, using the fact that each transfer function can be expressed by a convolution of a minimum-phase part and an all-pass part (Oppenheim et al., 1999). HðxÞ ¼ HðxÞmin HðxÞap : J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

(1)

In this model, the all-pass part has a magnitude response equal to unity and the phase response of the minimum-phase part is decided by its log-magnitude response, so the magnitude spectrum of this model is the same as the raw HRTF dataset and the all-pass function as a pure delay is chosen such that the overall time delay of the model matches the interaural time delay of the HRTF dataset. This approach was evaluated psycho-acoustically in previous studies (Kulkarni et al., 1999; Kistler and Wightman, 1992) and where it was shown that the minimum-phase-plus-delay model adequately carries localization cues, specifically when stimuli and directions are variable. The minimum-phase version of each recorded HRIR can be calculated by windowing the real cepstrum of the corresponding HRIR (Huopaniemi and Smith, 1999). By calculating the minimum-phase filtering of impulse responses for all recorded directions, a matrix can be formed where each column represents a minimum-phase filter of an HRIR at a specific spatial location, as shown in Eq. (2). 2 3 h1 ð0Þ  hn ð0Þ 6 h1 ð1Þ  hn ð1Þ 7 6 7 HRIRs ¼ 6 (2) 7 : .. 4 ⯗ 5 . ⯗ h1 ðl  1Þ   

hn ðl  1Þ

ln

A. Stage one: IIR model

The autoregressive-moving-average models [Eq. (3)], all-pole IIR filter and pole-zero IIR filter, are used for compressing HRTF datasets and have been validated psychoacoustically and quantitatively in previous studies (Kulkarni and Colburn, 2004; Asano et al., 1990; Blommer and Wakefield, 1997). Although all these studies investigated the representation of HRTFs with ARMA models, their approaches for calculating the IIR coefficients (ap and bq) differed. In the study by Asano et al. (1990), the Kalman algorithm was used to compute the IIR coefficients. In the study by Blommer and Wakefield (1997), the log-leastsquares-error criterion was used, and in the study by Kulkarni and Colburn (2004) a weighted-least-squares method was considered. All of these methods have advantages and disadvantages in terms of computational complexity and HRTF approximation error but, regardless of the approach, one can always say a smaller IIR order leads to smaller data size—and, if it is fast, reduces computational complexity which is desirable for real-time applications where the auditory scene changes rapidly—with the disadvantage of increasing spatial localization error k¼q X

Gz ¼

bq ðkÞzk

k¼0 k¼p X

: ap

(3)

ðkÞzk

k¼0

In this paper, we use the Prony method to compute IIR model coefficients by minimizing the error defined in Eq. (4) individually for the impulse response at each spatial location. The Prony method needs only a set of linear equations Shekarchi et al.: Head-related transfer function compression

3687

to calculate IIR coefficients. In the following set of equations, Eqs. (4)–(6), this method for finding the all-pole IIR coefficients is explained very briefly. The more detailed one appropriate for the general zero-pole model, which is more complex and needs higher computational cost, is elucidated by Hayes (1996). By choosing each column of Eq. (2) in turn as our input data, the error becomes np ¼

1 X

jeðnÞj2 eðnÞ ¼ hðnÞ þ

n¼0

1 X

ap ðkÞhðn  kÞ:

Pm ðxÞ ¼

Setting the partial derivative of np with respect to equal to zero will give us a set of linear equations for calculating ap(k) 2 3 rh ð1Þ rh ð2Þ    rh ðp  1Þ rh ð0Þ 6 7 6 rh ð1Þ rh ð0Þ rh ð1Þ    rh ðp  2Þ 7 6 7 6 7 6 7 ⯗ ⯗ ⯗ ⯗ 4 5 rh ð0Þ rh ðp  1Þ rh ðp  2Þ rh ðp  3Þ … 2 3 2 3 ap ð1Þ rh ð1Þ 6 7 6 7 6 ap ð2Þ 7 6 r ð2Þ 7 6 7 6 h 7 6 7 6 7 6 7 7: a r ð3Þ ð3Þ 6 ¼ (5) 6 p 7 6 h 7 6 7 6 7 6 ⯗ 7 6 ⯗ 7 4 5 4 5 rh ðpÞ

In Eq. (5), rh(k) is the autocorrelation function and is equal to rh ðkÞ. Calculating the numerator coefficient, b(0), can be done by equalizing the energy in g(n) ¼ h(n). Then b(0) will be determined as follows: bð0Þ ¼

p qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X fnp gmin fnp gmin ¼ rh ð0Þ þ ap ðkÞrh ðkÞ:

(6)

k¼1

By applying this method to the head-related impulse response for each spatial direction, we collect AR and ARMA filter coefficients into A and B matrices 2 3 a1 ð1Þ    an ð1Þ 6 7 .. ⯗ 7 ; A¼6 . 4 ⯗ 5 a1 ðpÞ    an ðpÞ ðpÞ n 2 3 b1 ð0Þ    bn ð0Þ 6 7 .. B¼6 : (7) ⯗ 7 . 4 ⯗ 5 b1 ðqÞ    bn ðqÞ ðqþ1Þ n Given that the a(p) coefficients are normalized for all directions [a(0) ¼ 1], it is not necessary for a(0) to be saved. The compression ratio for the general ARMA model at this stage can be calculated as ln : cra ¼ ðp þ q þ 1Þn 3688

Legendre polynomials (LPs) are a set of orthogonal polynomials which are solutions to the Legendre differential equation, and are defined on a spherical surface. Since the head-related impulse response dataset is defined in spherical coordinates, LPs are applied in the compression method to exploit the observation that HRTFs and HRIRs are also highly dependent on direction. If Pm(x) is the mth Legendre polynomial, we have

(4)

k¼1

ap ðkÞ

ap ðpÞ

B. Stage two: Legendre polynomials

J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

1 dm ½ðx2  1Þm : 2 m! dxm m

(9)

In general, the solutions of the Legendre equation converge if jxj < 1. By expanding the Taylor series in the Legendre equation, we can obtain solutions for the zeroth and first order Legendre polynomials: P0(x) ¼ 1, P1(x) ¼ x. Providing the first two terms, and using Bonnet’s recursion formula (Kreyszig, 2005), enables us to calculate Legendre polynomials of any order. ðj þ 1ÞPjþ1 ðxÞ ¼ ð2j þ 1ÞxPj ðxÞ  jPj1 ðxÞ; x2  1 d Pn ðxÞ ¼ xPn ðxÞ  Pn1 ðxÞ: n dx

(10)

Calculation of the Legendre polynomials of any degree is independent of the data itself but dependent on the length of the data. The coefficients multiplying the Legendre polynomials are the components of the HRIR or HRTF in the Legendre basis, and so are the only information needed to reconstruct and represent the HRIRs (or HRTFs) afterward. These coefficients are called Legendre moments. ð 2m þ 1 1 Pm ðxÞf ðxÞ dx; (11) Lm ¼ 2 1 where f(x) is a row of HRIR coefficients [Eq. (2)], or a row of the A or B matrices in Eq. (7). Data points are mapped to the domain of convergence, xi 2 [1, 1]. For example, when representing the A matrix with the fixed intervals 2 Dx ¼ xiþ1  xi ¼ ; n where n is our total number of directions (or total number of columns in A), the points are defined as   1 xi ¼ 1 þ i  Dx: (12) 2 In the discrete domain, a method proposed by Liao and Pawlak (1996) has been used to calculate Eq. (11). This method was demonstrated to be more accurate than the standard zeroth-order approximation. Here the formula for one dimensional data is Lm ¼

n 2m þ 1 X km ðxi Þf ðxi Þ; 2 i¼1

(13)

(8) where Shekarchi et al.: Head-related transfer function compression

km ðxi Þ ¼

ð xi þDxi =2

Pm ðxi Þdx:

(14)

By substituting Eq. (19) into Eq. (18), we have

xi Dxi =2

^ ij ¼ M The integral in Eq. (14) has been calculated using the exact method proposed by Hosny (2007) by using one of the properties of Legendre polynomials resulting from integrating over Bonnet’s formula. The order of Legendre polynomials should be greater than zero, m  1, ð Pmþ1 ðxÞ  Pm1 ðxÞ : (15) Pm ðxÞdx ¼ 2m þ 1 Using Eq. (15) and Eq. (14) in Eq. (13) and for simplicity defining si ¼ xi þ Dxi/2, we have 2m þ 1 L^m ¼ 2m þ 2

n X

f ðxÞ½xPm ðxÞ  Pm1 ðxÞssiþ1 : i

ðp þ q þ 1Þn ; pðLa þ 1Þ þ ðq þ 1ÞðLb þ 1Þ ln : crt ¼ pðLa þ 1Þ þ ðq þ 1ÞðLb þ 1Þ

crl ¼

(17)

If we set La ¼ Lb ¼ L, then compression ratios in Eq. (17) can be simplified as shown below: crl ¼

n ; Lþ1

crt ¼

ln : ðp þ q þ 1ÞðL þ 1Þ

C. HRIR reconstruction

In order to reconstruct the HRIRs, first the A and B coefficients are reconstructed using the following formula: f^ðxÞ ¼

m¼Max X

Lm Pm ðxÞ:

(18)

m¼0

The x matrix is defined such that all its rows are computed using Eq. (12), 2 3 x1;1    x1;n 6 7  ⯗ x¼4 ⯗ 5; xpþq;1    xðp þ q; nÞ 2 3 L1;1    L1;Maxþ1 6 7  ⯗ (19) L¼4 ⯗ 5: Lpþq;1    Lpþq;Maxþ1 J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

i 2 f1; 2; :::; ng;

j 2 f1; 2; :::; p þ q þ 1g:

(20)

^ is a matrix which contains reconstructed a and b M coefficients of the IIR model, so if the first row of L in Eq. (19) consists of the Legendre basis of the B vector then the ^ would be the reconstructed b coefficients of first row of M the AR model and the rest of its elements contains the a coefficients of AR model. The next step in reconstructing the complete HRIRs is to use the IIR coefficients to filter the Dirac delta function d(i), ^ ¼ hðiÞ

In Eq. (16), L^m is a very good approximation of Lm and the error is limited to the small error inherited from the Bonnet recursion formula. In our method, f(x) in Eq. (16) will be a row of A or B. In A, elements of the first row are equal to 1 because it is normalized to a(0) ¼ 1, thus the compression could be started from the second row. If La and Lb are the orders of Legendre polynomials for the A and B matrices, respectively, we define crl and crt to be the compression ratios for Legendre transformation and total compression, respectively, as below:

Lmj Pm1 ðxij Þ;

m¼1

(16)

i¼1

Maxþ1 X

q X

b^j dðijÞ

j¼0

p X

^ i 2 f1;2;:::;lg: a^j hðijÞ;

(21)

j¼1

D. Error criteria

In general, it is very difficult to propose an objective criterion, to evaluate an HRTF model quantitatively, which also can match completely in terms of psycho-acoustical perspective, and most different forms of error criterion lead to similar error distributions. Furthermore, according to a study by Kulkarni and Colburn (1998), significant smoothing of an HRTF does not affect localization cues, and virtual and real sound stimuli are indistinguishable to a listener until the HRTF model loses most of its detailed variation in frequency. In this study, two error criteria have been employed to evaluate the compression process, by comparing the reconstructed model to the original empirical HRTFs. The first error criterion, which was also used in the cited study (Kulkarni and Colburn, 2004), is log-magnitude spectral distortion calculated using the following expression: ð 1 2p 2 ^ ð20log10 jHðxÞj20log10 jHðxÞjÞ dx E21 ¼ 2p 0 2 ^ ¼ k20log10 jHðxÞj20log10 jHðxÞjk : (22) In the expression above, H(x) denotes the original ^ HRTF while H(x) denotes the reconstructed HRTF. The second error criterion is percent mean square error (PMCE) which is a normalized error criterion derived from the spectral distortion error and shows the percentage of error in the variation of the reconstructed model compared to the original one (Wang et al., 2009), E2 ¼

2 ^ klog10 jHðxÞj  log10 jHðxÞjk

klog10 jHðxÞjk2

 100%:

(23)

k  k in the above expression is the mathematical notation for Euclidean norm. In this work, to have error values relevant to human subjects, only frequencies between 300 Hz and 15 kHz in the Shekarchi et al.: Head-related transfer function compression

3689

log-magnitude frequency spectrum are considered, by applying a weighting function with value unity in this frequency range and zero outside. Since the proposed method consists of two cascaded compression stages, the error for each stage and total error of the method have been calculated and are presented below. III. EXPERIMENTS AND RESULTS

In this section, we apply the proposed method to a set of HRIRs. We start by transforming the data to a set of autoregressive-moving-average models. Here all-pole IIR model and the special case of zero-pole model, with equal poles and zeros were considered. Then we compress IIR coefficients using Legendre transformations. The results from each section will also be discussed. A. MIT HRTF database

A set of HRTFs were measured at the MIT media lab using a KEMAR dummy head (Gardner and Martin, 1995). Data were sampled at 44.1 kHz in 710 different directions from 40 to 90 elevations in steps of 10 and with azimuth increments chosen to be approximately 5 in the great circle (zero elevation) from 0 to 355 azimuth degrees. We used the data-reduced set of 128-point HRIRs which were obtained by convolving the appropriate 512-point recorded HRIRs with the inverse filter of the Optimus Pro 7 speaker used in the experiment to cancel recording effects on the dataset, then truncating to 128-point HRIRs. In the following we use the Legendre transformation for compressing this dataset directly in the time domain (HRIRs) and frequency domain (HRTFs), comparing the quality of these models with our proposed method, explained in Sec. II, of cascading IIR and LP models. B. Direct HRIR compression

In this experiment, minimum-phase HRIR datasets including 710 distinct directions were compressed directly

by applying the Legendre polynomial approach to the finite impulse response filter coefficients of all spatial directions. The algorithm for this type of compression, as was explained in Sec. II B, consists of computing and storing all Legendre moments where f(x) in Eq. (16) is a vector comprising the same finite impulse response tap coefficient over all directions. By doing so, the size of HRIR dataset is changed from (l  n) to (p  l), where n denotes the number of spatial directions, l is the length of the HRIR for each direction, and p is the number of Legendre polynomials used. To implement the HRIRs, the matrix of Legendre moments is needed to recover full HRIR datasets using Eq. (18). Reconstructed HRIRs were then compared to the original datasets using Eq. (22). As shown in Fig. 1, increasing the number of Legendre polynomials from 0 to 25 did not improve the error significantly; however, increasing the number to 35 reduced the error drastically. Further increments in the number decrease the error gradually, conveying that most impulse response variation was associated with Legendre moments between 20 and 35. C. Direct HRTF compression

This compression method is similar to the previous method. However, in this case HRIRs are first transformed to HRTF magnitudes in decibels using Fourier transforms. Then Legendre polynomials are applied to these magnitudes. The results, including error curves and compression ratios, have been plotted in Fig. 2 and Fig. 3. In the direct HRTF compression method shown in Fig. 2, the 99% confidence borders of both spectral error and percentage error approach the average error curves, in contrast to the direct HRIR compression method. This indicates that the number of outliers, i.e., directions with an unusually large reconstruction error, was reduced. This reduction in outliers also affects the average error curves and makes this method slightly more accurate than the direct HRIR compression method. Note that here outliers are those few spatial

FIG. 1. (Left) Average and median spectral error with 95% confidence levels and its outliers using 99% confidence border. (Right) Average and median percent mean square with the same confidence borders and its outliers.

3690

J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

Shekarchi et al.: Head-related transfer function compression

FIG. 2. (Left) Average and median spectral error with 95% confidence levels and its outliers using 99% confidence border. (Right) Average and median percent mean square with the same confidence borders and its outliers.

directions, of the 710 spatial directions in total, where the approximation error exceeds the corresponding confidence border. The most significant reason for their occurrence is their high spectral complexity which cannot be approximated as well as other regions of the dataset. Although looking for these outliers, the regions where they are located and their relation to a specific part of head, outer ear, or torso, is an interesting problem, they are not of great importance in dynamic virtual acoustic scenarios, and their location is not considered further in this study. It is often desirable for a system designer to know the maximum compression ratio one can get for a given value of the error criterion. Figure 3 illustrates the maximum compression ratio as a function of average spectral error. As it is shown in this figure although the compression curve for HRTF behaves differently from the one for direct HRIR compression, it has improved—i.e., it has higher compression ratio—in the region where the spectral error is low (1 < Er < 3 dB).

FIG. 4. (Left) The average and median spectral distortion error in AR models with 95% confidence level, also specifying the outliers using 99% confidence border, while (right) average and median percent mean square error in AR models with 95% confidence level in addition of outliers has been depicted.

coefficients were calculated for two types of IIR filters and the reconstruction error was measured to be quantitatively similar to that from the other two methods. The first type of IIR filter is the all-pole model; the second is a special case of the general ARMA model in which the number of poles equals the number of zeros. These two models were chosen because the all-pole model is faster and more computationally efficient while the zero-pole model is better able to approximate the dips in the HRTF frequency spectrum. Moreover, psycho-acoustic experiments show little difference between these two methods (Kulkarni and Colburn, 2004). Note that, in this study, the optimal ARMA model is not investigated. Comparing these two models, illustrated in Fig. 4 and Fig. 5, shows that both models can achieve very small reconstruction errors (i.e., average spectral distortion error and percent mean square error) by choosing an appropriate order.

D. Proposed method: IIR representation

In the first part of the proposed method, Eq. (5) is applied to the raw dataset as explained in Sec. II. A and B

FIG. 3. Compression ratio versus average and median error. J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

FIG. 5. (Left) The average and median spectral distortion error in ARMA models with 95% confidence level, also specifying the outliers using 99% confidence border, while (right) average and median percent mean square error in ARMA models with 95% confidence level in addition of outliers has been depicted. Shekarchi et al.: Head-related transfer function compression

3691

FIG. 6. Compression ratio vs spectral error in the (left) all-pole IIR and (right) zero-pole model.

The confidence border lines are closer to the average error line in the zero-pole model, indicating that the maximum reconstruction error over all directions is lower than for the all-pole model. Comparing Fig. 4 with Fig. 1 and Fig. 2 reveals that error curves lack a sharp fall like the ones for direct HRIR and HRTF compression methods. The sharp fall in error as Legendre order increases reflects the fact that higher-order Legendre polynomials capture finer spatial detail, so a certain order is needed to capture the detail

FIG. 7. Average spectral error for Legendre transformation as a function of La and Lb for all-pole and zero-pole model.

intrinsic to a given HRTF. The IIR representation in stage 1 of the proposed method does not compress any spatial information, as it processes each direction independently, and therefore the strong dependence on model order is absent for this phase. As shown in Fig. 6, increasing the order of either model reduces the compression ratio and reconstruction error and the zero-pole model has a higher compression ratio when the spectral error is below 2.5 dB. Allowing a more relaxed error condition, when the error may exceed 2.5 dB, the all-pole

FIG. 8. Several average spectral distortion curves for Legendre transformations as a function of either Lb or La, for (top) all-pole model and (bottom) zero-pole model. 3692

J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

Shekarchi et al.: Head-related transfer function compression

model shows higher compression ratio compared to the zeropole model. E. Proposed method: Legendre transformation

In this experiment, the zero-pole order is chosen to be 17 (p ¼ q ¼ 17), which reduces the average and median spectral errors to less than 1 dB. Its equivalent all-pole order in terms of the number of coefficients would be 35, with spectral errors of around 1.2 dB. The question of optimal order for any of these IIR filters in the first stage of this method is not considered because the main point of this study is investigating the role of Legendre polynomials as a compression technique over all spatial directions. In this part of the proposed compression method, Legendre bases with different orders are calculated by applying Eq. (16) to A and B. The reconstructed HRIRs can be calculated by using Eq. (20) and Eq. (21) and by comparing these with original empirical data, and reconstructed data from the IIR filters, one can calculate the total spectral error for the method and the error for the orthogonal transformation. These errors are illustrated in Fig. 7. Observing the error behavior of these two models in this figure indicates that the error curves of both models are similar, but La has a higher influence on the spectral error in the all-pole model than zero-pole one. This

phenomenon can be explained because La in all-pole model has almost double the number of coefficients compared to a similarly performing zero-pole model. As shown in Fig. 8, both model errors are more affected by the order of the Legendre polynomials for B coefficients, left panels, when the order is below 60. However, increasing to the higher orders in all-pole model, La has more influence and the error gradually decreases as the order of the Legendre polynomials increases. In the zero-pole model, by increasing the order, La and Lb have almost equal effect on the error gradual decrement. This error behavior helps us to choose the best order for La and Lb, to get the optimal compression ratio with acceptable error performance. The total error of the proposed method and error for the Legendre polynomials compression phase are shown for the special case that La ¼ Lb ¼ L in Fig. 9. It is apparent that by increasing the Legendre order, total model error and the compression error caused by Legendre transformation for both models are reduced and the difference between them becomes almost constant around 0.5 dB and 0.2 dB for allpole and zero-pole model, respectively. This also could be explained by the fact that the chosen zero-pole model has a lower spectral error to approximate the empirical HRIRs than the all-pole model. Based on the application, one can set an error threshold and find the optimal La and Lb resulting in the maximum

FIG. 9. Spectral and percent mean square error for the Legendre transformation part and for the complete method where La ¼ Lb in (top) all-pole model and (bottom) zero-pole model. J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

Shekarchi et al.: Head-related transfer function compression

3693

FIG. 10. Compression ratio vs spectral error for (left) the Legendre transformation part and (right) the complete proposed method using all-pole and zero-pole model.

compression ratio. In Fig. 10, we compare the compression ratio of optimal orders with the one where La ¼ Lb. As shown in top panels of Fig. 10, in the all-pole model, for those applications where the acceptable error is very small ( < 2.5 dB), compression ratios of optimal orders and equal orders are almost the same, but when accepting a larger error, there is a significant advantage in choosing optimal orders over equal ones. In the zero-pole model shown in the bottom panels of Fig. 10, for a wider acceptable error range up to 3.5 dB it is quite safe to choose equal orders instead of the optimal orders for any applications. Using the compression ratio for choosing one of these models over the other, we conclude that for up to 2.5 dB spectral error there is no major difference between these two models. If a higher spectral error is acceptable, an all-pole model seems much the better choice. Comparing these figures with the corresponding ones for direct HRIR compression and direct HRTF compression (Fig. 3), we can see that within the 4 dB spectral error, the compression ratio of the proposed method using all-pole filters and zero-pole filters can reach up to 95 and 60, respectively, while for the other two methods this index is 21 and 23, respectively. One concern is that the LP compression, being spatially sensitive, may depend strongly on the number of directions used to represent the HRTF. This is investigated by applying the method to the full dataset and to the odd and even 3694

J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

halves of it (i.e., every second direction), each of which has half the spatial resolution and slightly different mean direction. The comparison, for both all-pole and zero-pole models, is depicted in Fig. 11. The results show negligible difference between error curves, and the spectral error is nearly independent of the number of spatial directions in this case.

FIG. 11. Absolute spectral error difference for the all-pole and zero-pole models between LP compression of the full dataset and LP compression of odd or even sub-samples of the dataset. Shekarchi et al.: Head-related transfer function compression

TABLE I. Number of parameters in different HRTF models to approximate roughly 90% of empirical dataset. Methods

Spatial directions

Parameters

Compression ratio

355 710 1420 355 710 1420 355 710 1420

2135 2135 2135 2840 5680 11360 1775 3550 7100

21.3 42.6 85.2 16 16 16 25.6 25.6 25.6

Proposed model

Zero-pole model

PCA model

IV. COMPARISON WITH PREVIOUS STUDIES

The proposed model presented in this manuscript has been compared to two other models (i.e., zero-pole model and PCA model) that were investigated in previous studies (Kulkarni and Colburn, 2004; Kistler and Wightman, 1992) in terms of the number of parameters each model needed to approximate HRTF datasets within the same spectral error. Their percent mean square error is set to 10% (i.e., the model approximates 90% of variation in log-magnitude frequency spectrum), in which case the number of parameters needed in PCA model (i.e., principal component weights) is 5 per direction and, in a zero-pole model, 8 parameters are needed (i.e., 4-pole 4-zero ARMA model) per direction. In our method the number of parameters (i.e., Legendre moments) is (L þ 1)(p þ q þ 1), in our case is 61  (35) ¼ 2135 for 710 directions. The number of Legendre moments, unlike the PCA weights and pole-zero coefficients, does not depend on the number of spatial directions but number of parameters in the zero-pole model. Here L denotes for the order of Legendre polynomials, also p and q are order of poles and zeros, respectively. As shown in Table I, the compression ratio of the zero-pole model and PCA model is fixed such that the finer across-position variability needs more parameters, while in the proposed method increasing the number of directions leaves the number of parameters constant and the compression ratio is increased proportionally. V. CONCLUSION

In this study a new model of HRTF representation, which combines IIR filters with an orthogonal transformation based on Legendre polynomials, was explored. Experimental results demonstrate the reduction of data storage by a factor of 95 and 60 in all-pole and zero-pole model, respectively. Furthermore, on average over spatial directions, reconstructed HRTFs approximate the empirical HRTFs within 4 dB spectral error. The new model has been compared to direct HRIR compression and direct HRTF compression using Legendre polynomials. The results show that, within the same average spectral distortion error in the full 710 spatial directions, the compression ratio of the proposed method is nearly 2.7 and 1.7 times better than that of previously published zero-pole and PCA models, respectively. This compression ratio can be J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

even greater when more spatial directions are sampled. This makes this model more efficient than other models for applications where multiple individual HRTF data sets with fine cross-position variability need to be stored. The spatial compression phase, using the Legendre polynomials, can be used for analyzing HRTF data sets in terms of their structural complexity: A spatially more complex dataset will require a higher Legendre order for accurate compression and reconstruction than a spatially simpler HRTF. ACKNOWLEDGMENTS

This project was partially funded by the EU 7th Framework Programme (FP7), through the Chiroping project (ICT-2007-1 STREP project 215370). We thank Professor H. Steven Colburn for his useful comments on an early version of this manuscript.

Adams, N. H., and Wakefield, G. H. (2009). “State-space models of headrelated transfer functions for virtual auditory scene synthesis,” J. Acoust. Soc. Am. 125, 3894. Algazi, V. R., Duda, R. O., Duraiswami, R., Gumerov, N. A., and Tang, Z. (2002). “Approximating the head-related transfer function using simple geometric models of the head and torso,” J. Acoust. Soc. Am. 112, 2053. Asano, F., Suzuki, Y., and Sone, T. (1990). “Role of spectral cues in median plane localization,” J. Acoust. Soc. Am. 88, 159–168. Blauert, J. (1996a). Spatial Hearing: The Psychophysics of Human Sound Localization (MIT Press, Cambridge, MA), Chap. 2, p. 78. Blauert, J. (1996b). Spatial Hearing: The Psychophysics of Human Sound Localization (MIT Press, Cambridge, MA), Chap. 4, pp. 288–367. Blommer, M. A., and Wakefield, G. H. (1997). “Pole-zero approximations for head-related transfer functions using a logarithmic error criterion,” IEEE Trans. Speech Audio Processing 5, 278–287. Brown, C. P., and Duda, R. O. (1998). “A structural model for binaural sound synthesis,” IEEE Trans. Speech Audio Processing 6, 476–488. Gardner, W. G., and Martin, K. D. (1995). “HRTF measurements of a KEMAR,” J. Acoust. Soc. Am. 97, 3907–3908. Georgiou, P., and Kyriakakis, C. (1999). “Modeling of head related transfer functions for immersive audio using a state-space approach,” in Proceedings of the 33rd IEEE Asilomar Conference on Signals, Systems, and Computers (Pacific Grove, CA), Vol. 1, pp.720–724. Hayes, M. H. (1996). Statistical Digital Signal Processing and Modeling (John Wiley and Sons, Hoboken, NJ), Chap. 4.4, pp. 144–165. Hosny, K. M. (2007). “Exact Legendre moment computation for gray level images,” Pattern Recognit. 40, 3597–3605. Huopaniemi, J., and Smith, J. O. (1999). “Spectral and time-domain preprocessing and the choice of modeling error criteria for binaural digital filters,” in Audio Engineering Society Conference: 16th International Conference: Spatial Sound Reproduction (Rovaniemi, Finland). Kistler, D. J., and Wightman, F. L. (1992). “A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction,” J. Acoust. Soc. Am. 91, 1637. Kreyszig, E. (2005). Advanced Engineering Mathematics, 9th ed. (John Wiley and Sons, Hoboken, NJ), Chap. 5, p. 180. Kulkarni, A., and Colburn, H. S. (1998). “Role of spectral detail in soundsource localization,” Nature 396, 747–749. Kulkarni, A., and Colburn, H. S. (2004). “Infinite-impulse-response models of the head-related transfer function,” J. Acoust. Soc. Am. 115, 1714–1728. Kulkarni, A., Isabelle, S. K., and Colburn, H. S. (1999). “Sensitivity of human subjects to head-related transfer-function phase spectra,” J. Acoust. Soc. Am. 105, 2821–2840. Li, G., and Wen, C. (2010). “Legendre polynomials in signal reconstruction and compression,” in 5th IEEE Conference on Industrial Electronics and Applications (ICIEA) (Taichung, Taiwan), pp. 1636–1640. Liao, S. X., and Pawlak, M. (1996). “On image analysis by moments,” IEEE Trans. Pattern Anal. Mach. Intell. 18, 254–266. Shekarchi et al.: Head-related transfer function compression

3695

Mackenzie, J., Huopaniemi, J., Valimaki, V., and Kale, I. (1997). “Loworder modeling of head-related transfer functions using balanced model truncation,” IEEE Signal Processing Lett. 4, 39–41. Middlebrooks, J. C. (1992). “Narrow-band sound localization related to external ear acoustics,” J. Acoust. Soc. Am. 92, 2607–2624. Oppenheim, A. V., Schafer, R. W., and Buck, J. R. (1999). Discrete-Time Signal Processing, 2nd ed. (Prentice Hall, Upper Saddle River, NJ), Chap. 5, pp. 240–339.

3696

J. Acoust. Soc. Am., Vol. 134, No. 5, November 2013

Scarpaci, J. W., and Colburn, H. S. (2005). “Principal components analysis interpolation of head related transfer functions using locally-chosen basis functions,” J. Acoust. Soc. Am. 117, 2561. Wang, L., Yin, F., and Chen, Z. (2009). “A hybrid compression method for head-related transfer functions,” Appl. Acoust. 70, 1212–1218. Wenzel, E. M., Arruda, M., Kistler, D. J., and Wightman, F. L. (1993). “Localization using nonindividualized head-related transfer functions,” J. Acoust. Soc. Am. 94, 111–123.

Shekarchi et al.: Head-related transfer function compression

Copyright of Journal of the Acoustical Society of America is the property of American Institute of Physics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Compression of head-related transfer function using autoregressive-moving-average models and Legendre polynomials.

Head-related transfer functions (HRTFs) are generally large datasets, which can be an important constraint for embedded real-time applications. A meth...
1MB Sizes 0 Downloads 0 Views