Elicitation of attributes for the evaluation of audio-on-audio interference Jon Francombe,a) Russell Mason, and Martin Dewhirst Institute of Sound Recording, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom

Søren Bechb) Bang & Olufsen a/s, Peter Bangs Vej 15, 7600 Struer, Denmark

(Received 30 July 2013; revised 14 March 2014; accepted 2 October 2014) An experiment to determine the perceptual attributes of the experience of listening to a target audio program in the presence of an audio interferer was performed. The first stage was a free elicitation task in which a total of 572 phrases were produced. In the second stage, a consensus vocabulary procedure was used to reduce these phrases into a comprehensive set of attributes. Groups of experienced and inexperienced listeners determined nine and eight attributes, respectively. These attribute sets were combined by the listeners to produce a final set of 12 attributes: masking, calming, distraction, separation, confusion, annoyance, environment, chaotic, balance and blend, imagery, response to stimuli over time, and short-term response to stimuli. In the third stage, a simplified ranking procedure was used to select only the most useful and relevant attributes. Four attributes were selected: distraction, annoyance, balance and blend, and confusion. Ratings using these attributes were collected in the fourth stage, and a principal component analysis performed. This suggested two dimensions underlying the perception of an audio-on-audio interference situation: The first dimension was labeled “distraction” and accounted for 89% of the variance; the second dimension, accounting for 10% of the variance, was labeled “balance and blend.” C 2014 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4898053] V PACS number(s): 43.55.Hy [LMW]

Pages: 2630–2641

I. INTRODUCTION

Products and systems that produce audio are becoming increasingly ubiquitous across many areas of day-to-day life. Such products are often portable, leading to many potential situations in which the experience of listening to an intended audio program is compromised by the presence of an interfering audio program. Many examples can be envisaged; for example, a telephone conversation in the presence of background music in a shopping center or while a car radio is on, or a laptop computer playing audio in a room with a television on. With this increase in the potential for audio interference, there has been a move toward the development of systems that are capable of reproducing separate audio programs to multiple listeners in the same environment while minimizing the interference between programs (Druyvesteyn and Garas, 1997; Jones and Elliott, 2008). Research to date has focused on the technical assessment of such systems (e.g., acoustic contrast and signal-to-noise ratios). In order to facilitate a more thorough evaluation of such systems it would be beneficial to account for subjective factors; therefore, it is important to understand the perceptual attributes that are affected in audio-on-audio interference situations.

a)

Author to whom correspondence should be addressed. Electronic mail: [email protected] b) Also at: Section of Signal and Information Processing, Department of Electronic Systems, Aalborg University, 9100 Aalborg, Denmark. 2630

J. Acoust. Soc. Am. 136 (5), November 2014

As an initial consideration of the perceptual effects of audio-on-audio interference situations, the degree of separation required between two streams of audio has been investigated (Druyvesteyn et al., 1994; Francombe et al., 2012). This research, however, has only considered the level difference between target and interferer programs, and the perceptual dimensions of such an experience have not been determined. This is in contrast to the domain of audio quality in which the perceptual dimensions have been investigated in detail, starting with broad divisions such as timbral or spatial quality and moving toward large sets of attributes for detailed parametric assessments (Letowski, 1989). It is a natural extension of this work to determine attributes of the perceptual experience of the “interference” domain for audio interferers. An attribute is defined by Rumsey (1998) as “a characteristic quality of an object that one may use in describing it”; such attributes can be used to differentiate between stimuli or products (Lawless and Heymann, 1999) and therefore generally originate from the description of differences between stimuli, or between a stimulus and a provided reference. Some relevant information may be derived from other areas of sound interference research, such as qualitative descriptions of the effects of environmental noise. Hede et al. (1979) collected free and forced subjective response data for a number of noise sources; “annoyance” was found to be the most commonly used term from a wide range of descriptors. Furihata et al. (2007) collected free-response data about the effects of community noise exposure; again, annoyance was the most commonly used descriptor.

0001-4966/2014/136(5)/2630/12/$30.00

C 2014 Acoustical Society of America V

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

However, this research has tended to focus on relatively steady-state broadband interferers, so may not be relevant to audio-on-audio interference. The experiments described below were performed in order to investigate the perceptual experience of a listener attempting to listen to a target audio program in the presence of some interfering audio, and to produce a set of attributes to describe this experience. This research will assist in understanding the perceptually salient factors of audio-onaudio interference situations, and the resulting attributes could be used as rating scales for the development of predictive perceptual models. Such models may be beneficial for improving the listener experience in situations with an audio interferer; this is especially pertinent when developing systems and methods for producing separation between audio programs including sound field control systems or source separation algorithms. The aims of this study were: to produce a full set of attributes for evaluation of the perceptual effect of an audioon-audio interference situation; to determine the underlying perceptual dimensions of this experience and therefore select the most appropriate attributes for further investigation; and to observe relationships between parameters of audio-onaudio interference situations (target program, interferer program, interferer level, and road noise level) and the relevant attributes. In Sec. II, the literature relating to attribute elicitation methods applied to audio quality evaluation is reviewed, and the experiment design for this study outlined. The experimental procedure and results for the four experiment stages performed in this study are presented in Secs. III–VII, and the results discussed in Secs. VIII and IX. II. ATTRIBUTE ELICITATION METHODS

A variety of methods have been used in the audio literature to elicit attributes. Many experiments have drawn on research from other sensory sciences, particularly food science. Descriptive analysis (DA) is a common and important procedure in the food science industry, providing information related to product development, consumer responses, and sensory mapping [see Lawless and Heymann (1999) and Bech and Zacharov (2006) for reviews]. It is common for researchers to adapt or combine methods to suit the needs of a particular study; this is known as generic DA (Murray et al., 2001). Various authors discuss the desirable characteristics of attributes determined using elicitation procedures (Lawless and Heymann, 1999; Berg, 2006). A review of the literature relating to the elicitation of attributes for audio quality evaluation suggested three main processes: elicitation of descriptive terms, grouping of terms and creation of attribute scales, and analysis of the underlying perceptual structure. In this study, the most suitable method was selected for each of these tasks. A. Elicitation of descriptive terms

The development of a descriptive vocabulary is the starting point in many attribute elicitation experiments. It is sometimes possible to make use of pre-existing descriptors, J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

but where there are no directly relevant studies that have provided suitable terms it is necessary to elicit descriptors (Berg, 2006). Experienced participants (or “experts” in the field) can be recruited to suggest suitable terms from their general experience (i.e., with no reference to a specific set of stimuli). For instance, Guski et al. (1999) sent questionnaires to 68 experts on environmental noise annoyance, collecting data on the “most essential” effect of noise as well as similarity ratings between the term annoyance and various other descriptors. Annoyance and “disturbance” were found to be commonly selected, and a close relationship was found between the terms; it was also determined that annoyance was a multidimensional construct including behavioral and evaluative effects. Experts may have a broader, more general, and wellconsidered grasp of the field than laity; however, the definition of “expertise” is not immediately obvious and may be contentious. For example, Guski et al. (1999) defined an expert as a published researcher, although in the case of the effects of environmental noise those affected by noise could also be considered expert. Using a panel of experts could also limit the generalizability of any elicited attributes to some extent. Additionally, the elicitation of terms based on the general experience of participants is less controlled than in methods using a specific stimulus set; therefore, the range of elicited terms may have reduced relevance to a particular situation. Rather than relying on the prior knowledge of a group of experts, it may be desirable to elicit attributes for stimuli presented to participants in an experiment. A wide range of methods for this have been used; such methods generally take the form of either “consensus” or “individual” vocabulary experiments. In consensus elicitation experiments, a group of subjects work together to elicit terms in an attempt to create a “common language.” This technique forms the basis of many of the formal DA methods and has been applied to audio research. For example, Lorho (2005a) used DA to develop a set of 16 attributes to describe spatial enhancement algorithms applied to various stimuli presented over headphones. Individual vocabulary techniques involve subjects developing their own unique vocabularies, which can then be combined to produce a consensus using statistical methods or a further group elicitation stage. The simplest form of individual elicitation is a free elicitation procedure; subjects are presented with stimuli and asked to respond with terms that they feel are appropriate descriptors. Free elicitation methods have been used to elicit terms that describe the effects of interfering noise (Hede et al., 1979; Furihata et al., 2007), impressions of sound scenes (Guastavino and Katz, 2004), spatial enhancement algorithms (Lorho, 2005b), and concert hall acoustics (Lokki et al., 2011). Such methods can be analyzed simply by looking at the frequency of use of each descriptor, possibly with some pre-processing such as reducing variants of words to their root forms (Guastavino and Katz, 2004); alternatively, further experimentation can be performed using the resultant attributes (see Sec. II C). Francombe et al.: Attribute elicitation for audio interference

2631

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

More structured methods for individual elicitation include the Repertory Grid Technique (RGT), developed by George Kelly in 1955 and adapted for audio attribute elicitation by Berg and Rumsey (1999); variants on this technique have been widely used for the elicitation of audio attributes (Choisel and Wickelmaier, 2006; Kim and Martens, 2007). A potential disadvantage of such methods is the length of time required to perform paired comparisons as well as the requirement to make ratings on all of their elicited attributes. Lokki et al. (2012) combined aspects of both elicitation methods, using a list of perceived differences elicited from paired comparisons as the basis of a free elicitation task. Individual vocabulary methods remove the advantage of directly creating a group language but reduce the risk that subjects who have stronger verbal skills, larger vocabularies, or are more confident in a group discussion may dominate a group elicitation session. Individual methods can also be less time consuming as they avoid lengthy training procedures and discussions involved in many group elicitation methods (Bech and Zacharov, 2006). Group discussion can still be included during the reduction of descriptors (Sec. II B), and Bech (1999) points to evidence suggesting little difference between the outcome of consensus and individual elicitation methods. The most important outcome from the initial elicitation stage is to obtain as wide a variety of descriptors as possible that are relevant to the situation. Hence, an individual free elicitation task using a representative set of stimuli was selected for use in this study (Sec. IV). Individual tasks allow subjects who may be less confident in a group setting to produce descriptors uninhibited by other group members, and a simple free elicitation task facilitates inclusion of a wide range of stimuli, reducing the risk of missing a pertinent descriptor. B. Grouping of terms and creation of attribute scales

There is likely to be considerable redundancy in the large descriptor sets elicited using the free elicitation task selected above; terms may be duplicated, synonymous, or irrelevant to the majority of participants (Shaw and Gaines, 1989). While statistical dimension reduction (Sec. II C) can be used to remove redundancy, collecting adequate data for such procedures with a large number of attributes is unnecessary if there is considerable redundancy that can be removed in a more efficient manner. It is therefore desirable to reduce the descriptors and determine a usable set of attribute scales using the methods discussed below. Perhaps the simplest and most commonly used method for reducing descriptors is group discussion; this is often an extension of a group elicitation stage (Bech et al., 1996), or can combine participants from an earlier individual elicitation stage (Zacharov and Koivuniemi, 2001). In group discussions that follow an individual elicitation stage, subjects are given the chance to suggest reasons for the descriptors they produced. This enables a consensus to be reached as to terms with similar meaning, which can be grouped to form attribute scales. It is also possible to develop bipolar endpoints and a simple explanation of the meaning of each scale 2632

J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

in order that it can be understood by new subjects (Zacharov and Koivuniemi, 2001). Choisel et al. (2007) found that creating a “neighbor description”—that is, a short phrase to describe the meaning of the scale—enabled subjects who had not participated in an elicitation experiment to use scales reliably. Statistical grouping or clustering that does not require full attribute ratings is potentially quicker than a group discussion. Le Bagousse et al. (2010) used free categorization to determine families of sound attributes for the assessment of spatial audio. Subjects were presented with a list of attributes and asked to group them into between two and five sets, providing a title for each set. This method is beneficial as it does not rely on all subjects being available at the same time and is a relatively quick task; however, the number of possible sets is limited (determined by the experimenter) and it is non-trivial to select the correct label for a group of descriptors if there are pronounced differences within a group. Rather than using statistical methods, verbal data can be grouped by considering the semantic content of the responses and categorizing them based on simple rules (Neher et al., 2006). This procedure is known as verbal protocol analysis (VPA) (Ericsson and Simon, 1993). Berg and Rumsey (2006) performed VPA based on categories proposed by Samoylenko et al. (1996), breaking down elicited terms into “descriptive” or “attitudinal” categories, then descriptive features into “unimodal” and “polymodal” categories, and attitudinal features into “emotional-evaluative” attributes and “naturalness-related” attributes. Classification was performed by one of the authors. Neher et al. (2006) note that categorized data from VPA experiments can be analyzed by frequency of occurrence using standard statistical techniques. It may also be possible to perform reduction without the use of human participants. Mattila (2001) used a simple rule-based similarity measure to group descriptors (based on the prefix of a word up to 65% of the word length with a minimum of four letters). This is similar to the procedure described by Guastavino and Katz (2004) that reduced words to their root forms (“lemmata”) and used a thesaurus to group terms by semantic theme. Such methods, while relatively quick and cost-effective to undertake, are likely to be too simplistic to be used for more than a preliminary filtering or analysis of the data. When combining terms to create attribute scales, the process should avoid potential biases by the experimenter, and the results should be as widely understandable as possible. A number of the methods outlined above, such as some statistical methods and VPA, often require the experimenter to define the groups. In contrast, group discussion allows the subjects to describe and label the groups, thus reducing the chance of bias and increasing the chance that the results will be understandable by others. Group discussion is also likely to be more successful than the relatively simplistic rulebased grouping (i.e., without subjects) in terms of reducing the number of resulting groups while maintaining subtle semantic differences. In order to facilitate efficient collection of ratings, some form of statistical reduction may still be desirable to further reduce the attribute set; therefore, a Francombe et al.: Attribute elicitation for audio interference

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

combination of group discussion (Sec. V) and statistical reduction (Sec. VI) was selected for this stage of experimentation. C. Understanding of perceptual space

The ultimate aim of attribute elicitation procedures is to understand the underlying perceptual space of the experience in question. Where relevant rating scales have already been determined, this can be undertaken by collecting ratings and performing statistical analysis. Factor analysis methods can be used to determine the underlying perceptual dimensions from ratings of multiple attributes. Principal component analysis (PCA) is a commonly used method as the results can be interpreted using simple plots (Næs et al., 2010). Gabrielsson and Sjogren (1979) collected audio quality ratings on 60 attributes; PCA was performed on the results, suggesting 8 dimensions by which the quality of reproduced audio could be described. Performing a rating experiment with 60 attribute scales is a time-consuming process; smaller numbers of scales have been successfully used (Guastavino and Katz, 2004; Kim and Martens, 2007). Where multiple attribute sets exist (i.e., as the output of an individual elicitation task such as the RGT), Generalized Procrustes Analysis (GPA) (Gower, 1975) can be used to transform the data in order that it conforms to a common perceptual space. This is often coupled with dimension reduction in order to eliminate redundancy and observe differences between products (Næs et al., 2010). Where only a single attribute set exists, differences between subjects can still be observed using methods such as Tucker-1 PCA (see Sec. VII B 2) and accounted for by standardizing scores. The experimental procedure selected above results in a single set of attribute scales determined using individual and consensus elicitation stages and reduced to a manageable size using statistical methods. The single attribute set enables collection of ratings on the same scales without the need for GPA. Therefore, in order to gain an understanding of the perceptual space, a rating experiment with PCA was selected for this experiment stage. A collection of ratings on various scales also enables analysis of subject performance, which can be used to help label the resulting dimensions in cases where two attributes show significant correlation. D. Experiment design

In the above literature review, three stages necessary for determining the underlying attributes of an experience were detailed: elicitation of descriptive terms, grouping of terms and creation of attribute scales, and understanding of perceptual space. The following selections were made: In order to elicit a wide variety of descriptors, an individual free elicitation task was selected; to reduce the number of descriptors and create a set of attribute scales, a combination of group discussion and statistical dimension reduction was selected; and in order to analyze the underlying dimensionality of the experience, a rating experiment followed by PCA was specified. The experiments are detailed in Secs. IV–VII. J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

III. PARTICIPANTS AND STIMULI A. Participants

Two groups of listeners with different degrees of experience participated in the experiments described below. It was considered beneficial to include both experienced and inexperienced listeners as useful attributes should “relate to consumer acceptance” (Lawless and Heymann, 1999) and it may therefore be necessary to use the attributes with nontrained listeners. It was also felt that audio-on-audio interference situations would be familiar to listeners in both groups. When collecting a wide range of descriptors (as in the first experiment stage), input from different participants may be useful in covering the complete perceptual space, and working with two groups of subjects facilitates analysis of the differences and similarities between the groups. “Experienced listeners” were undergraduate students on the Music and Sound Recording course at the University of Surrey, all of whom had undertaken a technical ear training course. “Inexperienced listeners” were undergraduate and postgraduate students or recent graduates from a range of disciplines, including music students. The inexperienced subjects had no formal technical listening training, but had potentially performed other listening tests or had some musical/production training. No subjects had explicit prior knowledge of the goals of the wider project. Subjects received a small amount of remuneration for the time taken to perform the experiments. B. Stimuli

The stimuli selected for an elicitation experiment naturally limit the generalizability of the attributes produced; for example, it is not possible to say with certainty that attributes elicited for the evaluation of environmental noise are necessarily applicable to audio-on-audio interference situations with realistic audio interferers. Therefore, the stimuli used in the elicitation experiment were selected to give a wide coverage of ecologically valid program items. In order to cover a range of realistic interference situations while facilitating categorical analysis of the relationship between parameters of the stimuli and attribute scores, stimuli were created as full factorial combinations of four independent variables: target program, interferer program, interferer level, and road noise level (road noise was included to simulate audio-on-audio interference situations in an automotive environment). The target and interferer programs were selected to cover a range of common audio items. The target programs included speech with background noise (football commentary with crowd noise), pop music (The Killers “On Top”), and classical music (Brahms Hungarian Dance No. 18). The interferer programs included speech (BBC Radio 4 “Points of View”), pop music (The Bravery “Give In”), and classical music (Mahler Symphony No. 5, 4th Movement). Minutelong excerpts of the program items were used. The reproduction levels of the interferer were selected to cover a wide perceptual range: 0 dB with respect to the target program level, the threshold of audibility þ 6 dB, and the midpoint Francombe et al.: Attribute elicitation for audio interference

2633

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

between these two levels. The threshold of audibility of the interferer in each combination was determined in a pilot experiment following a similar methodology to that described in Francombe et al. (2012); listeners performed a method of adjustment task to produce threshold values. The level of the target audio was set at approximately 70 dBLAeq(20 s). The road noise was either off, or a simulation of road noise within a vehicle at 30 mph (approximately 60 dBA). The full factorial combination of these factors produced a set of 54 stimuli. The target and interferer programs were replayed monophonically at distances from the listener of 1.85 and 2.2 m, respectively, with the target loudspeaker on-axis and the interferer loudspeaker at 90 . The road noise sample was recorded by undergraduate students on the Music and Sound Recording course at the University of Surrey; a monophonic recording was decorrelated using Pulkki’s (2007) method of convolution with white noise bursts in 3 frequency bands, and reproduced over 6 loudspeakers arranged in a regular hexagon of radius 2 m (with vertices at 30 , 90 ,…, 330 ). IV. STAGE ONE: FREE ELICITATION

The purpose of the first stage was to elicit a wide range of terms that the participants felt suitable for describing the perceptual effect of audio-on-audio interference situations. A free elicitation task was used. A. Experiment design

The methodology was similar to the differential elicitation stage of the Audio Descriptive Analysis & Mapping (ADAM) method (Koivuniemi and Zacharov, 2001). A reference stimulus (just the target audio, with road noise where appropriate) was provided alongside a set of test stimuli, and subjects were asked to write any words that they felt appropriate to describe the difference in situation between the reference and the test stimuli. A custom interface was created in Max/MSP. Each test page featured nine stimuli in addition to the reference; the target program and road noise level were constant for all items on each page and the other factor levels were randomly arranged across nine buttons. This gave a total of 6 pages for the 54 stimuli. Although subjects were asked to compare each stimulus against the reference, the multiple stimulus presentation also allowed for comparison between stimuli; this was considered to be beneficial as the potential pool of descriptors would be enlarged without requiring the large number of combinations required using a paired comparison method. Each test page contained an area for subjects to type their response, and the words in the box were not removed on moving to the next page. Subjects were instructed that they did not have to use any word more than once even if it was appropriate for multiple stimuli. The interface did not allow a subject to move on to a new page until every stimulus had been auditioned at least once, although subjects were told that they were not required to listen to the full duration of the stimuli unless they felt it necessary. Eighteen subjects participated in the test: nine experienced listeners and nine 2634

J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

inexperienced listeners (the latter including four music specialists). B. Results

The collected data were manually separated into individual phrases (i.e., complete words and phrases were included verbatim). Any exact duplicates were removed as these were unnecessary for the following consensus vocabulary stage. This left a total of 259 unique words/phrases for the experienced listeners and 313 for the inexperienced listeners. Listeners tended to respond with short descriptive phrases rather than individual words; although the majority of responses for both groups featured single words (37.3% and 30.9% of the responses for the experienced and inexperienced listeners, respectively), the mean number of words per phrase was 4.0 for the experienced subjects and 5.1 for the inexperienced subjects. These statistics suggest that the inexperienced listeners were marginally more verbose in their responses. 1. Edited word list

The primary function of the first stage was to provide a wide range of possible descriptors; however, it was also possible to perform a more detailed analysis of the results. As discussed in Sec. II A, free elicitation data can often be analyzed simply using frequency of response. In this experiment, subjects were told that it was not necessary to use any term more than once; therefore, analysis by frequency of use was limited. However, it was possible to analyze agreement between participants by observing the degree of overlap of phrases. To facilitate this analysis, key words were extracted from the phrases by the first author. For example, the phrases “the interferer was annoying,” “I found it really annoying,” and “annoyed” were all coded as “annoyance.” In this manner, the 572 phrases were reduced to a total of 276 unique descriptors. The majority of descriptors were only used by one subject, with a sharp drop-off in the number of words that were used by multiple subjects; one term (distraction) was used by 14 subjects, while 218 words were only used by particular individuals. Table I contains the descriptors that were used by six or more subjects; distraction, “confusion,” and “annoyance” stood out as descriptors that were used by the majority of participants. V. STAGE TWO: GROUP DISCUSSION

The aim of the second experiment was to reduce the large collection of phrases elicited in the first stage into a set of TABLE I. Descriptors used by more than six subjects (based on the edited word list described in Sec. IV B 1). Descriptor

Number of subjects

Distraction Confusion Annoyance Focus, clashing Relaxation Irritation, concentration

14 13 12 8 7 6

Francombe et al.: Attribute elicitation for audio interference

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

attributes that could be used to describe the experience of a listener in an audio-on-audio interference situation. As discussed in Sec. II B, a consensus vocabulary task was selected for this. A. Experiment design

The grouping task again drew on ideas from the ADAM elicitation procedure (Zacharov and Koivuniemi, 2001). The experiment featured five tasks: grouping, attribute definition, end-point definition, attribute description, and attribute list combination. These tasks were divided into three sessions: The grouping was performed in the first session (2 h); the attribute definition, end-point definition, and attribute description were performed in the second session (2 h); and the list combination was performed in the third session (1 h). The subjects in the consensus elicitation experiment had all participated in the first experiment. Experienced and inexperienced listeners were separated and the task performed separately for each group of listeners to enable comparison of the attributes produced. Six experienced listeners participated in the first session, with four in the second and third sessions; eight inexperienced listeners participated in the first session, five in the second session, and three in the third session. Before commencing the experiment, subjects were briefly introduced to the concept of an attribute for making judgments about an object or situation, and the stages of the experiment were outlined. The group discussions were led by the first author; the role of the panel leader was to facilitate the discussion, i.e., to present the phrases, keep the discussion moving, ensure all subjects were equally involved, and record the results. The panel leader had no input into the direction of the discussion and made a conscious effort to remain unbiased. In the grouping stage, subjects were asked to group the phrases elicited in the first experiment stage into sets of words that described the same “difference in experience” between listening to the reference and any of the test stimuli. Phrases that the group felt to be inappropriate could be discarded at this stage. Each subject group was only presented with the phrases that were previously elicited by members of that group. The phrases elicited in stage one were presented verbatim to the subjects (i.e., not in the edited form described in Sec. IV B 1). In the attribute definition stage, subjects were presented with all of the phrases from the sets produced in the grouping stage, one set at a time, and asked to select a word (or short phrase if necessary) that was most appropriate for describing the content of that set. Subjects were also able to merge, divide, or discard groups at this stage. In the end-point definition stage, subjects were asked to produce scale end-point descriptors that could be used to describe the maximum and minimum perceptual intensity of the characteristic described by each attribute. In the attribute description stage, subjects were asked to agree on a short description of the attribute that could be used to explain the meaning to someone who had not taken part in the elicitation experiment. Finally, in the attribute combination stage, the two groups of subjects were combined to compare and reduce J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

any redundancy in the attribute sets through group discussion. Following familiarization with the attributes produced by each group, subjects were given the opportunity to remove duplicate or unnecessary attributes and refine definitions, end-points, and descriptions. B. Results 1. Grouping

In the first session of the consensus vocabulary experiment, the experienced listeners grouped 263 phrases into 15 sets and the inexperienced listeners grouped 317 phrases into 13 sets. Each group of listeners also discarded a number of the original phrases as they felt that the terms were meaningless or not relevant to the audio-on-audio interference scenarios. In some cases the participants could not decide which set to use for a term but felt that it had already been covered in other sets and so excluded it from further consideration. The experienced listeners discarded 44 phrases (16.7%) and the inexperienced listeners discarded 63 phrases (19.9%). 2. Attribute definition, end-points, and description

In the second session, the experienced listeners produced nine attributes and the inexperienced listeners produced eight attributes. For both groups of listeners, this was a notable reduction from the number of sets produced in the grouping stage; both groups merged and removed sets. Conversely, the experienced listeners also produced two attributes (annoyance and “unsettling”) from the same set of terms. 3. Attribute list combination

In the third session, the two groups of listeners were combined in order to merge the attribute sets. Through group discussions, the 17 attributes produced by the 2 groups were reduced to 12 attributes and some minor changes were made to attribute definitions, descriptions, and end-points. The final set of attributes is presented in Table II. It was notable during the discussion that the inexperienced subjects often felt that they had produced terms with a similar meaning to the experienced listeners, but that the experienced listeners created more concise and meaningful descriptions of the related experience. Therefore, for similar attributes, the experienced listeners’ descriptions were generally retained. The following major changes were made: “calming” and unsettling from the experienced subject attributes were merged and titled calming; distraction and distraction from the two attribute sets were merged; “separation” and separation from the two attribute sets were merged; “environmental pressure” and “completion” were removed from the inexperienced subject attributes; and “lasting effect” and “response to stimuli” (from the inexperienced subject attributes) were changed to “response to stimuli over time” and “short-term response to stimuli,” respectively. VI. STAGE THREE: ATTRIBUTE REDUCTION

An important part of the DA methods introduced in Sec. II involves using the resulting scales for ratings to Francombe et al.: Attribute elicitation for audio interference

2635

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

TABLE II. Combined attribute list. Attribute Masking (Mask) Calming (Calm) Distraction (Dist) Separation (Sep) Confusion (Conf)

Annoyance (Ann) Environment (Env) Chaotic (Ch) Balance and blend (BB) Imagery (Ima) Response to stimuli over time (RTime)

Short-term response to stimuli (RShort)

Description

End-points

How much it is possible to hear the target in the presence of the alternate audio How calming or unsettling is the listening experience with the addition of the alternate audio? How much the alternate audio pulls your attention or distracts you from the target audio How distinct the two sound sources are from each other How confusing the merge of the two audio programs is - rhythmically, melodically, or harmonically; how they blend together. Confusion because the sources interact with each other To what extent the alternate audio causes irritation when trying to listen to the target audio To what extent does the combination of stimuli represent a plausible real-world situation? How busy the combination of audio is How you judge the blend of sources to be To what extent does the audio experience cause you to reminisce or focus on a particular memory or event Once the stimuli have finished, is there a long-standing feeling, either positive or negative; feelings that go beyond the duration of the stimuli How does the blend of the given stimuli cause you to feel?

Completely audible ! drowned out

differentiate products or for further statistical analysis of the underlying perceptual dimensions of an experience. Statistical techniques such as PCA can be used to reveal redundancy in the attribute set (i.e., more than one attribute describing essentially the same facet of experience). However, in order to reduce redundancy and encourage efficiency it was considered beneficial to remove the less relevant attributes prior to performing a rating experiment. The third experiment was designed to determine the most relevant and useful attributes from the set of 12 produced in the consensus vocabulary experiment. A. Experiment design

A simplified ranking procedure was used for the attribute reduction stage. For each stimulus combination (from the same set of stimuli used in the previous stages), subjects were asked to select a single attribute that they felt was most useful or relevant for rating the difference between the reference and test stimulus. A custom test interface was created featuring one stimulus per page (alongside a reference stimulus as described above) with the attribute labels and their associated scale end-points shown in a grid of buttons. The position of the attributes was randomized for each page to minimize a possible bias of subjects repeatedly selecting the same attribute through boredom or fatigue. Fifteen subjects participated in the experiment: seven subjects had participated in each previous experiment (four experienced, three inexperienced), and three had participated in one or more of the earlier experiments before dropping out (one experienced, two inexperienced). The remaining five subjects (two experienced, three inexperienced) had not participated in any of the previous experiments. It was considered beneficial to include new subjects in order to assess their agreement with those who had been involved in the 2636

J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

Very calming ! very unsettling Not at all distracting ! overpowered Completely separate ! indistinguishable Extremely confusing ! not at all confusing

Very annoying ! not annoying at all Realistic ! unrealistic Chaotic ! simple Complementary ! conflicting Strong relation ! no relation Very positive feeling ! very negative feeling

Very positive feeling ! very negative feeling

elicitation procedure. Each experiment session lasted approximately 25 min.

B. Results 1. Overall results

The results from stage three were analyzed using a chisquare goodness-of-fit test to determine significant differences from the null hypothesis that all attributes were used with equal probability (Bech and Zacharov, 2006). There were found to be significant differences in the overall use of each attribute (z ¼ 312.9, p < 0.01), indicating that the attributes were not selected equally. This suggests that subjects considered some attributes to be more useful or relevant than others. Figure 1 shows the total frequency of use (across stimuli and subjects) for each attribute; significant differences (a ¼ 0.05), determined by standardized residuals lying outside of 61.96, are indicated with arrows showing the

FIG. 1. Overall frequency of attribute use in the simple ranking experiment. Significant differences are indicated with arrows showing the direction of the difference. Attribute shorthand is outlined in Table II. Francombe et al.: Attribute elicitation for audio interference

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

direction of the difference (that is, greater or less than uniform frequency of use). The results show that distraction, annoyance, and “balance and blend” were all used more than expected for a uniform distribution of responses.

three descriptors used by most subjects, and the set of phrases from which balance and blend was derived included “clashing.” VII. STAGE FOUR: ATTRIBUTE RATINGS

2. Results by subject group

The chi-square analysis was repeated for five subject groups: All experienced listeners (1); all inexperienced listeners (2); subjects who participated in all elicitation sessions (3); subjects with no prior participation (4); and subjects who participated in some elicitation sessions (5).1 All groups used distraction at significantly greater than chance frequency; groups 1–4 also used annoyance; groups 2 and 3 used balance and blend; and groups 1 and 5 used confusion (not one of the most frequently used attributes overall, although the inclusion of this attribute can be attributed to the responses of the one subject falling in both groups). It was therefore felt worthwhile to include confusion in the subset of attributes for further analysis. It is interesting to note that the experienced listeners did not select balance and blend—an attribute that originated from the inexperienced listener set of attributes—at greater than chance frequency. The attributes used at lower than chance frequency were similar between all subject groups: response to stimuli over time was apparently rejected by all groups; calming, environment, “imagery,” and “short-term response” were rarely selected; and “masking” and separation were used at less than chance frequency by group 5. 3. Results by stimulus

Pearson’s chi-square test of independence was used to test for relationships between the independent variables and the selected attribute. Standardized residuals were used to observe those cases contributing to a significant deviation from chance selection frequency: There were found to be significant relationships between frequency of attribute selection and factor level for target program, interferer program, and interferer level. Analyzing the results in this manner showed some instances of attributes other than the four mentioned above being used at greater than chance frequency for specific factor levels. However, the most commonly used attributes in each case were those detailed above, so the additional attributes were not considered further. 4. Relationship with stage one results

The aim of the third stage was to determine the most useful and relevant attributes for evaluation of audio-onaudio interference situations in order to select a small set of attributes for use in a subsequent rating experiment. Of the 12 attributes elicited in earlier experiments, 4 attributes stood out as being used at greater than chance frequency for all stimuli: Distraction, annoyance, balance and blend, and confusion. It is interesting to note that the four attributes selected relate closely to the four most widely used descriptors from the first stage of the elicitation experiment (Table I); distraction, confusion, and annoyance were the J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

The final stage was designed to collect ratings made on the four attribute scales selected from the subset of attributes determined in stage three, in order to: perform statistical dimension reduction to identify the underlying perceptual dimensions of audio-on-audio interference situations; statistically assess the reliability and consistency of subjects’ use of the different attribute scales to assist in labeling the dimensions; and obtain ratings of the stimuli to provide information about the relationship between the attributes and the factor levels. A. Experiment design

All 54 stimuli from the initial elicitation stage were used in the rating experiment, and each rating was repeated to allow for assessment of the reliability of subjects’ scale use. The test was preceded by a familiarization stage in which subjects were required to listen to a range of the stimuli (all combinations of target program, interferer program, and road noise level, at the low and high interferer levels) and acquaint themselves with the attribute description and rating scale. Following the familiarization, a multiple stimulus method was used to collect ratings. Alongside a reference stimulus (just the target audio, with road noise where appropriate) there were ten stimuli on each test page; the target program and road noise level were held constant and the other factors varied to give nine combinations in addition to a hidden reference. Including the repeats there were a total of 12 pages for each attribute. The scales were 15 cm long sliders from 0 to 100 (with a resolution of 1) with end-point labels positioned 1 cm from the top and bottom of the scale. Subjects were instructed that the hidden reference should be scored at 0 for distraction, annoyance, and confusion, and 50 for balance and blend.2 Subjects completed a randomly selected practice test page before commencing the ratings. As suggested above, it was felt to be important that new subjects could make meaningful use of the resulting scales. Therefore, 14 subjects participated in the rating experiment; 7 subjects had participated in all prior stages while the remaining 7 had not participated in any of the previous stages. Within these groups, participants fell into three categories of listening experience: “technical listeners” were Music and Sound Recording students at the University of Surrey, all of whom had undertaken technical ear training; “musical listeners” were students with a musical background including production and other critical listening experience but without the same technical listening background; and inexperienced listeners were listeners without a particular musical background or training at degree level. The test procedure outlined above was repeated for the four attributes, with each subject rating the attributes in a different order. There was a short break (5–10 min) between Francombe et al.: Attribute elicitation for audio interference

2637

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

each attribute, and a more substantial break after the second test if subjects were performing all of the listening tests in one day. The range of time taken for subjects to complete the test was wide, varying between approximately 12 and 42 min per attribute. B. Results 1. Subject reliability

The hidden reference stimulus was correctly identified (i.e., scored at 0 or 50 as appropriate) in all but one case (out of 672 judgments); the incorrectly scored reference was for the attribute confusion by subject ten, and was rated at 3. The hidden reference stimuli were intended to anchor the distraction scale and were therefore removed from further analysis. Mean absolute error between repeated judgments was used to assess the ability of subjects to use the scales reliably. To account for differences in scale use, scores for each subject and attribute were standardized to have a mean of 0 and a standard deviation of 1. While there was some variance between the reliability of individual listeners, removing subjects who were less reliable did not have a substantial effect on the analysis described below; therefore, results from all subjects were included in the analysis. There was found to be no significant effect of listening experience on reliability, although the inexperienced and musical listeners seemed to perform marginally more reliably than the technical listeners. Figure 2 shows mean absolute error broken down by prior participation and attribute; subjects who participated in the elicitation were significantly more reliable for all attributes with the exception of distraction, suggesting that this attribute was the most easy to understand and use for subjects who had not participated in the prior elicitation stages. 2. Subject agreement

Alongside being able to use a scale reliably, it is also important that subjects are able to agree on the meaning and usage of attributes. One way to assess inter-subject agreement is visual inspection of Tucker-1 correlation loading plots (Dahl et al., 2008), which are created by performing

FIG. 2. Mean absolute error (standardized) by previous elicitation participation, broken down by attribute. Error bars show 95% confidence intervals for each subject group. Horizontal lines show means across all attributes for each group. 2638

J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

PCA on an unfolded data matrix with stimuli in rows and attributes in columns for each subject side by side (Næs et al., 2010). This allows for the loadings of the attributes onto the principal components to be different for each subject, giving an impression of how much the subjects agree with each other on the use of the attributes. The desirable situation is for points which are close together (strong agreement) and toward the edge of the plot in one dimension (high explained variance by the principal component represented by that dimension). Figure 3 shows Tucker-1 plots for each of the four attributes, with the PCA performed on standardized scores. Distraction shows the tightest grouping, with subject 12 displaying the most difference. Subjects also showed reasonable agreement for annoyance. Ratings on balance and blend exhibited the least agreement between subjects with a wide spread of points and three subjects with less than 50% explained variance. Confusion ratings also showed some disagreement between subjects, with subjects 2, 3, 5, and 8 loading heavily onto PC1 and subjects 1, 4, 6, and 14 approximately equal onto the two components. These results support the finding in Sec. VII B 1 that of the four attributes selected, distraction is most easily understood and used in a similar manner by the subjects. 3. PCA

PCA was performed on the standardized scores, and an oblique (“promax”) rotation of the calculated components performed. An oblique rotation is appropriate where the original attributes could theoretically be related and does not require the resulting components to be orthogonal (Field, 2005). It was found that 88.6% of the variance was explained by the first principal component and 98.5% by the first two components. Figure 4 shows attribute loadings for the first two principal components (after rotation). The first principal component appears to represent annoyance and distraction, while the second principal component is related to balance and blend. Confusion is approximately equally loaded onto both components. Stimulus scores in the principal component space are also shown in Fig. 4. The scores aid interpretation of the components and give an impression of the effect of the factors on the perception of the stimuli. The stimuli are differentiated by interferer level and a combination of programs. There is a clear relationship between interferer level and PC1: As the interferer level increases, the stimuli score higher on PC1 (related to annoyance and distraction). Likewise, there is a relationship between the program combination and PC2: When the target and interferer were different types of program (i.e., music and speech), they showed low scores on PC2 (toward the complementary end of the balance and blend scale); when they were the same type of program, particularly music on music, they scored highly on PC2 (toward conflicting on the balance and blend scale). The other factor levels were plotted in a similar manner but did not show such strong trends; it was particularly notable that road noise showed no effect on stimulus scores. As confusion loads equally onto both components, it exhibits a Francombe et al.: Attribute elicitation for audio interference

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

FIG. 3. Tucker-1 correlation loading plots for each attribute. Labels show subject number; inner and outer circles show 50% and 100% explained variance respectively (Næs et al., 2010).

relationship with both the level of the interferer and the combination of program items; this could indicate that confusion encompasses both dimensions, but the subject agreement analysis presented above suggests that this is mainly due to a different interpretation of the attribute between subjects.

4. Stability of the PCA solution

It is important that attributes can be used equally well by subjects that were not involved in the elicitation procedure. To assess the performance of the subjects who were not involved in developing the attribute scales, the PCA was repeated separately for the subjects who performed all experiments and the new subjects. Bi-plots of the two PCA solutions are shown in Fig. 5. The relationship between the attributes is very similar for both groups of subjects, albeit with slightly more explained variance in PC1 for the new subjects. This suggests that subjects who did not participate in the elicitation of the scales are able to use the attributes in a similar way to those subjects that developed the scales, and that the elicitation procedure used is capable of producing useful and meaningful attribute scales. VIII. DISCUSSION OF THE EXPERIMENTAL METHOD

FIG. 4. PCA bi-plot showing attribute loadings and stimulus scores. Scores differentiated by interferer level (shade) and program combinations (symbol). J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

The four-stage methodology used in this experiment was found to be suitable for eliciting attributes in a reasonable time-frame; the four stages were performed in a total of approximately 53 h. The time commitment required of each subject was approximately 6 h (for the subjects that participated in all sessions). One suggested weakness of consensus vocabulary methods is the difference in verbal ability and/or confidence of participants, leading to some participants overpowering the group discussion (Bech and Zacharov, 2006). This effect was observed in the group sessions to some Francombe et al.: Attribute elicitation for audio interference

2639

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

FIG. 5. Bi-plots showing the PCA solution for subjects who participated in every experiment and those who did not take part in developing the attribute scales.

extent, requiring careful management from the panel leader to ensure that all subjects’ opinions were heard and considered equally. When the experienced and inexperienced subject groups were combined, it was generally felt that the experienced listeners had expressed the similar terms more concisely and clearly, suggesting that trained listeners are better at communicating their perception of audio stimuli. However, some attributes developed by inexperienced listeners were selected for the final set of 12 attributes, the reduced subset of 4 attributes, and attached as a label to 1 of the 2 principal components, suggesting that it may be imprudent to disregard inexperienced subjects for an elicitation task, at least for situations that are likely to be familiar to inexperienced listeners. This is contrary to advice in the literature (Lawless and Heymann, 1999), which suggests that only experienced participants should participate in DA experiments. Aside from their verbal skills, the inexperienced participants were found to perform at least as well as experienced listeners in the rating task. Subjects who did not participate in the early attribute elicitation experiments were found to be slightly less reliable for all four attributes on which ratings were collected; however, this effect was least pronounced for distraction, suggesting that this term is widely relevant and well understood. The PCA solution also showed that subjects not involved in the elicitation used the attributes in the same way as those subjects who had developed the scales, with a similar relationship between the attributes shown for both groups of subjects. It was interesting to note that similar attributes were found to be the most commonly used throughout the four stages. For example, the four most widely used terms from the free elicitation stage were closely related to those selected in the simple rating stage. The stimulus set for the experiment was chosen to cover a range of ecologically valid program types and therefore potential audio-on-audio interference situations. However, it would be beneficial to repeat the elicitation process using additional stimuli, perhaps focusing on particular application areas (e.g., speech targets and/or interferers, degraded quality signals, or steady-state noise interferers). It would also be of interest to vary the interferer location as the physical separation between target and interferer program could 2640

J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

potentially affect the perceptual experience. However, while it is known that an interferer may be more easily masked when reproduced from a similar direction to the target (Moore, 2008), it is not expected that this would result in a significantly different perceptual experience. IX. CONCLUSIONS

An experiment was performed in order to elicit a set of attributes that could be used to describe the perceptual experience of a listener in an audio-on-audio interference situation. The first two stages used a free elicitation and consensus vocabulary task similar to those in the ADAM method (Zacharov and Koivuniemi, 2001). The outcome of this was a set of 12 attributes that covered all facets of audio-on-audio interference situations. A simple ranking experiment was performed in order to select the most useful and relevant attribute for each stimulus combination; four attributes stood out as being most relevant: distraction, annoyance, balance and blend, and confusion. A rating experiment was performed using these attributes; dimension reduction on the ratings suggested two perceptual dimensions of audio-on-audio interference. The dimension accounting for the majority of the variance was related to the level of the interferer and was labeled distraction, as this was the attribute that was used most reliably by both groups of subjects and produced the strongest agreement between subjects. The second dimension was labeled balance and blend and related to whether the two programs were complementary or conflicting. The attributes determined provide useful directions for further research into situations featuring an audio interferer (such as those considered in Sec. I). The resulting attributes will be useful for rating experiments; to model the response of a listener in an audio-on-audio interference situation it is necessary to collect perceptual ratings, and the attribute scales described above have been shown to be relevant to such situations as well as being widely understood by participants. Data resulting from such rating experiments can be used to determine physical parameters of the sound field that correlate to the attributes and to produce models of the experience of a listener in an audio-on-audio interference situation in order to optimize systems that attempt to control or mitigate audio interference. Francombe et al.: Attribute elicitation for audio interference

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

ACKNOWLEDGMENTS

The experiment described above was carried out as part of the “Perceptually Optimized Sound Zones” (POSZ) project funded by Bang and Olufsen. The authors would like to acknowledge the members of the POSZ project team for valuable discussions; Christopher Hummersone for making available the user interface for stage four; and the listening test participants. 1

It should be noted that there is a relatively small number of subjects in each group and some overlap between the groups. However, the assumption of the chi-square test that the expected count for each category should be greater than five (Field, 2005) was met in all cases. 2 It was inappropriate for the reference to score 0 on the balance and blend attribute as the end-points (complementary and conflicting) both indicated situations where the alternate audio was audible; the midpoint of the scale was considered most appropriate.

Bech, S. (1999). “Methods for subjective evaluation of spatial characteristics of sound,” in Proceedings of the 16th Audio Engineering Society International Conference on Spatial Sound Reproduction, Rovaniemi, Finland, April 10–12, pp. 487–504. Bech, S., Hamberg, R., Nijenhuis, M., Teunissen, C., Looren de Jong, H., Houben, P., and Pramanik, S. (1996). “Rapid perceptual image description (RaPID) method,” Proc. SPIE 2657, 317–328. Bech, S., and Zacharov, N. (2006). Perceptual Audio Evaluation: Theory, Method and Application (John Wiley and Sons, Sussex), pp. 39–96. Berg, J. (2006). “How do we determine the attribute scales and questions that we should ask of subjects when evaluating spatial audio quality?,” in Proceedings of the International Workshop on Spatial Audio and Sensory Evaluation Techniques, Guildford, UK, April 6–7. Berg, J., and Rumsey, F. (1999). “Spatial attribute identification and scaling by repertory grid technique and other methods,” in Proceedings of the 16th Audio Engineering Society International Conference on Spatial Sound Reproduction, Rovaniemi, Finland, April 10–12, pp. 51–66. Berg, J., and Rumsey, F. (2006). “Identification of quality attributes of spatial audio by repertory grid technique,” J. Audio Eng. Soc. 54, 365–379. Choisel, S., Hegarty, P., Christensen, F., Pedersen, B., Ellermeier, W., Ghani, J., and Song, W. (2007). “A listening test system for automotive audio—Part 4: Comparison of attribute ratings made by expert and nonexpert listeners,” in Proceedings of the 123rd Audio Engineering Society Convention, New York, October 5–8 (Paper No. 7225). Choisel, S., and Wickelmaier, F. (2006). “Extraction of auditory features and elicitation of attributes for the assessment of multichannel reproduced sound,” J. Audio Eng. Soc. 54, 815–826. Dahl, T., Tomic, O., Wold, J., and Næs, T. (2008). “Some new tools for visualizing multiway sensory data,” Food Qual. Prefer. 19, 103–113. Druyvesteyn, W., Aarts, R., Asbury, A., Gelat, P., and Ruxton, A. (1994). “Personal sound,” Proc. Inst. Acoust. 16, 571–585. Druyvesteyn, W., and Garas, J. (1997). “Personal sound,” J. Audio Eng. Soc. 45, 685–701. Ericsson, K. A., and Simon, H. A. (1993). Protocol Analysis: Verbal Reports as Data (MIT Press, London), pp. 1–62. Field, A. (2005). Discovering Statistics Using SPSS (Sage, London), pp. 642–645. Francombe, J., Mason, R., Dewhirst, M., and Bech, S. (2012). “Determining the threshold of acceptability for an interfering audio programme,” in Proceedings of the 132nd Audio Engineering Society Convention, Budapest, Hungary, April 26–29 (Paper No. 8639). Furihata, K., Yanagisawa, T., Asano, D., and Yamamoto, K. (2007). “Development of an experimental noise annoyance meter,” Acta Acust. Acust. 93, 73–83. Gabrielsson, A., and Sjogren, H. (1979). “Perceived sound quality of soundreproducing systems,” J. Acoust. Soc. Am. 65, 1019–1033. Gower, J. C. (1975). “Generalized procrustes analysis,” Psychometrika 40, 33–51.

J. Acoust. Soc. Am., Vol. 136, No. 5, November 2014

Guastavino, C., and Katz, B. F. G. (2004). “Perceptual evaluation of multidimensional spatial audio reproduction,” J. Acoust. Soc. Am. 116, 1105–1115. Guski, R., Felscher-Suhr, U., and Schuemer, R. (1999). “The concept of noise annoyance: How international experts see it,” J. Sound Vib. 223, 513–527. Hede, A., Bullen, R., and Rose, J. (1979). “A social study of the nature of subjective reaction to aircraft noise,” Technical Report, N.A.L., No. 79, Australian Government Publishing Service, Canberra, pp. 1–29. Jones, M., and Elliott, S. J. (2008). “Personal audio with multiple dark zones,” J. Acoust. Soc. Am. 124, 3497–3506. Kim, S., and Martens, W. (2007). “Verbal elicitation and scale construction for evaluating perceptual differences between four multichannel microphone techniques,” in Proceedings of the 122nd Audio Engineering Society Convention, Vienna, Austria, May 5–8 (Paper No. 7043). Koivuniemi, K., and Zacharov, N. (2001). “Unravelling the perception of spatial sound reproduction: language development, verbal protocol analysis and listener training,” in Proceedings of the 111th Audio Engineering Society Convention, New York, November 30–December 3 (Paper No. 5424). Lawless, H. T., and Heymann, H. (1999). Sensory Evaluation of Food: Principles and Practices (Springer, New York), pp. 341–378. Le Bagousse, S., Paquier, M., and Colomes, C. (2010). “Families of sound attributes for assessment of spatial audio,” in Proceedings of the 129th Audio Engineering Society Convention, San Francisco, CA, November 4–7 (Paper No. 8306). Letowski, T. (1989). “Sound quality assessment: Concepts and criteria,” in Proceedings of the 87th Audio Engineering Society Convention, New York, October 18–21 (Paper No. 2825). Lokki, T., P€atynen, J., Kuusinen, A., and Tervo, S. (2012). “Disentangling preference ratings of concert hall acoustics using subjective sensory profiles,” J. Acoust. Soc. Am. 132, 3148–3161. Lokki, T., P€atynen, J., Kuusinen, A., Vertanen, H., and Tervo, S. (2011). “Concert hall acoustics assessment with individually elicited attributes,” J. Acoust. Soc. Am. 130, 835–849. Lorho, G. (2005a). “Evaluation of spatial enhancement systems for stereo headphone reproduction by preference and attribute rating,” in Proceedings of the 118th Audio Engineering Society Convention, Barcelona, Spain, May 28–31 (Paper No. 6514). Lorho, G. (2005b). “Individual vocabulary profiling of spatial enhancement systems for stereo headphone reproduction,” in Proceedings of the 119th Audio Engineering Society Convention, New York, October 7–10 (Paper No. 6629). Mattila, V.-V. (2001). “Descriptive analysis of speech quality in mobile communications: Descriptive language development and external preference mapping,” in Proceedings of the 111th Audio Engineering Society Convention, New York, November 30–December 3 (Paper No. 5455). Moore, B. (2008). An Introduction to the Psychology of Hearing (Emerald, Bingley), pp. 257–261. Murray, J., Delahunty, C., and Baxter, I. (2001). “Descriptive sensory analysis: Past, present and future,” Food Res. Int. 34, 461–471. Næs, T., Brockhoff, P., and Tomic´, O. (2010). Statistics for Sensory and Consumer Science (John Wiley & Sons, Sussex), pp. 11–38 and 209–226. Neher, T., Brookes, T., and Rumsey, F. (2006). “A hybrid technique for validating unidimensionality of perceived variation in a spatial auditory stimulus set,” J. Audio Eng. Soc. 54, 259–275. Pulkki, V. (2007). “Spatial sound reproduction with directional audio coding,” J. Audio Eng. Soc. 55, 503–516. Rumsey, F. (1998). “Subjective assessment of the spatial attributes of reproduced sound,” in Proceedings of the 15th Audio Engineering Society International Conference on Audio, Acoustics and Small Spaces, Copenhagen, Denmark, October 31–November 2 (Paper 15-012), pp. 122–135. Samoylenko, E., McAdams, S., and Nosulenko, V. (1996). “Systematic analysis of verbalizations produced in comparing musical timbres,” Int. J. Psych. 31, 255–278. Shaw, M. L., and Gaines, B. R. (1989). “Comparing conceptual structures: Consensus, conflict, correspondence and contrast,” Knowl. Acquis. 1, 341–363. Zacharov, N., and Koivuniemi, K. (2001). “Audio descriptive analysis and mapping of spatial sound displays,” in Proceedings of the 7th International Conference on Auditory Display, Helsinki, Finland, July 29–August 1, pp. 95–104.

Francombe et al.: Attribute elicitation for audio interference

2641

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Fri, 19 Dec 2014 15:31:12

Elicitation of attributes for the evaluation of audio-on-audio interference.

An experiment to determine the perceptual attributes of the experience of listening to a target audio program in the presence of an audio interferer w...
631KB Sizes 2 Downloads 5 Views