Journal of Experimental Psychology: Applied 2014, Vol. 20, No. 1, 55– 68

© 2013 American Psychological Association 1076-898X/14/$12.00 DOI: 10.1037/xap000007

A Sequential Sampling Account of Response Bias and Speed–Accuracy Tradeoffs in a Conflict Detection Task Anita Vuckovic, Peter J. Kwantes, Michael Humphreys, and Andrew Neal

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

The University of Queensland Signal Detection Theory (SDT; Green & Swets, 1966) is a popular tool for understanding decision making. However, it does not account for the time taken to make a decision, nor why response bias might change over time. Sequential sampling models provide a way of accounting for speed–accuracy trade-offs and response bias shifts. In this study, we test the validity of a sequential sampling model of conflict detection in a simulated air traffic control task by assessing whether two of its key parameters respond to experimental manipulations in a theoretically consistent way. Through experimental instructions, we manipulated participants’ response bias and the relative speed or accuracy of their responses. The sequential sampling model was able to replicate the trends in the conflict responses as well as response time across all conditions. Consistent with our predictions, manipulating response bias was associated primarily with changes in the model’s Criterion parameter, whereas manipulating speed– accuracy instructions was associated with changes in the Threshold parameter. The success of the model in replicating the human data suggests we can use the parameters of the model to gain an insight into the underlying response bias and speed–accuracy preferences common to dynamic decision-making tasks. Keywords: sequential sampling model, response bias, speed–accuracy, criterion, threshold

and involves a tradeoff between the goals of efficiency and safety. SDT does not explain such decision dynamics. A recent attempt has been made to address temporal dynamics of decision making through Fuzzy SDT (Parasuraman, Masalonis, & Hancock, 2000) where the probability of a signal response varies according to the quality and quantity of information gathered over time. The Fuzzy SDT method provides a way to quantify the impact of time on decision accuracy, but at present, it still does not explicitly model the accumulation of evidence over time or response time distributions. Sequential sampling models provide a way of accounting for speed–accuracy trade-offs and response bias shifts common to dynamic decision-making tasks. A natural extension to SDT (Diederich & Busemeyer, 2006; Ratcliff, 2008), sequential sampling models treat information in the environment as being continuous rather than static and provide a way to integrate response decisions alongside response times. In turn, this provides a simpler and more holistic approach to data analysis. Sequential sampling models have been used to explain data from a variety of experiments including those on memory recognition (Ratcliff, 1978), sensory detection (Smith, 1995), perceptual discrimination (Ratcliff & Rouder, 1998; Usher & McClelland, 2001), and lexical access (Kwantes & Mewhort, 1999; Ratcliff, Gomez, & McKoon, 2004). However, most evaluations of these models are based on laboratory studies that use a discrete and unchanging signal. In addition, models such as the Diffusion model were designed to explain data from simple, rapid decision tasks with response times typically under one second (Ratcliff & McKoon, 2008). Outside of the laboratory, decision tasks often have a substantially longer response window and hence do not fall within the scope of traditional Diffusion model analysis. Furthermore, although progress has been made in making sequential sampling models easier to apply (Vandekerckhove & Tuerlinckx, 2007, 2008), it is acknowl-

Since the 1960s, Signal Detection Theory (SDT; Green & Swets, 1966) has been the predominant tool for understanding decision making in applied domains. SDT’s appeal stems from its ability to extract meaning from raw data in a simple and practical way, providing insights into the difficulty of making a decision and one’s response bias. Although the theory is useful for analyzing performance in simple settings, it does not account for the time taken to make a decision, nor why a person’s response bias might change under different conditions. Air traffic control is a good example of a work domain where response time and decision criteria have important implications for performance (Durso & Manning, 2008). The goals of an air traffic controller are to keep aircraft safely separated and to ensure the efficient flow of traffic. If information about a pair of aircraft is uncertain, the controller can choose to wait until more evidence is available to make a more accurate decision. This may increase accuracy, but at the expense of time, a well-known decision phenomenon referred to as the speed–accuracy trade-off (Fitts, 1954; Pachella & Fisher, 1972; Reed, 1973). In addition, a controller may treat the same event differently depending on context. For example, as workload increases, a controller may adjust his or her criterion, applying larger safety margins to reduce the need for continual monitoring (Loft, Bolland, Humphreys & Neal, 2009). In SDT, this would appear to be a change in bias but it results from a motive to reduce workload

This article was published Online First October 14, 2013. Anita Vuckovic, Peter J. Kwantes, Michael Humphreys, and Andrew Neal, School of Psychology, The University of Queensland, St Lucia, Brisbane, Queensland, Australia. Correspondence concerning this article should be addressed to Andrew Neal, School of Psychology, The University of Queensland, St Lucia, QLD 4072, Australia. E-mail: [email protected] 55

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

56

VUCKOVIC, KWANTES, HUMPHREYS, AND NEAL

edged that application can be complex and may be a barrier to their use. There has been a more recent effort to develop simplified and more tractable models (e.g., Brown & Heathcote, 2008; Wagenmakers, van der Maas, & Grasman, 2007), though there are still few applications to the types of tasks performed by workers in military, industrial, and commercial settings. Given that sequential sampling models are now sufficiently well developed and can represent data better than traditional analytic methods, we should be able to use them in applied settings. Neal and Kwantes (2009) previously developed a sequential sampling model for air traffic control. The model was tested within a simulated conflict detection task where participants were asked to detect conflicts between pairs of aircraft. Results from their original study (2009), and those of Vuckovic, Kwantes, and Neal (2013) showed the model could accurately replicate human decisions and response times across a range of aircraft geometry and timing scenarios, providing initial evidence that the model could account for perceptual difficulty in discrimination primarily through its Noise parameter. However, models of this type require testing and evaluation across a number of studies and contexts before we can be confident they are robust and can start using them to assess performance. Specifically, early stages of model validation generally involve an extensive examination of the model’s parameters to ensure that the parameters reflect the underlying psychological processes that they are designed to represent. The systematic testing of parameters has been the strategy used by Ratcliff and colleagues in validating the Diffusion model (Voss, Rothermund, & Voss, 2004; Wagenmakers, Ratcliff, Gomez, & McKoon, 2008), and was earlier used by those working with SDT (see examples in Swets, 1964). The aim of the current study, therefore, is to validate the Neal and Kwantes (2009) model by assessing whether its previously unexamined parameters, Criterion and Threshold, respond to experimental manipulations that are designed to be a relatively pure test of the constructs they represent.

A Sequential Sampling Model of Conflict Detection in Air Traffic Control Formal models are used in many areas of psychology as a means of understanding performance. With respect to understanding speeded binary choice decisions, sequential sampling models are by far the most widely accepted and successful class of models (Laming, 1968; Link & Heath, 1975; Ratcliff, 1978; Smith & Vickers, 1989). Sequential sampling models assume that decisions are the product of a noisy, evidence accumulation process. A decision maker is said to sample information from the environment over time until a threshold is crossed and a response can be made. In this way, sequential sampling models recognize decisions are subject to noise arising from uncertainty and time pressure in the environment and in so doing, provide a principled way of integrating decision accuracy alongside response time. In the conflict detection task, the goal of the decision maker is to detect a ‘conflict,’ or an impending violation of vertical or lateral separation standards between a pair of aircraft (1000 feet and 5 nautical miles respectively). The Neal and Kwantes (2009) model assumes that the decision maker calculates evidence about a pair’s likelihood of conflict based on information about the ratio of each aircraft’s distance to a common destination or ‘crossing

point,’ and their speeds. The decision maker uses the information to calculate what is called the referent, denoted as, ⌬. The referent is calculated as follows:





⌬ ⫽ 共d⬘a va兲 ⫺ 共d⬘b vb兲

(1)

where da= is the perceived distance of aircraft a to the crossing point, va is the velocity of aircraft a, db= is the perceived distance of aircraft b to the crossing point, and vb is the velocity of aircraft b. Neal and Kwantes (2009) calculated the perceived distance of an aircraft to the crossing point as follows: d⬘ ⫽ d ⫺ (0.0000557 · d · ␪2) ⫺ (0.000944659 · d · ␪)

(2)

where ␪ is the heading of the aircraft, measured in degrees from the horizontal. Once the referent is calculated, it is compared to a criterion (C) to calculate the signal (S). C represents the controller’s safety margin. If the referent is larger than C, the evidence suggests that the separation between the aircraft is greater than that required to assure separation, and the decision maker will be biased toward a “nonconflict” decision. If the referent is smaller than C, the decision maker will be biased toward a conflict decision. The model also includes a parameter for noise (N) which represents the uncertainty, or error, associated with the estimation of the referent and varies with different information cues such as angle of convergence, time to minimum separation, relative speed of aircraft, and so forth. The error is sampled from a Gaussian distribution with a mean of 0 and a standard deviation of N. The signal (S) is, therefore, calculated as follows: S ⫽ C ⫺ (⌬ ⫹ Gauss(0, N))

(3)

The decision maker assesses the referent at each sample of the stimulus, set to one sample per second, and with each, sums the signal to generate evidence for one response or the other. Evidence (E) for a response on the current sample (t) then, is calculated as the existing evidence accumulated thus far from all t ⫺1 previous samples plus the S derived from the current sample: Et ⫽ Et⫺1 ⫹ St

(4)

Evidence, or E, is accumulated over successive samples until its magnitude reaches either a negative or positive threshold (T), leading to a “nonconflict” or “conflict” response, respectively. T reflects the decision maker’s preference for speed over accuracy, such that a larger T indicates a preference for more information or accuracy, whereas a lower T indicates a preference for a speedy decision. The number of samples it took to cross the threshold is taken as the response time. Figure 1 provides an illustration of the model’s evidence accumulation process. The aim of this study is to validate the Neal and Kwantes (2009) model through experimental manipulations that target its parameters. Neal and Kwantes (2009) and Vuckovic, Kwantes and Neal (2013) provided preliminary evidence that the model’s N parameter could account for the difficulty involved in making a judgment about a pair of aircraft based on various informational cues from the environment. However, the two other key parameters of the model, C and T, have not yet been tested. To this end, we investigate how these parameters respond to manipulations of experimental instructions that target the participants’ response bias and speed–accuracy preferences.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SEQUENTIAL SAMPLING MODEL OF BIAS AND SPEED–ACCURACY

Figure 1. An illustration of the evidence accumulation process in the Neal and Kwantes (2009) sequential sampling model. Information about the world is sampled from the environment and compared with a subjective criterion or standard, the discrepancy between which is taken as evidence toward one decision or the other (referred to as the signal). The signal is iteratively summed over successive samples until it exceeds one of the threshold boundaries and a decision can be made.

Inducing Changes in Participants’ Response Bias The term “response bias” refers to the decision maker’s subjective preference to respond in favor of a particular decision alternative. For instance, a controller may have a preference to classify a pair as a conflict rather than a nonconflict, because failing to detect a conflict can have serious consequences for all the people concerned, including the controller. Air traffic controllers, for example, can be stood down, may lose their license, or potentially even go to jail, if they miss conflicts. Bisseret (1981) showed that experienced controllers show a greater tendency for responding “conflict” as a way to satisfy safety requirements and minimize the risk of collision. Response bias can be induced experimentally in one of several ways. One method is to manipulate the relative proportion of one stimulus over another in a way that makes one stimulus more salient or frequently presented. The respondent should consequently adjust his or her response bias in favor of the more salient or frequent stimulus so as to maximize accuracy. Another method is to encourage respondents to adopt a particular response bias through experimental instructions. This may or may not involve an additional manipulation of event payoffs or consequences where a particular response is associated with a greater reward or outcome relative to another. In this study, we manipulate task instructions without manipulating payoffs. Instructional manipulations of response bias have been primarily used in studies of word recognition (Azimian-Faridani & Wilding, 2006; Hirshman & Henzler, 1998; Postma, 1999; Strack & Förster, 1995) or eyewitness identification (Köhnken & Maass, 1988; Meissner, Tredoux, Parker, & MacLin, 2005; Steblay, 1997). Overall, studies originating from the memory and eyewitness literatures have demonstrated that experimental instructions are an effective way to manipulate response bias and get the participant to respond in the desired way. Across these studies, there have been two main types of instructional manipulation. One is the leniency manipulation, which instructs participants to respond in a certain way even if they only have a vague notion of the answer, or to respond only if they are absolutely certain (Azimian-

57

Faridani & Wilding, 2006; Meissner et al., 2005; Postma, 1999). The second is a false base-rate manipulation that provides participants with false information about the likelihood of a certain event (Gardiner, Richardson-Klavehn, & Ramponi, 1997; Hirshman & Henzler, 1998; Strack & Förster, 1995). For instance, in some trials, participants are told that event A is four times as likely as event B, even though both events are equally likely. We do not know of any instructional manipulations of response bias in the sequential sampling literature. Instead, tests of sequential sampling models typically involve a manipulation of the true proportions of experimental stimuli where participants are informed about the true base rates before the start of a trial (e.g., Leite & Ratcliff, 2011; Ratcliff & McKoon, 2008). As such, participants are expected to formulate an appropriate response bias setting prior to the start of the trial that will allow them to maximize their performance. In this study, we used both false base rate information as well as a leniency instruction to encourage participants to classify a pair as a conflict, or nonconflict. Hence, we expect that the likelihood of classifying a pair of aircraft as a conflict will be higher in the conflict bias conditions compared to neutral and nonconflict bias conditions. The conflict bias conditions are expected to be associated with a higher C parameter value compared with neutral and nonconflict bias conditions.

Inducing Changes in Participants’ Speed–Accuracy Preferences Another important aspect of sequential sampling models is their ability to deal with the time-course of decision making and speed– accuracy trade-offs. The sequential sampling framework naturally accounts for this through its threshold parameter. The threshold parameter reflects the amount of evidence that the decision maker requires before making a decision, which is under strategic control. Studies of sequential sampling models have typically relied on task instructions to manipulate the participant’s preference for speed over accuracy (Wagenmakers et al., 2008; White, Ratcliff, & Starns, 2011). For instance, when speed is emphasized, participants are asked to respond as quickly as possible, and when accuracy is emphasized, participants are asked to respond as accurately as possible. Results are generally in line with this manipulation. The Neal and Kwantes (2009) model captures speed–accuracy preferences through its T parameter, with a low value indicating a preference for less evidence and speedy inaccurate decisions, and a high value indicating a preference for more evidence and a slower but more accurate decision. In this study, speed–accuracy preference was experimentally induced through instructions that emphasized the importance of speeded responses or accurate responses. Based on prior literature, we expect that a speed emphasis will be associated with quicker response times and a comparatively lower T parameter value. Conversely, an accuracy emphasis will be associated with longer response times and a comparatively higher T parameter value.

Method Participants Fifty-one undergraduate psychology students from the University of Queensland, Brisbane, took part in this study in return for

58

VUCKOVIC, KWANTES, HUMPHREYS, AND NEAL

course credit. There were 22 females and 29 males with a mean age of 19.37 years (SD ⫽ 3.30).

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Design The study employed a 4 miss distance (dmin; 1.25, 3.75, 6.25, 8.75 nm) ⫻ 2 angles of convergence (45°, 135°) ⫻ 3 response bias (conflict bias, neutral-bias, nonconflict bias) ⫻ 3 speed–accuracy (speed emphasis, neutral-speed, accuracy emphasis) withinsubjects design. Each of these 72 conditions was repeated three times for a total of 216 trials. The two dependent variables were the decisions as to whether a given pair was a conflict or nonconflict, and the time taken to make a response.

Figure 2. Experimental manipulations of miss distance (dmin) in nautical miles, and angle of convergence.

Experimental Task We examined performance on a simulated conflict detection task using an open-source ATC simulator called ATC-labAdvanced (Fothergill, Loft, & Neal, 2009; Loft, Hill, Neal, Humphreys, & Yeo, 2004). Pairs were displayed one at a time, flying straight and level toward a crossing point at the center of the screen. Each pair was either in conflict or not in conflict. A conflict was defined as a pair of aircraft whose miss distance violated the prescribed separation standard of 5 nautical miles laterally. This distance was approximately equal to 1 cm on the computer screen. To aid judgment, there was a scale marker on the left-hand side of the screen. A probe was visible for all aircraft indicating the projected position of the aircraft one minute into the future based on its current speed and heading. Each aircraft’s track was made visible through the use of route lines. Participants were asked to click on the No/Yes response box at the top right corner of their screen to indicate whether the given pair was a conflict or not. To make the task sufficiently difficult, participants were allowed a maximum of seven seconds to make their response on each pair. There was a randomized delay (between 0 and 600 ms) in the onset of each pair on screen to avoid participants’ making responses without paying attention. There was also a voluntary rest period in between trials in which participants were required to press the space bar to start the next trial.

Experimental Stimuli The stimuli consisted of eight unique pairs of aircraft, which were the result of a cross-product of four miss distances (dmin; 1.25, 3.75, 6.25, 8.75 nm) and two angles of convergence (45°, 135°). To obtain stable performance estimates, each of the eight stimuli was repeated three times (termed hereafter as the “repetition” factor) for a total of 24 trials. All pairs were set to reach their dmin after 150 seconds by starting at a certain distance away from the crossing at the beginning of each trial. Altitude and speeds were fixed with all aircraft flying at an altitude of 37,000 feet and a speed of 420 nautical mph. Those with dmin smaller than 5 nautical miles were conflicts, whereas those with dmin larger than 5 nautical miles were nonconflicts. Thus, there were equal numbers of conflict and nonconflict pairs within each block of response bias or speed–accuracy conditions. To create the dmin, we varied the relative distance of the aircraft from the crossing point at the start of the trial (see Figure 2 for an illustration of the dmin manipulation). To increase the dmin, one

aircraft started closer to the crossing and the other aircraft farther away from the crossing. To create the angle manipulation, we varied the relative distance settings of the aircraft to the crossing (see Figure 2). For instance, to produce the 135° angle, one aircraft started closer to the crossing and the other further away to make up the required dmin in exactly 150 seconds. Finally, to avoid the possibility that line orientation might influence perceived distances to crossing, we positioned the angles of convergence symmetrically about the horizontal and vertical axes. As shown in Figure 2, any one of the aircraft within each pair had an equal degree of offset from the horizontal or vertical axis as the other aircraft in the pair. The actual origin of the pair (coming from top, bottom, left, or right of screen) was randomized per trial. The order of which aircraft would reach the crossing first was randomized for each pair. The 24 stimulus trials were then repeated in blocks, with each block reflecting a particular response bias and/or speed–accuracy instructional manipulation explained below. The cross-product of the 3 response bias ⫻ 3 speed–accuracy manipulations produced 9 blocks, each with 24 trials, resulting in a total of 216 trials.

Response Bias and Speed–Accuracy Manipulations The experimental trials were presented in nine blocks, each with different response bias and/or speed–accuracy instructions. There were three types of response bias instructions: conflict bias, neutral-bias, and nonconflict bias. In the conflict bias conditions, participants were asked to be more liberal (‘Do not miss any conflicts. Only respond “nonconflict” if you are 100% certain that a pair is not a conflict’). Participants were also given false information about the likelihood of conflict for a given set of trials, which in this case was that the probability of a conflict was 75% despite the true probability being 50%. In the neutral bias conditions, participants were told the true base rate information about the block of trials—that 50% of the following pairs would be conflicts. In the nonconflict bias conditions, participants were asked to be more conservative (‘Do not make any false alarms because this adds unnecessary costs and delays to the airline and air traffic management services industry. Only respond “conflict” if you are 100% certain that a pair is a conflict), and were told that the probability of a conflict was only 25%, despite the true likelihood being 50%. These three types of response bias instructions were crossed with three types of speed–accuracy instructions: speed-emphasis,

SEQUENTIAL SAMPLING MODEL OF BIAS AND SPEED–ACCURACY

neutral-speed, and accuracy-emphasis. In the speed-emphasis conditions, participants were asked to respond as quickly as possible as response time was critical. In the neutral-speed conditions, participants were instructed to respond as quickly and as accurately as possible. In the accuracy-emphasis conditions, participants were asked to take as much time as they needed, within the maximum 7-s timeframe allowed per trial, and to carefully consider their decision.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Procedure Participants first received audiovisual instructions about the task. Then they saw eight demonstration trials to experience how conflicts and nonconflicts appear on screen. To calibrate their judgment, participants then performed 24 training trials where they were informed after each trial whether the given demonstration pair was in conflict. Participants then completed the 216 experimental trials. Feedback was not provided on the experimental trials so as not to influence how participants set their response bias or speed–accuracy preference. We reasoned that it would not be appropriate to randomize presentation of the nine experimental blocks fully as participants may have difficulty switching quickly between different bias instructions blocks. For this reason, we decided to start with the three neutral bias conditions first as this allowed the participant to maintain whatever his or her original response bias setting was upon first starting the task. Therefore, participants first received the neutral bias ⫹ neutral-speed block, followed by the neutral bias ⫹ speed-emphasis, and neutral bias ⫹ accuracy-emphasis blocks, where the order of the latter two was randomized across participants. Next, participants received all conflict bias blocks in one succession (conflict bias ⫹ neutral-speed, conflict bias ⫹ speedemphasis, and conflict bias ⫹ accuracy-emphasis), followed by all nonconflict bias blocks in another succession (nonconflict bias ⫹ neutral speed, nonconflict bias ⫹ speed-emphasis, nonconflict bias ⫹ accuracy-emphasis), the order of which was randomized across participants. Within each of these conflict and nonconflict bias blocks, the pure bias manipulation block (that is, with neutral speed instructions) always went first, followed by the two blocks containing a speed–accuracy manipulation, the order of the latter two was randomized across participants.

Results Conflict Responses Given that the dependent variable was binary (i.e., conflict or nonconflict response) and repeatedly measured over time, we analyzed the data using a hierarchical generalized linear model (HGLM) for binary outcomes in HLM (Raudenbush & Bryk, 2002). Multilevel modeling is a useful technique for assessing repeated measures data as it accounts for correlations between error terms. In this case, we calculated separate regression equations at Level 1 (within person) and at Level 2 (between person) levels, using a Bernoulli sampling distribution and logit link function. In the context of this model, the outcome variable is represented by the log-odds of a conflict response. Effect sizes were then interpreted by transforming the log-odds into odds ratios (ORs).

59

At Level 1, we entered the experimental variables (repetition, dmin, angle, bias, and threshold) as fixed effects. Orthogonal contrasts were used to test the linear and where relevant, quadratic effects. At Level 2, the model was empty meaning that no between-person variables were used to predict the within-person effects. However, the intercept was specified to have a random effect at Level 2, which allowed us to control for individual differences in the overall proportion of conflict responses. As per Raudenbush and Bryk (2002), the experimental variables were tested across a series of models. In the first step, we tested the unconditional model. In the second step, all main effects terms (linear and quadratic if relevant) were entered and tested simultaneously. In the third step, we entered and tested all two-way interactions, followed by all three-way and four-way interactions in subsequent models. Where an effect was not significant, it was dropped from subsequent model analysis. Participants’ conflict response data, averaged across repetition, are shown in Figure 3 and descriptive statistics are reported in Table 1. Results of the final statistical model for conflict response data, based on the unit-specific model with robust standard errors, are shown in Table 2. There was a main effect of repetition (OR ⫽ 1.09, 95% confidence interval [CI] ⫽ 1.03 – 1.16) such that participants were slightly more likely to classify a pair as a conflict over repeat trials of the same pair. For dmin, both the linear (OR ⫽ 0.37, CI ⫽ 0.33 – 0.42) and quadratic (OR ⫽ 1.21, CI ⫽ 1.05 ⫺1.40) terms were significant, indicating that participants were less likely to classify a pair as a conflict as dmin increased but that the rate of this decrease declined with greater dmin. There was a main effect of angle (OR ⫽ 0.23, CI ⫽ 0.18 – 0.28) indicating participants were more likely to classify acute angle pairs as conflicts and obtuse angle pairs as nonconflicts. There was a main effect of the bias manipulation (OR ⫽ 0.47, CI ⫽ 0.38 – 0.57), indicating participants were more likely to classify a pair as a conflict under conflict-bias conditions compared to neutral and nonconflict bias conditions. The speed–accuracy manipulation was excluded from the procedure at step 2, as in this model it did not have a significant effect on conflict response proportions. Also nonsignificant was the interaction between bias and speed–accuracy manipulations at step 3. However, there were two significant two-way interactions in the final model. The interaction between dmin and angle showed that the effect of dmin was convex for the 45° angle, exhibiting a trend toward conflict responses, and slightly concave for the 135° angle, exhibiting a trend toward nonconflict responses (see Figure 3). The significant interaction between bias and repetition showed participants were less likely to classify a pair as a conflict from the conflict-bias to the neutral- and nonconflict bias conditions, however there were slightly fewer conflict responses in the first repetition of the conflict bias condition. The relative values of the odds ratios for these two interactions reveal they are relatively weak effects (OR close to 1).

Response Times A separate HGLM was conducted on the response time data. However, because response time is a continuous measure, we used a normal sampling distribution and an identity link function, which

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

60

VUCKOVIC, KWANTES, HUMPHREYS, AND NEAL

Figure 3. Conflict response proportions for human and model data across experiment manipulations. Human data are represented by symbols with error bars, whereas model data are represented with solid or dashed lines. P(conflict) ⫽ probability of a conflict response; dmin ⫽ distance of minimum separation (in nautical miles); 45 ⫽ data for the 45° angle conditions; 135 ⫽ data for the 135° angle conditions.

treats the outcome variable in its raw state. Experimental effects were examined over successive models, similar to the procedure used for the conflict responses analysis. Participants’ response time data are shown in Figure 4 while descriptive statistics are shown in Table 1. The results of the final statistical model for response time data, based on the unit-specific model with robust standard errors, are shown in Table 3. Generally speaking, responses were slightly faster over successive repetitions of the same pair of aircraft and as angle of convergence increased. For dmin, the quadratic trend was significant, indicating that response times were slower for pairs of aircraft whose dmin was close to the separation standard (3.75 and 6.25 nm; M ⫽ 1789.84 and 1676.42 msec, respectively) than for pairs with extreme dmin values (1.25 and 8.75 nm; M ⫽ 1494.81 and 1573.40 msec, respectively). The response bias effect is best described by a quadratic term, indicating response times were significantly faster in the conflict bias (M ⫽ 1461.24 msec), and

the nonconflict bias (M ⫽ 1562.09 msec) as compared with neutral-bias conditions (M ⫽ 1879.13 msec). For the speed– accuracy manipulation, the linear and quadratic terms were both significant, indicating response times were faster under speed instructions (M ⫽ 1359.95 msec) as opposed to neutral (M ⫽ 1751.02 msec) and accuracy (M ⫽ 1792.20 msec) instructions. As Table 3 shows, there were also a number of significant interactions. Of focal interest is the significant bias and speed–accuracy interaction, which suggests that the effect of the bias manipulation was stronger in the accuracy-emphasis and neutral conditions than in the speed-emphasis condition.

Discussion The effects of dmin and angle were consistent with prior literature (Loft, Bolland, Humphreys, & Neal, 2009; Neal & Kwantes, 2009). As dmin increased, participants were less likely to report a

SEQUENTIAL SAMPLING MODEL OF BIAS AND SPEED–ACCURACY

Table 1 Descriptive Statistics: Means and Standard Errors for Conflict Responses and Response Times Conflict responses

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Repetition 1 2 3 dmin 1.27 nm 3.75 nm 6.25 nm 8.75 nm Angle 45° 135° Bias instruction Conflict Neutral Nonconflict Threshold instruction Speed-emphasis Neutral Accuracy-emphasis

Model Fitting In this section, we fit the Neal and Kwantes (2009) model to the data. First, for each pair of aircraft, we calculated the proportion of times the pair was classified as a conflict across all repetitions of the same pair, and across all participants. This is referred to as P(conflict). We also calculated the mean response time for each pair, across all repetitions and participants. Then, we fit the relationship between dmin and P(conflict) for each angle, under each of the nine bias ⫻ speed–accuracy conditions by adjusting the model’s free parameters such that the output produced by the model replicated the human data as closely as possible. We then examined how the model’s parameters changed in response to these experimental manipulations. Fitting the model involved minimizing the Root Mean Square error (RMSe) between the model and participants’ responses, while simultaneously maximizing the correlation between the model and participants’ response times. In our model, reaction time (RT) is calculated based on the number of samples the model requires to make a final decision (one sample is assumed to take one second but this is arbitrary). Therefore, because we did not “predict” response times, we used correlation as the measure of fit between human and model response time as this provides a simple indication of whether the trend in the model’s response time data was reflective of that in the human data. To obtain stable parameter estimates, we averaged simulation data across 10,000 independent runs of the model for each trial. The parameter search was conducted using the Simplex fitting algorithm (Press, Teukolsky, Vetterling, & Flannery, 2002) which sought to optimize the following:

Response times (msec)

Experimental variable Mean Standard error

Mean

Standard error

1.50 1.52 1.52

.01 .01 .01

1747.16 1577.40 1577.12

19.22 16.87 17.51

1.93 1.62 1.34 1.16

.01 .01 .01 .01

1494.81 1789.84 1676.42 1573.40

19.01 22.78 20.97 19.45

1.70 1.33

.01 .01

1668.09 1598.85

15.39 13.85

1.61 1.50 1.43

.01 .01 .01

1461.24 1879.13 1562.09

15.72 19.60 17.57

1.51 1.51 1.51

.01 .01 .01

1359.95 1751.02 1792.20

12.92 17.78 21.32

61

Note. dmin ⫽ distance of minimum separation; nm ⫽ nautical miles; msec ⫽ milliseconds.

pair as a conflict. Acute angle pairs were more likely to be judged as conflicts, whereas obtuse angle pairs were more likely to be judged as nonconflicts. Of focal interest were the effects of the bias and speed–accuracy manipulations, where results were generally as expected. Inducing a conflict bias produced more conflict decisions compared to neutral while inducing a nonconflict bias produced fewer conflict responses compared to neutral. Unexpectedly, responses were also faster in the extreme bias conditions compared with neutral. Finally, as expected, the manipulation of speed–accuracy preferences did not have any effects on conflict responses but did produce faster response times overall under speed instructions compared with accuracy instructions.

冉冓

min ␣

RMSe[p(conModel), p(condata)] RMSe[p(conworst), p(condata)]



⫹ (1 ⫺ ␣) ·





1 ⫺ r(RTModel, RTdata) 2

冊冔冊

(5)

where RMSe [p(conModel),p(condata)] is the Root Mean Square error of the fit between the model and the human data. The RMSe [p(conModel),p(condata)] is scaled to value between 0 and 1 by dividing it by RMSe [p(conworst),p(condata)], which is the

Table 2 Prediction of Conflict Response Proportions by Repetition, dmin, Angle, Bias and Threshold Manipulations Fixed effect

Coefficient

SE

t Ratio

df

p

OR

Intercept Repetition (L) dmin (L) dmin (Q) Angle (L) Bias (L) Bias (L) ⫻ Repetition (L) dmin (L) ⫻ Angle (L)

⫺0.01 0.09 ⫺1.00 0.19 ⫺1.49 ⫺0.76 ⫺0.12 ⫺0.14

.10 .03 .06 .07 .11 .11 .04 .04

⫺0.07 2.99 ⫺15.67 2.61 ⫺13.70 ⫺7.17 ⫺3.17 ⫺3.31

50 10899 10899 10899 10899 10899 10899 10899

.942 .003 ⬍.001 .010 ⬍.001 ⬍.001 .002 .001

0.99 1.09 0.37 1.21 0.23 0.47 0.89 0.87

Random effect

Variance component

SD

␹2

df

p

Intercept

0.27

0.52

370.62

50

⬍.001

Note. L ⫽ linear term; Q ⫽ quadratic term; dmin ⫽ distance of minimum separation; SE ⫽ standard error of the mean; SD ⫽ standard deviation; OR ⫽ odds ratio.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

62

VUCKOVIC, KWANTES, HUMPHREYS, AND NEAL

Figure 4. Response times for human and model data across experiment manipulations. Human data are represented by symbols with error bars, whereas model data are represented with solid or dashed lines. dmin ⫽ distance of minimum separation (in nautical miles); 45 ⫽ data for the 45° angle conditions; 135 ⫽ data for the 135° angle conditions.

Root Mean Square error for the worst possible model performance and human data. Worst possible model performance occurs when the model predicted a response that was incorrect with regard to the true conflict status of a pair of aircraft, that is, a predicted nonconflict response for a conflict pair (dmin 1.25 or 3.75 nm pairs), or a predicted conflict response for a nonconflict pair (dmin 6.25 or 8.75 nm pairs). The r(RTModel, RTdata) is the correlation between the model and human response times. The parameter ␣ determines the relative weight or importance placed on fitting the P(con) responses while its complement, 1 ⫺ ␣, is the weight placed on fitting response times. The alpha weighting is set by the modeler. Although the alpha value does not affect the model’s predicted value of P(con) and RT for a certain combination of C, N, and T, it does affect how long the model spends trying to improve the P(con) fit relative to the RT fit. It is possible that with a different alpha, the model search process could return a different set of “best fitting” parameters— but the definition of “best” must somehow be defined. We decided that it was more important to try to

replicate responses over response times because this is, at minimum, the purpose of our model. Therefore, we set ␣ to 0.8 to place a higher priority on finding a fit to decisions than to response times.

Modeling Results Table 4 (Model 1) presents the results of the model where free parameters were allowed to vary across all conditions, whereas Figures 3 and 4 show the fits of the model against human conflict responses and response time data, respectively. Note that the model fitting process provides us with a mean probability of a conflict response and a mean response time for each condition that may then be compared with the averaged participant data through Figures 3 and 4. As the figures show, the model appeared to capture the trends in the data very well. The best fitting parameter values are shown in Figure 5. As Figure 5 illustrates, the response bias manipulation had the expected effect on the C parameter. C was relatively higher (re-

SEQUENTIAL SAMPLING MODEL OF BIAS AND SPEED–ACCURACY

63

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Table 3 Prediction of Response Times by Repetition, dmin, Angle, Bias and Threshold Manipulations Fixed effect

Coefficient

SE

t Ratio

df

p

Intercept Repetition (L) dmin (Q) Angle (L) Bias (L) Bias (Q) Threshold (L) Threshold (Q) Bias (Q) ⫻ Threshold (L) Bias (L) ⫻ dmin (L) Bias (L) ⫻ Angle (L) Bias (Q) ⫻ Repetition (L) Threshold (L) ⫻ dmin (Q) Threshold (L) ⫻ Repetition (L) Threshold (Q) ⫻ Repetition (L) dmin (L) ⫻ Angle (L) Bias (Q) ⫻ Threshold (L) ⫻ Repetition (L) Bias (L) ⫻ dmin (Q) ⫻ Angle (L) Bias (Q) ⫻ dmin (L) ⫻ Angle (L) Threshold (L) ⫻ dmin (L) ⫻ Angle (L) Threshold (Q) ⫻ dmin (L) ⫻ Angle (L) Bias (L) ⫻ Threshold (L) ⫻ dmin (L)ⴱAngle (L) Bias (Q) ⫻ Threshold (L) ⫻ dmin (L)ⴱAngle (L) Bias (L) ⫻ dmin (L) ⫻ Angle (L) ⫻ Repetition (L)

1639.93 ⫺90.44 ⫺99.34 ⫺34.87 51.89 ⫺124.65 221.32 ⫺58.00 ⫺67.57 ⫺25.52 ⫺45.42 31.82 ⫺24.61 ⫺99.25 26.06 ⫺100.43 22.96 67.26 13.82 ⫺21.68 5.94 ⫺17.40 6.54 10.86

49.01 13.45 14.35 14.35 23.29 14.89 47.80 14.92 12.87 5.99 13.22 7.23 10.30 12.63 8.72 7.54 8.80 12.36 3.11 4.99 2.85 5.40 3.02 4.07

33.46 ⫺6.72 ⫺6.92 ⫺2.43 2.23 ⫺8.37 4.63 ⫺3.89 ⫺5.25 ⫺4.26 ⫺3.44 4.40 ⫺2.39 ⫺7.86 2.99 ⫺13.32 2.61 5.44 4.44 ⫺4.34 2.09 ⫺3.22 2.17 2.67

50 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867 10867

⬍.001 ⬍.001 ⬍.001 .015 .026 ⬍.001 ⬍.001 ⬍.001 ⬍.001 ⬍.001 .001 ⬍.001 .017 ⬍.001 .003 ⬍.001 .009 ⬍.001 ⬍.001 ⬍.001 .037 .002 .030 .008

Random effect

Variance component

SD

␹2

df

p

Intercept Level 1 error

119679.33 891433.53

345.95 944.16

1481.57

50

⬍.001

Note. L ⫽ linear term; Q ⫽ quadratic term; dmin ⫽ distance of minimum separation; SE ⫽ standard error of the mean; SD ⫽ standard deviation; OR ⫽ odds ratio.

flecting a conflict bias) in the conflict bias conditions compared to neutral and nonconflict bias conditions. As expected, the bias manipulation had no significant impact on N because N is mostly related to difficulty associated with the judgment process, which was not manipulated here. However, the bias manipulation also had an effect on the T parameter, which was not expected, such that T was relatively lower in the extreme bias conditions compared to the neutral bias condition.

The speed–accuracy manipulation worked as expected on C, which remained stable across speed–accuracy conditions. N was slightly lower in the speed-emphasis conditions compared to neutral and accuracy-emphasis conditions, though this effect is negligible. Lastly, the speed–accuracy manipulation had the expected effect on T such that T was smaller when speed was emphasized compared to neutral, and was slightly greater when accuracy was emphasized compared to neutral.

Table 4 Results of Models Fitted in the Study: Parameter Settings, Total Parameters Estimated, and Final Fit Statistics Parameter settings across experimental manipulations

Model

C

N

T

C

N

T

C

N

T

Total number of free parameters (by type)

1. 2. 3. 4. 5. 6.

0 1 1 1 1 1

0 1 1 1 1 1

0 1 1 1 1 1

0 0 0 0 0 0

0 0 1 1 1 1

0 0 1 0 0 1

0 0 0 0 1 1

0 0 0 0 1 1

0 0 0 0 0 0

54 (18C ⫹ 18N ⫹ 18T) 27 (9C ⫹ 9N ⫹ 9T) 15 (9C ⫹ 3N ⫹ 3T) 21 (9C ⫹ 3N ⫹ 9T) 13 (3C ⫹ 1N ⫹ 9T) 7 (3C ⫹ 1N ⫹ 3T)

Angle

Bias

Threshold

Fit statistics RMSe

RT r

0.0284 0.0416 0.0421 0.0417 0.0434 0.0437

0.93 0.92 0.83 0.92 0.93 0.83

Note. Model 1, as reported first in the Results section, is the ‘full model’ where all parameters were allowed to vary across all experimental manipulations. Models 2 through 6 progressively added constraints on the number of times parameters could be estimated across experimental conditions. C ⫽ Criterion parameter; N ⫽ Noise parameter; T ⫽ Threshold parameter. ‘0’ denotes the parameter was allowed to vary across levels of the given experimental variable. ‘1’ denotes the parameter was fixed across levels of the experimental variable. RMSe ⫽ Root Mean Square error between human and model conflict responses (smaller values indicate better fit); RT r ⫽ Correlation between human and model response times (higher values indicate better fit).

VUCKOVIC, KWANTES, HUMPHREYS, AND NEAL

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

64

Figure 5. Best fitting parameter estimates for (a) Criterion, (b) Noise, and (c) Threshold in Model 1. Model 1 allowed parameters to vary across all experimental manipulations including Angle, Bias and Threshold, but here we show results averaged over angle because angle was not of focal interest to this study and parameters showed no substantial differences across this manipulation.

Examining Alternative Models In the modeling results above, we showed that a model with no constraints on its parameters could fit the conflict responses as well as response time data very well and that its parameters generally responded in accord with our predictions. Thus, we can have some confidence that the parameters of the model are sensitive to the processes that they are designed to reflect, and that we can use them to infer the response bias and speed– accuracy preferences of the decision maker. However, a strong test of the model’s validity would be to show that the experimental manipulations only affect the parameters they are meant to, and as in the way predicted by theory. This approach has previously been adopted in validating other sequential sampling models such as the Diffusion model (Leite & Ratcliff, 2011; Voss et al., 2004). To examine how our experimental manipulations had behavioral effects that were yoked to the model’s parameters, we fit a range of models to the data while gradually reducing the number of parameters allowed to vary across conditions. The model results reported above were from a version of the model where all parameters were allowed to vary across experimental manipulations. This is the most flexible model possible, and should offer the best fit. We then tested a series of variations of the same model. Using the same dataset, we fit a series of “reduced” models where parameters were fixed across certain experimental conditions according to our predictions. We then compared the fit statistics for the full model with those of the reduced models to ascertain which model fit the data best with the fewest free parameters (Vandekerckhove & Tuerlinckx, 2007). The relative fit of a model can normally be evaluated using methods such as a chi-square difference test, which examines the fit index by the number of parameters estimated. How-

ever, because we used our own custom model fitting function that combined an RMSe statistic with a correlation, there is no known standard against which we could compare model fits. As such, we report and interpret our findings based on descriptive trends in our fit statistics. In Table 4, we provide the fits of the full model (Model 1), and the results of the additional five models, which were more constrained, compared to the full model (Models 2 to 6). We do not report the models that could not be fitted. As expected, Model 1 produced an RMSe of 0.0284, and a correlation for RT of .93, using 54 parameters (unique C, N, and T values across each of 2 angle ⫻ 3 bias ⫻ 3 speed–accuracy levels). In Model 2, all three parameters were fixed across angle, but allowed to vary across levels of the response bias and speed–accuracy conditions. The RMSe for this model increased by .02 compared with Model 1, but had a similar correlation for RT of .92. In summary, the fits of the model against the human conflict response data indicated that Model 2 offered a very good fit, and importantly, still did very well to fit the response time data. Keeping all parameters fixed across angle, in Model 3, we fixed N and T across the response bias manipulation. This produced a similar fit to the conflict response data compared to Model 2, but a relatively poor fit to response time data. Given that the statistical results suggested response times were faster in extreme bias conditions compared to neutral, we reasoned that T would need to be allowed to vary in response to the bias manipulations in order to accommodate the data. Therefore, in Model 4, we allowed T to vary with Bias, which improved the RT correlation to a level similar to Model 2. This indicates that allowing both C and T to vary with the bias manipulation is crucial in reproducing trends in the human data.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SEQUENTIAL SAMPLING MODEL OF BIAS AND SPEED–ACCURACY

In Model 5, we examined whether C and N could also be fixed across the speed–accuracy manipulation. The parameters for Model 5 are shown in Fig. 6. This model produced good fit to both conflict responses and response time data using only 13 parameters, suggesting that only T needed to vary to account for the effect of the speed–accuracy manipulation. Model 6 is the hypothesized model, where the C parameter is only allowed to vary with the response bias manipulation, and T is only allowed to vary with the speed–accuracy manipulation. This model offers a competitive RMSe but a poor RT correlation. Based on these results, the model with the best fit for the fewest parameters is Model 5, which generally supports our predictions about the effects of response bias on C, and the effects of speed–accuracy manipulations on T.

Conclusions The aim of this study was to validate Neal and Kwantes’ (2009) sequential sampling model of conflict detection. Specifically, we wanted to examine whether the model’s previously untested Criterion and Threshold parameters would respond to experimental manipulations in a way that is consistent with the psychological constructs they were designed to represent. In the experimental task, participants observed pairs of aircraft flying toward a crossing point in the center of the screen and had to decide whether each pair was a “conflict” (would breach prescribed separation standards at some point during their flight) or a “nonconflict” (would remain safely separated throughout the course of their flight). Through experimental instructions, we manipulated participants’ response bias by asking them to favor the classification of aircraft pairs as either conflicts, or nonconflicts. We also manipulated participants’ speed–accuracy preferences by encouraging them to make speedy or accurate responses.

65

The main effects of our manipulations were generally in line with our predictions. Inducing a conflict bias produced relatively more conflict decisions compared to neutral, while inducing a nonconflict bias produced relatively more nonconflict decisions compared to neutral. Consistent with our predictions, the model accommodated the response bias effect primarily through changes in the criterion parameter, C. Therefore, the behavior of the criterion parameter across the response bias manipulation was theoretically consistent with the psychological process it was designed to represent and validates the use of this parameter as an index of the decision maker’s decision rule. With respect to our speed– accuracy manipulation, inducing a speed emphasis produced relatively faster responses compared to neutral, while inducing an accuracy emphasis produced relatively slower response times compared to neutral. Again, consistent with our predictions, the model accommodated this through the threshold parameter, T, which gives us confidence that changes in the parameter reflect the speed–accuracy preference of the decision maker. While most of the experimental effects were consistent with our predictions, the response bias manipulation had a modest, but unexpected effect on response time such that response times were faster in the extreme bias conditions compared with neutral bias conditions. It may be that under biased instructions, participants responded by reducing nondecision time. Nondecision time is a component of overall decision time (Ratcliff & Smith, 2004), and represents the time associated with processes not directly related to making a decision. For example, in the task used in this study, time must be spent shifting gaze back and forth between the aircraft to estimate differences in arrival times, time is spent using the mouse to commit the response, and time is possibly also spent checking response calculations before committing to the final decision. It is

Figure 6. Best fitting parameter estimates for (a) Criterion, (b) Noise, and (c) Threshold in Model 5. In Model 5, the Criterion parameter was only allowed to vary across levels of the Bias manipulation, the Noise parameter was fixed across all manipulations, and the Threshold parameter was only allowed to vary across levels of the Bias and Threshold manipulations.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

66

VUCKOVIC, KWANTES, HUMPHREYS, AND NEAL

possible that biased instructions offer a motive for participants to reduce nondecision time, resulting in a faster response. In our model, this influence of nondecision time could only be accommodated through the Threshold parameter. Therefore, it is not that the parameters of our model are not behaving as they should, rather the opposite; our parameters acted appropriately and in a way that was required and consistent with the observed statistical trends in the data. In future experiments, we could try to control a possible reduction in nondecision time when switching between neutral to extreme bias conditions, by asking participants to try to keep their speed constant. Alternatively, we could provide additional training before the trials to familiarize participants with the likely positions of the aircraft so that they can calibrate their focus, or we could train participants on an appropriate visual strategy to use to process perceptual information in a systematic way. We could also try to add another parameter for nondecision time to try to capture this effect and examine this through the above-mentioned experimental manipulations designed to affect this parameter. Aside from exploring how to control the impact of nondecision time on final latency, we could also try to improve our manipulation of response bias to target the Criterion parameter only, without effecting Threshold. For example, we could eliminate the false information about probability of occurrence or manipulate bias by manipulating the actual frequency of occurrence, as has been done with other sequential sampling models. In sum, our empirical findings were generally consistent with what is normally found in tests of sequential sampling models involving manipulations of response bias or speed–accuracy tradeoffs (Ratcliff & McKoon, 2008). These effects were accommodated by the model’s Criterion and Threshold parameters, which behaved in theoretically consistent ways to accommodate the participant data. Thus we can have confidence that the parameters of the model are sensitive to the psychological processes that they are designed to reflect, and that changes in the parameters of the model can therefore be used to infer the underlying psychological processes involved in this decision-making task. An important point however, is that in this study, our manipulation of response bias was based on task instructions only, and not on actual stimulus frequencies or event payoffs. It would be interesting to explore, through further research, whether such alternative manipulations would produce similar response effects and whether these would be accommodated equally well by our model. We expect so, but it is possible that some manipulations could produce relatively stronger effects compared with others, such as when stimulus frequencies are used in conjunction with payoffs and instructions. Validating the model in this way would provide us with further confidence of its ability to capture trends across multiple task settings. We now move to discuss the implications of this work and the practical value of our model. The sequential sampling model examined in this study is unique in several regards and provides a valuable contribution to understanding decision-making performance in applied settings. First, by using only three free parameters, our model offers a conceptually simpler reconstruction of human response trends compared with the benchmark sequential sampling model, Diffusion model. The Diffusion model is defined by seven parameters (Ratcliff et al., 2004). To account for response bias effects, the Diffusion model requires a change in its starting point parameter, or its drift rate parameter, or both (depending on the exact performance

observed). Ours requires a change in the Criterion parameter only. To address speed–accuracy effects, the Diffusion model requires a change in the upper response boundary, and possibly the nondecision time parameter. Ours requires changes in the Threshold parameter only. Finally, to account for the effects of dmin and angle, the Diffusion model would require a separate drift rate for each condition. Our model did not require any changes to handle the various stimuli used in the task. Though having fewer parameters facilitates our model’s application, the downside is that the model cannot account for complex phenomena such as the relative speed of correct versus incorrect decisions, as can the Diffusion model. However, the Diffusion model is limited to data from simple and rapid response time tasks, so the argument may be moot. Conversely, the objective of the present modeling work is to summarize data from a complex and dynamic decision-making task where responses occur over a course of several seconds. As such, the aim of the exercise is not to predict the finer intricacies of response time distributions, but to provide a means to analyze decision-making performance in a simple and easy to understand fashion. This study, in complement to Neal and Kwantes (2009), and Vuckovic, Kwantes, and Neal (2013), showed that the current model is sufficient in reproducing trends in the data at a group level. Second, the model offers a simpler approach to analyzing data compared with traditional statistical methods and one that is more comprehensive than SDT. Using statistical procedures, the data can only be analyzed using separate analyses of decisions and response times. As shown in the results section, this can often result in a number of complex three- or four-way interactions that are difficult to understand and integrate across two independent sets of analyses. SDT offers an appealing alternative to extract meaning from raw data using simple formulas, but it does not provide a way to deal with response time data. Sequential sampling models like ours provide a way to integrate decisions alongside response times and in turn reduce some of the complexity in data analysis by providing a holistic approach to interpretation (Donkin, Averell, Brown, & Heathcote, 2009; White, Ratcliff, Vasey, & McKoon, 2010). That the model was able to replicate the trends in human data across all conditions suggests the model can be used as a viable alternative to traditional analytic methods. Third, this model extends on the traditional scope of application for sequential sampling models. Historically, sequential sampling models have only been used to account for performance in response time paradigms where decisions are made in the order of milliseconds. This study shows that the current model can adequately account for performance in more natural and dynamic environment where decisions occur over seconds and minutes rather than milliseconds. To this end, a model like ours can be used as a measurement or diagnostic tool, like Signal Detection Theory (Green & Swets, 1966), to identify conditions where decision support or training may be required. Alternatively, we could use a model like ours to evaluate the effects of new technologies, processes, or procedures on human performance. Furthermore, we can incorporate our model into larger computational architectures to make them more flexible and realistic. Computational models are increasingly being used for evaluating new design concepts and systems, in both military and civilian settings (Corker, Gore, Fleming, & Lane, 2000; Remington, Lee, Ravinder, & Matessa, 2010). Lastly, given enough data and testing, we could establish

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SEQUENTIAL SAMPLING MODEL OF BIAS AND SPEED–ACCURACY

stable parameter values that hold across multiple task scenarios and settings, which could themselves be used to predict human performance under novel conditions. Fourth, although other sequential sampling models stipulate that decisions are the product of an evidence accumulation process, they are not specific about the source of the evidence being used. Most estimate a parameter such as drift rate to account for the direction and strength of the evidence accumulation process, but although some modelers have attempted to map drift rates onto stimulus properties (Ratcliff, Van Zandt, & McKoon, 1999), they do not provide an explanation of the psychological process that led to the drift rate value. Instead, drift rate is interpreted in a relative way by comparing it across different conditions. Conversely, Neal and Kwantes’ model uses the referent to explain the source and calculation of information about the world that constitutes as evidence in the accumulation process. By estimating the referent directly and then relating it to the free parameters in the model, we can understand how the perceptual, information processing aspect of the task drives the strategic decision-making process.

Limitations There are a number of potential limitations of this study. First, our task was simplified relative to real air traffic control. Complexity was reduced by controlling the geography and timing of aircraft pairs (e.g., angles, altitudes, and speeds were constant throughout the scenarios). Furthermore, only one pair was shown on screen at once. In real air traffic control, when more than two aircraft are present, each aircraft can be a member of more than one pair, which complicates the conflict detection process. A future direction of the work would be to extend our modeling to more applied contexts by examining how the expert controller integrates various sources of information (angle, speed, altitude) from the display into a representation of a referent, as well as how the expert prioritizes aircraft pairs in his or her sector. Because we used a constrained laboratory task and naïve participants, we are limited in the extent to which the empirical findings generalize to the field. That said, like most behavioral science researchers, we are more interested in theoretical generalization than statistical generalization (Highhouse & Gillespie, 2009). The purpose of this study was to examine a model of decision making involved in a complex and dynamic decision-making task. It is our view that models like ours have the potential to be used more broadly to analyze performance in dynamic tasks and could be of great benefit to assessing the performance of expert operators. Validating the model in a controlled environment, such as that used in this study, is an important first step toward realizing this goal. Another limitation is that the modeling was performed at the aggregate level. Although best practice recommends models be fit to the individual, this requires a large number of observations, generally one hundred observations or more, for each condition. Although this may be easier for simple laboratory tasks, such as lexical decision or recognition memory, it is not practical for more complex tasks, because of the time required to present and respond to each item. We wanted to validate the model at the aggregate level before undertaking the more arduous task of validating the model on an individual level.

67

References Azimian-Faridani, N., & Wilding, E. L. (2006). The influence of criterion shifts on electrophysiological correlates of recognition memory. Journal of Cognitive Neuroscience, 18, 1075–1086. doi:10.1162/jocn.2006.18.7 .1075 Bisseret, A. (1981). Application of signal detection theory to decision making in supervisory control: The effect of the operator’s experience. Ergonomics, 24, 81–94. doi:10.1080/00140138108924833 Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57, 153–178. doi:10.1016/j.cogpsych.2007.12.002 Corker, K., Gore, B. F., Fleming, K., & Lane, J. (2000). Free flight and the context of control: Experiments and modeling to determine the impact of distributed air-ground air traffic management on safety and procedures. Paper presented at the 3rd USA-Europe Air Traffic Management R&D Seminar, Naples, Italy. Diederich, A., & Busemeyer, J. R. (2006). Modeling the effects of payoff on response bias in a perceptual discrimination task: Bound-change, drift-rate-change, or two-stage-processing hypothesis. Perception & Psychophysics, 68, 194 –207. doi:10.3758/BF03193669 Donkin, C., Averell, L., Brown, S., & Heathcote, A. (2009). Getting more from accuracy and response time data: Methods for fitting the linear ballistic accumulator. Behavior Research Methods, Instruments & Computers, 41, 1095–1110. doi:10.3758/BRM.41.4.1095 Durso, F. T., & Manning, C. A. (2008). Air traffic control. Reviews of Human Factors and Ergonomics, 4, 195–244. doi:10.1518/ 155723408X342853 Fitts, P. M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 47, 381–391. doi:10.1037/h0055392 Fothergill, S., Loft, S., & Neal, A. (2009). ATC-labAdvanced: An air traffic control simulator with realism and control. Behavior Research Methods, 41, 118 –127. doi:10.3758/BRM.41.1.118 Gardiner, J. M., Richardson-Klavehn, A., & Ramponi, C. (1997). On reporting recollective experiences and “direct access to memory systems”. Psychological Science, 8, 391–394. doi:10.1111/j.1467-9280 .1997.tb00431.x Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York, NY: Wiley. Highhouse, S., & Gillespie, J. (2009). Do samples really matter that much? In C. Lance & R. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 247–265). New York, NY: Routledge. Hirshman, E., & Henzler, A. (1998). The role of decision processes in conscious recollection. Psychological Science, 9, 61– 65. doi:10.1111/ 1467-9280.00011 Köhnken, G., & Maass, A. (1988). Eyewitness testimony: False alarms on biased instructions? Journal of Applied Psychology, 73, 363–370. doi: 10.1037/0021-9010.73.3.363 Kwantes, P. J., & Mewhort, D. J. K. (1999). Modeling lexical decision and word naming as a retrieval process. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 53, 306 – 315. doi:10.1037/h0087318 Laming, D. (1968). Information theory of choice reaction time. New York, NY: Wiley. Leite, F. P., & Ratcliff, R. (2011). What cognitive processes drive response biases? A diffusion model analysis. Judgment and Decision Making, 6, 651– 687. Link, S., & Heath, R. (1975). A sequential theory of psychological discrimination. Psychometrika, 40, 77–105. doi:10.1007/BF02291481 Loft, S., Bolland, S., Humphreys, M., & Neal, A. (2009). A theory and model of conflict detection in air traffic control: Incorporating environmental constraints. Journal of Experimental Psychology: Applied, 15, 106 –124. doi:10.1037/a0016118

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

68

VUCKOVIC, KWANTES, HUMPHREYS, AND NEAL

Loft, S., Hill, A., Neal, A., Humphreys, M., & Yeo, G. (2004). ATC-lab: An air traffic control simulator for the laboratory. Behavior Research Methods, Instruments & Computers, 36, 331–338. doi:10.3758/ BF03195579 Meissner, C. A., Tredoux, C. G., Parker, J. F., & MacLin, O. H. (2005). Eyewitness decisions in simultaneous and sequential lineups: A dualprocess signal detection theory analysis. Memory & Cognition, 33, 783–792. doi:10.3758/BF03193074 Neal, A., & Kwantes, P. J. (2009). An evidence accumulation model for conflict detection performance in a simulated air traffic control task. Human Factors, 51, 164 –180. doi:10.1177/0018720809335071 Pachella, R. G., & Fisher, D. F. (1972). Hick’s law and the speed–accuracy tradeoff in absolute judgments. Journal of Experimental Psychology, 92, 378 –384. doi:10.1037/h0032369 Parasuraman, R., Masalonis, A. J., & Hancock, P. A. (2000). Fuzzy signal detection theory: Basic postulates and formulas for analyzing human and machine performance. Human Factors, 42, 636 – 659. doi:10.1518/ 001872000779697980 Postma, A. (1999). The influence of decision criteria upon remembering and knowing in recognition memory. Acta Psychologica, 103, 65–76. doi:10.1016/S0001-6918(99)00032-3 Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2002). Numerical recipes in C: The art of scientific computing (2nd ed.). New York, NY: Cambridge University Press. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59 –108. doi:10.1037/0033-295X.85.2.59 Ratcliff, R. (2008). The EZ diffusion method: Too EZ? Psychonomic Bulletin & Review, 15, 1218 –1228. doi:10.3758/PBR.15.6.1218 Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model account of the lexical decision task. Psychological Review, 111, 159 –182. doi: 10.1037/0033-295X.111.1.159 Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873– 922. doi:10.1162/neco.2008.12-06-420 Ratcliff, R., & Rouder, J. N. (1998). Modeling response times for twochoice decisions. Psychological Science, 9, 347–356. doi:10.1111/14679280.00067 Ratcliff, R., & Smith, P. L. (2004). A comparison of sequential sampling models for two-choice reaction time. Psychological Review, 111, 333– 367. doi:10.1037/0033-295X.111.2.333 Ratcliff, R., Van Zandt, T., & McKoon, G. (1999). Connectionist and diffusion models of reaction time. Psychological Review, 106, 261–300. doi:10.1037/0033-295X.106.2.261 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). San Francisco, CA: Sage. Reed, A. V. (1973). Speed–accuracy trade-off in recognition memory. Science, 181, 574 –576. doi:10.1126/science.181.4099.574 Remington, R., Lee, S. M., Ravinder, U., & Matessa, M. (2010). Observations on human performance in air traffic control operations: Pre-

liminaries to a cognitive model. Paper presented at the Behavior Representation in Modeling and Simulation (BRIMS), Arlington, VA, USA. Smith, P. L. (1995). Psychophysically principled models of visual simple reaction time. Psychological Review, 102, 567–593. doi:10.1037/0033295X.102.3.567 Smith, P. L., & Vickers, D. (1989). Modeling evidence accumulation with partial loss in expanded judgment. Journal of Experimental Psychology: Human Perception and Performance, 15, 797– 815. doi:10.1037/00961523.15.4.797 Steblay, N. M. (1997). Social influence in eyewitness recall: A metaanalytic review of lineup instruction effects. Law and Human Behavior, 21, 283–297. doi:10.1023/A:1024890732059 Strack, F., & Förster, J. (1995). Reporting recollective experiences: Direct access to memory systems? Psychological Science, 6, 352–358. doi: 10.1111/j.1467-9280.1995.tb00525.x Swets, J. A. (1964). Signal detection and recognition by human observers. Los Altos, CA: Peninsula Publishing. Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550 –592. doi:10.1037/0033-295X.108.3.550 Vandekerckhove, J., & Tuerlinckx, F. (2007). Fitting the Ratcliff diffusion model to experimental data. Psychonomic Bulletin & Review, 14, 1011– 1026. doi:10.3758/BF03193087 Vandekerckhove, J., & Tuerlinckx, F. (2008). Diffusion model analysis with MATLAB: A DMAT primer. Behavior Research Methods, 40, 61–72. doi:10.3758/BRM.40.1.61 Voss, A., Rothermund, K., & Voss, J. (2004). Interpreting the parameters of the diffusion model: An empirical validation. Memory & Cognition, 32, 1206 –1220. doi:10.3758/BF03196893 Vuckovic, A., Kwantes, P., & Neal, A. (2013). Adaptive decision making in a dynamic environment: A test of a sequential sampling model of relative judgment. Manuscript submitted for publication. Wagenmakers, E. J., Ratcliff, R., Gomez, P., & McKoon, G. (2008). A diffusion model account of criterion shifts in the lexical decision task. Journal of Memory and Language, 58, 140 –159. doi:10.1016/j.jml.2007 .04.006 Wagenmakers, E. J., van der Maas, H. L. J., & Grasman, R. P. P. P. (2007). An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14, 3–22. doi:10.3758/BF03194023 White, C. N., Ratcliff, R., & Starns, J. J. (2011). Diffusion models of the flanker task: Discrete versus gradual attentional selection. Cognitive Psychology, 63, 210 –238. doi:10.1016/j.cogpsych.2011.08.001 White, C. N., Ratcliff, R., Vasey, M. W., & McKoon, G. (2010). Using diffusion models to understand clinical disorders. Journal of Mathematical Psychology, 54, 39 –52. doi:10.1016/j.jmp.2010.01.004

Received April 16, 2013 Revision received June 23, 2013 Accepted July 2, 2013 䡲

A sequential sampling account of response bias and speed-accuracy tradeoffs in a conflict detection task.

Signal Detection Theory (SDT; Green & Swets, 1966) is a popular tool for understanding decision making. However, it does not account for the time take...
797KB Sizes 0 Downloads 0 Views