Journal of Experimental Psychology: Applied 2016, Vol. 22, No. 1, 107–123

© 2016 American Psychological Association 1076-898X/16/$12.00 http://dx.doi.org/10.1037/xap0000075

Vigilance in a Dynamic Environment Eric J. Stearman and Francis T. Durso

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Georgia Institute of Technology Advances in technology have led to increasing levels of automation in modern work environments, moving people to the position of a passive monitor. When persons are in passive monitoring states, they are often subject to overall deficits in performance that become worse as time on task increases (i.e., vigilance decrements). Although many factors have been shown to influence whether or not a vigilance decrement will occur in a monitoring task (e.g., event rate), it is not clear how laboratory experiments translate to operational environments (Hancock, 2013). Four experiments were conducted that examined the effects of signal rate, event rate, cognitive load, training, and the presence of a dual task on performance during an air traffic control (ATC) automation failure detection task. Both failure detection and detection time were analyzed. Results from a meta-analysis revealed that cognitive load placed on participants through the use of task-relevant complex instructions produced a reliable vigilance decrement. However, other types of cognitive load did not produce any reliable vigilance decrements. The relationship of the cognitive load to the vigilance task may be an important factor in determining if the cognitive load will produce a vigilance decrement in a dynamic operational environment like air traffic control. Keywords: vigilance, event rate, signal rate, dual task, cognitive load, ATC

clock’s second hand for increased distance in hand movement. Normally, the clock’s second hand would move one second at a time making equal distant movements. However, occasionally the clock’s second hand would move two seconds at one time. When this happened, participants were told to indicate to the experimenter that they had detected the larger movement. The task lasted for 2 hours. Mackworth found a large decrease (10 –15%) in performance after the first 30 min of the task, suggesting that vigilance decrements can occur early in a vigil. As research on the effects of monitoring on performance continued, it was noticed that although some monitoring tasks showed a decrease in performance with time on task, in other monitoring tasks people were able to maintain performance without ever showing a vigilance decrement. Thus, one goal of vigilance research became to identify the differences between the tasks that were likely to show a vigilance decrement and the tasks that were not. In this spirit, Parasuraman (1986) reviewed 42 different monitoring tasks and classified these tasks along two dimensions: event rate and signal type. Event rate refers to the rate of presentation of background signals. Parasuraman (1986) classified event rates as high (24 or more background signals per minute) or low (less than 24 background signals per minute). Signal types were classified as either successive (holding a standard in memory) or simultaneous (the standard is present in the display). Parasuraman (1979) viewed the difference between successive and simultaneous signal types as whether a memory load is present (successive signal type) or not (simultaneous signal type). Parasuraman (1986) found that, for the 42 tasks examined, a vigilance decrement was only present when both a high event rate and successive signal types were present. However, in more than 25% of tasks that involved both a high event rate and successive signal type, no vigilance decrement was observed. Thus, event rate and signal type were only part of the story. See, Howe, Warm, and Dember (1995) further examined the

As we move into the future, research on vigilance, an operator’s ability to sustain attention over time, is becoming more important. Advances in technology have allowed many environments to use increasing levels of automation. The increased levels of automation have the potential to change the role of the human in these environments from one of active participant to that of a passive monitor. When a person is required to monitor an environment with little to no interaction, the person may be susceptible to what is commonly referred to as a vigilance decrement (Parasuraman, 1986). A vigilance decrement occurs when a person’s performance during a vigil decreases as the time spent in the vigil increases. For example, during World War II, military radar operators were better at detecting enemy targets at the beginning of their shift than at the end of their shift (Mackworth, 1948). Thus, the performance of the radar operators for monitoring the radar display for enemy targets decreased as the time spent monitoring the radar display increased. Research on the effects of prolonged monitoring tasks on performance began over 60 years ago when Mackworth (1948) attempted to understand how long a person could monitor an environment before performance began to decrease. Mackworth developed a task that required participants to watch a blank-faced

This article was published Online First February 4, 2016. Eric J. Stearman and Francis T. Durso, School of Psychology, Georgia Institute of Technology. The research in this report was supported by the FAA AJP 61 through Grant 42066N7 awarded to the second author. We thank Vlad Pop, Sadaf Kazi, Jerry Crutchfield, and Barbara Wilper for their contributions to this research. Correspondence concerning this article should be addressed to Eric J. Stearman or Frank Durso at School of Psychology, Georgia Institute of Technology, 654 Cherry Street, Atlanta, GA 30332. E-mail: [email protected]; [email protected] 107

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

108

STEARMAN AND DURSO

event rate by signal type classification by examining its interaction with type of stimuli. Stimuli were considered either sensory or cognitive in nature. For sensory stimuli, critical signals were predesignated changes in physical characteristics of the stimuli, whereas for cognitive stimuli, critical signals used symbolic or alphanumeric changes. See et al. found that for successive signal type tasks with a high event rate, the size of the vigilance decrement was greater for sensory tasks than for cognitive tasks. However, for simultaneous signal type tasks with a high event rate, the size of the vigilance decrement was greater for cognitive tasks than for sensory tasks. Another factor to consider is signal rate, the proportion of signals to events; whereas some reports have suggested that when signals are rare (signal rates are low), vigilance decrements are more likely to occur (e.g., Warm & Jerison, 1984), other reports (See et al., 1995) have suggested that signal rate does not play a key role in presence of a vigilance decrement. Warm and Jerison (1984) point out that when signal rate and event rate are examined together, event rate is the dominant effect. Other factors that make it more likely a vigilance decrement will be present include low discriminability (Parasuraman & Mouloua, 1987), temporal and spatial uncertainty (Adams & Boulter, 1964), spatial working memory load (Caggiano & Parasuraman, 2004; Helton & Russell, 2011), verbal working memory load (Helton & Russell, 2011), and the presence of a secondary task unrelated to the vigilance task (McBride, Merullo, Johnson, Banderet, & Robinson, 2007). (The presence of a secondary task related to the vigilance task may actually aid in performance; McBride et al., 2007.) The previously mentioned factors affecting performance during a vigilance task are by far not exhaustive. Mackie (1987) identifies over 30 factors claimed to influence performance during vigilance tasks and acknowledges there are even more contributing factors that are not mentioned in the report. In an effort to explain why the above factors play the role they do in obtaining a vigilance decrement, many theories have been proposed. Some of these theories include inhibition theory (Mackworth, 1950), expectancy theory (Deese, 1955), arousal theory (Hebb, 1955), and motivation theory (Smith, 1966). See Davies and Parasuraman (1981) for a review. Inhibition theory posits that the vigilance decrement results from a buildup of inhibition to respond when neutral events are presented, so as time on task increases, inhibition to respond increases, causing a failure to respond to signals. Expectancy theory posits that as time on task increases, the probability of the signals causes operators to shift their criterion to identify whether a signal is present to match the signal probability. Therefore, operators will be less likely to say a signal is present when the likelihood of a signal being present is rare. Arousal theory posits that vigilance tasks are monotonous and as time on task increases, arousal decreases and operators are less likely to detect a signal when arousal levels are low. Therefore, as time on task increases, arousal and performance decrease. Motivation theory posits that operators with high intrinsic motivation can maintain performance over time. Therefore, a vigilance decrement would only occur in situations where extrinsic motivation is higher than intrinsic motivation. However, the dominant theory of today is resource theory (Kahneman, 1973; Warm, Parasuraman, & Matthews, 2008), which posits that vigilance tasks are mentally demanding and explains the vigilance decrement as oc-

curring because cognitive resources are being used faster than they can be replenished resulting in decreases in performance as time on task increases. The question of when vigilance decrements occur and when they do not remains vital today (Hancock, 2013) in part because we need greater clarity if we hope to apply findings from the laboratory to modern tasks requiring sustained attention. Elliott (1960) pointed out over 50 years ago that in operational environments, there is often no performance decrement over a 2-hour vigil. At times, a vigilance decrement has been observed in operational environments. Wiggins (2011) found pilots showed a significant vigilance decrement during a simulated flight. However, when Thackray and colleagues (Thackray & Touchstone, 1989; Thackray, Bailey, & Touchstone, 1977) asked college students to monitor for failures in system reports of altitude, performance was as good at the end of 2 hours as it was at the beginning. Hancock (2013) suggests that vigilance decrements are iatrogenically created. That is to say that vigilance decrements are created by design either intentionally or unintentionally. Hancock argues that it is often the case when vigilance decrements are found that the display does not contain a standard for comparison, making the operator have to maintain the standard in memory creating a cognitive load; operators are often not provided with feedback or knowledge of results; environments are often monotonous, boring, and repetitive in nature; and operators are often isolated during the task with no social contact. In operational environments, it is often not the case that all of these factors are present. Others have also noted differences between laboratory tasks and the operational task to which we hope to apply the results. For example, Elliott (1960) suspected that rare events in a laboratory are actually quite high compared to rates in operational environments. The current report focuses on how these factors will affect overall performance and performance with time on task in the dynamic simulated environment of air traffic control (ATC). Vigilance has been studied occasionally using air traffic control (ATC)-like systems (e.g., Funke et al., 2010), or in other dynamic environments such as improvised explosive device detection (Teo, Szalma, Schmidt, Hancock, & Hancock, 2012), although whether or not a vigilance decrement is observed is not straightforward. For example, as we mentioned, Thackray and Touchstone (1989) showed no vigilance decrement when participants had to note when the altitude was replaced by a string of XXX. Other research on vigilance in air traffic control (Hitchcock et al., 2003) examined automation cueing and signal salience in a simulated ATC task that comprised four 10-min blocks. Hitchcock et al. found that when no cue was present, correct detection of the target (two aircraft on a collision path) decreased by over 20% across the four blocks, thus providing evidence of a vigilance decrement. Research on vigilance in ATC environments is becoming more important today. The Federal Aviation Administration expects a 2–3-fold increase in passenger and cargo air traffic over the next 20 –30 years (FAA, 2009). In response to the expected increase, the Joint Planning and Development Office (JPDO) has proposed a next generation air transportation system (NextGen; JPDO, 2010). The proposed NextGen environment will move air traffic controllers (ATCos) to the position of monitoring the airspace for potential automation failures rather than controlling aircraft in the manner performed currently. Prophetically, Thackray’s motivation

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

VIGILANCE IN A DYNAMIC ENVIRONMENT

over 35 years ago was the concern that increased automation would increase monitoring and therefore the vigilance decrement. Although others have looked at vigilance in simulated operational environments, there are few studies in which simulators with continuous movement have been used (e.g., Teo et al., 2012). For example, in today’s ATC environment, aircraft are continuously updated (i.e., move continuously, rather than discretely). The studies reported in the current report use NextSim, an ATC simulator that captures aspects of ATC including accurate physics, continuous updating, and many anticipated NextGen features. Thus, the future may vary from situations in which the controller is only passively involved, to classic situations in which the controller directs the traffic, to a situation in which the controller may be asked to do both. Because of the anticipated increase in air traffic, we designed the environment to contain a large enough number of aircraft to qualify the environment as having a high event rate (24 aircraft in the airspace at a time). Although this is not a high event rate in the classical sense (24 events per minute), having 24 aircraft in one controller’s airspace at one time would be considered high in an ATC setting. Having 24 aircraft present at one time not only increases the event rate, but decreases discriminability as well. See Figure 1 for an example of the airspace participants had to monitor. The task of detecting a failure in an aircraft’s automation can also be considered on successive versus simultaneous dimension. Although ATCos could compare indicators of a failure (hollow aircraft indicator) with indicators of working system (filled aircraft indicator) on a different aircraft they will not have an exact target they can match to be considered a simultaneous task, thus we consider the task to be successive. See Figure 2 for an example of

109

a filled and hollow aircraft indicator. Additionally, like the Mackworth study, the current study was sensory in nature (participants had to monitor a sector of airspace for a visual change in appearance of the target, the filled or unfilled triangular aircraft indicator) keeping in line with the See et al. (1995) addition to the Parasuraman taxonomy. It can also be assumed that the environment will be temporally and spatially uncertain (ATCos will not know when or where an automation failure will occur), making it more susceptible to a vigilance decrement. The current report examined how traffic density (number of aircraft in the airspace at one time; event rate), the complexity of the instructions for dealing with an automation failure (cognitive load), number of automation failures (signal rate), and the presence of a dual task affected performance. Given that the environment is already temporally and spatially uncertain and that the discrimination requires successive signal types, a high event rate, and a sensory discrimination task, we would expect to find vigilance decrements when the event rate is high and that the size of the vigilance decrement would be greater when an additional cognitive load is placed on participants.

Experiment 1 In the current experiment, we sought to understand how traffic density (event rate) and the number of automation failures (signal rate) affected performance over a 50-min ATC scenario.

Method Participants. Participants were 183 students (79 female) recruited through the Experimetrix system at the Georgia Institute of

Figure 1. NextSim airspace (color is inverted and in grayscale). Airspace is black with green aircraft indicators, vector lines, and trails. Other objects and text are white. Airports are represented by the circular figures lettered W–Z. Gates are located at the edge of the display, along the jetways, and are numbered 1– 4. Waypoints are located at the corners of the diamond shaped area in the center of the screen and are lettered a– e. Flow Corridor A runs horizontally across the top of the airspace. Flow Corridor B runs diagonally across the airspace from the lower left to the upper right. Aircraft in the flow corridors appear as filled-in geometric shapes, while aircraft in classic airspace appear as outlined geometric shapes and are accompanied by a data block, vector line, and history trails.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

110

STEARMAN AND DURSO

Figure 2. The aircraft in the lower portion of the figure contains a filled aircraft indicator while the aircraft in the upper portion of the figure contains a hollow aircraft indicator. Participants were required to detect when an aircraft indicator changed from filled to hollow representing an automation failure.

Technology. The Experimetrix system allows students to participate in psychology experiments for extra credit. Participants were between the ages of 18 –26 (M ⫽ 20.21; SD ⫽ 1.669). All participants claimed normal or corrected-to-normal vision and hearing and to be native English speakers. Apparatus. The same apparatus was used for all experiments. The experiment was individually administered to participants in an enclosed room on an Alienware Area 51 ALX computer with a 30-in. Dell high-resolution (2560 ⫻ 1600) color monitor, ATI Radeon HD 4870 ⫻ 2 video card, noise canceling headphones, and a next-generation air transportation system en route ATC simulator (NextSim) developed by Durso, Stearman, and Robertson (2015). NextSim. NextSim is a next-generation en route ATC research simulator that enables data collection on concepts that will be used in the NextGen environment. NextSim contains features such as different types of aircraft, different levels of airspace, waypoints, gates, and airports. All aircraft are required to maintain a minimum separation of 5 miles laterally and 1,000 feet vertically from all other aircraft. The main task of the participant in the simulator was to ensure that aircraft maintain this minimum separation distance from one another. However, aircraft equipped with special automation (“Evade sensors” related to the NextGen anticipated automatic dependent surveillance broadcast [ADS-B] separation sensors) could detect other aircraft equipped with ADS-B and could automatically selfseparate from such aircraft. ADS-B-equipped aircraft therefore did not require control actions from the ATCo. Aircraft equipped with the ADS-B evade sensors were denoted as types E5, E7, or E9 (E for equipped). Such aircraft appeared as filled geometric shapes and flew in a level of airspace known as flow corridors. Flow corridors were tubular volumes of airspace above 25,000 feet that only aircraft equipped with ADS-B could travel. Aircraft not equipped with ADS-B could not detect other aircraft in their vicinity and required control actions from the ATCo

to maintain minimum separation from all other aircraft. Non-ADSB-equipped aircraft were denoted as types U3, U5, or U7 (U for unequipped). Such aircraft appeared as hollow geometric shapes and flew in a level of airspace referred to as classic airspace. Classic airspace was the airspace between the 10,000 feet and 25,000 feet. Refer to Figure 1 for a layout of the airspace. Each experiment in the current report uses aircraft equipped with ADS-B. However, no unequipped aircraft were present in Experiment 1; their presence will appear in Experiment 2 as mixed equipage when we introduce equipped and unequipped aircraft in the same airspace. Embedded within the airspace were certain features such as waypoints, gates, and airports. Waypoints in classic airspace appeared in a diamond shaped pattern in the display while waypoints for flow corridors were off the display. Waypoints were used to create routes that aircraft would follow until they reached their destination. Gates were entry and exit points for aircraft in classic airspace. An aircraft could also exit the airspace if it was over an airport, at an altitude of 10,000 feet and a speed of 100 nautical mph, at which point it was handed off to terminal radar approach control (TRACON). For a more detailed description of NextSim see Durso et al. (2015). All aircraft followed waypoints to traverse through the airspace. The aircraft had data blocks next to them indicating the call sign, type of aircraft, current and assigned speed, current and assigned altitude, and route of the aircraft. All aircraft had the same minimum speed (100 nautical mph) and altitude (10,000 feet). Aircraft varied in terms of maximum speed (300 –900 nautical mph) and altitude (25,000 – 50,000 feet), and whether they were equipped with automation that allowed aircraft to self-separate. Every time minimum separation was violated, aircraft in conflict flashed red. It was possible for ADS-B automation to fail, and consequently for aircraft equipped with ADS-B to violate minimum separation from other aircraft. Aircraft with nonfunctioning ADS-B equipment

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

VIGILANCE IN A DYNAMIC ENVIRONMENT

appeared as hollow geometric shapes (See Figure 2). In the current report, the hollow geometric shape represented a signal that the participants needed to detect. Conflicting aircraft flashed red until the conflict was resolved. Conflicts between aircraft could be resolved by making changes in speed, altitude, or heading. Only aircraft with nonfunctioning ADS-B could come into conflict with other aircraft. The conflict was not the signal to be detected for assessment. Following any scenario in NextSim, operators were given feedback about their performance via a performance summary at the end of the each scenario. The performance summary provided participants with information about the percentage of successful landings, number of flights crashed, percentage of completed routes, mean time to accept aircraft, mean route leg time (mean time from one waypoint to another waypoint), and total time in conflict (time during which minimum separation standards were not maintained). The performance summary was only intended to provide feedback for the participants’ performance in controlling aircraft in classic airspace and not as training for the monitoring task. Design. This experiment used a 3 ⫻ 2 (failure rate x traffic density) between subjects design. Failure rate was represented by the number of automation failures that occurred during a scenario. Failure rate was high (10 automation failures), moderate (4 automation failures), or low (1 automation failure). Traffic density was the number of aircraft that were present during a scenario. Traffic density was high (24 aircraft in the airspace at a time; 200 aircraft total) or moderate (12 aircraft in the airspace at a time; 80 aircraft total). In the current experiment, no aircraft were present in classic airspace. In other words, only aircraft that needed to be monitored for automation failures were present in the scenario. The combination of failure rate and traffic density resulted in varying signal rates. For example, the high failure rate/high traffic density condition resulted in a signal rate of .05 while the high failure rate/moderate traffic density condition resulted in a signal rate of .125. See Table 1 for the signal rates for each of the 3 ⫻ 2 (failure rate ⫻ traffic density) conditions. The dependent measure was whether the participant detected the automation failure within 60 seconds. Participants were instructed to send the aircraft with an automation failure out of the airspace in the quickest and safest manner possible. Detection times of automation failures were recorded for each participant. Scenario design. Parent scenarios were designed, one for each traffic density. In the parent scenarios, 10 aircraft were randomly selected to have automation failures. One failure occurred during each 5-min period of the scenario, such that the first automation failure occurred within the first 5 min of the scenario, the second automation failure occurred between 5 and 10 min into the scenario, and so on. The same 10 aircraft had failures and were

Table 1 Signal Rates for Each Failure Rate by Traffic Density Condition Failure rate Traffic density

High

Moderate

Low

High Moderate

.05 .125

.02 .05

.005 .0125

111

present in both high and low traffic density scenarios. Additional scenarios were derived from the parents by selecting which of the 10 aircraft would experience the automation failure. For the four failure condition, only the first, fourth, seventh, and 10th automation failures from the parent occurred. In the one failure condition, only the 10th automation failure from the parent occurred. Thus, in the one failure condition, the failure always occurred during the last 5 min of the scenario. Procedure. This experiment lasted for approximately 1 hour. Participants were randomly assigned to one of the 3 ⫻ 2 (failure rate ⫻ traffic density) conditions. When each participant first arrived, the participant read and signed an informed consent form. After signing the informed consent form, each participant was given an instruction sheet explaining the task. The instruction sheet provided pictures of the airspace and aircraft and explained that an aircraft changing from a filled in geometric shape to a hollow geometric shape represented an automation failure. The instruction sheet then detailed the steps needed to be taken after an automation failure occurred. After reading the instruction sheet, the participant was then given a quiz to ensure the participant understood the instructions. If the participant missed any questions on the quiz, the experimenter explained the correct answer to the participant. Each participant then completed a 5-min practice scenario in which one aircraft was present. The aircraft present in the practice scenario had an automation failure 3 min into the scenario. The participant was required to right click on the aircraft in order to gain control of it. The participant then removed the aircraft from the airspace in the quickest and safest manner possible. After completing the practice scenario, the participant completed a 50min scenario for the condition to which he or she was assigned.

Results and Discussion Two dependent measures were used for the analyses, detection time, and failure detection. For detection time, the time it took the participant to respond to the failure was used. Nonresponses were ignored for detection time. For failure detection, if the participant detected the failure within 60 seconds, the participant was given credit for the failure detection. If the participant took longer than 60 seconds to detect the failure, or did not detect the failure at all, the participant was not given credit for the failure detection. In order to examine the presence of vigilance decrements, results were analyzed using a 2 ⫻ 2 ⫻ 2 (block ⫻ failure rate ⫻ traffic density) ANOVA. Block was a within subjects factor (early or late). The block variable was used to look at participants’ mean failure detection and detection time during the first half of the session (early) compared to the second half of the session (late). Of course, because the one failure condition did not have any failures in the first half of the session, it was excluded for this analysis. This analysis did not provide any evidence for a vigilance decrement. There was never a significant difference between the early and late block. Thus, we decided to take another approach to give vigilance decrements a chance to be detected; we analyzed the slopes. This slope analysis was more sensitive and thus we report the details for only it. Performance slopes were calculated by generating a slope for each participant for each scenario based on the detection times and failure detections across the scenario. A mean slope was then calculated for each condition. These slopes were tested, using a

STEARMAN AND DURSO

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

112

t test, to see if they differed significantly from zero. A vigilance decrement would be observed if either a negative slope was observed for the proportion of failures detected or a positive slope was observed for detection time. Because our primary concern was finding slopes suggesting a vigilance decrement, we will also discuss marginally significant results, as these would be significant for a one-tailed t test. No vigilance decrements were revealed for any of the traffic density by failure rate conditions (see Tables 2 and 3). Liberalizing the alpha level to include marginal effects produced only one marginally significant slope for detection time in the high failure rate/high traffic density condition (t(30) ⫽ ⫺1.72; p ⬍ .095) and one marginally significant slope for failure detection in the moderate failure rate/moderate traffic density condition (t(29) ⫽ 1.98; p ⬍ .058). The slope for detection time in the high failure rate/high traffic density condition was positive indicating a decrease in performance with time on task while the slope for failure rate in the moderate failure rate/moderate traffic density condition was positive indicating an increase in performance with time on task. Of course, no slope could be calculated for the one failure condition; however, visual inspection shows that the failure was detected 97% (M ⫽ .97; SD ⫽ .04) and in about 10 seconds (M ⫽ 10.11; SD ⫽ 1.79) under high traffic and 90% (M ⫽ .90; SD ⫽ .04) of the time and in about 13 seconds (M ⫽ 12.89; SD ⫽ 2.61) under moderate traffic, even after participants had been watching the screen for 45 min without incident. Compared to similar performance at the beginning of the scenario in the other conditions, it is difficult to argue that failure detection or detection time declined for the single failure condition. In brief, only when a high traffic density (high event rate) was present in the current experiment was a vigilance decrement found, but only using a liberalized alpha level. However, the high event rate with the most rare failure rate (1 failure), over 90% of participants detected the failure in under 13 seconds despite the failure occurring after a 45-min vigil. This finding differs from previous research by Molloy and Parasuraman (1996), who had found that performance was significantly lower if a single automation failure occurred during the last 10 min, compared to the first 10 min, of a 30-min vigilance session. Although we expected variation in performance across conditions, we were surprised that no vigilance decrement proved reliable and that overall performance was notably high. However, we were also aware of recent work that suggested a vigilance decrement could materialize if the controller was given a complex

mental task to perform (Pop, Stearman, Kazi, & Durso, 2012). In the Pop et al. study, the number of aircraft built up over the course of the experiment and so their finding of a decrement could have been due to a change in the display density. In Experiment 2 (as was the case for the study in this report), we looked at the role of complexity while controlling display density throughout each scenario.

Experiment 2 In the current experiment, we sought to understand how the complexity of instructions (cognitive load) affected performance over a 50-min ATC scenario. We also sought to introduce the idea of mixed equipage in the current experiment. Mixed equipage refers to the anticipated near-term transitional period in the National Airspace System (NAS) under NextGen during which some aircraft, because of their newer technologies, will only be monitored by controllers, while other aircraft with current technologies will still require controllers to actively participate in their flight. In this experiment, we began to investigate this issue by simply adding these aircraft into the airspace to see if they would serve as distracters for the monitoring task and possibly contribute to producing a vigilance decrement. Participants were not required to interact with these aircraft in any way.

Method Participants. Participants were 34 students recruited using the Experimetrix system at the Georgia Institute of Technology. Participants were given credits for participating in the experiment. Due to a procedural error, no demographic information was obtained from participants. Design. This was a between-subjects experiment examining the difference between simple and complex instructions for the detection of automation failures in an ATC environment. In the simple instructions condition, participants were required (as was the case in Experiment 1) to right click on an aircraft that had experienced an automation failure and then send the aircraft out of the airspace in the quickest and safest manner possible. In the complex instructions condition, participants were required to differentiate the two flow corridors. If the automation failure occurred in Flow Corridor A, participants were instructed to first lower the altitude of the aircraft to 10,000 feet, then to reroute the aircraft to Airport X or Airport Y. If the automation failure occurred in Flow

Table 2 Test Statistics for Each of the Failure Rate by Traffic Density Conditions for Detection Time

Failure rate

Traffic density

Moderate

Low Moderate High Low Moderate High

High

Detection Time Slope Mean (SD)

Cohen’s d

.06 (.39) ⫺.13 (.41)

.16 ⫺.31

⫺.01 (1.53) .08 (.84)

⫺.01 .09

Detection time Mean (SD) 12.89 (13.58) 12.98 (8.41) 11.12 (6.81) 10.11 (9.48) 24.22 (28.12) 22.56 (23.30)

Note. Positive slopes for detection time indicate a vigilance decrement. ⫹ indicates slope was significant at p ⬍ .10. No slopes were significant at p ⬍ .05.

t value

df

p value

⫺.05 .52

28 30

.964 .606

.85 ⫺1.72

29 30

.402 .095⫹

VIGILANCE IN A DYNAMIC ENVIRONMENT

113

Table 3 Test Statistics for Each of the Failure Rate by Traffic Density Conditions for Failure Detection

Failure rate

Traffic density

Moderate

Low Moderate High Low Moderate High

High

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Note.

Failure detection Slope Mean (SD)

Cohen’s d

.002 (.006) .000 (.006)

.36 ⫺.07

⫺.001 (.013) ⫺.002 (.007)

⫺.04 ⫺.26

Failure detection Mean (SD) .90 (.31) .95 (.14) .93 (.13) .97 (.19) .84 (.27) .88 (.18)

t value

df

p value

1.98 ⫺.40

29 30

.058⫹ .695

⫺.22 ⫺1.47

30 30

.827 .152

Negative slopes for failure detection indicate a vigilance decrement.

Corridor B, participants were instructed to reroute the aircraft to Gate 1 or Gate 3 (no change in altitude needed to be made). Detection time of automation failures was recorded for each participant. Scenario design. Only one scenario was created for this experiment. The scenario used a high traffic density. Aircraft were always present in classic airspace (approximately 36 aircraft in the airspace; 24 aircraft in the flow corridors and 12 aircraft in classic airspace). However, participants were told to ignore the aircraft in classic airspace and that these aircraft would never have an automation failure. The scenario also included 10 automation failures that occurred at random points in time in the same manner discussed in Experiment 1. These 10 automation failures were equivalent to a signal rate of .05 as 200 aircraft were present in the flow corridors during the 50-min scenario. Procedure. This experiment lasted for approximately 1.5 hours. Participants first read a PowerPoint presentation explaining the task. Participants then completed a 10-min practice scenario. Before beginning the experimental scenario, participants completed a quiz to ensure they understood the instructions they were given. If a participant missed an answer on the quiz, they were told the correct answer. Participants were also provided a reminder sheet that explained the instructions in case they needed it for reference during the scenario. Participants then completed a 50min scenario during which data were collected.

Results and Discussion The same dependent measures were analyzed in the same way as Experiment 1. The ANOVA using the block variable did not reveal any evidence of a vigilance decrement and the analysis of performance slopes revealed no significantly positive slopes for detection time and no significantly negative slopes for failure detection (see Tables 4 and 5). However, liberalizing the alpha level

revealed a marginally significant negative slope for the complex instruction condition using failure detections (t(16) ⫽ ⫺1.973; p ⬍ .066), suggesting a marginal vigilance decrement. Although, the complex instructions suggested a decrement, the effect did not reach conventional levels of significance. This contrasts with Pop et al. (2012), who found rather dramatic decrements and much poorer overall performance. Here, however, like in Experiment 1, overall performance was quite good and there was no reliable indication that performance became poorer as time into the vigil increased. Although we did not find any significant decreases in performance for either condition, the slight decrease in performance in the complex instruction condition together with the Pop et al. results suggest that the cognitive load placed on participants for the complex instruction condition may have caused a decrease in performance. In Experiment 3, we investigate load manipulations more thoroughly looking at an instruction manipulation of cognitive load as well as a load manipulation rooted in the secondary-task literature.

Experiment 3 In the current experiment, we sought to understand how cognitive load and the presence of a dual task affected performance over eight 50-min ATC scenarios. It is unclear whether a dual task (i.e., controlling some traffic) would affect the vigilance decrement on a primary task (i.e., monitoring other traffic for automation failures). Previous research (Baker, 1961; McBride et al., 2007) suggests that a secondary meaningful task can aid in the performance of a vigilance task. However, if the secondary task is viewed as an additional cognitive load, other research (Parasuraman, 1986) would suggest that this could increase the likelihood of a vigilance decrement occurring.

Table 4 Test Statistics for Each of the Instruction Conditions for Detection Time

Instruction

Detection Time Slope Mean (SD)

Cohen’s d

Detection Time Mean (SD)

t value

df

p value

Complex Simple

.17 (.45) .16 (.80)

.37 .20

26.12 (24.87) 19.69 (13.94)

1.51 .80

16 15

.152 .438

Note. Positive slopes for detection time indicate a vigilance decrement. ⫹ indicates slope was significant at p ⬍ .10. No slopes were significant at p ⬍ .05.

STEARMAN AND DURSO

114

Table 5 Test Statistics for Each of the Instruction Conditions for Failure Detection

Instruction

Failure detection Slope Mean (SD)

Cohen’s d

Failure detection Mean (SD)

t value

df

p value

Complex Simple

⫺.003 (.006) .001 (.004)

⫺.48 .13

.86 (.24) .85 (.31)

⫺1.97 .58

16 16

.066⫹ .573

Note. Negative slopes for failure detection indicate a vigilance decrement. indicates slope was significant at p ⬍ .10. No slopes were significant at p ⬍ .05.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.



Method Participants. Participants were 60 students recruited from the Georgia Institute of Technology. Participants were between the ages of 18 –28 (M ⫽ 20.71; SD ⫽ 2.11). All participants reported normal or corrected-to-normal vision and hearing and to be native English speakers. All participants were paid $8 per hour for their participation. A 10% bonus was given to participants who completed the entire experiment. The total payment to any participant did not exceed $110. Design. This experiment used a 2 ⫻ 4 (dual task ⫻ additional cognitive load) within subjects design. Dual task was present or absent. When the dual task was present, participants were responsible for controlling aircraft in classic airspace while monitoring the airspace for automation failures. Controlling aircraft in classic airspace entailed maintaining minimum separation between nonADS-B-equipped aircraft as well as preparing aircraft for handoffs to either airports or other sectors of airspace. When the dual task was absent, participants were only responsible for monitoring aircraft in the two flow corridors for automation failures. Additionally, when the dual task was absent, all aircraft in classic airspace did not need to be prepared for handoffs to airports or other sectors of airspace, nor did conflicts occur among aircraft in classic airspace. In both dual task present and dual task absent conditions, participants monitored for automation failures in two flow corridors (Flow Corridor A and Flow Corridor B). Additional cognitive load was either absent, minimal, moderate, or high. When additional cognitive load was absent, participants were not given any additional instructions. When additional cognitive load was minimal, participants were responsible for accepting all aircraft into the airspace by left clicking on aircraft as they entered the airspace. When additional cognitive load was moderate, participants were required to count all square-shaped aircraft. When additional cognitive load was high, participants were required to count all square-shaped aircraft heading east or northeast and all diamond-shaped aircraft heading west or southwest. In both the moderate and high additional cognitive load conditions, the number of aircraft the participants were required to count was equal. Participants completed four scenarios each week, all scenarios within a week were either with the dual task present or absent. The order of dual task present or absent conditions was counterbalanced across participants. One scenario each week was present for each of the cognitive load conditions. The order of cognitive load conditions was counterbalanced within each week across partici-

pants. Detection times of automation failures and failure detections were recorded for each participant. Scenario design. Only high traffic densities (average of 36 aircraft in the airspace; 24 aircraft in the flow corridors and 12 aircraft in classic airspace) with 10 automation failures (a signal rate of .05) were used (aircraft in classic airspace were not counted toward the signal rate). Participants were instructed to send the aircraft with an automation failure out of the airspace in the quickest and safest manner possible. One parent scenario was used to create seven variants needed for the eight scenarios used in this experiment. The seven variants were created by randomly selecting 10 aircraft to have automation failures for each scenario. Procedure. This experiment lasted for 3 weeks with each week comprising four 1-hour sessions. The first week comprised four 1-hour sessions during which participants completed a battery of cognitive tests for the first and second sessions (the cognitive tests will not be discussed in the current report), were trained controlling aircraft during the third session, and were trained to detect and handle automation failures during the fourth session. During the following 2 weeks participants completed four 1-hour scenarios during each week.

Results and Discussion The same dependent measures were analyzed in the same way as the previous two experiments. Again, no vigilance decrements were revealed for any of the dual task by additional cognitive load conditions (see Tables 6 and 7). However, there were significant improvements in performance as the vigil continued. Analysis of performance slopes revealed marginally significant improvement for detection time in the dual task present/additional cognitive load moderate condition (t(47) ⫽ ⫺1.38; p ⬍ .057), significant improvement for failure detection in the dual task absent/additional cognitive load minimal condition (t(47) ⫽ 2.71; p ⬍ .009), and marginally significant improvement for failure detection in the dual task present/additional cognitive load absent condition (t(47) ⫽ 1.70; p ⬍ .096). Again, performance never decreased with time on task in Experiment 3. The lack of any vigilance decrement suggests that the cognitive load manipulations, both for the additional cognitive load and the presence of a dual task, used in the current experiment had less of an effect than the complex instructions used in Experiment 2. Thus, in the next experiment, we use the complex instructions from Experiment 2 and further examine the effects of traffic density and the presence of a dual task with a small group of participants who agreed to participate for 6 weeks.

VIGILANCE IN A DYNAMIC ENVIRONMENT

115

Table 6 Test Statistics for Each of the Dual Task by Additional Cognitive Load Conditions for Detection Time

Dual task

Additional cognitive load

Detection time Slope Mean (SD)

Present

Absent Minimal Moderate Complex Absent Minimal Moderate Complex

⫺.34 (1.80) .21 (1.07) ⫺.63 (2.22) ⫺.15 (.82) .16 (.85) ⫺.32 (1.34) ⫺.18 (.92) .07 (.76)

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Absent

Cohen’s d

Detection time Mean (SD)

t value

df

p value

⫺.19 .20 ⫺.28 ⫺018 .19 ⫺.24 ⫺.20 .09

27.76 (29.93) 29.07 (27.60) 39.61 (40.44) 30.40 (24.22) 24.50 (26.37) 27.69 (26.89) 29.53 (29.76) 29.33 (23.02)

⫺1.29 1.29 ⫺1.95 ⫺1.23 1.34 ⫺1.63 ⫺1.38 .60

47 42 47 47 47 45 47 47

.203 .204 .057⫹ .224 .187 .11 .174 .549

Note. Positive slopes for detection time indicate a vigilance decrement. indicates slope was significant at p ⬍ .10. ⴱⴱ indicates slope was significant at p ⬍ .05.



Experiment 4 Another concern about generalizing from the laboratory to the field in vigilance work has been the amount of experience that two populations of ATCos have. In Experiment 4 we take a step toward exploring this issue by training a small group of students for 6 weeks. In addition, because there are no professional controllers who are experienced in controlling traffic in a NextGen environment (given the environment does not yet exist), a reasonable way to understand the human factors issues that may arise with skilled operators in that future environment is to train individuals in a NextGen environment.

Method Participants. Participants were 18 students recruited from the Georgia Institute of Technology through an advertisement in the school newspaper and flyers placed around the campus. Participants were between the ages of 19 –27 (M ⫽ 21.17; SD ⫽ 1.89). All participants reported normal or corrected-to-normal vision and hearing and to be native English speakers. All participants were paid $8 per hour for their participation. A 10% bonus was given to participants who completed the entire experiment. The total payment to any participant did not exceed $352. Design. The experiment used a 3 ⫻ 2 ⫻ 6 (traffic density ⫻ dual task ⫻ session) mixed model design. Traffic density was a between-subjects factor. Traffic density was high (average of 36

aircraft in the airspace at a time), moderate (average of 24 aircraft in the airspace at a time), or low (average of 18 aircraft in the airspace at a time). Like Experiment 3, for each traffic density condition, 12 of the aircraft present in the airspace at a time were in classic airspace; the remaining aircraft were in the flow corridors and thus needed to be monitored for automation failures. Dual task (present or not) was a within-subjects factor and was identical to the dual task used in Experiment 3: When the dual task was present, participants were responsible for controlling aircraft in classic airspace while monitoring the airspace for automation failures, otherwise they were responsible only for monitoring the airspace for automation failures. Participants monitored for automation failures in two flow corridors (Flow Corridor A and Flow Corridor B). Detection times of automation failures were recorded for each participant. Scenario design. One parent scenario was used for this experiment. The parent scenario was taken from Experiment 2. Variants to the parent scenario were created to allow for six high traffic scenarios, six moderate traffic scenarios, and six low traffic scenarios. The six scenarios for each traffic density condition were counterbalanced across participants. In order to keep a consistent signal rate, the number of automation failures was adjusted for each traffic density condition. A signal rate of .05 was always used. In the high traffic density condition, 10 failures occurred during each scenario, in the moderate traffic density condition, four failures occurred during each scenario, and in the low traffic density condition, two failures occurred during each scenario. The

Table 7 Test Statistics for Each of the Dual Task by Additional Cognitive Load Conditions for Failure Detection

Dual task Present

Absent

Note.

Additional cognitive load

Failure detection Slope Mean (SD)

Cohen’s d

Failure detection Mean (SD)

t value

df

p value

Absent Minimal Moderate Complex Absent Minimal Moderate Complex

.002 (.006) .001 (.008) .002 (.009) .001 (.007) ⫺.001 (.005) .003 (.009) .000 (.006) .000 (.007)

.24 .10 .20 .10 ⫺.13 .39 .04 .00

.88 (.18) .78 (.31) .81 (.21) .88 (.16) .90 (.15) .81 (.24) .88 (.17) .84 (.20)

1.70 .70 1.41 .71 ⫺.90 2.71 .25 .00

47 47 47 47 47 47 47 47

.096⫹ .489 .166 .481 .374 .009ⴱⴱ .805 1

Negative slopes for failure detection indicate a vigilance decrement.

STEARMAN AND DURSO

116 Table 8 Experiment 4 Timeline Week 1

Weeks 2– 6

Session 1

Session 2

Session 3

Session 1

Session 2

Session 3

Cognitive tests

Practice scenarios

Vigilance scenarios

Training scenarios

Training scenarios

Vigilance scenarios

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Note. Other than Session 1 of Week 1, participants completed two 50-minute scenarios during each session.

method for adjusting the number of failures was similar to Experiment 1 such that the first, fourth, seventh, and 10th automation failures from the high traffic density parent scenario were used for the moderate traffic density scenarios and the first and 10th automation failures were used for the low traffic density condition. Procedure. Because we could train only a limited number of people, and we wanted to maximize the power of the comparison, participants completed two scenarios during each session. Thus, participants performed tasks for two hours every day over three sessions each week for 6 weeks. At the beginning of the experi-

ment, participants were randomly assigned to one of the three traffic densities. On the first day of the experiment, consent was obtained, participants completed a demographic questionnaire, and participants completed a battery of cognitive tests. The cognitive tests will not be discussed in the current report. Each participant was also trained on the dual task. On the second day of the experiment, instructional PowerPoint slides were used to acquaint the participants with procedures to accomplish tasks in NextSim, such as preparing aircraft for handoffs at airports and other sectors of

Table 9 Test Statistics for Each Week by Dual Task by Traffic Density Conditions for Detection Time

Week

Dual task

Traffic density

Detection time Slope Mean (SD)

1

Present

Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High

1.32 (1.95) ⫺.48 (2.67) ⫺2.36 (5.84) ⫺.88 (2.67) ⫺.04 (.78) 1.08 (.78) .63 (2.15) ⫺1.64 (2.64) ⫺.44 (7.09) .46 (2.78) ⫺.06 (.79) 1.70 (1.21) .79 (.82) .39 (1.09) .95 (6.46) ⫺.79 (1.33) ⫺.63 (1.55) .35 (2.84) ⫺2.37 (3.70) ⫺.74 (1.90) 1.56 (1.25) ⫺.14 (2.24) .20 (.89) ⫺.23 (2.25) .46 (7.14) .30 (.72) .81 (.63) ⫺.81 (4.31) .11 (.85) ⫺.87 (5.93) ⫺2.24 (1.04) .50 (1.52) .56 (.97) ⫺.98 (1.85) ⫺.91 (1.60) .90 (2.27)

Absent 2

Present Absent

3

Present Absent

4

Present Absent

5

Present Absent

6

Present Absent

Cohen’s d

Detection time Mean (SD)

t value

df

p value

.68 ⫺.18 ⫺.40 ⫺.33 ⫺.05 1.39 .29 ⫺.62 ⫺.06 .16 ⫺.08 1.41 .96 .35 .15 ⫺.59 ⫺.41 .12 ⫺.64 ⫺.39 1.25 ⫺.06 .23 ⫺.10 .06 .42 1.20 ⫺.19 .13 ⫺.15 ⫺2.16 .33 .58 ⫺.53 ⫺.57 .39

95.00 (93.83) 96.51 (106.67) 167.28 (140.01) 30.67 (30.02) 15.76 (14.73) 136.99 (153.85) 49.50 (22.51) 49.81 (35.68) 251.00 (308.74) 49.90 (36.22) 23.65 (25.06) 104.40 (113.88) 46.42 (34.25) 43.04 (32.31) 145.20 (100.53) 30.67 (16.51) 23.54 (18.44) 115.34 (117.83) 75.08 (40.77) 41.36 (33.95) 123.97 (75.37) 33.75 (28.06) 18.43 (18.19) 78.00 (72.41) 77.67 (65.03) 29.26 (18.87) 96.07 (68.68) 52.00 (44.76) 30.22 (26.54) 105.50 (135.27) 67.67 (68.26) 32.57 (28.52) 138.24 (160.80) 31.42 (31.45) 29.06 (30.07) 97.91 (97.51)

1.35 ⫺.44 ⫺.70 ⫺.81 ⫺.12 2.77 .66 ⫺1.53 ⫺1.37 .37 ⫺.19 3.45 1.66 .87 .36 ⫺1.45 ⫺1.00 .306 ⫺1.28 ⫺.95 3.06 ⫺.15 .55 ⫺.25 .13 1.04 1.61 ⫺.46 .32 ⫺.36 ⫺4.33 .81 1.31 ⫺1.30 ⫺1.39 .88

3 5 2 5 5 3 4 5 4 4 5 5 2 5 5 5 5 5 3 5 5 5 5 5 3 5 4 5 5 5 3 5 4 5 5 4

.269 .68 .557 .456 .911 .069⫹ .547 .187 .897 .732 .857 .018ⴱ .239 .425 .732 .206 .364 .772 .29 .387 .028ⴱ .884 .604 .813 .906 .348 .183 .663 .76 .735 .023ⴱ .455 .262 .251 .223 .427

Note. Positive slopes for detection time indicate a vigilance decrement. ⫹ indicates slope was significant at p ⬍ .10. ⴱ indicates slope was significant at p ⬍ .05.

ⴱⴱ

indicates slope was significant at p ⬍ .01.

VIGILANCE IN A DYNAMIC ENVIRONMENT

117

Table 10 Test Statistics for Each Week by Dual Task by Traffic Density Conditions for Failure Detection

Week

Dual task

Traffic density

Failure detection Slope Mean (SD)

1

Present

Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High

⫺.007 (.016) ⫺.007 (.017) ⫺.002 (.009) .007 (.016) .004 (.010) ⫺.004 (.004) ⫺.007 (.030) .011 (.013) .005 (.019) .000 (.025) .004 (.011) ⫺.007 (.007) ⫺.013 (.021) ⫺.005 (.011) ⫺.001 (.009) .013 (.021) .004 (.010) ⫺.005 (.007) .013 (.021) .005 (.015) ⫺.006 (.009) .007 (.030) ⫺.015 (.014) .001 (.012) ⫺.033 (.016) ⫺.003 (.007) ⫺.001 (.009) .007 (.030) ⫺.009 (.013) ⫺.009 (.009) .013 (.033) ⫺.004 (.008) ⫺.001 (.007) .007 (.016) .009 (.013) ⫺.006 (.008)

Absent

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

2

Present Absent

3

Present Absent

4

Present Absent

5

Present Absent

6

Present Absent

Cohen’s d

Failure Detection Mean (SD)

t value

df

p value

⫺.41 ⫺.39 ⫺.23 .41 .41 ⫺.89 ⫺.22 .82 .26 .00 .36 ⫺1.00 ⫺.65 ⫺.49 ⫺.16 .65 .41 ⫺.75 .65 .36 ⫺.70 .22 ⫺1.06 .05 ⫺2.04 ⫺.41 ⫺.14 .22 ⫺.73 ⫺.99 .41 ⫺.48 ⫺.11 .41 .73 ⫺.79

.58 (.49) .54 (.29) .18 (.33) .92 (.20) .96 (.10) .45 (.49) .58 (.38) .75 (.27) .33 (.29) .50 (.45) .88 (.14) .58 (.35) .50 (.45) .75 (.16) .32 (.28) .83 (.26) .96 (.10) .52 (.39) .33 (.41) .75 (.16) .33 (.27) .75 (.27) .79 (.19) .58 (.35) .42 (.20) .83 (.26) .33 (.30) .75 (.27) .79 (.25) .60 (.38) .50 (.32) .79 (.19) .40 (.32) .92 (.20) .79 (.25) .60 (.37)

⫺1.00 ⫺.96 ⫺.58 1.00 1.00 ⫺2.19 ⫺.54 2.00 .64 .00 .89 ⫺2.46 ⫺1.58 ⫺1.20 ⫺.39 1.58 1.00 ⫺1.83 1.58 .88 ⫺1.71 .54 ⫺2.61 .13 ⫺5.00 ⫺1.00 ⫺.35 .54 ⫺1.78 ⫺2.43 1.00 ⫺1.17 ⫺.28 1.00 1.78 ⫺1.94

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

.363 .383 .59 .363 .363 .08⫹ .611 .102 .553 1 .415 .057⫹ .175 .286 .714 .175 .363 .127 .175 .421 .148 .611 .048ⴱ .903 .004ⴱⴱ .363 .741 .611 .135 .06⫹ .363 .296 .793 .363 .135 .111

Note. Negative slopes for failure detection indicate a vigilance decrement. indicates slope was significant at p ⬍ .10. ⴱ indicates slope was significant at p ⬍ .05.



airspace. Participants were also acquainted with procedures to maintain minimum separation standards between aircraft, including changing the speed and altitude of aircraft, and instructed on how to monitor the flow corridors for automation failures. Following the instructions, participants completed a practice session of the learned procedures. On the third day of the experiment, participants completed two 1-hour vigilance scenarios. In the subsequent 5 weeks of the experiment, participants completed two sessions comprising four scenarios of training each week during which they controlled aircraft in classic airspace.

ⴱⴱ

indicates slope was significant at p ⬍ .01.

Unlike the vigilance sessions, training sessions only comprised aircraft that the participants had to control and did not include any aircraft that needed to be monitored for automation failures. Participants then completed two 1-hour vigilance scenarios on the third day of each week. Each participant completed one vigilance session per week for 6 weeks. During the vigilance session, each participant completed one scenario with the dual task present and one scenario with the dual task absent. Each scenario was 50-min long and participants were given a 10-min break between the two scenarios in the same session. The order of the scenarios was

Table 11 Experiment Conditions Contributing to Meta-Analysis Experiment

Total # of conditions

Dual task

Traffic density

Instruction complexity

1 2 3 4

4 2 8 36

Absent Absent Absent/Present Absent/Present

High/Moderate High High High/Moderate/Low

Simple Simple/Complex Simple Complex

STEARMAN AND DURSO

118

a

Dual Task Absent

Dual Task Present 4

Slope

2

0

-4

b

Dual Task Absent

Dual Task Present

0.1 0.08 0.06 0.04

Slope

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

-2

0.02 0 -0.02 -0.04 -0.06 -0.08 -0.1

Figure 3. (a) Meta forest plot of the performance slopes using detection time for both dual task conditions. The gray dots represent conditions from the four experiments and the black dots represent the meta-analysis data points. A positive slope with a CI that does not capture zero indicates a performance decrease with time on task. (b) Meta forest plot of the performance slopes using failure detection for both dual task conditions. The gray dots represent conditions from the four experiments and the black dots represent the meta-analysis data points. The CIs from each point capture zero. Although the CIs from the meta-analysis data points are too small to be seen on this scale, both of these capture zero.

counterbalanced across sessions. Participants were given instructions identical to the complex instructions used in Experiment 2 for dealing with an aircraft with an automation failure. See Table 8 for a timeline of experimental sessions.

Results and Discussion The same dependent measures were analyzed in the same way as the previous three experiments. An ANOVA using block as a variable did not reveal any evidence for a vigilance decrement. However, analysis of performance slopes using detection time revealed two significant and one marginally significant positive slopes (see Table 9). Two of these significant slopes indicated a vigilance decrement and one indicated a vigilance improvement. Both decrements were in high traffic, once with a dual task (Week 4 (t(5) ⫽ 2.93; p ⬍ .033) and once without (Week 2 (t(5) ⫽ 3.45; p ⬍ .018). The vigilance improvement was in low traffic with a dual task (Week 6 t(3) ⫽ ⫺4.33; p ⬍ .023). For failure detection, there were two significant slopes indicating a decrement (see Table

10). Neither were in high traffic, once with a dual task (Week 5; t(5) ⫽ ⫺5.00; p ⬍ .004) and once without (Week 4; t(5) ⫽ ⫺2.61; p ⬍ .048). Again, expanding our alpha level to p ⫽ .10 produced additional indicators of performance changes over time (see Tables 9 and 10). All of the marginally significant slopes were in the high traffic density, dual task absent condition using failure detection (Week 1 t(5) ⫽ ⫺2.19; p ⬍ .080; Week 2 t(5) ⫽ ⫺2.46; p ⬍ .057; Week 5 t(5) ⫽ ⫺2.43; p ⬍ .060). All these marginally significant decrements in the current experiment were in the direction of a vigilance decrement. Considering all of these effects, (along with the results of Experiment 2) invites the suspicion that instructional complexity in specific circumstances— conditions of high traffic, conditions with no dual task, or both high traffic and no dual task—leads to a vigilance decrement. However, we caution that this is supported if we assume a one-tailed hypothesis testing posture in search of vigilance decrements, than if we allow the possibility that increasing vigils can sometimes produce decrements and sometimes benefits. With two-tailed tests at alpha of .05, only three tests

VIGILANCE IN A DYNAMIC ENVIRONMENT

a

119

Low Traffic

Moderate Traffic

High Traffic

Low Traffic

Moderate Traffic

High Traffic

4

Slope

2

0

-4

b

0.1 0.08 0.06 0.04

Slope

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

-2

0.02 0 -0.02 -0.04 -0.06 -0.08 -0.1

Figure 4. (a) Meta forest plot of the performance slopes using detection time for the three traffic density conditions. The gray dots represent conditions from the four experiments and the black dots represent meta-analysis data points. A positive slope with a CI that does not capture zero indicates a performance decrease with time on task. (b) Meta forest plot of the performance slopes using failure detection for the three traffic density conditions. The gray dots represent conditions from the four experiments and the black dots represent meta-analysis data points. Although the CIs from the meta-analysis data points are too small to be seen on this scale, all of these capture zero.

across the two dependent variables of the 48 opportunities fall in the union of high traffic and no dual task. To understand better across all of the experiments reported here, we review the vigilance decrement in the next section.

Meta-Analysis Because we did not find a clear pattern of vigilance decrements in any of the four experiments, we conducted a meta-analysis across the experiments in the hope that this may add clarity.1 We used a meta forest plot, as outlined by Cumming (2012). This technique is meant to make small effects easier to see. We only looked at factors that were suggested to have an effect in at least one experiment. Thus, we did not look at results using the block variable, but only performance slopes. We considered dual task, traffic density, and instruction complexity. We ignored failure rate and training which were never even marginally suggestive of a decrement in our ATC task. We consider all of the slopes for each condition. See Table 11 for experiment contributions to the metaanalysis.

The forest plot represents the mean slope for each condition with error bars representing a 95% confidence interval (CI), based on 1.96 standard errors, around the mean to see if the CI captures a slope of zero. If the CI captures zero, then we cannot say that the slope differs significantly from zero. We only discuss the metaanalysis in terms of detection time as both failure detection and detection time reach near identical results for this analysis. Only one condition across all four experiments has a CI that does not capture zero. This occurred during Experiment 4 in one of the dual task present and low traffic density conditions (M ⫽ ⫺0.033; CI ⫽ ⫺0.001 to ⫺0.065). First, we consider dual task. We conducted a meta forest plot for both dual task present and dual task absent conditions. Both the meta-analysis data point of dual task absent (M ⫽ ⫺.016; CI ⫽ ⫺.051 to 0.084) and dual task present (M ⫽ .029; CI ⫽ ⫺.051 to .084) capture zero, indicating that neither the presence nor absence of the dual task reliably contributed to a 1

Thanks to Jason McCarley for this suggestion.

STEARMAN AND DURSO

120

a

Complex Instrucons

Simple Instrucons

4

Slope

2

0

-4

b

Complex Instrucons

Simple Instrucons

0.1 0.08 0.06 0.04

Slope

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

-2

0.02 0 -0.02 -0.04 -0.06 -0.08 -0.1

Figure 5. (a) Meta forest plot of the performance slopes using detection time for both instruction type conditions. The gray dots represent conditions from the four experiments and the black dots represent meta-analysis data points. A positive slope with a confidence interval (CI) that does not capture zero indicates a performance decrease with time on task. (b) Meta forest plot of the performance slopes using failure detection for both instruction type conditions. The gray dots represent conditions from the four experiments and the black dots represent meta-analysis data points. Although the CIs from the meta-analysis data points are too small to be seen on this scale, the meta analysis data point CI for complex instructions does not capture zero, while the meta analysis data point CI for simple instructions does capture zero.

significant vigilance decrement across experiments. See Figure 3a for the forest plot and Figure 6a for a zoomed-in scale of the meta-analysis data points and CIs. The same meta-analysis results were found for failure detection. See Figures 3b and 6b. Next, we consider traffic density. We conducted a meta forest plot for low traffic density, moderate traffic density, and high traffic density. Both of the meta-analysis data points for moderate traffic density (M ⫽ ⫺.022; CI ⫽ ⫺.114 to .071) and high traffic density (M ⫽ .067; CI ⫽ ⫺016 to .149) capture zero. Indicating that neither moderate nor high traffic density reliably contributed to a significant vigilance decrement across experiments. The metaanalysis data point for low traffic density does not capture zero (M ⫽ ⫺.462; CI ⫽ ⫺.913 to - .011). However, the negative slope indicates performance improved with time on task. See Figure 4a for the forest plot and Figure 6a for a zoomed-in scale of the meta-analysis data points and CIs. The same meta-analysis results were found for failure detection with the exception of the low traffic density condition capturing zero. See Figures 4b and 6b.

Lastly, we consider instruction complexity. We conduct a meta forest plot for both simple instructions and complex instructions. The meta-analysis data point CI for simple instructions did capture zero (M ⫽ ⫺.024; CI ⫽ ⫺.092 to .044). The meta-analysis data point CI for complex instructions did not capture zero (M I ⫽ 0.197; CI ⫽ .057 to .337), suggesting that using complex instructions had a negative effect on performance with time on task (i.e., a vigilance decrement). Therefore, complex instructions produced a reliable vigilance decrement. See Figure 5a for the forest plot and Figure 6a for a zoomed in scale for the meta-analysis data points and CIs. The same meta-analysis results were found for failure detection. See Figures 5b and 6b.

Conclusions The ANOVAs using block as a variable did not provide any evidence for a vigilance decrement at any point in this report. This methodology failed to show a vigilance decrement in a dynamic

VIGILANCE IN A DYNAMIC ENVIRONMENT

a

121

1 Complex Instructions 0.5

Dual Task Present

High Traffic Density

Slope

Simple Instructions 0 Dual Task Absent Moderate Traffic Density Low Traffic Density -1

b

0.01

Moderate Traffic Density 0.005

Simple Instructions Dual Task Present

Slope

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

-0.5

0 Dual Task Absent -0.005

High Traffic Density Low Traffic Density

Complex Instructions

-0.01

Figure 6. (a) Meta-analysis data points and confidence intervals (CIs) using detection time. data points and CIs using failure detection.

environment in not only this report, but one by Teo et al. (2012) as well. However, due to the nature of our experiments, specifically number of failures, we could only divide the sessions into early and late. Thus the lack of evidence from this analysis may be somewhat related to that since the vigilance decrement should occur within the first 15–20 min of the experiment (Mackworth, 1948). It is also worth noting that the analysis we did settle on, examining performance slopes, does assume a linear change over time. It is also worth mentioning that there is a large variation in individual differences when it comes to performance on vigilance tasks. Operators with a high working memory capacity are less likely to show a vigilance decrement (Caggiano & Parasuraman, 2004; Helton & Russell, 2013; McVay & Kane, 2009). In addition, when intrinsic motivation is higher than extrinsic motivation, operators are less likely to show a vigilance decrement (Sawin & Scerbo, 1995; Smith, 1966). However, since we did not measure working memory capacity nor motivation, we can only assume a random distribution of these variables. When considering the four experiments outlined in the current report individually, there seemed to be no factor that reliably

(b) Meta-analysis

predicted a vigilance decrement. There was some evidence from marginally significant performance slopes that high traffic density and the presence of aircraft in classic airspace when the dual task was absent may have contributed to the vigilance decrement. However, neither of these factors contributed reliably to a vigilance decrement in the meta-analysis. It is worth noting that our manipulation of event rate (i.e., traffic density) was spatial in nature and traditionally event rate has been temporal in nature. More research may be needed on the difference between spatial and temporal manipulations of event rate. The only factor that did contribute reliably to a vigilance decrement was the presence of complex instructions for handling an automation failure that was discovered in the meta forest plot. All significant and marginally significant slopes suggesting a vigilance decrement were in the complex instruction conditions. Perhaps this type of cognitive load had more of an effect because of its relationship to the vigilance task (i.e., handling an automation failure). While neither placing an additional cognitive load unrelated to the vigilance task (Experiment 3) nor placing an additional cognitive load through the presence of a dual task (Experiments 3 and 4) provided any reliable effect, instruction complexity (Experiments 2 and 4) provided a more reliable vigilance decrement.

STEARMAN AND DURSO

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

122

One other interesting condition in the current report was the low traffic density condition (Experiment 4). In that condition, half of the participants (dual task absent) were merely monitoring aircraft with little active involvement. All participants received complex instructions, were socially isolated, were rarely provided feedback about automation failures, and were merely monitoring aircraft with little active involvement; yet performance was high and there was no vigilance decrement. In fact, the meta-analysis for failure detection suggested that performance actually increased with time on task in low traffic conditions. It is possible that by using an operational task, the task was not seen as monotonous, boring, and repetitive in nature as some laboratory experiments may be (cf., Hancock, 2013). The low traffic density, dual task absent condition (monitoring a display for 50 min with only two opportunities for any interaction) would be considered the most monotonous of the conditions in the current report. The lack of a vigilance decrement in these conditions and the evidence for a performance increment would offer little support for the arousal theory of vigilance. It is possible that even the most monotonous conditions in the current report were less monotonous than other esoteric laboratory vigilance tasks like the Mackworth Clock Task (Mackworth, 1948) and the psychomotor vigilance task (Lopez, Previc, Fischer, Heitz, & Engle, 2012). However, the evidence for a vigilance decrement due to a cognitive load related to the task (instruction complexity) would seem to offer support for the resource theory of vigilance (Kahneman, 1973; Warm et al., 2008). The suggestion is that in a dynamic environment like ATC, although vigilance decrements may be rare, a vigilance decrement could be possible if too much of a cognitive load is placed on operators. Wiggins (2011) found this same type of effect when pilots showed a significant vigilance decrement during a simulated flight for tasks that required a memory (cognitive) component.

References Adams, J. A., & Boulter, L. R. (1964). Spatial and temporal uncertainty as determinants of vigilance behavior. Journal of Experimental Psychology, 67, 127–131. http://dx.doi.org/10.1037/h0046473 Baker, C. H. (1961). Maintaining the level of vigilance by means of knowledge of results about a secondary vigilance task. Ergonomics, 4, 311–316. http://dx.doi.org/10.1080/00140136108930532 Caggiano, D. M., & Parasuraman, R. (2004). The role of memory representation in the vigilance decrement. Psychonomic Bulletin & Review, 11, 932–937. http://dx.doi.org/10.3758/BF03196724 Cumming, G. (2012). The new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge. Davies, D. R., & Parasuraman, R. (1981). The psychology of vigilance. London: Academic Press. Deese, J. (1955). Some problems in the theory of vigilance. Psychological Review, 62, 359 –368. http://dx.doi.org/10.1037/h0042393 Durso, F. T., Stearman, E. J., & Robertson, S. (2015). NextSim: A platform independent simulator for NextGen human factors research. Ergonomics in Design, 23, 23–27. http://dx.doi.org/10.1177/1064804615572624 Elliott, E. (1960). Perception and alertness. Ergonomics, 3, 357–364. http://dx.doi.org/10.1080/00140136008930497 Federal Aviation Administration. (2009). Air traffic NextGen briefing. Retrieved from http://www.faa.gov/air_traffic/briefing/ Funke, M. E., Warm, J. S., Matthews, G., Riley, M., Finomore, V., Funke, G. J., . . . Vidulich, M. A. (2010). A comparison of cerebral hemovelocity and blood oxygen saturation levels during vigilance performance. Proceed-

ings of the Human Factors and Ergonomics Society Annual Meeting, 54, 1345–1349. http://dx.doi.org/10.1177/154193121005401809 Hancock, P. A. (2013). In search of vigilance: The problem of iatrogenically created psychological phenomena. American Psychologist, 68, 97–109. http://dx.doi.org/10.1037/a0030214 Hebb, D. O. (1955). Drives and the C.N.S. (conceptual nervous system). Psychological Review, 62, 243–254. http://dx.doi.org/10.1037/h0041823 Helton, W. S., & Russell, P. N. (2011). Working memory load and the vigilance decrement. Experimental Brain Research, 212, 429 – 437. http://dx.doi.org/10.1007/s00221-011-2749-1 Helton, W. S., & Russell, P. N. (2013). Visuospatial and verbal working memory load: Effects on visuospatial vigilance. Experimental Brain Research, 224, 429 – 436. http://dx.doi.org/10.1007/s00221-012-3322-2 Hitchcock, E. M., Warm, J. S., Mathews, G., Dember, W. N., Shear, P. K., Tripp, L. D., . . . Parasuraman, R. (2003). Automation cueing modulates cerebral blood flow and vigilance in a simulated air traffic control task. Theoretical Issues in Ergonomics Science, 4, 89 –112. http://dx.doi.org/ 10.1080/14639220210159726 Joint Planning and Development Office (JPDO). (2010). Concept of operations for the Next Generation Air Transport System Version 3.0. Washington, DC: Author. Retrieved from https://www.hsdl.org/ ?view&did⫽747519 Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall. Lopez, N., Previc, F. H., Fischer, J., Heitz, R. P., & Engle, R. W. (2012). Effects of sleep deprivation on cognitive performance by United States Air Force pilots. Journal of Applied Research in Memory & Cognition, 1, 27–33. http://dx.doi.org/10.1016/j.jarmac.2011.10.002 Mackie, R. R. (1987). Vigilance research—Are we ready for countermeasures? Human Factors, 29, 707–723. Mackworth, N. H. (1948). The breakdown of vigilance during prolonged visual search. The Quarterly Journal of Experimental Psychology, 1, 6 –21. http://dx.doi.org/10.1080/17470214808416738 Mackworth, N. H. (1950). Researches on the measurement of human performance (Special Report Series No. 268). London, UK: Medical Research Council, HM Stationery Office. McBride, S. A., Merullo, D. J., Johnson, R. F., Banderet, L. E., & Robinson, R. T. (2007). Performance during a 3-hour simulated sentry duty task under varied work rates and secondary task demands. Military Psychology, 19, 103–117. http://dx.doi.org/10.1080/08995600701323392 McVay, J. C., & Kane, M. J. (2009). Conducting the train of thought: Working memory capacity, goal neglect, and mind wandering in an executive-control task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 196 –204. http://dx.doi.org/10.1037/ a0014104 Molloy, R., & Parasuraman, R. (1996). Monitoring an automated system for a single failure: Vigilance and task complexity effects. Human Factors, 38, 311–322. http://dx.doi.org/10.1518/001872096779048093 Parasuraman, R. (1979). Memory load and event rate control sensitivity decrements in sustained attention. Science, 205, 924 –927. http://dx.doi .org/10.1126/science.472714 Parasuraman, R. (1986). Vigilance, monitoring, and search. In K. R. Boff, L. Kaufman, & J. P. Thomas (eds.), Handbook of perception and human performance, Vol. 2: Cognitive processes and performance (pp. 1–39). New York, NY: Wiley. Parasuraman, R., & Mouloua, M. (1987). Interaction of signal discriminability and task type in vigilance decrement. Perception & Psychophysics, 41, 17–22. http://dx.doi.org/10.3758/BF03208208 Pop, V. L., Stearman, E. J., Kazi, S., & Durso, F. T. (2012). Using engagement to negate vigilance decrements in the NextGen environment. International Journal of Human–Computer Interaction, 28, 99 – 106. http://dx.doi.org/10.1080/10447318.2012.634759 Sawin, D. A., & Scerbo, M. W. (1995). Effects of instruction type and boredom proneness in vigilance: Implications for boredom and work-

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

VIGILANCE IN A DYNAMIC ENVIRONMENT load. Human Factors, 37, 752–765. http://dx.doi.org/10.1518/ 001872095778995616 See, J. E., Howe, S. R., Warm, J. S., & Dember, W. N. (1995). Metaanalysis of the sensitivity decrement in vigilance. Psychological Bulletin, 117, 230 –249. http://dx.doi.org/10.1037/0033-2909.117.2.230 Smith, R. L. (1966). Monotony and motivation: A theory of vigilance. Santa Monica, CA: Dunlap and Associates. Teo, G. W., Szalma, J. L., Schmidt, T. N., Hancock, G. M., & Hancock, P. A. (2012). Evaluating vigilance in a dynamic environment: Methodological issues and proposals. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 56, 1586 –1590. http://dx.doi.org/ 10.1177/1071181312561316 Thackray, R. I., Bailey, J. P., & Touchstone, R. M. (1977). The effect of increased monitoring load on vigilance performance using a simulated radar display (FAA/AM-77/18). Oklahoma City, OK: FAA Civil Aeromedical Institute. Thackray, R. I., & Touchstone, R. M. (1989). A comparison of detection efficiency on an air traffic control monitoring task with and without

123

computer aiding (DOT/FAA/AM-89/1). Oklahoma City, OK: FAA Civil Aeromedical Institute. Warm, J. S., & Jerison, H. J. (1984). The psychophysics of vigilance. In J. S. Warm (Ed.), Sustained attention in human performance (pp. 15– 60). Warm, J. S., Parasuraman, R., & Matthews, G. (2008). Vigilance requires hard mental work and is stressful. Human Factors: The Journal of the Human Factors and Ergonomics Society, 50, 433– 441. http://dx.doi.org/ 10.1518/001872008X312152 Wiggins, M. W. (2011). Vigilance decrement during a simulated general aviation flight. Applied Cognitive Psychology, 25, 229 –235. http://dx .doi.org/10.1002/acp.1668

Received April 13, 2013 Revision received November 11, 2015 Accepted November 19, 2015 䡲

Vigilance in a dynamic environment.

Advances in technology have led to increasing levels of automation in modern work environments, moving people to the position of a passive monitor. Wh...
566B Sizes 1 Downloads 10 Views