Local and Global Contextual Constraints on the Identification of Objects in Scenes PETER DE GRAEF, ANDREAS DE TROY, & GERY D'YDEWALLE University ofLeuven, Belgium

Abstract Objects likely to appear in a given real-world scene are frequently found to be easier to recognize. Two different sources of contextual information have been proposed as the basis for this effect: global scene background and individual companion objects. The present paper examines the relative importance of these two elements in explaining the context-sensitivity of object identification in full scenes. Specific sequences of object fixations were elicited during free scene exploration, while fixation times on designated target objects were recorded as a measure of ease of target identification. Episodic consistency between the target, the global scene background, and the object fixated just prior to the target (the prime), were manipulated orthogonally. Target fixation times were examined for effects of prime and background. Analyses show effects of both factors, which arc modulated by the chronology and spatial extent of scene exploration. The results arc discussed in terms of their implications for a model of visual object recognition in the context of real-world scenes.

Research on object identification in real-world scenes has generated strong indications of context-sensitive components in the process of object recognition. The best-documented finding to this effect, is the superior identiflability of objects likely or plausible to appear in the scene they are viewed in (Antes & Pcnland, 198T; Biedcrman, 1981; Boyce, Pollatsek, & Rayner, 1989; De Graef, Christiaens, & d'Ydewalle, 1990). This Probability Effect has generally been interpreted as the result of top-down processes of feature detection and hypo-thesis testing that originate from a global scene schema activated during the very first glance at a novel scene (Biederman, Mezzanotte, & Rabinowitz, 1982; Friedman, 1979). Global scene background (Boyce et al., 1989; Metzger & Antes, 1983) or specific spatial configurations of gross object volumes (Biederman, 1988,1990) have been proposed as the essential input for the process of immediate scene schema activation underlying the Probability Effect. However, a similar effect has been observed in arrays of isolated objects lacking both a common background and a distinctive spatial organization. Canadian Journal of Psychology, 1992, 46, 489-508

490

De Graef, De Troy, & D'Ydewalle

Specifically, it was found that ease of identification increases for objects whose fixation is preceded by a fixation on an object likely lo appear in the same kinds of real-world settings (De Gracf, in press; Henderson, Pollatsek, & Rayner, 1987). In contrast to the top-down scene-to-object explanation of the Probability Effect, this Relatedness Effect has been attributed to the operation of an intra-level object-to-object priming mechanism, i.e. the automatic spreading of activation between episodically and semantically related individual object representations in an object lexicon. The similarity between these two effects has raised the question whether intra-level priming could be responsible for both (Henderson, this issue; Henderson et al., 1987). The advantages of this unified account would be its parsimony and its greater compatibility with prominent data-driven models of object perception (e.g. Biederman, 1987; Marr, 1982; Pentland, 1986). Rather than having to postulate a complex model with two qualitatively different routes to object recognition (De Graef, in press), one merely needs to assume that data-driven access to an object representation primes related object representations, thus reducing the thresholds for establishing a match between them and the object features recovered from the image. Arguing against the object-to-object account of Ihe Probability Effect, Rayner and Pollatsek (this issue) point out its incompatibility with the greater identifiabiUty observed for target objects in scenes which do contain target-related backgrounds but no target-related companion objects (Boyce et al., 1989). Since priming can also not account for context effects rooted in the spatial structure of scenes (Biederman, 1981; De Graef et al., 1990), Rayner and Pollatsek conclude that it should perhaps be abandoned as an, al best, incomplete approach to understanding the context-sensitivity of object identification in scenes. The main purpose of the present study was to attempt lo clarify this ongoing theoretical debate, by testing the relative importance global scene and local object information may have in bringing about the Probability Effect. In our opinion, this is still necessary in spite of the Boycc et al. (1989) demonstration that scene backgrounds and not companion objects affect the accuracy with which a target object's presence can be verified in a tachistoscopically flashed scene. While we will not question the validity of this conclusion (see Henderson, this issue), we do think that the Boyce et al. paradigm is clearly biased towards finding global scene rather than local object influences on target identification. First, this method requires subjects to distribute their attention across the entire scene the moment it is flashed at them, favouring the acquisition of low-resolution, background information over high-resolution, object information (Antes, Singsaas, & Metzger, 1978). Second, if local object influences are due to a priming mechanism, they were unlikely to be observed given the use of displays with peripherally located targets and no objects in the fovea during display exposure. De Graef (in

Local and Global Contextual Constraints

491

press) found that object-to-object priming in arrays of isolated objects required foveation of both the prime and the target. In view of these considerations, we decided to carry out an existence test for priming in full scenes with an oppositely biased paradigm that favoured the appearance of local object influences on target identification. For this purpose we used a method with which we already observed context effects (De Graef et al., 1990). Specifically, we measured object identifiability via fixation times on target objects that were incidentally fixated during a search task. This task involved the free exploration of line drawings of scenes in order to find and count an uncertain number of non-objects, i.e. closed, meaningless figures with a part-structure and size range comparable to that of real objects. Contrary to the speeded target verification and target naming tasks (Biederman et al., 1982; Boyce & Pollatsek, 1992; Boyce et al., 1989), this search task requires that viewers sequentially displace their focused attention and gaze from one possible non-object to the next. The rationale of a local-vcrsus-global test in this task is straightforward: designate a target object in a scene, orthogonally manipulate its relation to the surrounding scene and its relation to the object fixated just prior to it (i.e. the prime), and measure fixation times once the target is foveated. Unfortunately, operationalizing this test is not so straightforward, since it requires that object fixation sequences are brought under experimental control. One possibility we considered was to present subjects with scenes preceded by two fixation crosses marking the position of the to-be-fixated prime and target objects. Subjects could then fixate the prime cross at which time the scene would come on and they would have to saccade to the position previously marked by the target cross. Some pilot work indicated, however, that subjects rarely succeeded in directly saccading towards the correct object. We therefore gratefully borrowed the Boyce and Pollatsek (1992) idea to use rapid, up and down motion of an object (i.e. a "wiggle") to attract the subject's attention and gaze to that object. By asking subjects to start scene exploration at the location of a prime object and initiating scene exposure and a wiggle of the designated target object contingent upon fixation of that location, we hoped to produce the necessary object fixation sequences. Note that, contrary to the Boyce and Pollatsek study, subjects concentrated on finding non-objects in the scenes they were presented with and were not told to look for and fixate the wiggling target object. This was done to keep subjects from widely distributing their attention across the scene, which could bias our research method against finding object-to-object priming effects. Post-experimental questioning revealed that most subjects weren't even aware that objects had wiggled, although they did on every trial. The first issue we wanted to examine with this task, was whether and how identification of a target object was affected by its relatedness to the previously fixated prime object, which was either likely or unlikely to appear

492

De Graef, De Troy, & D'Ydewalle

in the same scene as the target. Based on priming research in arrays of isolated objects (cfr. supra), we expected to find a main effect of Prime, producing shorter fixation times on prime-related than on prime-unrelated targets. Predictions were not so clear with respect to a second issue. Orthogonally to the Prime manipulation, we varied whether the target object was likely or unlikely to appear in the overall scene. Since scene exploration started at the prime and target fixations would ideally follow immediately afterwards, we could predict that Background (i.e. target-scene consistency) should not have any effect on target identification. This because we previously found (De Graef el al., 1990) that, in the context of our search task, effects of target-scene consistency (including the Probability Effect) only surfaced after extended scene exploration. We interpreted this as indicating that context effects are mediated by a gradually constructed situational model of the scene (De Graef, in press).1 However, this rationale for the prediction of a null-effect of Background needs to be questioned. Basis for this doubt is the concern that in our previous work with the search task we might have missed immediate effects of scene context due to the virtual absence of target fixations during the very first fixations at a scene. According to a "zoom lens" model of attention in scene perception (e.g. Loftus, 1983), this is the moment at which visual attention is distributed across the entire scene, prior to zooming in on potentially interesting fixation locations. While Boyce and Pollatsek (1992) recently argued that this zoom is not perfect by the second fixation on the scene, it seems reasonable to expect an object-oriented narrowing of attention over the course of scene exploration. Under the general assumption that the most recently attended information figures most prominently in the contextual model that influences the processing of new information, this would imply a shift in importance from global scene context to local object surroundings and object companions (see Henderson, this issue, for a defense of this local-processing hypothesis). Since the acquisition of local context requires I It may seem contradictory to find slowly emerging effects of scene context, and to still consider object-to-object priming as a possible basis for the Probability Effect. By definition, inter-object priming is immediate once the prime has been processed and its operation should not depend on the amount of scene exploration. Its observation in reduced target fixation times, however, may be influenced by how early on in scene exploration the target object was fixated. First, early target fixation necessarily implies a reduction in the likelihood that another object was fixated before and coutd serve as a prime. Second, it seems reasonable to assume that as scene exploration progresses the scene background will become increasingly differentiated from the objects in the scene. To the extent that objects are the perceptually most relevant fixation locations in natural scenes (Antes, Singsaas, & Metzger, 1978; Metzger & Antes, 1983), this implies that object-to-object scanning patterns are less likely to be interrupted by background fixations, which again increases the likelihood of observing inter-object priming effects.

Local and Global Contextual Constraints

493

more time (Melzger & Antes, 1983), the underrepresentation of object identification measures in the very beginning of scene exploration, could have biased our previous research towards finding late effects of (local) context. An examination of Background effects on target and prime fixation times in the present experiment, should allow us to determine whether this was indeed the case. Method STIMULI 20 black-on-white line drawings of different scenes were selected from the set employed in our previous research (Dc Graef et al., 1990). For each of these scenes (the related backgrounds), two objects were selected (the target and the related prime) which had a high probability to appear in the scene. For each of these object pairs, a second scene was selected which was very unlikely to contain either of the two objects (the unrelated background). Finally, for each unrelated background, one object was selected with a high rated probability to appear in it (the unrelated prime). A complete list of targets, related and unrelated backgrounds, and related and unrelated primes can be found in Appendix A. On the basis of this list, six different displays were created for each target. First, the target was placed in its related and unrelated backgrounds, and this in such a manner that no spatial violations (De Graef et al., 1990) of object-scene relations resulted. Subsequently, three versions of each target-background combination were constructed by choosing a starting position for scene exploration and placing the related prime, or the unrelated prime, or a non-object, at that position. The two displays with the non-objects (most of them adapted from the set provided by Kroll & Potter, 1984) had to ensure that subjects, who explored each presented display in order to count the number of non-objects in it, would not simply ignore the prime which was always presented at the starting position for scene exploration. The four remaining displays provided the necessary stimuli for the orthogonal manipulation of target-background and target-prime relatedness. An illustration of the different display types is presented in Figure 1. The construction of the 120 displays (20 targets x 2 backgrounds x 3 starting objects) was completed by inserting varying numbers of non-objects. This was done in such a manner that neither the Related and Unrelated Background conditions (RB and UB), nor the Related and Unrelated Prime (RP and UP) conditions differed in terms of the average number of non-objects they contained. SUBJECTS, PROCEDURE AND APPARATUS

t2 subjects from the University of Leuven subject pool were paid for their participation in the experiment. All of the subjects had normal vision and none required corrective lenses.

494

Graef, De Troy, & D'Ydewalle

Figure i a-f. Three displays with a related background for the target-object 'pig' are presented in la-ic, displays with an unrelated background are in id-f. Displays had either a target-related object at the starting position for scene exploration (chickens in la and id), or a target-unrelated object (shopping cart in ib and le), or a non-object (tc and lO-

Upon arriving for the experiment, subjects were seated 130 cm away from the stimulus screen. They were told that they would participate in one of a series of experiments on how good people are at detecting certain kinds of information in images of varying complexity. In this particular experiment they would have to determine whether line drawings depicting real-world

Local and Global Contextual Constraints

495

scenes contained drawings of nonexistent objects. In order to illustrate the concept, subjects were given a page (see Appendix B) providing them with examples of existing and nonexistent objects. They were then told that their accuracy in detecting non-objects would be evaluated in two ways. First, after each stimulus they would have to press a response key once for each non-object they had noticed in the scene. Second, their eye movements would be registered during the entire display exposure to determine whether they had in fact localized all non-objects in the display or had merely guessed how many were present. Following these instructions, individual bite-bars were prepared for each subject and were mounted in a head rest in order to eliminate head movements and to keep viewing distance constant. Subsequently, the eye tracker was calibrated and subjects received three series of 42 trials, i.e. two practice and 40 experimental trials, with a 5 min rest period between each series. In this manner, each subject saw all 120 experimental stimuli. Presentation order of the six display types for each target was counterbalanced across subjects. Within this counterbalancing scheme, presentation order was randomized but barred consecutive stimuli from showing the same scene or target. Each trial involved the following events. First, the stimulus screen was blanked and subjects were asked lo fixate the middle in anticipation of a fixation cross to appear at an unknown position on the screen. Subjects were instructed that, upon appearance of this cross, they had to saccade to it and fixate it carefully, in order to initiate stimulus exposure. When the eye tracker detected the onset of a fixation on the cross, an 8-s exposure of the stimulus was automatically initiated. After a 160 ms interval, the target object in the display was wiggled by moving it up and down twice with an amplitude of 4 min of arc. Following stimulus exposure, subjects responded by pressing a key in front of them. No feedback was given until the end of the experiment, which lasted about 50-60 min. Eye movements were recorded with a Generation-v dual-Purkinje-image eye tracker (Crane & Steele, 1985). This system has an accuracy of 1 min of arc and a 1000-Hz sampling rate and was interfaced to an Intel-386 computer keeping a complete record of the x and Y coordinates of the subject's point of regard. For each incoming measurement (i.e. every ms), the computer made an on-line decision about whether the eye was fixating or saccading by simultaneously comparing the current eye position at ms n to that at ms n-i and at ms n-4. When differences were smaller than 3 and 6 min of arc respectively, the eye was taken to fixate. This decision was continuously checked (i.e. every ms) by a second lntel-386 computer, which controlled stimulus presentation by means of an AT&T Truevison Vista board. Stimuli appeared in 50-Hz non-interlaced mode on a Barco 6351 CRT with a 740 x 578 resolution, and subtended approximately 16 x 12 degrees of visual angle.

496

De Graef, De Troy, & D'Ydewalle

TABLE i Characteristics of Prime-Target Fixation Sequences Related Background

Proportion Targets Unfixated Prime-Target Fixation Lag' Proportion Perfect Sequences Mean Prime-Target Distance2 Note.

Unrelated Background

Related Prime

Unrelated Prime

Related Prime

Unrelated Prime

'7-35

17-74

15.16

n.to

3.71/1640

3.80/1699

3.45/1182

3-43/125"

35-30

38-44

31-59

36.32

7-73

74'

7-25

7.22

' Mean number of intermediate fixations/Mean lag time in ms Prime center to target center in visual degrees

2

Results and discussion All subsequent analyses pertain to the eye-movement data obtained for displays in which the target was accompanied by its related or unrelated prime. Due to track loss at scene onset or to false detection of fixation onset, the first recorded scene fixation (i.e. a period of eye stability of more than 40 ms) did not always fall on the prime. These false start-trials (11.5 %) were excluded from all further analyses. Since subjects could freely explore the scenes and were not told to look for the target or the wiggle, we could not expect to record consecutive prime and targei fixations on all remaining trials. Subjects failed to fixate targets on 15.3 % of the trials, and made intermediate fixations between prime and target on 50.7 % of the trials. Table 1 presents likelihood of target misses, likelihood of a perfect prime-target fixation sequence, and the length of the time lag between end of prime and beginning of target fixation when they were not fixated consecutively. Subjects x Prime x Background analyses of variance (ANOVA) on these variables revealed one reliable effect: the prime-target fixation interval was 454 ms shorter when the target appeared in an unrelated background [/(i, 11) = 7.16, use = 2506546, p = .0231. To test the relative impact of priming and background effects, first fixation and gaze durations on targets were subjected to a Subjects x Prime x Background ANOVA.2 Because fixations between prime and target fixation 2 When an object is looked at for the first lime in scene exploration, it may receive several spatially disparate fixations before the eye turns away from it. The length of the first of these fixations, the first fixation duration, has been challenged as an overly conservative measure of object identification time. Hence the parallel analysis of gaze durations, i.e. sum of consecutive fixation durations before the eye turns away. This measure, however, may be too liberal

Local and Global Contextual Constraints

497

TABLE 2 Mean Fixation Times on Target Objects in No Lag-trials and Lag-trials Target's Relation to Background Target's Relation to Prime

Related (RB)

Unrelated (UB) No Lag-trials

Related (RP) Unrelated (UP)

1

240/356 232/375

271/395 238/452 Lag-trials

Related (RP) Unrelated (RP)

241/398 250/423

240/367 236/426

Note. ' First fixation duration (in ms)/Gaze duration (in ms)

dilute experimental control over priming effects, cases with intermediate fixations (Lag-trials) were analyzed separately from cases without (No Lag-trials). Mean fixation times are presented in Table 2. For the No Lag-trials, first fixation durations were unaffected by Prime [F(I, 11) = 1.64, MSe - 14528, p = .226], Background [h\i, 11) = 2.10, MSC = 9751, p = .175], or their interaction [F(i, TI) = 1.04, MSC = 8658, p = .330J. Gaze durations showed no reliable effects of Prime \F(I, 11) = 2.92, MSt = 28638, p = .115J or of the Prime x Background interaction [F(I, I I ) < 1, MSC = 43652], but did reveal reliably longer gazes on targets in unrelated backgrounds \F{\, 11) = 8.66, MSC = 21993, P - 013J. For Lag-trials, none of the effects involving Prime or Background approached reliability. Contrary to expectations, no evidence was found for object-to-object priming. Previous failures to observe priming have been attributed to high-quality target previews associated with small prime-to-target distances (Henderson et al., 1987). Average prime-target distance in the No Lag-trials was 5.9 degrees, comparable to the 5 degrees at which Henderson et al. (1987) failed to find priming effects. To examine the possible impact of prime-target separation, it was included as a covariate in a Subjects X Prime x Background analysis of covariance (ANCOVA) on fixation times in the No-Lag trials.3 Effects of primary interest in this ANCOVA were the interactions of Distance with Prime and Background, which reflect the heterogeneity and is likely to reflect post- identification processes (see Henderson (in pressc) for a more extensive discussion). All analyses of eye movement data were carried out with individual object fixations as the unit of analysis in a General Linear Model approach (Kirk, 1982). 3 Subjects x Prime x Background analyses on prime-target distances in the Lag and No Lag-trials revealed no differences between the conditions, allowing the inclusion of distance as a true covariate.

De Graef, De Troy, & D'Ydewalle

498 500 450 400

520 -

RP/UB

500

— uP/ua

400

339 -•-

RP/RB

160

UP/RB

440 420 400 380 MO 340 320300-

Prime-Target Distance (vbual degrees)

Prime-Target Distance (visual degrees)

Fig. 2 a-b. First fixation and gaze durations as a function of prime-target distance and Prime and Background conditions.

of slopes for the linear functions relating distance to fixation times in the various conditions. For first fixation durations, a reliable Distance x Prime interaction [F(i, 236) = 5.06, MSe = 15454, P = O2 5] w a s qualified by a Distance x Prime x Background interaction \F(i, 236) = 4.04, MSe = 15454, p = .045]. This interaction resulted from zero-slopes in all conditions except for the RP/UB condition which had a negatively sloped function (b = -25, t^ = -3.41, p = .0008). For gaze durations, the three-way interaction did not prove to be reliable [F(i, 236) = 1.9, MSC = 61931, p - .169], but Distance did interact with Prime [F(i, 236) = 4.3, MSC = 61931, p = .039], with a zero-slope for related pairs and a positive slope (b = 20, l0 = 2.34, p = .02) for unrelated pairs. Figure 2 plots the relevant equations and shows that consideration of prime-target separation is essential to an evaluation of priming effects. Spatial contiguity of a related prime and target increased first target fixation durations (Figure 2a) when prime and target were placed in an inappropriate background (RP/UB condition). In addition, an increase in spatial separation was paralleled by an increase in target gaze durations (Figure 2b), but only when the target was preceded by an unrelated prime (UP condition). Importantly, these Prime x Distance interactions were only observed when prime and target were fixated in succession: an ANCOVA on the Lag-trials revealed no impact of Distance on either Prime or Background effects. While Distance did modulate Prime effects, no such modulation was found for Background, leaving us with the observation that a target-related background decreased target gaze durations in the No Lag-trials. This Background effect appears to support the notion of global-to-local context effects in the earliest stages of scene exploration. Further evidence for this conclusion was sought in an analysis of fixation times on primes which were foveated on the very first scene fixation. The analysis was limited to gaze durations because subjects were already fixating the prime's position when the scene came on, making it impossible to consider first fixation durations on a prime as a reliable reflection of the ease with which the prime was encoded. In addition to prime-background relatedness, we also included target-

Local and Global Contextual Constrainls

499

TABLE 3 Mean Gaze Durations on Primes (in ms) Target's Relation to Background Prime's Relation to Background

Consistent (CT)

Inconsistent (IT)

Consistent (CP) Inconsistent (IP)

479 45"

490

402

background relatedness in this Subjects x Prime Consistency x Target Consistency analysis. This because fixation time on the foveal prime could have been influenced by the extrafoveal target's power to attract attention, which in its turn could be influenced by the target's relation to its background (Antes & Penland, 198T; Loftus & Mackworth, 1978). As can be seen in Table 3, the prime's relatedness to its background did not have a reliable effect on gaze duration |F(i, 11) = 2.64, M5e = 80875, p = .132]. However, gaze durations did show a clear Prime Consistency x Target Consistency interaction [F(\, I I ) = 29.62, AfSc = 20555, P = .0002]. Note that this interaction could in fact be alternatively formulated as a main effect of prime-target relatedness: gaze durations were longer when prime and target had the same relation to the background, i.e. when target and prime were related to each other. However, a posteriori contrasts revealed that the 1P/CT condition did not reliably differ from the CP/CT condition [F(l ,11) = 1.49, MSe = 35921,p = .248I, or from the IP/TT condition [ F ( I , I I ) = 2.06, MSe = 55898, p = .179]. This shows that simply having a prime-unrelated target in the scene (IP/CT condition), was insufficient to reliably decrease gaze duration on the prime, relative to scenes where the target was prime-related (CP/CT and IP/IT conditions). Rather, this decrease seems to be contingent upon primebackground consistency (a reliable 88 ms difference between CP/IT and ip/rr, F(I, 11) = 11.42, MSC - 64156, p = .006), and the simultaneous presence of an extrafoveal background-inconsistent target (a reliable 77 ms difference between CP/IT and CP/CT, F(i, 11) = 9.72, MSt = 60814, P = -O°9)General Discussion Our First purpose in this study was to clarify whether episodic relatedness between individual objects could have an impact on ease of object identification in the context of full, real-world scenes. The results indicate that such an impact is possible. However, they also show that this influence can not simply be identified with the facilitory priming effect which in studies with isolated objects has been taken to indicate passive spreading of activation in an object lexicon (e.g. Henderson, this issue; Huttenlocher & Kubicek, 1983; Kroll & Potter, 1984). Looking at target gaze durations only, a straightforward priming interpretation remains viable. It does seem reasonable to interpret the observed

500

De Graef, De Troy, & D'Ydewalle

Distance x Prime interaction as a Preview Quality x Prime inleraction, in which a related prime compensates for a loss in extrafoveal target visibility which normally produces longer gaze durations (Henderson, 1992). However, when first fixation data are also laken into consideration, this appears to be too simple a picture. The main complication lies in the finding that a related prime produces a marked negative effect relative to an unrelated prime, when prime and target are spatially contiguous in an inappropriate scene (RP/UB condition). Previous studies with isolated objects have established Ihe disappearance of related prime superiority with smaller prime-target distances (Henderson, 1992; Henderson et al., 1987) but never reported a reversal of the effect. The reason for this discrepancy may very well lie in the presence of a full scene context for the prime-target pair. One account consistent with this, is that scene perception inevitably involves the construction of a coherent situational model of the scene and that global background and local object clusters both are an integral part of the information going into that model. On this view, consecutive fixation of spatially close, related objects in an inappropriate background will establish the presence of an internally coherent local object cluster in an inconsistent global environment. Longer target fixation times could therefore reflect the processing difficulty associated with contradictory local object cluster and global background contributions to the overall situational model. While facilitory priming and cluster-background inconsistency could provide explanations for gaze and fixation durations separately, this also highlights a puzzling discrepancy between these two measures: why did only gaze durations exhibit the classic priming effect? Even when local object and global background information were consistent, a related prime did not appear to reduce initial target fixation time, as suggested by the statistically indiscriminate Distance X First Fixation Duration functions for the RP/RB condition versus the UP/RB and UP/UB conditions. Given the effect in the RP/UB condition, this can not be explained by assuming that, unlike gaze duration, first fixation duration is primarily determined by non-cognitive visuomotor processes (O'Regan & Levy-Schocn, 1987), and therefore should not reflect a priming effect. Alternatively, there appears lo be little ground for claiming that priming only affects post-encoding components of the object identification process and consequently is more likely to surface in gaze than in first fixation measures. Earlier demonstrations of priming effects in first fixation durations (De Graef, 1990; Henderson et al. 1987), as well as evidence for priming effects on the rate of visual information acquisition (Reinilz, Wright, & Loftus, 1989) argue against this claim. At present, we can only offer a tentative alternative for these unsatisfactory explanations for the absence of a priming effect in the first fixation data. In a recent model relating visual attention to eye movement control, Henderson

Local and Global Contextual Constraints

501

(in pressa) argues that it is necessary to assume that fixations can be cut off before identification of the fixated stimulus is nearing completion. Henderson emphasizes a global oculomotor strategy to keep the eyes moving as the prime determinant of such a fixation cutoff point. This emphasis, however, may be somewhat biased by the fact that most evidence cited in this matter (e.g. Pollatsek & Rayner, 1990) has been gathered in the context of reading, a task with a highly stereotypic global oculomotor pattern. In tasks which impose less a priori constraints on eye movements, the criterion for fixation cutoff may be more closely tied to the actual process of visual information intake. Specifically, it may be the case that a fixation will no longer be sustained when the rate of information intake from the external stimulus or the rate of activation increase for an internal representation drops below a criterion value. Under the view (Reinitz el al., 1989) that priming increases the rate of visual information intake, this would imply that first target fixations in the Related Prime conditions are less likely to be cut off prematurely than those in the Unrelated Prime conditions. Evidence for this premature cutoff was found in analyses of the number of consecutive target fixations and of subsidiary target fixation time, i.e. the time between the end of the first fixation and the end of the gaze. Both measures showed an effect of Prime, with more consecutive fixations |>"(l, II) = 5.23, MSe = .332, p = .043] and more subsidiary fixation time ff(i, 11) = 5.03, MSe = 39064, p = .046] for unprimed targets. The logical possibility that an early fixation cutoff for unprimed targets could have prevented a processing advantage for primed targets from emerging in the first fixation durations, underscores the need for simultaneous consideration of additional eye- movement measures in drawing inferences about visual information processing. The second issue addressed in this study, was whether an object's episodic consistency with the scene it appears in could affect the object's ease of identification early on in scene perception. Apparently, our concern was justified that in our earlier work (De Graef et al. 1990) we might have missed this effect by not probing object recognition early enough in scene exploration. An indirect indication of an immediate effect of scene-context could be inferred from the Prime Consistency x Target Consistency interaction in the prime fixation data (Table 3). Scene-inconsistent objects have repeatedly been argued to more effectively draw the viewer's gaze than scene-consistent objects (Antes, Penland, & Metzger, 1981; Loftus & Mackworth, 1978; Loftus, 1983). Note that the reliably shorter prime-target fixation lags in the present study (Table 1) appear to support this argument4. One could therefore 4 We do not want to conclude from this that episodically inconsistent objects in the exlrafuveal portions of a scene are recognized as such during the first few glances at the scene, and will then inevitably draw the viewer's gaze. First, it should be noted that the earlier fixation of inconsistent objects during scene exploration is an erratic phenomenon. Observed

502

De Graef, De Troy, & D'Ydcwalle

expect foveal prime processing to be interrupted more often by the wiggling of a scene-inconsistent target. However, shorter prime fixation times were only observed when the prime itself was consistent with the scene. In line with findings of an increase in perceptual span with a decrease in foveal processing difficulty (Henderson, in pressa, Henderson & Ferreira, 1990), this could indicate that scene-consistent primes were easier to process. A direct and more convincing indication of early scene-context effects is found in our observation of longer gaze durations for background-unrelated targets in the No Lag-trials. This effect and its disappearance in the Lag-trials seem to bridge the gap between our previous findings of delayed context effects (De Graef et al., 1990) and the recent Boyce and Pollatsek (1992) findings of immediate context effects. Specifically, taken together the three sets of data can be taken to suggest that global background has a contextual impact which disappears after the first two or three fixations on the scene. The global scene characterization will then begin to make way for more highly specified contextual models which need some time to develop since they are centered on local object environments and individual companion objects. As already mentioned in the introduction, this global-to-local shift in contextual prominence is assumed to be guided by the principle that the most recently attended scene information carries most weight in the situational model that influences current information processing.5 Clearly, this presumes an object-oriented focusing of attention over the course of scene exploration. Further research on the evolution of perceptual span in scene exploration will need to establish the validity of this assumption (for some critical comments on this see Boyce & Pollatsek, 1992, and also Henderson, Pollalsek, & Rayner, 1989). The observation of an object-scene consistency effect raises the question of the locus of impact of the effect: which components of visual object in the present data and in Loftus and Mackworth (1978), there was no indication of it in several other studies of scene exploration (Antes & Penland, 1981; De Graef et al., 1990). Second, even if one can assume that inconsistent objects will generally be more effective in attracting the viewer's eye, it still needs to be demonstrated that this greater attractiveness is rooted in an evaluation of episodic consistency. As argued by various authors (Antes & Penland, rgSi; De Graef, 1990; Huttenlocher & Kubicek, 1983; Rayner & Pollatsek, this issue) this high-level congruency may correlate strongly with a low-level visual object-context discriminability. Until a satisfactory measure of visual similarity between objects and their surroundings has been developed, one can not rule out the possibility that inconsistent objects are more salient in a strictly low-level analysis of potential fixation locations (Mahoney & Ullman, 1988). 5 For instance, when the first glance at a scene sets up the model "farm", the subsequent recognition of a rooster on top of something may be helped. But will this still be the case when the viewer first discovers that the farm scene contains a desk with a computer on it, and then proceeds to identify the rooster that is sitting on top of the computer4?

Local and Global Contextual Constraints

503

processing are affected? Our present data indicate one possible constraint on a viable answer to this question: the absence of a Distance x Background interaction in the No Lag-trials. Taking into consideration that Prime did interact with Distance we propose the following account. At a very coarse level of theorizing, visual object recognition can be characterized as involving two major processes: I) computation of an intermediate structural object description from the image, and 2) connecting that intermediate representation to a stored object representation (e.g. Biederman & Cooper, T99.T; Marr, T982; Rock, 1985). Within this framework, it seems reasonable to locate the effect of an extrafoveal object preview on the first process. This is suggested by the finding that visual similarity between a previewed and subsequently foveated object is the main facilitator of naming time for that foveated object (Pollatsek, Rayner, & Collins, 1984; Pollatsek, Rayner, & Henderson, 1990). Under the assumption that effects of prime-target distance on target fixation time can be regarded as effects of target preview quality, our present data and earlier findings (Henderson et al., 1987) indicate an undcradditive relation between inter-object-priming and preview quality: priming decreases as preview quality increases (Henderson, 1992). This relation can be captured in a model that 1) views the effect of priming as an increase in the rate at which visual evidence for an intermediate object description is accumulated, and 2) views the effect of a preview as reducing the amount of evidence that still needs to be accumulated during foveation before the intermediate object description is gated into the process of matching it to a stored object representation. The preview-insensitive effect of object-scene consistency is then integrated in this framework by assuming that a consistent (inconsistent) background lowers (elevates) the threshold for establishing the match between intermediate and stored object representation. Conclusion With this paper, we have attempted to increase our understanding of how visual recognition of an object can be affected by its relation to the real-world scene it appears in. Two possible sources of contextual impact on object identification were plotted against each other: global scene background and individual companion objects. In agreement with previous research, both types of context were found to affect object recognition as measured by recordings of object fixation times during scene exploration. Complementary to previous research, the observed pattern of results suggests a global-to-local shift in contextual impact as scene exploration progresses. While this challenges an exclusively top-down, scene-lo-objecl account of context-effects, a similar challenge is presented for a strictly inlra-level object-to-object priming account. Automatic priming could be compensating for the lower extrafoveal preview-quality of distant objects, but can not explain the adverse effect of a related prime on the processing of a near, scene-inconsistent target object.

504

De Graef, De Troy, & D'Ydewalle

This effect clearly suggests that objects in scenes can not be treated as independent entries in a cross-referenced lexicon, since they can cluster into higher-order 'mini-scenes'. Finally, the present data have lead us to hypothesize a distinction between global and local contextual information in terms of the locus of their impact on the object recognition process. Further empirical validation of this distinction seems to be a promising next step in the development of a coherent model of object identification in the context of real-world scenes. The reported work was supported by the Belgian government through agreement RFO/Ai/04 of the Incentive Program for Fundamental Research in Artificial Intelligence. Portions of the data were presented at the Sixth European Conference on Eye Movements, Leuven, Belgium. Our thanks to Sandy Pollatsek for helpful comments on an earlier version. Reprint requests to Peter De Graef, Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium. References Antes, J. R., & Penland, J. G. (1981). Picture context effects on eye movement patterns. In D. F. Fischer, R. A. Monty & J. W. Senders (Eds.), Eye movements: Cognition and visual perception (pp. 157-170). Hillsdale, NJ; Krlbaum. Antes, J. R., Penland, J. G., & Metzger, R. L. (1981). Processing global information in briefly presented pictures. Psychological Research, 43, 277-292. Antes, J. R., Singsaas, P. A., & Metzger, R. L. (1978). Components of pictorial informativeness. Perceptual and Motor Skills, 47, 459-464. Biederman, I. (1981). On the semantics of a glance at a scene. In M. Kubovy & J.R. Pomerantz (Eds.), Perceptual Organization. Hillsdale, NJ: Erlbaum. Bicdcrman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115-147. Biederman, I. (1988). Aspects and extensions of a theory of human image understanding. In Z. W. Pylyshyn (Ed.), Computational processes in human vision: An interdisciplinary approach (pp. 370-428). Norwood, NJ: Ablex. Biederman, I. (1990). Higher level vision. In D.N. Osherson, S. M. Kosslyn, & J.Hollerbach (Eds.), An invitation to cognitive science. Vol. It: Visual cognition and action (pp. 41-72). Cambridge: MIT Press. Biederman, I., & Cooper, E. (1991). Priming contour-deleted images: Evidence for intermediate representations in visual object recognition. Cognitive Psychology, 23, 393-419Biederman, I., Mczzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14, 143-177.

Local and Global Contextual Constraints

505

Boyce, S. J., & Pollatsck, A. (1992). Identification of objects in scenes: The role of scene background in object naming. Journal of Experimental Psychology: learning, Memory, and Cognition, 18, 531-543Boycc, S. J., Pollatsek, A., & Rayner, K. (1989). Effects of background information on object identification. Journal of Experimental Psychology: Human Perception and Performance, IS, 556-566. Crane, H. D., & Stcclc, C. ([985). Generation-v dual-Purkinje-imagc cyctracker. Applied Optics, 24. 527-537. De Graef, P. (1990). Episodic priming and object probability effects. Unpublished Masters thesis, Department of Psychology, University of Massachusetts, Amherst. Dc Graef, P. (in press). Scene-context effects and models of real-world perception. In K. Rayner (Ed.), Eye movements and visual cognition. New York: Springer Vcrlag. De Graef, P., Christiaens, D., & d'Ydewalle, G. (1990). Perceptual effects of scene context on object identification. Psychological Research. 52, 317-329. Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108, 316-355. Henderson, J. M. (1992). Identifying objects across saccades: Effects of extrafovcal preview and flanker object context. Journal of Experimental Psychology: [.earning. Memory, and Cognition, 18, 521-530. Henderson, J. M. (in pressa). Visual attention and eye movement control during reading and picture viewing. In K. Rayner (Ed.), Eye movements and visual cognition. New York: Springer Verlag. Henderson, J. M. (in pressfr). Eye movement control during visual object processing: Effects of initial fixation position and semantic constraint. Canadian Journal of Psychology. Henderson, J. M., & Ferreira, F. (1990). Effects of fovcal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16. 417-429Henderson, J. M., Pollatsek, A., & Rayner, K. (1987). Effects of foveal priming and cxtrafoveal preview on object identification. Journal of Experimental Psychology: Human Perception and Performance, 13, 449-463. Henderson, J. M., Pollatsek. A., & Rayner, K. (1989). Covert visual attention and cxtrafoveal information use during object identification. Perception & Psychophysics, 45, 196-208. Huttcnlocher, J., & Kubicek, L. F. (1983). The source of rclatcdness effects on naming latency. Journal of Experimental Psychology: Learning, Memory and Cognition, y, 486-496. Kirk, R. E. (1982). Experimental design: Procedures for the Behavioral Sciences. Monterey, CA: Brooks/Cole.

506

De Graef, De Troy, & D'Ydewalle

Kroll, J. F., & Potter, M. C. (1984). Recognizing words, pictures and concepts: A comparison of lexical, object and reality decisions. Journal of Verbal Learning and Verbal Behavior, 23, 39-66. Loftus, G. R. (1983). Eye fixations on text and scenes. In K. Rayner (Ed.), Eye movements in reading (pp. 359-376). New York: Academic Press. Loftus, G. R., & Mackworth, N. H. (1978). Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology; Human Perception and Performance, 4, 565-572. Mahoney, J. V., & Ullman, S. (1988). Image chunking defining spatial building blocks for scene analysis. In Z. Pylyshyn (Ed.), Computational processes in human vision: An interdisciplinary perspective (pp. [69-209). Norwood, NJ: Ablex. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: Freeman. Metzger, R. L., & Antes, J. R. (1983). The nature of processing early in picture perception. Psychological Research, 45, 267-274. O'Regan, K., & Levy-Schoen, A. (1987). Eye-movement strategy and tactics in word recognition and reading. In M. CoHheart (Ed.), Attention and Performance XII (pp. 363-384). Hillsdalc, NJ: Erlbaum. Pentland, A. P. (1986). Perceptual organization and the representation of natural form. Artificial Intelligence, 2H, 293-331. Pollatsek, A., & Rayner, K. (1990). Eye movements and lexical access in reading. In D. A. Balota, G. B. Florcs d'Arcais, & K. Rayner, (Eds.), Comprehension processes in reading (pp. 143-163). Hillsdalc, NJ: Erlbaum. Pollatsek, A., Rayner, K., & Collins, W. E. (1984). Integrating pictorial information across eye movements. Journal of Experimental Psychology: General, it 3, 426-442. Pollatsek, A., Rayner, K., & Henderson, J. M. (1990). The role of spatial location in the integration of pictorial information across saccades. Journal of Experimental Psychology: Human Perception and Performance, 16, 199-210. Rcinitz, M. T., Wright, E., & Loftus, G. (1989). Effects of semantic priming on visual encoding of pictures. Journal of Experimental Psychology: General, TI8, 280-297. Rock, I. (1985). Perception and knowledge. Ada Psychologica, 59, 3-22.

Appendix A Related Background

Related Prime

Target

Unrelated Prime

Unrelated Background

Living room Backyard Farm Playground Classroom Gas station Gym Laundrene Workshop Locker room Workshop Kitchen Chemisrry Lab Farm Cone en Hall Street Office Bathroom Supermarket Bathroom

Hifi rack Watering can Pitch fork Skateboard Globe Motorcycle Punching ball Ironing board Axe Shoe* Hammer Blender Microscope Chickens Piano Traffic sign Stapler Toilet Shopping cart Blowdrver

Speaker Lawnmower Shovel Roller skates Map Gas pump Boxing gloves Laundry basket Saw Socks Wrench Rolling pin Testtubes Pig Cello Parking meter Paper punch Toilet paper Shopping basket Brush

Litter bin Harp Briefcase Alarm clock Cocktail Weights iron Barbecue Egg

Bus terminal Concert hail Library Bedroom Bar

Funnel

Saltshaker Computer Table lamp Shopping cart Wheelbarrow Parasol Loaf of bread T,V. set Forklift Pitcher

Gym

Laundrette Backyard Dining room Kitchen Restaurant Office Living room Supermarket Construction site Beach Kitchen Living room Waterfront Bar

I a

g O 3 Fs X

E"

a. n o

508

De Graef, De Troy, & D'Ydewalle

Appendix B Objects

Non-objects

Local and global contextual constraints on the identification of objects in scenes.

Objects likely to appear in a given real-world scene are frequently found to be easier to recognize. Two different sources of contextual information h...
915KB Sizes 0 Downloads 0 Views