Variant Points of View on Viewpoint Invariance KARL VERFAILLIE University ofUuven, Leuven, Belgium

Abstract In order to recognize an object, the visual system must make abstraction of proximal stimulus variations concomitant with the incidental vantage point. Theoretical models can be distinguished according to the degree to which they require the achievement of viewpoint independence prior to matching a stored object model. Recognition-by-components is one theory which incorporates the realization of general viewpoint invariance as one of its hallmarks. Some aspects.of this theory, especially the orientation independence of the represented relations between object parts, arc scrutinized. Next, an alternative approach is sketched in which object recognition is accomplished on the basis of a stimulus description which is dependent on the object's orientation, but which makes abstraction of other stimulus variations. Relevant neurophysiological findings are discussed, as well as behavioural evidence from experiments investigating orientation-dependent priming effects in the perception of biological motion. Resume Pour reconnaitrc un objet, le systeme visuel doit fairc abstraction des variations de stimuli pmximaux, qui se produisent a partir du point d'observation fortuit. Lcs modules thc~oriqucs se distinguent par le degre' d'abstraction qu'ils font du point de vue dans I'identification des objets par association a des modeles objets memorises. L'identification des objets a partir de leurs composanles constitue une thcoric dont 1'une des caracteristiques est I'invariance generate du point de vuc. L'article porte entrc autres sur certains aspects de ccttc theorie, notamment la representation des relations entre lcs parties d'objet peu importe leur orientation. L'auteur expose ensuite les grandes lignes d'unc autre approche scion laquellc 1'identification des objets repose sur la description d'un stimulus qui est fonction dc I'orientation des objets, mais qui fait abstraction des variations des autres stimuli. II traite enfin des conclusions ncurophysiologiqucs pertinentes et des 6I6ments dc prcuve behavioristcs, etablis a partir d'experiences sur les effets d'amorc.agc qui dependent dc I'orientation dans la perception des mouvements du corps humain.

Canadian Journal of Psychology, 1992, 46:2, 215-235

216

Verfaillie

The problem: Object constancy under variable viewing conditions The achievement of visual object recognition is no sinecure. One of the reasons is that the appearance of an object can vary drastically with changing viewing conditions. First, even with a constant relation between viewer and viewed object, a change in illumination affects the characteristics of the image considerably. Second, since an object can appear in a myriad of different positions and orientations vis-d-vis the observer, its projected shape will vary radically. Most current models of visual perception assume that in order to achieve recognition the visual system must compute the correspondence between a description of the projected shape and a stored object representation. The fact that the input is highly dependent on the temporary viewpoint imposes important constraints on this so-called indexing process. A successful match requires (i) the computation of a stimulus description which makes abstraction of "uninteresting" image characteristics, (2) a stored object representation which is defined in terms of shape properties that remain stable over "irrelevant" changes in viewing conditions, and (3) the execution of appropriate matching procedures. There is some debate on the issue to what degree abstraction must be made of the object's temporary appearance before it can be recognized. One can imagine a continuum along which different theoretical models are situated. At one extreme, stored object models are completely viewpoint independent and general viewpoint invariance must be achieved prior to recognition. At the other extreme, viewpoint-dependent information is inherent to the object representation: The object's shape is represented as it appears in particular viewer-dependent positions and orientations. The former position stresses the generalization across stimulus variables prior to matching. The latter alternative emphasizes the specificity of the stored object model(s). Between those two extremes, several variant theoretical models are conceivable depending on the types of generalization and specificity they postulate. Of course, any theoretical enterprise which tries to derive a shape description appropriate for recognition directly from the retinal array is doomed to fail (Pinker, 1984). Before matching procedures can come into play, considerable processing already must have taken place, dealing, for instance, with occasional illumination conditions. Therefore, the extreme position of strictly viewpoint-dependent stored models is untenable: At least some generalization must be achieved prior to matching. However, some authors have extended this reasoning to the other end of the continuum, claiming that visual object recognition is accomplished by computing a description of the stimulus object which makes abstraction of the object's position and orientation, so that it can be matched with a stored viewpoint-invariant object representation. The next section concentrates on an

Viewpoint Invariance

217

example of such a theory, which has gained some visibility in the scientific community. General viewpoint invariance in recognitionby-com portents Recognition-by-components (RBC), a theory of object recognition proposed by Biederman (1987), incorporates the achievement of genera) viewpoint invariance prior to recognition as one of its hallmarks. Recently, Hummel and Biederman (in press) described a neural network implementation based on the theory. The input to the model is a line drawing of an isolated object showing the edges corresponding to surface orientation discontinuities and occlusions in the image. Notwithstanding the fact that the input object can be presented from a number of different vantage points, "the model achieves viewpoint invariance in that the same output unit will respond to an object regardless of where its image appears in the visual field, the size of the image, and the orientation in depth from which the object is depicted" (Hummel & Biederman, in press, p. 9). The theory and the implementation are consistent with behavioural data. They capture the finding (Biederman & Cooper, 1991a, 1991b, 1992; Cooper, Biederman, & Hummel, this issue) that naming of a tachistoscopically presented drawing of an upright object reduces naming latency on a second occasion several minutes later, regardless of whether the same drawing is shown, or the same object is presented in a different position in the frontoparallel plane, a different size, or a different left-right reflection. However, the scheme by which RBC proposes to derive a viewpoint-independent structural description is not without problems. In RBC, an object is modeled as a specific arrangement of simple volumetric primitives, called geons. RBC asserts that the image features by which both the geons and their relational arrangement are identified are independent of the incidental viewpoint. Moreover, the elements of the derived structural description, i.e., the representation of the geons and their relations, are said to be viewpoint invariant. I will examine this claim in some detail, concentrating on the orientation independence rather than on other invariances, and on the relational arrangement of geons rather than on the geons themselves. RBC can account for two observations on the effect of rotating monooriented objects, i.e., objects which normally appear in an upright orientation. The first observation is that rotation in the frontoparallel plane has a profound effect on measures of recognition. Using line drawings of common monooriented objects, Jolicoeur and his colleagues have found repeatedly that the time to name rotated pictures is strongly affected by departure from the upright (Jolicoeur, 1985, 1988; Jolicoeur & Milliken, 1989). On the other hand, a human observer experiences virtually no difficulties in recognizing an upright mono-oriented object after it has been rotated in depth around its

218

Verfaillie

top-bottom axis. The "behaviour" of the neural net implementation of RBC (Hummel & Biederman, in press) conforms very well to these two observations. While adding to the model's external validity, this accomplishment also necessitates a number of critical reflections. First, a global characterization of the system's performance should perhaps not include the term "viewpoint invariant". To the contrary, rotations in the frontoparallel plane have robust effects on the success with which RBC recognizes objects. Although this agrees wilh the findings of Jolicocur (1985, 1988), it is a strange behaviour for a model which feels very strongly about the necessity to reach general viewpoint invariance prior to recognition. It also implies that the in-depth rotations under which recognition is still achievable are very limited: Only rotations in which the object remains (approximately) upright are allowed. Second, there is no principled ground for predicting that rotations in the frontoparallel plane will have strong effects, and in-depth rotations will have no effects. In fact, there are reasons to expect the opposite. Rotation in the picture plane does not alter the projected shape in a fundamental way, in the sense that angles are preserved. Therefore, rotation around the line of sight is isomorphic. In-depth rotation, in contrast, is anisomorphic because the projected shape can change substantially. How, then, should one explain an effect of isomorphic rotation and the absence of an effect of anisomorphic rotation? It seems that the upright orientation forms the crux of Ihe argument. Indeed, the effect of rotating mono-oriented objects in the frontoparallel plane takes the upright as a baseline: The perceptual cost associated with a rotation increases with deviation from the upright (except for 1800 rotations). In addition, Hummel and Biederman (in press) confine in-depth rotations to rotations in which the object remains upright1. One could claim, then, that the difference between the two kinds of rotation is a consequence of the fact that mono-oriented objects normally appear upright and that this is violated under rotation in the frontoparallel plane, but not after a rotation around the upright top-bottom axis. In this line of reasoning, familiarity with a particular class of views (the views of the upright object) becomes a crucial factor: "A... condition under which viewpoint affects identifiability of a specific object 1 Hummel and Biederman (in press, p. 27) report that they successfully tested their model with two types of in-depth rotations: Rotations around the vertical axis and rotations about a horizontal axis perpendicular to the line of sight. It would seem that the object is no longer upright in the latter case. But, the model was tested with only one in-depth rotated version of each of the 10 test objects and it is not clear whiuh type of in-depth rotation was applied to each object or how far the orientation of each rotated version deviated from the original orientation (Ihe possible degree of rotation ranged from 45" to 70°). It seems fairly safe to conclude that under these conditions most of the JO in-depth rotated objects were at least approximately upright. This is no coincidence, since larger rotations (>9°> around a horizontal axis perpendicular to the line of sight would turn the object upside-down, which cannot be handled in RBC.

Viewpoint Invariance

219

arises when the orientation is simply unfamiliar, as... when the top-bottom relations among the components are perturbed as when a normally upright object is inverted" (Biederman, 1987, p. 140, my italics). The impact of familiarity on orientation effects has indeed been documented, as I will briefly discuss in the next section. However, the explanation of the effect of familiarity might be problematic for theories which capitalize on the realization of general viewpoint invariance. It is not immediately obvious how the visual system could settle on a particular viewpoint-independent, stimulus description more rapidly when confronted with familiar views than with unfamiliar views, prior to recognizing the object or the view. But, whatever the way in which this could be accomplished, the real origin of RBC's differential coping with different types of rotation seems to be related to the way inter-geon relations are coded, which I discuss next. Third, the Above, Below, and Beside relations between geons, which form the backbone of the so-called viewpoint-independent structural description of an object, are coded in a way which is heavily dependent on the viewpoint. What does it mean to say that an object part is above, below, or beside another part? These relational predicates are ambiguous in the sense that they can be defined in a viewpoint-dependent as well as in a viewpoint-independent manner (deictically vs. intrinsically in Levelt, 1984). In the latter case, the object itself provides the reference frame, and it allows one, for instance, to relate a part to the top or bottom of an object, independently of the object's orientation with respect to the observer. In contrast, the Above, Below, and Beside relations in the structural descriptions of RBC are viewpoint dependent: RBC does not capture inter-geon relations in a viewpoint-invariant manner. In fact, the reason why the theory is able to account for the differential effect of in-depth rotations and rotations in the frontoparallel plane can be traced back to this viewpoint-dependent coding. Consider the effect of rotation in the frontoparallel plane. The viewpoint-dependent Above, Below, and Beside relations will change under such a rotation, leading, therefore, to a perceptual cost. Hummel and Biederman (in press) are aware of this: "For example, rotate a lamp 45 0 in the visual plane and the lampshade, which is above the base in the upright view, will be above and beside the base. Continue to rotate the lamp until it is upside down and this spurious Beside relation will disappear" (Hummel & Biederman, in press, p. 28), and the lampshade will be below the base. In spite of this awareness, Hummel and Biederman do not seem to draw the logical conclusion that, as far as the relations between geons are concerned, the object representations in RBC are viewpoint dependent. This viewpoint-dependent coding has no implications for the (upright) in-depth orientation of an object, because of the way RBC represents the inter-geon relations which are affected by in-depth rotations. If one does not consider occlusion (which is a boundary condition in RBC), the only

220

Verfaillie

viewpoint-dependent relation which can change after rotating an object around its upright top-bottom axis is the degree to which a part is located to the left or to the right of another part. But, this is collapsed into one relation in RBC: Beside. Either a part is beside another part or it is not, but this will remain so after a rotation around the object's upright top-bottom axis. Fourth, the system runs into problems when it encounters poly-oriented objects, which can appear under a multitude of orientations. Just as with mono-oriented objects, RBC derives different descriptions of a poly-oriented object, depending on its orientation in the frontoparallel plane. For mono— orierited objects, Biederman's (1987) account of the effect of image-plane rotations hinged upon familiarity with upright orientations. It is not clear, however, what the upright orientation of a poly-oriented object could be. Sometimes a top and a bottom can be distinguished. One could claim that, analogously to the case of mono-oriented objects, the object has an upright orientation when the object-centered top is in a viewer-centered Above relation (assuming an upright viewer in relation to the environment). However, many objects have no top-bottom relation, about 20 % of all objects according to Biederman's (1987) estimate. Moreover, even when such a relation is discernible, there is nothing special about the so-called upright orientation of a poly-oriented object. That is precisely what distinguishes them from mono-oriented objects. Either all orientations of a poly-oriented objects are familiar or there are at least several, not necessarily upright, familiar views. Therefore, in the case of poly-oriented objects, familiarity with the upright orientation cannot be invoked as an explanation for the different descriptions which the model derives for different views in the frontoparallel plane. Note that it is also not clear what the stored model of a poly-oriented object looks like in RBC, since the inter-geon relations are coded in a viewpoint-dependent manner. For mono-oriented objects, a canonical upright orientation might serve as a reference orientation for interpreting the Above, Below, and Beside relations, but this seems meaningless for poly-oriented objects. In sum, notwithstanding the achievements of RBC, the assertion that the model realizes general viewpoint invariance prior to matching is problematic. First, rather than showing complete generalization across viewpoints, the model displays a specific degree of selectivity, since it only recognizes objects under a restricted class of views, even when the same geons are available. Second, the derived (and the stored) object representations are not viewpoint independent: Some inter-geon relations determining the componential makeup of an object, are coded in a viewpoint-dependent way. Finally, this viewpoint dependence cannot be linked with the fact that mono-oriented objects normally appear upright, since familiarity would predict a different pattern of generalization for poly-oriented objects, which the model fails to achieve. In contrast, the displayed pattern of selectivity is hard to interpret in the case of poly-oriented objects.

Viewpoint Invariancc

221

The effect of orientation changes on visual object processing It is not surprising that RBC encounters its greatest challenge in dealing with orientation changes. Since rotation can alter the projected shape of an object drastically, a system which has to interpret a 3-D world on the basis of its 2-D projection runs the risk of being considerably sensitive to changes in orientation. As far as orientation in the frontoparallel plane is concerned, I already mentioned that the naming of line drawings of isolated mono-oriented objects is slowed down when the object is depicted in non-upright orientations (Jolicoeur, 1985, 1988). Moreover, the impact of the object's orientation seems to grow as the object becomes more complex. For relatively complicated objects, even when they are very familiar in upright orientations, it is sometimes impossible to recognize inverted versions. It is a well-known observation that faces arc very hard to identify when they are presented upside down. Another example of a complex object is the human body, and orientation specificity has also been found in the perception of a human body under biological motion conditions (Johansson 1973, 1975). I will briefly elaborate on these findings, because I will hark back to the phenomenon of biological motion perception later on. When the available stimulus information is confined to a small light attached to the major joints of a human actor, and the actor remains stationary, only a random swarm of stationary dots is perceptible. But, from the moment the actor starts moving, the visual system organizes the collection of dots into a phenomenologically very vivid and detailed percept of a 3-D human figure engaged in complex activities. The upright orientation of this so-called "point-light walker" seems to be an important factor. When presented with an upside-down version of a point-light walker, observers do not report seeing an upside-down human figure (Sumi, 1984; see also Pavlova, 1989). Moreover, adults are able to detect a point-light walker in a simultaneous mask of scrambled moving point-lights (Cutting, Moore, & Morrison, 1988), but only when the walker is upright (Bcrtenthal & Pinto, 1990). Bcrtenthal and his colleagues have found that infants discriminate between an upright and an inverted point-light walker (Bertenthal, Proffitt, & Cutting, 1984), and between an upright and a scrambled walker (Bertenthal, Proffitt, Kramer, & Spetner, 1987), but not between an inverted and a scrambled version (Bertenthal & Pinto, 1990). Moreover, infants can detect the appropriateness of occlusion of the point-lights (Bertenthal, Proffitt, Spetner, & Thomas, T985) or violations of local rigidity in the walker (Bertenthal, Proffitt, & Kramer, 1987), but only when the walker is presented upright. As far as the in-depth orientation is concerned, it appears, phenomenologically, that human observers rarely experience difficulties to recognize an object after rotation around its (upright) top-bottom axis. However, more

222

Verfaillie

fine-grained behavioural measurements have provided evidence for effects of in-depth rotation. Response latency to match two sequentially presented views of an object is larger when different views arc shown than when the same views are presented (e.g., Bartram, 1976, Experiment 1; Ellis & Allport, 1986; Ellis, Allport, Humphreys, & Collis, 1989; but see Bartram, 1976, Experiment 2). Short-term priming effects are also modulated by the orientation in priming and primed stimulus: When stimulus objects are presented one at a time, and subjects have to classify each individual object into one of two predefined categories, RT's are shorter when Ihe object in the immediately preceding trial has the same identity and orientation as the one in the current trial than is the case when the same object is viewed from a different angle (e.g., Marshall & Walker, 1987; Roberts & Bruce, 1989). In sum, behavioural evidence indicates that visual recognition is affected by changes in the orientation of the viewed object, in depth as well as in the frontoparallel plane. However, a number of complications somehow obscure the overall picture. First, not all paradigms are equally relevant for issues of recognition. For instance, the value of evidence from matching tasks for testing models of recognition has been questioned (Biederman & Cooper, 1992c; Cooper, Biederman, & Hummel, this issue). Second, failures to find effects of rotation, even in the frontoparallel plane, in identification or matching tasks have been repeatedly reported (for reviews, see Corballis, 1988; Jolicoeur, 1990). Third, although orientation-dependent short-term priming effects have been documented, some authors failed to observe orientation-dependency in long-term priming effects (Bartram, 1974, Experiment 3; Biederman & Cooper, 1991a, 1991b, Experiment 1; but see Bruce & Valentine, 1985, Experiment 2; Ellis, Young, Flude, & Hay, 1987, Experiment 3). These complications indicate that a serious investment of additional research energy will probably be necessary to unravel the exact contribution of such factors as the nature of the perceptual judgement (e.g., categorization versus naming, see Price & Humphreys, 1989), the complexity of the stimulus objects (Tarr & Pinker, J990), and ihe familiarity of the depicted orientations (Edelman & Biilthoff, 1990; Tarr & Pinker, 1989). A number of studies have already identified familiarity with particular views as a crucial determinant of orientation effects. Practice with familiar mono-oriented objects at unfamiliar orientations attenuates the orientation effects (Jolicoeur, 1985). Some experiments have also employed objects previously unknown to the subjects. Tarr and Pinker (T989), for instance, studied how familiarity with particular orientations in the frontoparallel plane affected naming of novel stick-figure planar objects. Edelman and his colleagues (Edelman & Biilthoff, 1990; Edelman, Biilthoff, & Weinshall, 1989) investigated the role of experience with different in-depth orientations of 3-D stick-figure objects. Rock and DiVila (T987) even demonstrated that, after studying a previously unknown wire object in a particular orientation, it

Viewpoint Invariance

22 ^

may be almost impossible to recognize the object at a new in-depth orientation. Note that the observation that rotating a familiar upright mono-oriented object around its top-bottom axis does rarely result in a perceptual cost phenomenologically, might be related to the high familiarity with the different upright views. Recognition on (he basis of a high-level orientationdependent stimulus description How, then, could an object recognition device deal with the variable orientation of objects, in depth as well as in the frontoparallel plane? The problems which RBC face to achieve orientation independence do not rule out in principle the validity of theoretical attempts at achieving general viewpoint invariance prior to indexing: Orientation effects might still be related to difficulties in arriving at a viewpoint-invariant object description, in one way or another. But, there is an alternative approach. One could relax the extreme requirement of general viewpoint invariance, move back on the continuum between complete viewpoint invariance and complete viewpoint dependence, and allow that some viewpoint-dependent information is incorporated in the derived stimulus description and the stored object representation. The object's orientation seems the most promising candidate for such an integration. In this account, the solution to the problem of object recognition employed by the animate perceptual system would involve the computation of an orientationdependent object description, yet independent of other variations (Humphreys & Quinlan, 1987). On the one hand, the proposed object description would be high level, i.e., it would display invariance for lower level stimulus variations, such as variations due to illumination, and for effects of object displacements. As far as position invariance is concerned, it is significant that Rock and DiVita's (1987) subjects were perfectly able to recognize a novel wire object after it had been displaced so that the shape of the retinal projection remained the same but appeared at a different retinal location (which is equivalent to the effect of translation in the frontoparallel plane under orthographic projection; but see O'Rcgan, in press). Also, neurophysiological evidence (Mishkin, Ungerleider, & Macko, 1983; but see Ettlinger, 1990) is accumulating for the existence of two separate systems in the primate brain, one dealing with the shape of objects (the "what" system) and another with their location (the "where" system). Note that the success with which abstraction can be made of position variations is limited by basic features of the perceptual system itself. For instance, the achievement of invariance for position in the visual field is determined by the angle of resolution required to resolve particular stimulus information, and the acuity available at a certain eccentricity. On the other hand, it is suggested that orientation dependency of the object description lies at the heart of the solution. This implies that recognition does

224

Verfaillie

not amount to the identification of an object which accidentally happens to be in a particular orientation, but rather that one always recognizes a view of an object. In fact, in contrast to the case of position, one can barely envisage how the identity of a complex object could be processed independently of its 3-D orientation, on the basis of a 2-D input. Of course, under some circumstances it may be possible and functionally useful to exploit diagnostic information which is relatively independent of the viewpoint, such as texture or color, or even orientation-invariant local shape attributes, but this is probably not sufficient for full-fledged recognition, at least not of the complex type of objects we continually encounter in daily life. The system should also be able to determine that two views of an object are views of the same 3-D object, and in this manner needs to transcend the viewpoint. This, however, might take place after recognition (e.g., by representing the association between different orientation-dependent object descriptions, in one way or another). I will come back to these issues in the concluding section. Let us first concentrate on the available evidence. Neurophysiological evidence for a high-level orientationdependent level of representation Some remarkable neurophysiological findings do indeed point in the direction of higher level orientation-dependent object representations in vision. Two decades ago, Charles G. Gross and his colleagues have discovered cells in the inferotemporal cortex of anaesthetized macaque monkeys which respond selectively to the visual presentation of fairly complex objects such as faces and hands (Gross, Rocha-Miranda, & Bender, 1972). Subsequent single-cell recording studies investigating the response characteristics of these cells in more detail, have produced quite impressive data. The most famous discovery are the so-called "face" cells (e.g., Bruce, Desimone, & Gross, 1981; Perrett, Rolls, & Caan, 1982; Perrett, Smith, Mistlin, ct al., 1985; Perrett et al., 1991; Rolls et al., 1985, 1987; Yamane et al., 1988) which fire preferentially to a view of a (human or monkey) head. The cells vary in their selectivity to the type of stimulus object, some responding to all faces, others in varying degrees to different individuals (Baylis, Rolls, & Leonard, 1985; Hasselmo, Rolls, & Baylis, 1989; Perrett et al., 1984). But, most interesting for the problem of object constancy is the pattern of generalization versus selectivity for particular viewing conditions. First, the cells show considerable generalization across changes in contrast and colour (Perrett, Rolls, & Caan, 1982; Rolls & Baylis, 1986). Second, the cells have very large receptive fields, up to 200 (Bruce ct al., 1981; Desimone ct al., 1984; Perrett & Mistlin, 1990). This indicates that the cells are relatively invariant for changes in position in the picture plane, which contrasts to the relinotopic organization of earlier cortical structures. The response pattern also shows substantial tolerance for changes in position in

Viewpoint lnvariance

225

depth, since size and viewing distance have only small effects (Perretl el al., 1982; Perrett, Smith, Potter, et al., 1985; Rolls & Baylis, 1986). Finally, the cells even generalize over isomorphic rotation of the stimulus object in the frontoparallel plane (Penrett et al., 1982; Perrett, Smith, Potter, et al., 1985). Note that the latter finding seems at odds with the orientation dependence in human perception of complex objects such as faces and bodies. However, due to their arboreal life, monkeys more often view objects upside down than humans do. Indeed, behavioural studies with monkeys have also failed to establish inversion effects in face perception (Bruce, 1982; Dittrich, 1990). So, the paradox between the effect of image-plane rotations in human observers and the absence of such an effect in nonhuman primates can be related to the degree of experience with this type of rotation. A nice piece of evidence for this speculation is the fact that in sheep, which like humans lack this experience, only cells that respond to upright faces have been localized (Kendrick & Baldwin, 1987). It would seem, then, that because monkeys themselves are "poly-orienled observers" they deal with frontoparallel rotations in a qualitatively different manner than humans, who are more or less "mono-oriented observers". This conclusion must be weakened, however. Although all recorded cells responsive to a view of the head continued to respond after an isomorphic rotation in the study of Perrett, Smith, Potter, et &'• (1985), 15 cells (in a sample of 26 view-selective neurons) had a slightly longer response latency (varying from TO to 60 ms) to inverted views than to upright views. Moreover, by increasing the need for configural processing, Perretl el al. (1988) were able to establish a behavioural orientation effect in two monkeys. The overall high tolerance for several stimulus variations contrasts sharply with the striking specificity for particular views. Instead of showing a generalization across anisomorphic rotation, the majority of the face-selective cells are maximally responsive to a head in a particular in-depth orientation (Perretl, Smith, Potter, et al., 1985; Perrett et al., 1991). For instance, Perrett, Smith, Potter, et al. (1985) report that the responses of most cells which are sensitive to a full face are reduced substantially by turning the full face 45 0 towards profile or rotating it up or down by the same amount. The most impressive finding in this respect is the discovery of cells which respond to a right-facing profile, but not to a left-facing profile, or vice versa. This is remarkable in view of the fact that the projection of a righl-facing and left-facing profile only differ by a mirror reflection. A minority of face-selective neurons have also been found which respond to all static views of a head, apparently exhibiting viewpoint-independent coding of the stimulus object (Perrett, Smith, Potter, et al., 1985; Perrett et al., 1991). Perrett and his colleagues have suggested that this response pattern could be the result of combining the output of several orientation-sensitive neurons, each tuned to a particular view of the head (Perrett et al., 1989). The

226

Verfaillie

finding that view-selective cells have slightly shorter response latencies than viewpoint-independent cells (Perrett et al., 1991), has been interpreted as support for this hypothesis. Also, among (he orientation-dependent neurons, there is considerable variation in the sharpness of tuning to a particular orientation. The broadly tuned cells might reflect an intermediate stage between orientation-dependent and viewpoint-independent representations (Hasselmo, Rolls, Baylis, & Nalwa, 1989; Perrett et al., 1991). In sum, the pattern of generalization and specificity of the majority of the face cells suggest that, at some level of processing, the monkey visual system computes a description of the input, (1) which makes abstraction of low-level variables having to do with illumination, (2) of ihe object's position in the image plane or in depth, and (3) possibly also of the object's orientation in the frontoparallel plane, but (4) which is highly dependent on the orientation of the object in depth (for a review, see Perrett et al., 1989). Interestingly, neurons have been localized in the same region of the cortex which fire to a view of a body movemenl instead of a static head (Perrett, Harries, el al., 1990; Perrett, Smith, Mistlin, et al., 1985; see Kendrick & Baldwin, 1989, for similar findings in sheep). For movements where the articulation of body parts in relation to one another or to the main torso are a defining factor, neurons displaying object-centered coding were prevailing (Hasselmo, Rolls, Baylis, & Nalwa, 1989; Perrett, Harries, et al., 1990). Other cells, however, were responsive to a body navigating with respect to the viewer, and irresponsive to translating control objects. A large proportion of the latter cells exhibited a higher level orientation-dependent response pattern, analogous to the selectivity of the face cells. For instance, some cells fired upon the presentation of a person walking to the right in a direction orthogonal to the line at sight, but not to a person walking to the left, and vice versa. Perrett, Harries, et al. (1990) have reported that about one third of these view-selective neurons continued to respond when Ihe bodily movements were shown under point-light conditions, again indicating that the orientationdependent representations are of a higher level nature. The perception of biological motion is also the topic of the next section. Behavioural evidence for a high-level orientation-dependent level of representation: An example Recently I performed a series of experiments, to be reported in more detail elsewhere (Verfaillie, 1992), in which behavioural evidence was gathered for the use of a high-level orientation-dependenl object representation in the visual processing of complex objects. On each trial a biological motion configuration was presented. Part of the point-light stimuli depicted a regular human walker as used in previous research. All figures were upright and Ihe translational component of walking was zero, so that they appeared to move on a treadmill. The figures were

Viewpoint Invariancc

227

walking in a direction orthogonal lo the line of sight. Half of them were facing to the right with respect to the viewer and half were facing to the left. This resulted in two possible in-depth orientations of a walker, differing from each other by a rotation about the figure's top-bottom axis2. On other trials, subjects were confronted with a nonhuman point-light walker. These stimuli depicted a point-light figure whose upper body faced in a direction opposite to that of the lower body: The point-lights corresponding to the head and the upper limbs specified a "normal" human upper body facing to the right, and the point-lights of the lower limbs designated a "normal" human lower body facing lo the left, or vice versa. In sum, there were also two possible upright in-depth orientations of a nonhuman walker. Note that human and nonhuman configurations could not be distinguished on the basis of local features. Apart from those figures, there were two additional stimulus configurations, portraying one of two "abstract" point-light objects of the same proximal size as the humans and nonhumans, and also moving in a nonrigid manner. In a serial two-choice RT task, stimuli were presented one at a time in a random order. On each trial the subject decided as rapidly as possible whether a human or a nonhuman figure was shown. In addition, the subject pressed the "human" button when one of the abstract objects was detected and the "nonhuman" button in the case of the other abstract object. The configuration remained on the screen until the subject responded and the response-stimulusinterval was 400 ms in most experiments. After the subject had completed the task, a data-sort program identified different types of transitions between trials. I will confine myself to the RT to human configurations in one-step transitions (i.e., transitions between two temporally adjacent trials) in which both trials required the same "human" response. The RT to the second stimulus configuration in those transitions (the primed stimulus) was analyzed as a function of its correspondence with the preceding priming configuration. Reliable priming effects were established and replicated over a number of studies, but priming was confined to a particular type of transition. Relative to transitions where the priming and the primed human figures had a different orientation (Mean RT to the primed figure = 550 ms), there was a significant benefit when a human primed configuration was preceded by a human walker in the same orientation (Mean RT = 507 ms). In fact, the RT to the target in the first type of transition did not differ from the 2 A left and a right facing poinl-light walker only differ by a mirror reflection. In general, a 3-D object and its mirror reflection, forming a so-called enanliomorphic pair, cannot be superimposed by a rotation in three dimensions (although they could be superimposed by a rotation through

Variant points of view on viewpoint invariance.

In order to recognize an object, the visual system must make abstraction of proximal stimulus variations concomitant with the incidental vantage point...
1MB Sizes 0 Downloads 0 Views