Perception, 1992, volume 21, pages 481 -496

Seeing lumps, sticks, and slabs in silhouettes

John WillatsU The University of Birmingham, Edgbaston, Birmingham B15 2TT, UK Received 3 May 1990, in revised form 9 October 1991

Abstract. Marr has suggested that we see three-dimensional (3-D) shapes in silhouettes because we make the implicit assumption that the viewed shapes are generalized cones. One difficulty with this suggestion is that it cannot deal with silhouettes of irregular 3-D shapes like clouds and trees; another is that it only applies to generalized cones with a relatively high length:width ratio. An alternative explanation, suggested by evidence from cross-cultural studies of language, from children's early speech, and from children's early drawings, is that the scene primitives actually used by humans are not generalized cones but 'lumps', 'sticks', and 'slabs', that is, primitives whose only shape properties are their relative extensions in 3-D space. In this paper it is proposed that the implicit assumption we make in interpreting silhouettes is that the extendedness of the silhouette reflects the extendedness of the viewed shape, so that a round region is seen as a lump and a long region is seen as a stick; and that such views seem "natural" because they are the views most likely to be encountered in normal environments. This account is more general than that of Marr because it explains how we interpret silhouettes of all kinds of 3-D shapes, even very irregular ones. Unlike Marr's account, it also deals with flat shapes like slabs and discs, and shows why it is difficult to see these shapes in silhouettes.

1 Introduction Marr (1977, 1982) pointed out that we can see more than we ought to in pictures which are made up of silhouettes, such as Picasso's Rites of Spring (figure 1). Although in theory any silhouette could be generated by an infinite variety of threedimensional (3-D) shapes, in practice we interpret silhouettes in terms of very particular shapes. Moreover, when silhouettes are combined in appropriate ways we may also see them as depicting particular objects, so that round regions are seen as features like heads, and long regions are seen as arms and legs. In most cases this interpretation is not just a matter of working out intellectually what the silhouettes mean; we do actually see the shapes of these objects in the picture. Wollheim (1977) calls this effect "seeing in"; for at least "central cases of representation" we both recognize the object depicted, and we see the three-dimensional shape of the object in the picture. By using the phrase "central cases of representation" Wollheim means, I take it, to exclude representations based on noniconic symbol systems, such as diagrams. With line drawings this effect can be extremely powerful. Figure 2 shows two views of a cube with a smaller cube removed from one corner. With figure 2a the effect of "seeing in" is immediate: it is impossible to see the 3-D shape of the object depicted in any other way. With figure 2b the effect is powerful but ambiguous. With an effort of will one can change the interpretation: the picture can also be seen as a cube with a smaller cube added on to it at an unusual angle, or as a small cube stuck up into the corner of a ceiling. Line drawings of this kind can be analysed automatically, without appealing to human intuitions (Clowes 1971; Huffman 1971; Waltz 1975); this kind of analysis reveals all the possible interpretations of ambiguous drawings such as that shown in figure 2b and also shows why the drawing in figure 2 a H Requests for reprints should be sent to: Woolley Hill Coach House, 18BWoolley Street, Bradford-on-Avon, Wilts, BA15 1AE, UK.

J Willats

482

is unambiguous. Because our own intuitive responses to such drawings are in accord with the results of the analysis, the implication is that the human visual system might use similar algorithms in the interpretation of line drawings. The question Marr asked was, how do we see shapes in silhouettes? As Marr acknowledged, part of the effect may result from familiarity with the depicted objects; "but not all of it, because one can use the medium of a silhouette to convey a new shape, and because even with considerable effort it is difficult to imagine the more bizarre three-dimensional surfaces that could have given rise to the same silhouettes" (Marr 1977, page 441). Marr's answer was that the implicit assumption we make in interpreting silhouettes is that the convexities and concavities of the outline reflect real properties of the viewed surface, rather than being artefacts of perspective. He then goes on to show that provided the object is seen from what he called a "favourable" view, so that strongly foreshortened views are avoided, this is equivalent to assuming that the viewed surface is a generalized cone.

Figure 1. Picasso's Rites of Spring, 1959. (Copyright SPADEM, Paris/VAGA, New York 1981.) Taken from Marr (1977, figure 1). The silhouettes at the ends of the arms of the dancing figure are probably intended to represent tambourines, ie flat, drum-like discs with a width:thickness ratio of about 5:1.

is:~?J \

z

A

wmmmmmmmm

V

A /

(a) (b) Figure 2. "Seeing in": two views of a cube with a smaller cube removed from one corner. The figure in (a) is unambiguous, but that in (b) is ambiguous; with an effort of will it can also be seen as a cube with a small cube sticking out of one corner at an unusual angle, or as a small cube stuck up into the corner of a ceiling.

Seeing lumps, sticks, and slabs in silhouettes

483

Marr defines a generalized cone as "the surface swept out by moving a crosssection of fixed shape but smoothly varying size, along an axis" (Marr 1977, pages 442 and 443). Thus objects such as cylinders, cones, pots, eggs, and dumbbells are all examples of generalized cones. In a well-known diagram Marr and Nishihara (1978, figure 8) and Marr (1982, figure 5-10) showed how representations of complex objects, such as people and animals, could be built up out of cylinders (one example of a generalized cone). Different parts of the figures are represented by cylinders having different length:width ratios, so that the legs of a giraffe are represented by long thin cylinders, and the head of a person by a short fat cylinder. An important feature of Marr's analysis of occluding contours was that it led easily to this kind of 3-D model. By "silhouettes" Marr means drawings consisting only of outlines. These outlines enclose uniform regions (often black on a white ground) without any internal structure. In Marr's analysis the convexities and concavities of the outline of the silhouette are assumed to reflect real properties of the viewed surface. His analysis thus amounts to mapping one-dimensional picture primitives (lines) back into 2-D scene primitives (surfaces). In contrast, the analysis described below is based on regions as picture primitives, and shows how these regions can be mapped back into volumes as scene primitives. Regions and volumes, as distinct from lines and surfaces, have very few shape properties. Of these the most basic is what Denny (1978) called "extendedness": saliency or emphasis of extension in zero, one, or two dimensions in the case of regions, and in zero, one, two, or three dimensions in the case of volumes. In the analysis of silhouettes proposed here the underlying assumption is that the extendedness of a silhouette reflects the extendedness of the 3-D shape from which it has been derived. In the case of silhouettes of what I shall call 'lumps' and 'sticks', that is, 3-D shapes which are saliently extended in three dimensions and one dimension respectively, the analysis described below leads to results similar to those obtained by Marr: round regions are interpreted as lumps (short generalized cones, such as the heads in Marr's illustrations of the human figure), and long regions are interpreted as sticks (generalized cones whose length is substantially greater than their width, such as arms and legs). Similarly, this analysis, like Marr's, shows why it is difficult to see foreshortened sticks (or foreshortened generalized cones) in silhouettes. This account is more general than Marr's, however, because it is exhaustive, and can deal with all kinds of shapes, even very irregular ones. In addition, this analysis leads to a further and perhaps rather surprising result, which Marr's account does not explain: that it ought to be difficult to see 'slabs' (3-D shapes which are saliently extended in two directions) in silhouettes. This result seems to agree with our intuitions. 2 Representing slabs in pictures At the ends of the arms of the dancing man in Picasso's picture there are two regions, one elongated and the other almost round. From the nature of the subject matter of the picture(1) it seems probable that these regions are intended to represent tambourines, ie flat, drum-like discs with an aspect ratio of about 1:5. The outlines of these regions, like those in the rest of the picture, are irregular; but these irregularities seem to be the result of accidents of brushwork rather than reflecting details of the contours of the viewed shape, so that sharp (L-shaped) discontinuities in the outline cannot be used to convey information about details of the shape of a disc, such as the shape of the rim where the top planar surface of a disc meets the curved periphery (1) Cf Nicholas Poussin's The Triumph of Pan (London: National Gallery) in which a similar figure is shown playing two tambourines.

J Willats

484

(Koenderink 1984). Thus, only the overall shapes of these regions—their extendedness, as I shall call it—can carry useful information about the shapes of the objects they depict. Both these regions have two rough axes of symmetry. Marr's analysis depends on finding the axis of symmetry of a silhouette from its contours and assuming that this corresponds to the axis of rotation of a generalized cone. Where only one axis of symmetry exists, the axis of the corresponding generalized cone is uniquely determined (Marr 1977, page 451). But where there is more than one axis of symmetry one might expect that Marr's analysis would lead to the extraction of more than one generalized cone; and in this case one would expect the silhouette to be reversible, like the reversible line-drawing shown in figure 2b. Figure 3 shows four possible interpretations of the two regions in figure 1. It ought to be possible to interpret the long region (shown in figure 3a) either as an elongated cylinder or stick (figure 3b) or as a short cylinder or disc (figure 3c), depending on which axis of symmetry is chosen to be the axis of a generalized cone. Similarly, it ought to be possible to interpret the round region (figure 3d) as either a fat cylinder or lump (figure 3e), or as another disc (figure 3f). From Marr's analysis one would expect all these interpretations to be available in the silhouettes; and given prior knowledge that these silhouettes are intended to represent discs, in the form of tambourines, one might expect this interpretation to predominate. In practice, the interpretations shown in figures 3c and 3f do not seem to be readily available in these silhouettes. Even with an effort of will, I find it quite hard to see the regions at the ends of the arm as discs. Why is this the case, and why in general do we see lumps and sticks in silhouettes, but not slabs? The reason sems to be that a round region is a natural symbol for a lump because a lump is round in 3-D space, and the projection of a lump is a round region from all directions of view. Similarly, a long region is a natural symbol for a stick because a stick is long in 3-D space, and the projection of a stick is a long region for all directions of view except the unlikely end-on view. In contrast, there is no single natural symbol for a slab, because slabs are both round and flat in 3-D space, and project about equal numbers of long and round regions over the viewing sphere. It follows that in a world containing equal numbers of lumps, sticks, and slabs a round region is always more likely to be the projection of a lump than the projection of a slab; and a long region is always more likely to be the projection of a stick than a

*

^

(a)

(b)

^ (c)

4 0 0

(d) (e) (f) Figure 3. Four possible interpretations which ought to be available in the two regions representing tambourines in Picasso's Rites of Spring. It ought to be possible to interpret the long region (a) as either an elongated cylinder or stick (b) or as a short cylinder or disc (c). Similarly, it ought to be possible to interpret the round region (d) as either a fat cylinder or lump (e) or as another disc (f). In practice, it is difficult to see a disc in either of these regions.

Seeing lumps, sticks, and slabs in silhouettes

485

projection of a slab. Thus regions in silhouettes, whatever the shapes from which they have been derived, will usually be interpreted either as lumps or as sticks rather than as slabs. 3 Extendedness Extendedness is perhaps the most basic of all shape properties. In linguistics, where the term originated, 3-D objects are classed as "nonextended" if they are not saliently extended in any particular direction, and "extended" if they are saliently extended in one or two dimensions. However, the term nonextended, meaning 'about equally extended in all three directions in space' for a 3-D shape, or 'extended in both directions on the picture surface' for a region would be rather misleading. I shall therefore avoid the use of the term nonextended, and use either the informal terms round, flat, and long or, where necessary, descriptions based on the notational system described below. In the analysis of pictures it is crucial to use a different vocabulary for scene primitives and for picture primitives. A notational system for scene and picture primitives which makes this distinction clear is described in Willats (1985). In this scheme the numbers 0, 1, 2, and 3 are used to stand for the dimensional index of a primitive: that is, the number of dimensions within which it can potentially be extended. In addition, the subscripts 1 and 0 are used to denote extension or lack of extension within a particular dimension. Regions may be described as saliently extended in both directions ( 2 n regions or round regions), saliently extended in one direction but not the other (210 regions or long regions), or not extended in any direction (200 regions or dots). Similarly, three-dimensional shapes may be classed as saliently extended in three directions (3 1 H volumes or lumps), saliently extended in two directions but not the third (3 110 volumes or slabs), saliently extended in one direction (3 100 volumes or sticks) or not extended in any direction (3 000 volumes or beads). The extendedness principle [the term was first used by Denny (1978)] forms the basis for what Rosch (1973) calls a "real", rather than an artificial, system for describing shape. Real categories, Rosch argues, are highly structured internally, and do not have well-defined boundaries. Such categories are composed of a "core meaning" consisting of the "clearest cases", "surrounded" by other cases decreasing in similarity to the core meaning.(2) Rosch gives examples from experiments with colour perception and shows that there are "natural prototypes" such as "pure blue", which shades off into green on one side and purple on the other. In the same way, slabs shade off into lumps on the one hand and sticks on the other, without any very definite boundaries between them. Rosch (1973) argues that the core meaning of categories of this kind "is not arbitrary but is 'given' by the human perceptual system; thus, the basic content as well as structure of such categories^, is universal across languages" (page 112). (2) An example relevant to this discussion would be the method used in the British army for classifying trees when reporting on a landscape: "And at least you know That maps are of time, not place, so far as the army Happens to be concerned—the reason being, Is one which need not delay us. Again you know There are three kinds of tree, three only, the fir and the poplar, And those that have bushy tops to ...". (Reed 1956, page 239) "Poplar" trees are a good example of one of Rosch's "clearest cases", used by the army to define the category 'extended trees'. "Fir" trees are presumably bushy trees (or perhaps poplar trees) with the addition of the shape modifier 'being pointed'.

486

J Willats

This certainly seems to be true of extendedness. Both Clark (1976) and Denny (1978) cite a survey of thirty-seven Asian languages made by Adams and Conklin (1973) in which they speak of "round", "flat", and "long" as being basic semantic primes. Denny describes the extendedness variable in relation to Tarascan, Algonquian languages, Athapaskan languages, Eskimo, Toba, Bantu languages, Ponapean, Tzeltal, Trukese, Malay, Iban, Burmese, and Gilbertese, and concludes that "it seems reasonable to say that extendedness is a widespread and perhaps universal principle of noun classification" (Denny 1978, page 101). English has a relatively impoverished system of classifiers, although words like "head" and "sheet" in "two hundred head of cattle" and "fifty sheets of paper" act rather like classifiers (Clark 1976). In the same way, we do not speak just of "a wax", but "a lump of wax", "a stick of wax", or "a slab of wax". However, Clark (1976) has pointed out that children (including English-speaking children) often overextend the meaning of their first words so that they act like classifiers. For example, a child might use a word like baw (ball) to mean any more or less round shape such as an apple or a bell-clapper, or tee (stick) to mean any more or less long shape such as an umbrella or a ruler. In addition, the representation of extendedness seems to play an important role in children's early drawings. In their first drawings of people, the so-called tadpole figures, most children use lines or long regions to represent long volumes such as arms and legs, and round regions to represent round volumes such as the head or head/body. Stern (1930) called these marks "natural" symbols: "We call these symbols 'natural' because their meaning does not first require to be learnt (as in the case of letters or mathematical signs) but directly occur to the child, and are used by him as a matter of course. Thus a long stroke is a natural symbol for an arm or leg, a small circle for an eye or head." (pp 369-370) This suggests that, at the very simplest level, children regard the human figure as a lump with sticks projecting from it, and they use the extendedness of the marks in their drawings as a way of representing the extendedness of their internal shape descriptions of the various features of the human figure. At a somewhat later stage children begin to depict flat shapes like the palms of the hand or the brim of a hat; but children do not seem to have natural symbols for flat shapes, as they do for lumps and sticks (Willats 1987,1992). There thus seems to be good evidence from cross-cultural studies of language, from children's early speech, and from children's early drawings, that round, long, and, to a lesser extent, flat are widespread natural categories. It is important to realize that extendedness is not just a way of classifying smooth, regular shapes. A round object such as a ball can be classed as a 3 m shape, but so can cubic objects such as houses, or irregular objects such as stones and fruit: "For example ko (3D) [in Japanese] cannot mean 'round' since it is used for squarecornered objects such as a child's block and an empty box, as well as rounded objects. Also mai (2D) would be the correct classifier for clothing whether it is flat or wrinkled." (Denny 1979a, page 321) "As is so often the case with Amerindian morphemes, one gradually realizes how abstract the meaning of these rounded-shape roots are. They apply to all the many irregularly rounded objects of the natural world such as stones and fruit as well as to the regular, geometric, rounded shapes typically produced by man. If the latter are to be differentiated [in Ojibway] they can be indicated by the preverb weweni (truly) so that the shape of a coin could be described as weweni-wdwiyeya (it is truly round = it is circular)." (Denny 1979b, page 26)

Seeing lumps, sticks, and slabs in silhouettes

487

Similarly, children's early drawings of cubes and squares may consist of a single more or less circular closed form; in pictures of this kind the only shape property of the object represented in the drawing is its extendedness (Caron-Pargue 1985; Piaget and Inhelder 1956). Thus at the most basic level all objects in the physical universe can be described in terms of their extendedness alone. This is important because Marr himself pointed out that surfaces exist "that cannot conveniently be approximated by generalized cones, for example a cake that has been cut at its intersection with some arbitrary plane, or the surface formed by a crumpled newspaper" (Marr 1978, page 73). Hoffman and Richards (1984) also criticized Marr's scheme on the grounds that it cannot handle many classes of shapes such as faces, shoes, clouds, and trees. None of these cases would present problems for a scheme based on the extendedness principle, provided we are satisfied with a sufficiently coarse level of description. 4 Assumptions in interpreting silhouettes Marr's analysis depends on a number of assumptions which are expressed as restrictions on the form of the depicted surface or its image, and the directions from which the surface is viewed. Marr's first restriction (Rl) is that the surface is smooth. As we have seen, the extendedness of a volume may be given independently of this kind of surface property, so this restriction is not needed in the following analysis. In this analysis what counts is the extendedness of the region forming the silhouette, rather than the details of any convexities and concavities in the boundary of the silhouette. Marr's restrictions R2 and R3 amount to saying that convexities and concavities in the boundaries of the silhouette reflect real properties of the viewed surface. In fact, the silhouettes in the Rites of Spring, as well as containing minor irregularities resulting from the accidents of brushwork, contain a number of sharp concavities or cusps which do not reflect the surface properties of individual generalized cones. Some of these occur where one cone joins another, such as the cusps where the arms of the dancing figure joins the body; the second part of Marr's paper deals with instances of this kind and shows how silhouettes can be decomposed into representations of two or more cones. Other cusps, such as the one between the legs of the seated figure, arise where one form overlaps another. At a sufficiently coarse level of description, convexities and concavities in silhouettes and 3-D shapes are irrelevant to their extendedness: thus the silhouette of the seated figure as a whole can be described as extended, leading to its interpretation as an extended volume.(3) To obtain a finer level of shape description the silhouette would have to be decomposed into a number of 'long' and 'round' regions, corresponding in the Rites of Spring to the heads, arms, legs, etc; and the following analysis, like Marr's, assumes that such decompositions can be achieved in the interpretation of complex pictures. In addition, Marr's analysis is only able to deal with views of generalized cones seen from what he calls "favourable" views: it is not intended to apply to objects seen end-on, like the top view of a bucket which Marr and Nishihara (1978) give as an example, and which Warrington and Taylor (1973) call "unconventional". There seem to be two separate reasons for this restriction. The first is that it is difficult to extract the axis from a foreshortened view of an object like a bucket, because the projection of the axis is so short that it is hard to find the axis of symmetry of the silhouette. The other reason is that a strongly foreshortened view disguises the variations in cross section along the length of the cone: "The skeleton ceases to be a reasonable approximation to the contours that occur in the image whenever the viewing angle is such as to make the projection of the length of the cone less than the orthogonal (3) Corresponding to the representation of a figure by a single cylinder in Marr and Nishihara (1978, figure 8).

488

J Willats

projection of its width. For such views, the methods of this article will fail" (Marr 1977, page 453). Notice, however, that this restriction would, apparently, prevent the analysis of the contours of a disc, because the projection of the axis of a disc will always be less than the orthogonal projection of its width, even if the viewing direction is not foreshortened. Perhaps this is one reason why Marr did not include discs in his list of primitives at the 3-D model level. Richards et al (1985) proposed an alternative method for inferring 3-D shape from the 2-D curves, which they called codons, that define the outlines of silhouettes; and like Marr they express an objection to views that disguise undulations in the contour: "We begin by examining the simple outlines of figure 2, the 'ellipse', 'peanut' and 'dumbbell' [...]. The simplest of these three outlines is the ellipse, which we naturally interpret as the silhouette of an ellipsoid or 'egg'. But why? If the outline is a special view of an object, such as the 'end-on' view of the dumbbell or peanut, we could be fooled. Our interpretation thus assumes that our view is such that none of the bumps or dents of the object are occluded or invisible." (Richards et al 1985, pages 5-6)

These considerations led Richards and coworkers to distinguish between what they called "generic" and "canonical" views. A generic view is one for which a slight shift in the viewpoint does not "change the topology of the viewed surface". A canonical view is a special case of a generic view, whose silhouette is smooth (without cusps) and which reveals all the undulations of the surface. An end-on view is not just misleading as to surface undulations, however. A top view of a peanut (though not a dumbbell) would disguise the surface undulations, but would still show that the peanut was long rather than round. Conversely, an end-on view of an ellipsoid would be both generic and canonical, but would still be misleading, because the silhouette would be round rather than long. 5 Representative views Existing accounts of how we infer 3-D shapes from silhouettes thus rely on the assumption that undulations in the boundaries of the silhouette reflect corresponding undulations in the viewed surface. This leads to the assumption that the silhouette does not show an "unfavourable" or "unconventional" or "special" view of the object which disguises the convexities and concavities of the viewed surface. The following analysis relies on the alternative assumption that the extendedness of the silhouette reflects the extendedness of the viewed shape. This in turn relies on the assumption that the silhouette does not show what I shall call an 'unrepresentative' view which disguises the extendedness of the object. I shall call 'representative' those views which do reflect the extendedness of an object: Definition Dl: A representative view (defined as a 2 U or a 2 10 region) is one which reflects the extendedness of the viewed 3-D shape. According to this definition, the representative view of a round shape or 'lump' is a round region: Corollary CI: The representative view of a 3 1 U volume is a 2 U region. Similarly, the representative view of a long shape or 'stick' is a long region: Corollary C2: The representative view of a 3 100 volume is a 2 10 region. However, there do not seem to be representative views for flat shapes or 'slabs': a side-on view of a slab disguises the fact that it is round, and a top view disguises the fact that it is flat. Corollary C3: There is no representative view of a 3{ 10 volume.

Seeing lumps, sticks, and slabs in silhouettes

489

6 'Likely' and 'unlikely' views Although a round region is possible as a view of a long object it would seem unnatural as a representation. As Stern (1930) said, a long stroke is a "natural" symbol for an arm or a leg; and as Richards et al (1985) say, we "naturally" interpret an extended region, such as an ellipse, as the silhouette of an extended object like an ellipsoid or an egg. How do such "natural" associations come about? Presumably as the result of an association between the shape of an object and the view of that object which is most frequently encountered. I shall call the view which an organism encounters most frequently in its environment a 'likely' view: Definition D2: A likely view is that view (defined as a 2n or a 2 10 region) of a 3-D shape which an organism encounters most frequently in its environment. Because the projection of a lump is always a round region, this will be the likely view under all circumstances: Corollary C4: The likely view of a 3111 volume is a 21 x region. Whether the likely view of a stick will be a side-on or an end-on view (ie a 2 10 region or a small 2 n region) will depend on environmental factors, since it is possible to imagine bizarre environments in which an organism will only encounter extended shapes end-on. Given the restriction that all directions of view are equally likely, however, the projection of a stick in an end-on position may plausibly be described as an unlikely view, because this particular view, whose silhouette is qualitatively different from the silhouettes corresponding to all the other views, is unlikely to be encountered very frequently. Figure 4 shows the silhouettes of a stick resulting from forty-one viewing directions spaced at roughly equal angles in azimuth and elevation over the viewing hemisphere. Subjectively, only one of these silhouettes is round; the rest are long.

*

*





^ »

Figure 4. Forty-one views of a stick spaced at roughly equal intervals over the viewing hemisphere. The dotted lines separate long regions from round regions. There are forty long regions but only one round region. Restriction Rl: All directions of view are equally probable. Corollary C5: Given R l , the likely view of a 3 1 0 0 volume is a 2 10 region.

490

J Willats

However, there does not seem to be any one likely view for a slab. Figure 5 shows silhouettes of a 'slab' corresponding to forty-one viewing directions, and it is apparent that there are about equal numbers of long and round regions. Consequently, we cannot say that either a long region or a round region is 'more likely' as a view of a slab.

Figure 5. Forty-one views of a 'slab' spaced at roughly equal intervals over the viewing hemisphere. The dotted lines separate long regions from round regions. There are twenty-three long regions and eighteen round regions.

Corollary C6: Given R l , there is no likely view for a 3 110 volume. Although it seems natural to us as humans that they should do so, the representative and likely views of an object do not necessarily coincide, and it is possible to imagine unusual environments in which they might not. However, inspection of CI to C6 shows that representative and likely views coincide if all the directions from which an object might be seen are equally probable: Corollary C7: Given R l , the representative and likely views of all objects coincide. This establishes a connection between the formal relations between an object and its silhouette, and the environmental factors which might make such a connection seem perceptually natural. This in turn suggests how such a connection might become established in the course of evolutionary development. The restriction that all directions of view are equally probable is motivated by ecological considerations rather than by mathematical ones. Any organism whose well-being depends on distinguishing between lumps and sticks, and whose visual system consists only of the ability to distinguish between round regions and long regions in silhouettes projected onto a single retina, will only be able to learn to associate round regions with lumps and long regions with sticks with any reliability if end-on views of sticks are unlikely: that is, if they are only encountered infrequently. If sticks are usually encountered end-on such an organism will have difficulty in learning to distinguish lumps from sticks on the basis of single, static, retinal images. Would such an organism be able to distinguish slabs from lumps and sticks? The answer seems to be that it would not if there were about equal numbers of lumps, sticks, and slabs in its environment, so that the probabilities of encountering each of these objects will be equal. Because the projection of a lump is always a round region, but the projection of a slab is only a round region in roughly half the views

Seeing lumps, sticks, and slabs in silhouettes

491

encountered, the chance of a round region being a view of a lump will be about double the chance of it being the view of a slab. Similarly, the chance of a long region being a view of a stick will be about double the chance of it being a view of a slab. Even if distinguishing between slabs on the one hand and lumps and sticks on the other has important consequences for the organism, it will have difficulty in making this distinction on the basis of static silhouettes alone. Evolutionary pressure will drive such an organism to conclude that all round regions are views of lumps and all long regions are views of sticks. Restriction R2: The universe contains equal numbers of lumps, sticks, and slabs. Corollary C8: Given R l and R2, slabs cannot be inferred from silhouettes. As an illustration of this argument the values for the relative probabilities of round regions and long regions being views of lumps, sticks, and slabs in silhouettes, given R l and R2, are shown in the appendix. These values show that the probability of a round region being a view of a lump is better than 1 in 2, being about 3 in 5 in fact, or p = 0.68; and the probability of a long region being a view of a stick is about the same, ie also about 3 in 5, or p = 0.63. In contrast, the probability of a round region being a view of a stick is very low, being about 1 in 60, or p = 0.017. All these figures seem to agree with our intuitions; a round region in a silhouette will usually be interpreted as a lump, and a long region will be interpreted as a stick. In contrast, a round region is highly improbable as a view of a stick; and in consequence, it is very difficult to portray a stick in a fully foreshortened position (that is, pointing directly towards the viewer) in a silhouette, in such a way that it will be correctly interpreted as a view of a foreshortened stick rather than as a small lump or bead.(4) On the other hand, the probability of a round region being a view of a slab is about 1 in 3, or p = 0.30; and this is about the same as the probability of a long region being a view of a slab, also about 1 in 3, or p = 0.37. Again, these values seem to agree with our intuitions; round regions and long regions are both improbable, but in this case only mildly improbable, as views of slabs. Thus in pictures which consist only of silhouettes, round regions and long regions are both unsatisfactory as representations of slabs or discs, in Wollheim's sense, although not as unsatisfactory as are round regions as representations of sticks. The values of these probabilities depend on assuming R l and R2, and in the case of picture perception R2 amounts to assuming that we have no preconceptions about what the regions in a silhouette are supposed to represent. If R2 is altered, the values for the probabilities will also alter. For example, R2 might be rewritten to read "the universe contains twice as many slabs as it does lumps and sticks". In picture perception this amounts to a preconception in favour of seeing slabs, rather than lumps and sticks, in regions in silhouettes. The probabilities of round regions and long regions being views of slabs are then improved to about 1 in 2 in each case (p = 0.46 and p = 0.53, respectively; see the appendix). Again, this seems to agree with our intuitions; with prior knowledge of what the regions at the ends of the arms of the dancing man in the Rites of Spring are intended to represent, seeing discs in these regions becomes more plausible, though still, I think, rather difficult. Such effects are rather subtle, too subtle for us to be able to trust our intuitions on their own as a test of the theory. They can, however, be tested experimentally. (4)

This argument should not be taken to mean that we cannot see unlikely objects in pictures. We can readily recognize angels and unicorns in pictures, however unlikely we are to encounter them in real life. Conversely, Biederman et al (1983) have shown that expectancy or familiarity confer no benefit in the interpretation of pictures of incongruous scenes. It is necessary to make a distinction between the perception of improbable objects, and the perception of shapes seen from "unfavourable", "unconventional", "special", or 'unlikely' directions of view.

492

©

J Willats

7 Experimental evidence It is often said, following Luquet (1927), that young children draw what they know and older children draw what they see. Thus young children will draw what they know of a stick—that it is long—by using a line or a long region to represent it, even when it is presented to them in a foreshortened position. Older children on the other hand (children above the age of about 7 or 8 years, according to Piaget) can be expected to change their drawings in order to take account of the particular view presented to them, so that a stick presented end-on will be drawn as a dot or a small circle (Morss 1987; Piaget and Inhelder 1956). Piaget also asked children to draw foreshortened discs. Although he does not provide values for the frequencies of children able to represent the foreshortening of sticks and discs at different ages, he gives the impression that the frequencies were about the same in each case. This would follow logically from his explanation that the difficulty which children have in representing foreshortening arises from the young child's lack of any conscious awareness or his or her own viewpoint, a concept which is, presumably, independent of the shape of the viewed object. The analysis given above suggests, however, that there might be another reason why young children find it difficult to represent foreshortened sticks; this difficulty arises from the constraints of drawing as a representational system rather than from any difficulties which the child might have in perceiving views as such. A true view of a foreshortened stick, in silhouette, takes the form of a small round region; but such a view is not representative, because it disguises the fact that sticks are long. A child wishing to draw a foreshortened stick by using regions as picture primitives is thus faced with a dilemma: a small round region may show a true view of a foreshortened stick, but it does not provide a satisfactory representation. However, the analysis given above suggests that this argument should not apply to foreshortened discs. Because long regions and round regions are about equally probable as views of discs, they ought to be about equally good as representations (although both are mildly unsatisfactory); so children who are old enough to draw views ought to be equally willing to use long regions or round regions in order to represent discs, depending on the view presented to them. Thus if long regions and round regions are about equally likely as views of discs, but round regions are highly unlikely as views of sticks, then older children ought to be more willing to use long regions to represent foreshortened discs than they would be to use round regions to represent foreshortened sticks. This prediction was tested in an experiment (Willats 1992) in which children aged 4, 7, and 12 years old were asked to draw sticks (the arms of a wooden test figure) and a disc (a plate held by the figure) presented to them in foreshortened and nonforeshortened positions. As predicted, there were significantly fewer children in the older two age groups who represented a foreshortened stick by a drawing whose outline took the form of a round region, compared with the frequencies of children who represented the foreshortened disc by a long region (p < 0.005 for each age group). In addition, the 12-year-olds used a different and more effective mark system compared with the 7-year-olds. Most of the 7-year-olds used single regions (in effect, silhouettes) to represent both the arms and the plate; and as the analysis given above suggests, foreshortened sticks, and discs in any position, cannot be satisfactorily represented within this system. Most of the 12-year-olds overcame this difficulty by using a mark system, similar to that shown in figures 3b, 3c, 3e, and 3f, in which lines were used to stand for occluding contours and interior edges (p < 0.001, comparing the mark systems used by the 7-year-olds and 12-year-olds both for the arms and for the plate). The hypothesis tested in this experiment depended on two premises, both suggested by the analysis proposed in this paper: that a small round region provides

Seeing lumps, sticks, and slabs in silhouettes

493

an unsatisfactory representation of a foreshortened stick, and that round regions and long regions are approximately equally as good as views of discs, although both are mildly unsatisfactory. The results of the experiment support this analysis. 8 Discussion Distinguishing between 'long' and 'round' images on a retina would seem to require only a very simple perceptual mechanism: one which can separate figure from ground, find the major and minor axes of the figure, and compute the ratio between them. Only a very coarse retina (one having few receptors) would be needed, and the image could be quite fuzzy at the edges because it would not be necessary to compute the shape of the boundary. An organism equipped with a mechanism of this kind ought to be able to distinguish between lumps and sticks in the environment, but would not be able to detect the presence of slabs. Of course, human beings can make much finer discriminations than this. In real scenes we can easily recognize slabs by means of the usual depth and shape cues such as stereopsis, motion parallax, etc, and normal subjects can recognize stick-like objecs presented to them end-on, however unlikely such views might be [but cf Warrington and Taylor (1973)]. No doubt the detection of contours plays a large part in this kind of scene perception, as Marr and other writers have suggested. The convexities and concavities of the outlines of silhouettes in pictures can also carry information about the shapes of objects, provided the outlines are carefully drawn: examples are eighteenth and nineteenth century portrait silhouettes, and Rubin's figure - ground reversal pictures. Nevertheless, the important role played by extendedness in the representation of shape in language, in children's early speech, and in children's early drawings suggests that the extendedness principle may play a role in internal shape descriptions, and that our visual system may contain a perceptual mechanism which can extract coarse shape descriptions from the extendedness of silhouettes. If this is so, it might explain why we can see lumps and sticks, but not slabs, in silhouettes with irregular or indeterminate outlines. Such a mechanism might be useful in the perception of the shapes of objects seen in extreme conditions: when an object is seen from far away, for example, or is seen through mist or rain, or under poor lighting conditions, ie when details of the convexities and concavities of the outline cannot be perceived. A mechanism of this kind might also be useful in extracting coarse shape descriptions from gross silhouettes before the final details are perceived. Even dividing the world into lumps and sticks, or combinations of lumps and sticks, might afford a useful degree of preprocessing, providing that the mechanism servicing this initial categorization were fast and economical. More complex primitives might then be derived from these basic prototypes by using shape modifiers such as 'being bent' or 'being pointed', an approach similar to that described by Hollerbach (1975). For example, the arms of the dancing man in the Rites of Spring might be described as 'bent sticks'; and the horns of the goat as sticks which are both bent and pointed. 'Being bent' is a shape property which can readily be recognized in silhouettes, and this is as one might expect because there is a high probability of the curvature of the axis of a stick being reflected in its silhouette. Moreover, it seems likely that it would be possible to extract this property from a coarsely specified silhouette without reference to the shape of its outline by measuring the distance between the spine of the region and its major axis. At some point, however, this approach will cease to give sufficient information (as in the case of slabs and discs), and the detection of the shape properties of the outline and of internal discontinuities in the image, representing features like edges, textures, and tonal modelling, will become crucial.

494

J Willats

Biederman has proposed the existence of a set of generalized-cone components, called geons (N < 36), as volumetric shape primitives, and showed that many of these primitives could be derived from simpler primitives by modifiers such as 'having straight edges' or 'having a curved axis' (Biederman 1987, figures 6 and 8). Biederman showed that objects made from such components can readily be recognized in line drawings in which the lines stand for contours and internal edges. By using a similar approach to that described in section 5 it ought to be possible to predict theoretically what 3-D shape features, other than extendedness, should be recognizable in coarsely specified silhouettes, and at what point it is necessary to resort to the use of line drawings. It might then be possible to test these predictions experimentally by using methods similar to those described by Biederman. 9 Conclusion For representations of lumps and sticks in silhouettes the analysis described above leads to results similar to those found by Marr (1977). It has fewer restrictions than Marr's analysis, however, and can deal with silhouettes of all kinds of shapes, even very irregular ones. O n the other hand, as it stands, it leads only to a very coarse level of description, and does not, for example, detect Gaussian features such as bumps and dents, as do the analyses described by Marr (1977) and Richards et al (1985). It is, however, possible that this approach might be extended by detecting the presence of shape modifiers in regions which reflect corresponding shape properties in objects or components of objects in the scene. The main features claimed for the analysis described in this paper are that: it deals with shape descriptions at a more basic level than that of surface undulations; it is not restricted to the analysis of shapes having smooth surfaces; it is exhaustive; and it deals, as Marr's does not, with flat shapes like slabs and discs, and shows why it is difficult to see these shapes in silhouettes. Acknowledgements. I am deeply indebted to Dr Peter Denny of the Department of Psychology, University of Western Ontario for a number of discussions with him on the extendedness principle and the representation of shape in language. I am also indebted to Dr Norman Freeman of the Department of Psychology, University of Bristol for his comments on the role of the extendedness principle in children's drawings, and to Mr W G T Willats of the Department of Pure and Applied Biology, University of Leeds and to two anonymous reviewers, for their helpful comments on earlier versions of this paper. References Adams K L, Conklin N F, 1973 "Towards a theory of natural classification" Papers from the Ninth Regional Meeting of the Chicago Linguistics Society, pp 1-10 Biederman I, 1987 "Recognition-by-components: A theory of human image understanding" Psychological Review 94 115-147 Biederman I, Teitelbaum R C, Mezzanotte R J, 1983 "A failure to find a benefit from prior expectancy or familiarity" Journal of Experimental Psychology: Learning, Memory, and Cognition 9 441-429 Caron-Pargue J, 1985 Le Dessin du Cube chez VEnfant (Berne: Peter Lang) Clark E V, 1976 "Universal categories: On the semantics of classifiers and children's early word meanings" in Linguistic Studies Offered to Joseph Greenberg on the Occasion of His Sixtieth Birthday volume 1, Ed. A Juilland (Saratoga,CA: Anma Libri) pp 449 - 462 Clowes M B, 1971 "On seeing things" Artificial Intelligence 2 79-116 Denny P J, 1978 "The 'extendedness' variable in classifier semantics: universal features and cultural variation" in Ethnolinguistics: Boas, Sapir and Whorf Revisited Ed. M Mathiot (The Hague: Mouton) pp 97-119 Denny P J, 1979a "Semantic analysis of selected Japanese numeral classifiers for units" Linguistics 17317-335 Denny P J, 1979b "Two notes on Ojibway shape roots" Algonquian Linguistics 4 2 6 - 2 7 Hoffman D D, Richards M, 1984 "Parts of recognition" Cognition 18 65 - 96

Seeing lumps, sticks, and slabs in silhouettes

495

Hollerbach J M, 1975 Hierarchical Shape Description of Objects by Selection and Modification of Prototypes MIT Artificial Intelligence Laboratory Technical Report No. 346 (Cambridge, MA: MIT Press) pp 1-237 Huffman D A , 1971 "Impossible objects as nonsense sentences" in Machine Intelligence volume 6, Eds B Meltzer, D Mitchie (Edinburgh: Edinburgh University Press) pp 295 - 323 Koenderink J, 1984 "What does the occluding contour tell us about solid shape?" Perception 13 321-330 Luquet G H, 1927 Le Dessin Enfantin (Paris: Ale an) MarrD, 1977 "Analysis of occluding contour" Proceedings of the Royal Society of London, Series B 197 441-475 MarrD, 1978 "Representing visual information: A computational approach" in Computer Vision Eds A R Hanson, E M Riseman (New York: Academic Press) pp 61 - 80 Marr D, 1982 Vision (San Francisco: W H Freeman) MarrD, NishiharaHK, 1978 "Representation and recognition of the spatial organization of three-dimensional shapes" Proceedings of the Royal Society of London, Series B 200 269 - 294 Morss JR, 1987 "The construction of perspectives: Piaget's alternative to spatial egocentrism" International Journal of Behavioral Development 10263-279 Piaget J, Inhelder B, 1956 The Child's Conception of Space (London: Routledge and Kegan Paul) Reed H, 1956 "Judging distances" in The Penguin Book of Contemporary Verse Ed. K Alott (Harmondsworth, Middx: Penguin Books) pp 238 - 239 Richards W, Koenderink J J, Hoffman D D, 1985 Inferring 3D Shapes from 2D Codons MIT Artificial Intelligence Laboratory Memo No. 840 (Cambridge, MA: MIT Press) pp 1 - 1 5 Rosch E, 1973 "On the internal structure of perceptual and semantic categories" in Cognitive Development and the Acquisition of Language Ed. T E Moore (New York: Academic Press) pp 111-144 Stern W, 1930 Psychology of Early Childhood (London: George Allen and Unwin) Waltz D, 1975 "Understanding line drawings of scenes with shadows" in The Psychology of Computer Vision Ed. P H Winston (New York: McGraw-Hill) pp 19 - 91 Warrington E K, Taylor A M, 1973 "The contribution of the right parietal lobe to object recognition" Cortex 9 1 5 2 - 1 6 4 Willats J, 1985 "Drawing systems revisited: The role of denotation systems in children's figure drawings" in Visual Order: The Nature and Development of Pictorial Representation Eds N H Freeman, M V Cox (Cambridge: Cambridge University Press) pp 78 -100 Willats J, 1987 "Marr and pictures: An information processing account of children's drawings" ArchivesdePsychologie 5 5 1 0 5 - 1 2 5 Willats J, 1992 "The representation of extendedness in children's drawings of sticks and discs" Child Development 63(3)692-710 WollheimR, 1977 "Representation: The philosophical contribution to psychology" in The Child's Representation of the World Ed. G Butterworth (New York: Plenum Press) pp 173-188

496

J WJIIats

APPENDIX If the regions in the two outer rings in figure 5 are judged to be 'long', and the regions in the three inner rings are judged as 'round', then over the viewing hemisphere in this example there are: 8 + 15 = 23 long regions as views of slabs, and 11 + 6 + 1 = 18 round regions as views of slabs. In figure 4 there are: 40 long regions as views of sticks, 1 round region as a view of a stick. To these we add 41 regions as views of lumps. Given Rl and R2, the probability of encountering: a round region as a view of a lump is 41/(41 + 1 +18), or p = 0.68; a long region as a view of a lump is 0, or p = 0.00; a round region as a view of a stick is 1/(41 + 1 +18), or p = 0.017; a long region as a view of a stick is 40/(40 + 23), or p = 0.63; a round region as a view of a slab is 18/(41 + 1 +18), or p = 0.30; a long region as a view of a slab is 23/(40 + 23), or p = 0.37. If R2 is rewritten as "the universe contains twice as many slabs as it contains either lumps or sticks", then the values for the probabilities are altered, so that the probability of encountering: a round region as a view of a lump is 41/(41 + 1 + 36), or p = 0.53; a long region as a view of a lump is still 0, or p = 0.00; a round region as a view of a stick is 1/(41 + 1 + 36), or p = 0.013; a long region as a view of a stick is 40/(40 + 46), or p = 0.47; a round region as a view of a slab is 36/(41 + 1 + 36), or p = 0.46; a long region as a view of a slab is 46/(40 + 46), or p = 0.53.

p

© 1992 a Pion publication printed in Great Britain

Seeing lumps, sticks, and slabs in silhouettes.

Marr has suggested that we see three-dimensional (3-D) shapes in silhouettes because we make the implicit assumption that the viewed shapes are genera...
2MB Sizes 0 Downloads 0 Views