Feature Article

Unique Character Instances for Crowds Jonathan Maïm, Barbara Yersin, and Daniel Thalmann ■ VRlab, Swiss Federal Institute of Technology

C

rowds are part of our everyday life and are essential when working with realistic interactive environments. Application domains for such simulations range from entertainment and populating artificial cities to VR exposure therapy for crowd phobia. Simulating large crowds at high frame rates involves using several levels of detail (LODs). Characters close to the camera are accurately rendered and animated with costly methods, while those farther away have Real-time crowd simulations less-detailed, faster representaare realistic only if each tions. The common process is to human instance looks use many instances of a small unique. A proposed solution set of human templates—that is, varies the shape of human virtual-human types identified instances by attaching by their mesh, skeleton, textures, and LOD. accessories. It also modifies Instantiating many characters the instances’ appearance with from a limited set of human a generic technique based on templates leads to multiple simisegmentation maps that can lar characters everywhere in the generate detailed color variety scene. However, creating an inand patterns. dividual mesh for each character isn’t feasible because of the high design and memory requirements. Thus, solutions are required to modify each instance so that it differs visually from all the others. Such methods must also be scalable for all LODs to avoid inconsistencies in the individual appearances. We introduce a fast, scalable technique to obtain unique characters from a small set of basic human templates. We mainly focus on real-time applications where the visual uniqueness of the characters in a crowd is paramount and the crowd includes several thousand virtual humans. To vary a character’s shape, we introduce accessories—simple meshes attached to the individuals to make them unique. To vary a character’s appearance, we introduce segmentation maps—special textures to 82

November/December 2009

identify body parts. This novel technique enables smooth transitions between body parts and enhances character and accessory visual appearances with distinctive details, such as makeup or fabric patterns. Figure 1 shows the results in a crowd scene where each character is different from all others and the visual quality is highly detailed. Both methods are scalable and thus keep the display of crowds consistent for any LOD. The cost of rendering crowds certainly increases with the number of polygons to display. So, when an application requires real-time performance, we must balance the number of worn accessories with the number of characters constituting the crowd. Even so, our framework can fully simulate thousands of accessorized characters at high frame rates.

Crowd Simulations Crowd applications mainly use the three LODs depicted in Figure 2. ■■

■■

■■

classical deformable meshes, a skeleton enveloped and skinned to perform skeletal animations; rigid meshes, precomputed geometric postures of a deformable mesh; and impostors, character representations with only two textured triangles forming a quad.

Deformable meshes are altered by the online computation of their skeleton movements. This method is more expensive than using rigid meshes,1 but it allows for special animations chosen or produced at runtime, such as looking at the camera (see Figure 2a) or mimicking facial expressions. Because impostors require only two triangles per character, they’re the most exploited LOD in the crowd simulation domain (see the sidebar, “Related Work on Crowd Simulations”). Their main advantage is rendering efficiency.

Published by the IEEE Computer Society

0272-1716/09/$26.00 © 2009 IEEE

Accessories Developing unique characters in a crowd requires differences at the shape level. Accessorizing crowds offers a simple, efficient alternative to costly human template modeling. Accessories are small meshes representing elements that can easily be added to the human template original mesh. Their range is considerable—from subtle details such as watches, jewelry, or glasses to larger items such as hats, wigs, or backpacks (see Figure 1). Distributing accessories to a large crowd of a few human templates varies each instance’s shape, thus making it unique. Similarly to deformable meshes, accessories are attached to a skeleton and follow its animation when moving. To keep the method simple and generic to all human templates, we assume that accessory vertices are all attached to the same joint—that is, the accessory mesh isn’t deformed. For example, a hat would be attached to the skull joint. Modeling an accessory involves two steps that are important to later placing and orienting it correctly for all human templates. First, we identify the joint to which we want to attach the accessory. In most cases, we select the same joint for all human templates. However, when two human templates differ too much in size, the best-adapted joint can differ— for example, a backpack would attach to a different vertebra on a child and an adult template. Second, we transform the accessory to perfectly coincide with each human template’s mesh. These changes are expressed relatively to the attach joint as a 4 × 4 transformation matrix Taccessory, saved for each human template and accessory combination. At the crowd simulation’s initialization, all characters are assigned various accessories that are then displayed at runtime. These assignments aren’t random. Each accessory is categorized according to a specific type (such as backpack, hat, or glasses) and a specific theme (such as casual, old-fashioned, or funny). The designer can choose the human templates for which each accessory type will be available. This gives designers a wide variety of possibilities within each application. We compute the final 4 × 4 transformation matrix T in world space at each time step with this equation:  R t   = Tchar Tjoint Taccessory , T =   0 1  

(1)

where Tchar is the character transformation matrix in world coordinates, and Tjoint is the attach joint deformation matrix, relative to Tchar. Accessories are scalable to all LODs commonly used in crowd

Figure 1. A crowd scene. Five human templates taking full advantage of accessories and segmentation maps.

(a)

(b)

(c)

simulations. For deformable meshes, Equation 1 is computed directly at runtime. For rigid meshes and impostors, we developed dedicated methods, detailed in the following subsections. Note that switching from an accessorized deformable mesh to a rigid one is unnoticeable because they share the exact same appearance. To ensure transparent switching between accessorized rigid meshes and impostors, we use the pixel-to-texel ratio metrics described by Simon Dobbyn and his colleagues (see the “Related Work in Crowd Simulations” sidebar).

Figure 2. Three human levels of detail (LODs): (a) a deformable mesh, (b) a rigid mesh, and (c) an impostor.

Accessories for Rigid Meshes Rigid meshes are precomputed postures, or keyframes, of deformable meshes performing a given animation sequence. Their main advantage is that no dynamic deformation of the skeleton is necessary at runtime. However, this also implies that matrix Tjoint in Equation 1 isn’t available to place an accessory. We could take a naive approach to this problem by storing the accessory animation as we do for a rigid mesh. However, we instead exploit our assumption that the mesh is never deformed and save the Tjoint matrices to which accessories are attached every IEEE Computer Graphics and Applications

83

Feature Article

Related Work in Crowd Simulations

P

revious work directly related to this article includes work on how to introduce per-pixel depth computation for impostors. We also present work on segmenting a human texture into body parts to vary the color of characters in a crowd.

Depth Computation for Impostors Impostors let us represent a 3D object with only a 2D quad. Although impostors remain advantageous in terms of rendering efficiency,1 they present a major drawback when two of them are intersecting. Indeed, the visual result obtained when two impostors intersect rarely corresponds to the result obtained if the real 3D objects were used. Researchers have studied several approaches to solving this problem. Gernot Schaufler introduced nailboards, which eliminated visibility artifacts by storing small depth offsets in the alpha channel of their texture.2 However, this method is limited to orthographic or near-orthographic projections. Amaury Aubel and his colleagues also introduced a technique to avoid visibility issues by dividing a human impostor into a series of body parts—that is, into several quads, each assigned with a specific depth value.3 This technique is suitable when used only on body parts, but it doesn’t adapt to subtle cases, where pixels must be processed individually. The approach we detail in this article offers an accurate solution completely independent from the projection used.

Color Variety for Crowds Previous work on color variety is based on dividing a human template into several body parts, identified by specific intensities in the template texture’s alpha channel. At runtime, each body part of each character is assigned a color to modulate the texture. Franco Tecchia and his colleagues used several passes to render each impostor body part.4 Simon Dobbyn and his colleagues extended the method to 3D meshes and avoided multipass rendering with programmable graphics hardware.5 Although these methods offer nice results from a reasonable distance, they produce sharp transitions between body parts. Adopting the same idea, David Gosselin and his col-

leagues showed how to vary characters with the same texture by changing their tinting.6 They also presented a method to selectively add decals to the characters’ uniforms. However, their approach applies only to crowds of similar characters in uniforms. The differences they introduce aren’t sufficient for working with civilian crowds. Clothes simulation is another approach to further vary characters,7 but for real-time crowd applications, the animation remains too expensive to be used. Dobbyn and his colleagues recently proposed presimulating cloth mesh deformation and using the resulting animation at runtime on impostors.8

References 1. E. Millan and I. Rudomin, “Impostors and Pseudo-instancing for GPU Crowd Rendering,” Proc. 4th Int’l Conf. Computer Graphics and Interactive Techniques in Australasia and Southeast Asia (Graphite 06), ACM Press, 2006, pp. 49–55. 2. G. Schaufler, “Nailboards: A Rendering Primitive for Image Caching in Dynamic Scenes,” Proc. Eurographics Workshop Rendering Techniques, Springer, 1997, pp. 151–162. 3. A. Aubel, R. Boulic, and D. Thalmann, “Real-Time Display of Virtual Humans: Levels of Detail and Impostors,” IEEE Trans. Circuits and Systems for Video Technology, vol. 10, no. 2, 2000, pp. 207–217. 4. F. Tecchia, C. Loscos, and Y. Chrysanthou, “Image-Based Crowd Rendering,” IEEE Computer Graphics and Applications, vol. 22, no. 2, 2002, pp. 36–43. 5. S. Dobbyn et al., “Geopostors: A Real-Time Geometry— Impostors Crowd Rendering System,” Proc. Symp. Interactive 3D Graphics and Games, ACM Press, 2005, pp. 95–102. 6. D. Gosselin, P.V. Sander, and J.L. Mitchell, “Drawing a Crowd,” ShaderX3: Advanced Rendering Techniques in DirectX and OpenGL, W. Engel, ed., Charles River Media, 2004, pp. 505–517. 7. G. Ryder and A.M. Day, “Survey of Real-Time Rendering Techniques for Crowds,” Computer Graphics Forum, vol. 24, no. 2, 2005, pp. 203–215. 8. S. Dobbyn et al., “Clothing the Masses: Real-Time Clothed Crowds with Variation,” Proc. Eurographics Short Papers, Eurographics Assoc., 2006, pp. 103–106.

time we store a rigid mesh keyframe. We have identified six potentially “attachable” joints: the skull for glasses and hats, a vertebra for backpacks, elbows for watches and bracelets, and wrists for objects in the character’s hands, given there’s no finger animation. So, for one rigid-mesh keyframe, we need to store only six matrices representing Tjoint. The storage cost for a single keyframe is thus independent from the number of accessories, whereas the naive approach would require saving one keyframe for each of them. 84

November/December 2009

Accessories for Impostors Our approach generates accessories for impostors in a preprocess that samples image tiles all around the object in orthographic mode and saves them in 512 × 512 normal and UV maps. The usual approach to generate an impostor is to sample an object’s circumscribed sphere with spherical coordinates, although this leads to an excessive number of samples near the poles as compared to the equator. This approach also requires costly trigonometric computations to find the correct tile at runtime.

Figure 3. Impostor generation: (a) sampled tiles for a hat impostor creation, (b) the resulting UV map, and (c) the normal map. (a)

(b)

As Figure 3a shows, we instead apply a Sukharev grid on each spherical face of a cube.2 With this method, samples are more uniformly distributed, and finding the correct tile online sums up to a fast cube map lookup.3 In Figures 3b and 3c, we illustrate a UV and a normal map for a hat. The memory storage cost for an accessory impostor is constant and independent from the number of keyframes generated for a human impostor. More precisely, we store one UV map and one normal map per accessory in the DirectDrawSurface (DDS) format, DXTC5, for a total of only 680 Kbytes (including MIP maps). Some channels have more precision than others. So, to make sure the compressed maps have as few artifacts as possible, we use the most accurate channels to store the data we need. To correctly place an accessory impostor, our runtime pipeline (see Figure 4a) takes five steps. Step 1. Compute the accessory transformation matrix T with Equation 1. T provides the accessory’s orientation and position, should it be rendered as a 3D mesh. To obtain T, we precompute and save the six matrices Tjoint for each impostor keyframe. This approach is similar to the rigid-mesh approach because, like rigid meshes, human impostors involve no online skeletal deformation.

(c)

Runtime

Preprocess

Tjoint

1.

2.

(X, Y)

3.

4.

Zacc Zf -Zn Zc - n

(a)

5.

Compute T

Find correct tile

Offset tile

(b)

Orient tile

Compute fragment depth

(c)

Step 2. Retrieve a normal and a UV map tile, representing the accessory from the correct viewpoint. We deduce the adequate view by expressing the current camera position relatively to T—that is, in accessory space. The resulting direction vector determines the tile to use in the cube map.3

3D accessory center, projected onto the quad. This offset is computed in a preprocess, when generating each tile. At runtime, the translation part t of T is transformed into camera space and offset by (X, Y) to properly place the impostor. In this process, t is offset only on the X and Y axes in camera space. At this stage, the depth Z, where the quad must be rendered, is still the same as originally computed in T. (Step 5 details the algorithm to compute each fragment depth.)

Step 3. Compute the exact position of the accessory impostor—that is, the quad. This position doesn’t correspond to the one where the 3D mesh would be rendered because the centers of the quad and the 3D mesh don’t necessarily correspond. To correctly position an impostor at runtime, we save the offset (X, Y) between the 2D quad center and the

Step 4. Rotate the quad around the camera Z axis to orient it correctly. Figure 4b illustrates the necessity of this stage. The figure shows a side view of a hat positioned on a human impostor. The character wears the hat inclined backwards, so the quad must be rotated to correctly imitate the 3D hat orientation. We compute this rotation at



IEEE Computer Graphics and Applications

Figure 4. Placing an accessory impostor: (a) pipeline of accessory impostor positioning, (b) impostors with no occlusion treatment, and (c) impostors with our fragment depth computation.

85

Feature Article

(a)

(b)

(c)

(d)

Figure 5. Close-up on the transition between skin and hair. Artifacts appear using previous methods with (a) bilinear filtering and (b) nearest filtering. (c) Segmentation maps and bilinear filtering yield smooth transitions. (d) The segmentation map shows the detailed transition between skin (red channel) and hair (green channel) on the segmentation map.

runtime by first transforming the rotation part R of T into camera space and then extracting its Z axis component. Step 5. Compute the depth of each impostor pixel (or fragment). This avoids visual artifacts, such as those in Figure 4c. The main problem in addressing such an issue with a perspective projection is that depth values in the Z buffer don’t vary linearly. Our dedicated algorithm solves visibility problems inherent in impostors. Moreover, it’s robust and supports any projection, even perspective. Here, we explain first the important values to compute in a preprocess and then how to exploit these values in real time to determine each fragment’s depth. When each tile is generated offline, the near and far clipping planes are set as close to the object as possible. This gives the near plane the same depth as the accessory vertex closest to the camera. Then, the GPU passes the 3D accessory vertex depth values in camera space from the vertex to the fragment shader. These values are interpolated for each fragment. We then compute, for each fragment, a final depth zacc as the normalized distance between the fragment and the near plane: zacc =

86

zfrag − z n , z f − z n 

November/December 2009

(2)

where zn and zf are the near and far clipping-plane depths, and zfrag is the fragment depth value interpolated from the vertices. For each fragment, zacc is saved in one of the unused channels of the UV map (see Figure 3b). In addition, for each tile, two important parameters must be saved at this stage, still in camera space. The first is the distance zf - zn between the near and far planes, which later lets us denormalize zacc. The second is the distance zc–n between the depth of the 3D accessory center and the near plane. Step 5 of the runtime pipeline sends the four quad vertices untouched to the vertex shader, which transforms them into camera space. At this moment, these vertices all have the same depth as the 3D accessory center, defined in T. They then go to the fragment shader, where we compute each fragment’s final depth in two operations. First, we retrieve zacc from the UV map to be denormalized. Then, we compute the fragment’s depth in camera space (zcamera): zcamera = zfrag + z c−n − ( zacc ⋅ ( z f − z n )),  (3) where zfrag is the fragment depth value in camera space, interpolated from the depth of the quad vertices in the GPU. The first part of Equation 3 shows that the nearest possible depth for any fragment is zfrag + zc–n. If we wanted to render the accessory with its 3D mesh, this value would correspond to the depth of its vertex closest to the camera. From this distance, the second part of Equation 3 offsets each fragment by the denormalized value of zacc. On the basis of its initial computation (see Equation 2), this offset is at the maximum equal to the distance between zn and zf —that is, the 3D accessory vertex that’s the farthest from the camera. Finally, we use the perspective transform to express zcamera in the canonical view volume— specifically, between -1.0 and 1.0: depthfrag =

s f + sn 2.0 ⋅ s f ⋅ sn , + s f −n s f −n ⋅ zcamera 

(4)

where sn and sf are the depths of the scene’s near and far planes in camera space, and sf–n is the distance between them. In the OpenGL Shading Language, the depth assigned to each fragment ranges between 0.0 and 1.0. So, we must normalize Equation 1’s result. This method produces optimal results when the computation of depth values is performed for both human impostors and their accessories, as Figure 4c shows. Our accessory impostor method is also compatible with more sophisticated virtual-

Segmentation Maps Previous research on color variety offers methods to vary the instances of the same human template for viewing crowds at far distances (see the “Related Work on Crowd Simulations” sidebar). However, using a single alpha layer to segment body parts has two main drawbacks. It rules out using bilinear filtering on the texture, because incorrect interpolated values would be fetched in the alpha channel at body part borders, as Figure 5a shows. Moreover, for individuals close to the camera, the transitions between body parts tend to be too sharp—for example, between skin and hair, as Figure 5b shows. This is because you can’t associate a texel with several body parts at the same time. Character close-ups also require a new method capable of handling detailed color variety. Subtle makeup or detailed patterns on clothes greatly increase the variety of a single human template. Furthermore, changing illumination parameters of materials—for example, their specularity—provides more realistic results. Previous methods would require costly fragment-shader branching to achieve such effects. To overcome those methods’ drawbacks, we apply a versatile solution based on segmentation maps. For each texture of a human template, we create a series of segmentation maps. Each map is an RGBA image, delimiting four body parts—one per channel—and sharing the same parameterization as the human template texture, as shown in Figure 6a. This method allows each texel to belong partially to several body parts at the same time through its channel intensities. As a result, we can design transitions between body parts that are much smoother than in previous approaches, as shown in the Figure 5c and 5d close-ups. We empirically defined eight body parts per human template, requiring the use of pairs of segmentation maps—specifically, two additional RGBA textures. This lets us define several pairs per human template texture, creating different patterns as illustrated in Figure 6b. When a character is created, it’s assigned a unique set of eight RGB colors (Figure 6c) for its eight body parts (Figure 6d), randomly selected within constrained color spaces. For each pixel of a body, we compute a final color as a combination of these eight colors, weighted by the channel intensities of the segmentation maps. For instance, for a body pixel p with texture coordinates (u, v), we compute its final color cp as

Texture (a)

Segmentation maps

human impostor methods, such as the one that Ladislav Kavan and his colleagues developed.4

(b) (c) Red skin

Freckles

Make-up

(d) Figure 6. Appearance variety with segmentation maps: (a) a human template original texture, (b) several pairs of segmentation maps sharing the same parameterization, (c) examples of unique sets of eight colors for each character; (d) detailed effects obtained with segmentation maps and specularity parameters on faces (make-up, freckles, glossy lips, and so on) and on the whole body (cloth patterns, shiny shoes, and so on).

c p = ct ⋅





s∈(S1 ,S2 ) a∈(R ,G ,B, A)

I s,a (u, v ) ⋅ c bp ( s, a).

 (5)

Here, ct is the color of the original texture (Figure 6a) at coordinates (u, v). The identifier s represents either the first or the second segmentation map from the selected pair (S1, S2). The function defined as I(s,a)(u, v) is the intensity of the texel with coordinates (u, v) for channel a of segmentation map s, and cbp(s, a) is the corresponding body part’s color. Equation 5 computes a sum on two segmentation maps and four channels per segmentation map, totaling eight components for eight body parts. To compute each character’s final pixel color, we implemented a dedicated fragment shader. IEEE Computer Graphics and Applications

87

Feature Article 550 No segmentation map

95

Segmentation map

91 87 83 79 75

(a)

0

1 2 Number of accessories

Segmentation map

450 400 350 300 250

3

0

(b)

17,000

1 2 3 Number of accessories

17,000 Displayable characters at 30 fps

No segmentation map

15,000

Segmentation map

13,000 11,000 9,000 7,000 5,000 3,000

No segmentation map

500

Displayable characters at 30 fps

Displayable characters at 30 fps

99

Displayable characters at 30 fps

Figure 7. Performance results. The number of displayable characters at 30 frames per second with 0 to 3 accessories for (a) deformable meshes, (b) rigid meshes, (c) impostors without depth, and (d) impostors with depth computation.

0

(c)

1 2 Number of accessories

No segmentation map

15,000

Segmentation map

13,000 11,000 9,000 7,000 5,000 3,000

3

1 2 3 Number of accessories

(d)

For each character, this shader must know the corresponding set of eight colors cbp to be applied to the eight body parts. Sending this data as uniforms would be time-consuming. Instead, we precompute one set of eight unique body part colors per character and store them all in contiguous texels of a 1,024 × 1,024 lookup image. When filled, this lookup image is only sent once to the shader. To arrange as many sets of eight RGB colors as possible, we also exploit the lookup image’s alpha channel. Thus, we use only six RGBA texels per virtual human. So, we can store over 130,000 unique color combinations. Previous methods were limited to 4,096 combinations because they couldn’t address every row of the lookup image by using only the human texture alpha channel. Finally, to further

improve detailed variety, we assign each body part specularity parameters, which are sent directly to the GPU. Each segmentation map pair corresponds to two 1,024 × 1,024 textures of about 1 Mbyte each, compressed in DDS format (DXTC5). The designer chooses how many different pairs to create for each human template texture. To apply the correct colors to body parts, we must read the lookup image pixel by pixel. To ensure that the colors are correctly read, we don’t compress the lookup image, which is thus 4 Mbytes. The total memory storage cost for using our approach therefore depends only on the number of human templates exploited and the segmentation map pairs defined for each of them.

Table 1. Performance data for Figure 7. Number of accessories 0 Segmentation map

Without

Deformable meshes (Figure 7a) Rigid meshes (Figure 7b) Impostors without depth (Figure 7c) Impostors with depth (Figure 7d) 88

1 With

Without

2 With

Without

3 With

Without

With

95

87

90

86

87

84

86

80

532

493

420

373

348

334

311

300

16,950

12,103

7,600

6,000

4,850

4,345

3,948

3,418

n/a

n/a

7,800

5,160

4,800

4,158

3,948

2,990

November/December 2009

Segmentation maps are generic. You can use them in any number and on any textured object to vary its appearance—for instance, to segment building instances in a city model or to vary accessories (see Figure 1). They’re also scalable to any LOD, so they keep the character’s appearance consistent (see Figure 2, which illustrates this scalability on a single character rendered in three LOD).

Results We produced the following results on an AMD64 X2 5200 with 2 Gbytes of RAM and an Nvidia 7900 GTX 512-Mbyte graphics board. To render virtual humans and accessories, we used OpenGL pseudo-instancing. Figure 7 shows the number of characters, each wearing 0 to 3 accessories, that we can render at 30 fps. For deformable meshes (see Figure 7a), the number of displayable characters changes only slightly, because the pipeline bottleneck remains the mesh skinning. Also, each character has over 6,000 triangles—that is, 5 to 12 times an accessory triangle count. With rigid meshes (see Figure 7b), accessories are proportionally more costly. This is because there’s no skinning phase, and rendering an accessory triangle sums up to rendering a rigidmesh triangle. As for impostors (Figure 7c), displaying an accessory or a character has the same fixed cost—specifically, two triangles. The performance loss is therefore sizable. However, we can often avoid rendering the impostors of accessories when they are very small and indistinguishable at far distances. This is the case for glasses and jewelry, for instance. In this manner, we can render many more human impostors than depicted in Figure 7c. The fragment depth computation implies disabling early culling optimizations that the GPU performs. However, as Table 1 shows, the more accessories rendered as impostors, the less this computation affects the frame rate. Figure 7 and Table 1 also depict results with and without segmentation maps. For all LOD, using segmentation maps implies fewer characters because of the additional pixel color computation. However, this cost isn’t prohibitive. Finally, we can render large crowds with no accessories or segmentation maps, but they’re composed of numerous similar instances, which isn’t desirable. With our methods, instead of huge crowds of mirror image characters, we offer large crowds of unique instances. In Figure 1, we used only five human templates and instantiated them several times, fully exploiting their textures, accessories, and segmentation maps. We can also segment accessories to increase their variety, which illustrates the segmentation map method’s versatility.

(a)

(b)

A companion video to this article (http://vrlab. epfl.ch/umaim/cga.html) demonstrates the integration of our methods into complex real-time crowd applications. We simulate over 5,000 characters in an environment of 50,000 triangles, computing dynamic shadows for the characters and the scene, enabling navigation and collision avoidance, and using the three LOD common to crowd simulations. The scene represents a theme park with doorways clustering the environment into several atmospheres. When a character passes through one of them, its colors and accessories change according to the area theme. Figure 8 illustrates crowds benefiting from our techniques in the theme park. With accessories and segmentation maps, we obtain unique, visually appealing individuals.

Figure 8. Two theme park scenes. The accessories and segmentation maps in scenes (a) and (b) change when a character passes from one environment to the next.

W

e’ve developed both the accessory and segmentation map methods so that they are applicable to offline as well as real-time crowd simulations. Moreover, segmentation maps are sufficiently generic to be applied to accessories as well as characters. Currently we simply frustum-cull virtual humans and accessories. The next step would be to exploit hardware occlusion queries to avoid rendering unseen objects in the view frustum. To keep the accessorizing process as simple as IEEE Computer Graphics and Applications

89

Feature Article

possible, we made assumptions that limit our technique. First, we presume that the mesh is attached to a single joint. We could skin an accessory with more joints, but adapting it to any template without changing the mesh vertices would prove difficult. Second, to attach accessories to moving characters, we assume working with skeletons and skeletal animations, which isn’t necessarily the case. Nevertheless, we can easily overcome this limitation by attaching an accessory, for instance, to one of the character’s vertices instead of to a joint. Third, the technique doesn’t provide solutions for simulating movements independent from the attach joint, such as a too-big hat sliding on a child’s head. However, we could implement this as a supplementary layer on top of the current positioning algorithm. Finally, we can’t use some accessories if their weight would alter the character’s animation. For example, placing a handbag in a virtual human’s hand is easy. But if the animation sequence isn’t altered, the bag seems weightless, and the resulting effect isn’t realistic. However, the range of accessories unaffected by this limitation is large enough to obtain unique instances in crowds. Overcoming this issue is an interesting topic for future work and would be a first step toward characters interacting with their environment.

Acknowledgments We thank Mireille Clavien and Renaud Krummenacher for their work on designing the virtual humans, accessories, and scenes. The Swiss National Research Foundation sponsored this project.

References 1. B. Ulicny, P. De Heras Ciechomski, and D. Thalmann, “Crowdbrush: Interactive Authoring of Real-Time Crowd Scenes,” Proc. 2004 ACM SIGGRAPH/ Eurographics Symp. Computer Animation (SCA 04), ACM Press, pp. 243–252.

2. A. Yershova and S.M. LaValle, “Deterministic Sampling Methods for Spheres and SO(3),” Proc. IEEE Int’l Conf. Robotics and Automation, IEEE CS Press, 2004, pp. 3,974–3,980. 3. N. Greene, “Environment Mapping and Other Applications of World Projections,” IEEE Computer Graphics and Applications, vol. 6, no. 11, 1986, pp. 21–29. 4. L. Kavan et al., “Polypostors: 2D Polygonal Impostors for 3D Crowds,” Symp. Interactive 3D Graphics and Games, ACM Press, 2008, pp. 149–155.

Jonathan Maïm is a research assistant and PhD candidate at the VRlab at the Swiss Federal Institute of Technology in Lausanne (EPFL). His research focuses on building an architecture for simulating realtime crowds of thousands of realistic virtual humans. Maïm has a master’s in computer science from EPFL, after completing his master’s project at the University of Montreal. Contact him at jonathan.maim@ epfl.ch. Barbara Yersin is a research assistant and PhD candidate at the VRlab at the Swiss Federal Institute of Technology in Lausanne (EPFL). Her research interests focus on real-time applications, particularly crowd simulations. Her PhD thesis topic is real-time motion planning and behavior of crowds of virtual humans. Yersin has a master’s in computer science from EPFL, after performing her master’s project at the University of Montreal. Contact her at barbara. [email protected]. Daniel Thalmann is a professor and the director of the VRlab at the Swiss Federal Institute of Technology in Lausanne (EPFL). He’s a pioneer in virtual-human research. Thalmann has an honorary doctorate from University Paul-Sabatier in Toulouse. He’s coeditor in chief of the Journal of Computer Animation and Virtual Worlds and coauthor of Crowd Simulation (Springer, 2007). Contact him at daniel.thalmann@ epfl.ch.

Engineering and Applying the Internet

IEEE Internet Computing reports emerging tools, technologies, and applications implemented through the Internet to support a worldwide computing environment. For submission information and author guidelines, please visit www.computer.org/internet/author.htm 90

November/December 2009

Unique character instances for crowds.

Real-time crowd simulations are realistic only if each human instance looks unique. A proposed solution varies the shape of human instances by attachi...
7MB Sizes 0 Downloads 3 Views