Saturday, December 22, 2012

Small Object Allocator


Prior to the latest update of Granny Smith, we ran into memory problems when loading levels. This was due to the unfortunate choice of using XML and TinyXML for level data. I'm generally very positive to both XMl as a data format (as opposed to most game developers) and TinyXML as a library. However, since we don't use instancing, a level can easily be 500 kb or more and TinyXMl will parse the entire file into a DOM structure before it can be read. This results in a myriad of small allocations ranging from a few bytes up to about 100 bytes. I'm not sure exactly what the per-allocation overhead is on iOS and Android, but I suppose it's at least 16 or 32 bytes. In addition the LUA library also makes a fair amount of small allocations.

So far I had not bothered using a custom allocator for anything since I usually try to avoid allocations alltogether. Strings, dynamic arrays, stream buffers, etc, all have a templated built-in storage, so you can reserve a specific amount when they are declared:

So instead of:

std::vector<QiVec3> tmpVerts;

I use:

QiArrayInplace<QiVec3, 128> tmpVerts;

Which means I reserve memory for 128 verts on the stack before any dynamic allocation happen. Anyway, I decided to try a small object allocator for the TinyXML and LUA allocations and my choice fell on the simplest possible solution - a fixed memory area for small allocations.

The idea is to reserve a fixed amount of memory at application startup, say 2 Mb, devide it into sections of different sizes. In Granny Smith I ended up using the following sizes: 8, 16, 32, 48, 64, 96 and 128 bytes, but they can of course be chosen to match the most common object sizes etc. The number of slots for each size is based on real statistics from the game. In my case, the 48 byte allocation was the most common, so almost half of the preallocated space is reserved for that size.

Now, for allocating out of this pool I use a packed bitfield for each allocation size to track which slots are in use. I also track the lowest free slot for each size. In the worst case scenario the whole bitfield needs to be searched. This is a linear operation, but in practice it is very cheap - one can search 32 slots per instruction (actually 64 nowadays) and it's a contiguous memory block, so very cache-friendly. I use 12.800 slots for 48 byte allocations, which translates to 400 32-bit integer comparisons all lined up in memory. This is the worst case scenario. On average I typically only need to search a few bytes. Deallocation is basically fo free. The pointer adress reveals if it's a small object allocation (if the pointer lies within the preallocated memory block) and also what size the allocation is (all allocations of the same size are grouped together). The actual deallocation is done by clearing the corresponding bit and potentially update the lowest free slot.

The most obvious win here is not speed, but low memory overhead. It uses only one bit of memory overhead per allocation (plus potential padding bytes up to the smallest matching size), so doing 1.000 allocations of 8 byte each will require exactly 8.125 bytes of memory. Performance is great, but there might be other allocators out there that are even better. Since it is all allocated out of a contiguous memory block it is cache friendly and does not cause any fragmentation.

It all sounds great, but there is of course one huge downside to it - when you run out of slots your're out. The allocator can only be made this efficient by using a single preallocated memory block, so you need to know the memory requirements at load time. However, when running out of slots one can just start to use the regular malloc instead, also for small objects. These will of course have the regular overhead, but at least it "fails" gracefully. I'm really happy with the results so far. Levels load very noticably faster and the overall memory usage went down. For games in particular, where the memory allocation pattern and total memory usage is known at compile time, I think this is a pretty good choice for general purpose small object allocator.

Tuesday, December 4, 2012

Programmers are people


Something that annoyed me for many years now is how the games industry (well, software development community in general) treat individual developers as "resources" in a project schedule. It doesn't just sound awful and is a direct insult to the profession, it's also plain wrong, stupid and counter-productive.

Yeah, I know what you're thinking.. but yes I would argue programming is a creative line of work, you are designing and building something. Do you think a car manufacturer talks about "designer resources" when coming up with a new model? Does a TV show producer have "screenplay writing resources" (well maybe they do, I don't have a clue). Of course it matters who designs and builds something. You can throw hundreds of bad composers on a soundtrack, but it still sounds horrible.

How many weeks does task X take to complete? The answer should always be the same - it depends on who does it. There are no man-hours, man-weeks or man-months in this industry. Building software is not like building a house. There are always tons of vastly different solutions to a given task. There might even be a tool out there that already solved the problem. Maybe the task itself is an effect of a previous design mistake?

When loading boxes of oranges a good worker might be twice as fast a bad worker. An awesome worker might be three times faster, but that's really pushing it. In software a highly motivated, awesome programmer can easily replace a whole room full of mediocre (not Mediocre!) programmers and still produce way better results, fewer errors, more maintainable code at a fraction of the cost.

Finding such talented programmers is of course very hard, but I wouldn't say they are quite as rare as most people think - keeping an awesome programmer motivated is the bigger challenge. The magic only happens when a great programmer is given freedom and responsibility. Unfortunately very few people seem to have realized this - the typical scenario being a big company headhunting some of the best programmers in the world and pay them tons of money to sit in a cubicle and munch dull scrum tasks off a backlog. Not so awesome anymore? Seriously, what did you expect? If you treat great programmers as generic programming resources they will become just that.

Thursday, October 25, 2012

How Granny Got the Look


In my last post I mentioned briefly how all graphical objects in Granny Smith are made out of 2D polygons which are transformed into 3D at load time. It was never initially designed for the sometimes complex environments in the game, but we decided to stick to this method instead of involving a separate 3D modelling software. I think, at times real 3D objects could have come in handy, but overall the current workflow is preferable since it's much more efficient. There is no need to track separate files or assets - every level is totally self-contained. Because the 2D data is so small, we don't even use instancing, so there is no risk of trashing another level when altering objects.

This is how a factor level looks in the editor.


The most fundamental transform is a simple extrude, but we can also apply a chamfer or fillet in the process. This is used extensively, especially for round hills and other natural shapes in the game. This beveling is done by gradually shrinking the polygon while extruding it, in one or more iterations.

Shrinking or expanding a polygon is not as easy as it sounds. For "well-behaved" polygons there is no problem, but sometimes vertices disappear in tight corners, or new ones must be added. I tried several different ways of implementing this myself, but it's really hard to find a robust method that never fails, or even one that failes gracefully. After a few failed attempts, I found the awesome Clipper library, which can do all kinds of polygon operations, including expanding and shrinking. It's reasonably fast, very robust and super-easy to use, I highly recommend. Even when using the Clipper library it was not trivial to get the beveling to work correctly. The Clipper library does not track or correlate vertices, so after a shrink operation you have no idea how the new vertices correlate to the old ones, hence it is really hard to stitch the two polygons together with triangles. It took me a few tries to implement a robust stitching algorithm, but I finally came up with one that deals with all well-behaved polygons, and most (but not all) tricky corner cases.




Shadows also deserves a mention. I started experimenting with different methods for smooth soft shadows very early on. The traditional way of using precomputed low resolution shadow maps didn't quite fit our needs, because all geometry is generated on the fly, and levels can be quite large. Since most geometry in Granny Smith is front-facing, extruded, flat, 2D polygons I came up with a scheme where the shadows are semi-transparent triangles based on projecting the 2D polygons onto each other. Thanks to Clipper, I already had a great toolbox for this. Each polygon is expanded and clipped against overlapping polygons in the background. The resulting "shadow" polygon then uses vertex coloring to smooth out the penumbra and a special shadow shader to achieve quadratic fall-off. All these shadow triangles are put in separate vertex buffers and rendered simultaneously. The engine supports dynamic soft shadows as well, but it's rather expensive, because all the expensive geometry clipping needs to happen every frame, so it's only being used in a very few places in the game. Because shadows are computed from the 2D polygons, there will only be shadows in the XY plane. To somewhat overcome this I also added self-shadowing along the extruded surface using the same vertex coloring scheme, so creases get a darker tint.
The anatomy of Granny Smith

The characters are basically a composite of 2D sprites, but slightly rotated to compensate for the camera angle, so it's somewhere in between an oriented sprite and a billboard. There was never a discussion about using "real" 3D characters for this game. My personal opinion is that 2D character are highly underrated for this type of game. 2D characters obviously yield better performance, but I'd also argue they also look better in many cases, especially on low resolution devices. Polygon aliasing completely disappear, because all edges are drawn into the alpha channel of the texture and rendered with mipmapping, so the characters always look crisp and sharp.


Characters is not the only area where we combine 2D graphics with 3D objects. Grass and foliage are two examples. The grass is added procedurally, while the decals for trees and bushes are placed manually along the rim of the object to hide sharp polygon edges and add extra detal. The grass is rendered with a special vertex shader to sway in the wind.


After completing a level, you get a replay in vignette and sepia color and with occasional bluring plus added dust and scratches to give the impression of an old movie. The vintage effect is just a shader effect in a post pass, but the replay itself features alternative camera angles and slow-motion effects which took some time to get right. The camera angles are determined by what's going on in the game. For example, it often switches to slow-motion close up right before fracturing an object. The replay data is basically just recorded input for the player, plus position correction data in case of divergence, similar to a networked multiplayer game, but I also record "special events" that are used to trigger camera angles. The replay data is analyzed and all the camera angles are decided before the replay starts. Choosing camera angles on the fly wouldn't work since you want to switch camera before the action happens. Getting everything in the game to support slow-motion playback without diverging too much was a real challenge, and you can still see artefacts here and there. The replay system is also driving the apple thief playback (in very subtle slow-motion) during regular gameplay.


Tuesday, October 23, 2012

The Physics of an Old Lady


It's been almost two months since we released our second title - Granny Smith (available on App Store and Google Play). It's a platform racer starring an old lady chasing an apple thief through different environments. Our ambition was to make a smaller game than Sprinkle, but it turned out to be a lot more ambitious. I think mostly due to graphics. It started off as a 2D platformer with 3D-ish graphics - a few layers of parallaxed polygons, but the editor got gradually more advanced, and even though all graphics are still drawn as 2D polygons or splines, they can be extruded, beveled, rotated, textured, etc in heaps of different ways. In fact, all level data is still stored as 2D polygons and then transformed into 3D object at load time, making the level data extremely small. Even a complex level in the city environment is typically less than 30 kb in compressed form (not including textures).



I will talk about the physics in this blog post and save the graphics details for later. Just as with Sprinkle, we use Box2D for gameplay physics, with the addition of a special "motion joint", described in my previous blog post. In Sprinkle, each level was small enough keep all dynamic objects awake at all times, but in Granny Smith, the levels are much bigger, so objects are sleeping until touched and then completely disabled at a certain distance behind the player. Some of the more complex objects use fixed joints to glue objects together. It's a rather expensive method instead of just merging several objects into a single body, but 2D physics was never a bottleneck in this game, so I just went with the simplest solution.

Using a physics-based character for a fast-paced racing platformer was in retrospect quite a bold move, and we ended up implementing certain "speed normalization", "jumping aid" and "grabbing aid" sensors, manually placed in the level editor to help the player time jumps and release handles at exactly the right moment (pushing it more and more in the direction of an arcade game). This was by far the most tricky part of the whole project - to give the impression of a physics-based game while still making it playable and enjoyable. Without all the sensors, levels were practically unbeatable unless you followed the intended path very precisely.



The most interesting physics in Granny Smith is the fracture feature. We realized early on that we wanted a "wow"-feature in the game, similar to the water in Sprinkle, and our choice fell on fracture and breakable objects, mostly because it has not been used much in mobile games and it fits the setting nicely. I have always argued that when fracturing objects in games, it is very important that the fracture pattern is determined by the point of impact. Exactly how it breaks is less important. I also think it's very important that the broken pieces sum up to the original object. It sound obvious, but many games just replace the broken object with some random pieces when it breaks, which creates a very noticable pop. Lastly, I think the motion of the broken pieces must respond physically to the impact. To sum it up, it is very hard to cheat with breakage - at least from a geometric perspective. The exact breakage pattern is not that important, but it's highly relevant that it a) breaks at the point of impact, b) that the pieces sum up to the original object, and c) that they follow a realistic trajectory after the impact.

In Granny Smith, we only break flat objects, so the breakage actually happens in 2D. This simplifies things a lot. Breakable objects are special objects in the editor, and from a level design perspective they are always a straight line, which is then extruded and can be broken along the extruded surface - a quad. The quad is broken in real time, based on the point of impact. The breakage algorithm itself is rather dumb, basically just slicing polygons with random lines recursively (while preserving texture coordinates) until there is a certain number of pieces. Pieces that fall within a certain radius from the impact are made dynamic and the rest are marked static. Impulses from the impact are then applied to the pieces to ensure that they fly off in a natural trajectory. It took some tweaking to get it all right, but it turned out quite nicely in the end.



The broken off pieces are simulated using my own 3D rigid body solver and collision detection (see previous blog posts) in the simplest way possible. There is no contact manifold, no pair tracking, no friction tracking, just a single contact point per piece and no collision between pieces. There is a very lightweight sleeping algorithm to disable objects that fall below a velocity threshold. Once they fall asleep they never wake up, and objects that fall outside the screen are immediately removed. Because Granny Smith is a fast-paced game you typically only see a breakable object for a second or two, so the relative inacurracy in the brekable physics is very hard to notice. I think this is a quite good example of "effect physics" in 3D that don't need a full-fledged, generic rigid body simulator.

All pieces from a breakable oject are rendered in a single draw call using a dynamic vertex buffer. This is far more efficient than drawing each piece separately, even though dynamic buffers have a some overhead. On multi-core devices (most phones and tablets nowadays!) the 3D physics calculations take place on a separate thread.

Wednesday, May 2, 2012

Jump start jumps


I once heard that elegant code is in some respect contradictory to useful software, because it's often the sum of all exceptions that make a program useful, rather than it being strictly logical. Do you remember the frustration back when the javascript of slow-loading web pages was allowed to switch input focus to a login input box while you were typing a URL? The software was just following it's strict logic, switching input when it was told, while the better way of handling it would be a not-so-elegant check if the user have manually focused something else in between the start loading and finished loading, or a timer checking if the input focus change is still relevant.

I wouldn't say that this contradiction applies to all aspects of software development, but certainly to most parts that interface with people. In a game, the most common part of the code that interface with people, apart from the GUI, is caracter controllers.

One would think that modeling a character with physics would automatically yield natural and intuitive behavior. In reality it's almost the complete opposite - the more real physics you use to model your character, the harder they are to control. The most common way to write a character controller is probably to not use physics at all, but to just use one or several raycasts or shapecasts. But what if you want the character to interact physically with the environment?

I recently spent some time writing and tweaking the character controller of our next game. Without revealing too much, I can say it's not a sequel to Sprinkle and that it includes a character which can interact with the environment. I decided early on that a pure raycast/shapecast based method wouldn't suffice, because then the environment wouldn't respond. One can cheat to some extent by applying downward forces etc, but there will always be situations where it doesn't fully work.

Instead of representing the character with detailed geometry I use two circles stacked on top of each other, slightly overlapping. It's nice to have a round shape for the character, since you want it to slide easily over obstacles. These circles only represent the torso, so in addition I also do a couple of downward raycasts for the legs. The raycasts control a special joint that keeps the character floating above ground. You don't want to rely on regular collision detection and friction for character motion, mostly because you need more control than that, so the spheres are really only for contacts on the sides or above the character.

Since the character can stand on any object, the special joint is reinserted every frame, attaching to whatever the character happens to be standing on. The joint itself constrains the motion of the character relative the other object using the regular constraint solver. It has one target relative velocities and one maximum force to reach that target for each degree of freedom (x, y and rotation in 2D). To me, this is a very intuitive interface to a character controller, instead of using forces. For example, if we want the character to walk left we set the X target relative velocity to -2 m/s and the maximum force to however strong we want the character to be. If we were using explicit forces we would need to adapt the force to the resistance (which of course at the time we don't know). The Y component gets a relative velocity based on the current distance to the ground. If we're below our rest-distance then it should be slightly positive, otherwise slightly negative. The maximum force is however strong the legs should be.

These very useful, one-dimensional velocity constraints are something I have always missed from off-the-shelf physics engines. All physics engines use them internally, but for some reason they are never exposed through the public API (well, Meqon was the exception here of course :)

For jumping, I also use the character joint, but with a higher positive Y relative velocity. If the character is standing on a dynamic object it will automatically get a downward push through the constraint solver. While play testing I noticed that some players consistently press the jump button a few frames too late, falling off a cliff instead of jumping to the next. I'm not sure if this is due to input lag or something else, but I added a few frames of grace period, allowing mid-air jumping if ground contact was recently lost. I'm also recording the ground normal continuously in a small ring buffer, so when a jump occurs I can go back and look for the best (most upward) jump direction. I also implemented an early abort mode, so that if the player releases the jump button quickly the jump terminates, and the character returns to the ground more quickly. All these special cases are supposed to feel intuitive to the extent where they are not even recognized as features. They just make the game easier to play, but has no logical explanation and certainly do not yield more elegant code.