Saturday, December 22, 2012

Small Object Allocator


Prior to the latest update of Granny Smith, we ran into memory problems when loading levels. This was due to the unfortunate choice of using XML and TinyXML for level data. I'm generally very positive to both XMl as a data format (as opposed to most game developers) and TinyXML as a library. However, since we don't use instancing, a level can easily be 500 kb or more and TinyXMl will parse the entire file into a DOM structure before it can be read. This results in a myriad of small allocations ranging from a few bytes up to about 100 bytes. I'm not sure exactly what the per-allocation overhead is on iOS and Android, but I suppose it's at least 16 or 32 bytes. In addition the LUA library also makes a fair amount of small allocations.

So far I had not bothered using a custom allocator for anything since I usually try to avoid allocations alltogether. Strings, dynamic arrays, stream buffers, etc, all have a templated built-in storage, so you can reserve a specific amount when they are declared:

So instead of:

std::vector<QiVec3> tmpVerts;

I use:

QiArrayInplace<QiVec3, 128> tmpVerts;

Which means I reserve memory for 128 verts on the stack before any dynamic allocation happen. Anyway, I decided to try a small object allocator for the TinyXML and LUA allocations and my choice fell on the simplest possible solution - a fixed memory area for small allocations.

The idea is to reserve a fixed amount of memory at application startup, say 2 Mb, devide it into sections of different sizes. In Granny Smith I ended up using the following sizes: 8, 16, 32, 48, 64, 96 and 128 bytes, but they can of course be chosen to match the most common object sizes etc. The number of slots for each size is based on real statistics from the game. In my case, the 48 byte allocation was the most common, so almost half of the preallocated space is reserved for that size.

Now, for allocating out of this pool I use a packed bitfield for each allocation size to track which slots are in use. I also track the lowest free slot for each size. In the worst case scenario the whole bitfield needs to be searched. This is a linear operation, but in practice it is very cheap - one can search 32 slots per instruction (actually 64 nowadays) and it's a contiguous memory block, so very cache-friendly. I use 12.800 slots for 48 byte allocations, which translates to 400 32-bit integer comparisons all lined up in memory. This is the worst case scenario. On average I typically only need to search a few bytes. Deallocation is basically fo free. The pointer adress reveals if it's a small object allocation (if the pointer lies within the preallocated memory block) and also what size the allocation is (all allocations of the same size are grouped together). The actual deallocation is done by clearing the corresponding bit and potentially update the lowest free slot.

The most obvious win here is not speed, but low memory overhead. It uses only one bit of memory overhead per allocation (plus potential padding bytes up to the smallest matching size), so doing 1.000 allocations of 8 byte each will require exactly 8.125 bytes of memory. Performance is great, but there might be other allocators out there that are even better. Since it is all allocated out of a contiguous memory block it is cache friendly and does not cause any fragmentation.

It all sounds great, but there is of course one huge downside to it - when you run out of slots your're out. The allocator can only be made this efficient by using a single preallocated memory block, so you need to know the memory requirements at load time. However, when running out of slots one can just start to use the regular malloc instead, also for small objects. These will of course have the regular overhead, but at least it "fails" gracefully. I'm really happy with the results so far. Levels load very noticably faster and the overall memory usage went down. For games in particular, where the memory allocation pattern and total memory usage is known at compile time, I think this is a pretty good choice for general purpose small object allocator.

Tuesday, December 4, 2012

Programmers are people


Something that annoyed me for many years now is how the games industry (well, software development community in general) treat individual developers as "resources" in a project schedule. It doesn't just sound awful and is a direct insult to the profession, it's also plain wrong, stupid and counter-productive.

Yeah, I know what you're thinking.. but yes I would argue programming is a creative line of work, you are designing and building something. Do you think a car manufacturer talks about "designer resources" when coming up with a new model? Does a TV show producer have "screenplay writing resources" (well maybe they do, I don't have a clue). Of course it matters who designs and builds something. You can throw hundreds of bad composers on a soundtrack, but it still sounds horrible.

How many weeks does task X take to complete? The answer should always be the same - it depends on who does it. There are no man-hours, man-weeks or man-months in this industry. Building software is not like building a house. There are always tons of vastly different solutions to a given task. There might even be a tool out there that already solved the problem. Maybe the task itself is an effect of a previous design mistake?

When loading boxes of oranges a good worker might be twice as fast a bad worker. An awesome worker might be three times faster, but that's really pushing it. In software a highly motivated, awesome programmer can easily replace a whole room full of mediocre (not Mediocre!) programmers and still produce way better results, fewer errors, more maintainable code at a fraction of the cost.

Finding such talented programmers is of course very hard, but I wouldn't say they are quite as rare as most people think - keeping an awesome programmer motivated is the bigger challenge. The magic only happens when a great programmer is given freedom and responsibility. Unfortunately very few people seem to have realized this - the typical scenario being a big company headhunting some of the best programmers in the world and pay them tons of money to sit in a cubicle and munch dull scrum tasks off a backlog. Not so awesome anymore? Seriously, what did you expect? If you treat great programmers as generic programming resources they will become just that.