Wednesday, October 13, 2010

Schedulicious

I've run into a pretty interesting/annoying problem while working on parallization of the physics pipeline.

I started off with a pretty standard thread pool, but proper physics requires quite a bit of synchronization. Especially in the solver, you need several full barriers to coordinate the threads. This is not so bad when you run thousands of objects, because the overhead of the synchronization is dwarfed by the massive simulation, but what would be more impressive is to go from say 3 ms per frame down to 1 ms.

Ideally you want your physics engine, rendering pipeline, AI, etc, all data parallel and run them sequentially every frame. The more common scenario of running physics on a separate thread and rendering on a separate thread introduces two extra frames of lag, which combined with double or triple buffering on the graphics card, plus another frame delay in the display (at least some modern TV sets do this) gives you very sluggish reponse. A secondary benefit from data-parallelism is the better cache utilization. It's more efficient to process all physics at once, and keep it in the L2, rather than doing all subsystems simultaneously, and have them trash the cache for each other along the way. Anyway, that's why I want the physics pipe to be super freakin data-parallel with a good speedup even for moderate scenes.

My problem with the standard synchronization primitives is not excessive locking. All tasks are pretty independent and data is locked very sparingly. My problem is the barrier sync, and especially getting the worker threads to start processing again after the sync. I use a standard semaphore which I release when there is a new job coming in. The worker threads then start munching jobs off a queue in any order they like. When there are no more jobs, they go to sleep again on the semaphore. Now, doing a barrier sync involves putting all worker threads to sleep, since I have to wait for the last job to finish, then I immediately feed in a new job and start processing again.

When a worker thread hits the semaphore, Windows first goes into kernel mode, then realizes that the thread should go to sleep, starts looking for other threads to run, and in most cases there won't be any, so it switches to the idle process, clocks down the core frequency, takes out a good book and starts preparing a cup of coffee. Then, one microsecond later (who knew!), the main thread releases the semaphore and the whole process is reversed. Well, at least this is what I think it does, but I'm no kernel expert...

Now, since I know exacly what needs to be computed I can tell from the beginning when this is going to happen. Which is every barrier sync, every iteration, every frame. That seems somewhat unnecessary. So I started ripping out synchronization code from the thread pool and replacing it with interlocked functions and spinlocks, having the workers spin until there is more work, but only during the simulation step, and then put them properly back to sleep when we're done with physics.

This new strategy works perfectly and gives a really nice speedup on small scenes. The one annoying problem is that if there is anything, really anything else running on the machine, the Windows scheduler gives my process a mad penalty. Even if they only spin for a small fraction of a frame. For some reason it chooses to swap out the entire process for several time quanta, leaving me with a 60 ms hickup every second or so. No cookie for Windows scheduler! Two questions:

A) Why does this bother Windows? Even on very small simulations where all workers sleep most of the frame this happens. That's right, most of my threads sleep on a semaphore most of the time, but I still get a penalty.

B) How the heck does Windows even know that I'm using a spinlock? Does modern hardware have some sort of spinlock detection layer? I can't imagine the scheduler will be so severely bothered by any process that happens run four threads simultaneously, especially not for such a short time.

Seriously, what's going on? The problem goes away if I do a Sleep(0) in the spinlock, but that kind of defeats the purpose. Even though Sleep(0) doesn't switch to the idle process, it's still a rather costly operation, and an open invitation to the OS to go do something else, which is exactly what I'm trying to prevent in the first place! Sigh.


Friday, August 20, 2010

Milkmans mistake

After two posts full of boring text and graphs I feel like I have to give away a crowd pleaser. Again, not a realtime simulation. About 120.000 particles, running at almost one FPS.

Wednesday, August 18, 2010

The life of an impulse

For a long time I've wondered how impulses develop inside a rigid body solver. Mainly for trying to come up with a prediction of where the final magnitude would be as early as possible. I dumped the data for all impulses out to a CSV file and graphed it for a couple of simple test scenarios. Let's start with a small test case, a box pyramid of two layers (three boxes). All dirty tricks are switched off, so this is a plain vanilla solver. The Y axis is impulse magnitude for the normal impulses and the X axis is iterations.



At a first glance, there seems to be no suprises here. Impulses form asymptotic curves towards their solution and there is not a lot of action after about twelve iterations. But look again, especially at the bright yellow curve, that seems to steadily decrease and form a non-horizontal asymptote, and what about the deviation around iteration eight and nine? Let's try a more complex exmple, with a box pyramid of five rows (15 boxes). There is around 90 contacts, and it would be too messy to show them all in the same graph, so I chose ten of them from the middle of the stack:



This is quite interesting - most of the curves are asymtotic to a non-horizontal line. Even at 64 iterations they have not found a stable configuration. Most interesting is maybe contact 45, which first goes up and then steadily decreases, while most othe contacts increase. Somehow, the solver is shifting around the weight. Maybe just because it can. Another graph we could study is the relative velocity along the normal direction (v from my previous post) and how it changes while iterating:




This graph is from the same scenario, but it's not from the same run, so contact numbering might not match exactly. Interesting here is that all curves steadily decrease after just a few iterations and seem asymptotic to a horisontal line, so our odd graph fellow above doesn't actually mean that the velocity is shifting similarly. Most likely it is an effect of an overdetermined system (with four co-planar contact points, we can shift the impulses around a bit and still get the same result. Not that the solver really has a good reason to do this, but it seems to be doing it anyway).

I also tried a variant of the solver that doesn't clamp accumulated normal impulses. Hence, the delta impulse is just clamped to be positive at each iteration. This prevents the accumulated impulse to ever have a negative inclination:



This is more what I expected. Each impulse is asymptotic to a horizontal line. Now if there only was a way to predict that asymptote and use it as initial guess, we would end up with a really stiff solver without having to keep track of contact points from the previous frame. I'm personally quite sceptical that it can be done from just looking at the curves, but it might be worth a shot. What i find most interesting is how good visual results you can get from just terminating the whole proces after three or four iterations, considering how far off the curves are.


Sunday, August 1, 2010

Explaining the rigid body solver

Following my last post about scientific papers not being written for engineers I will do an attempt at explaining a rigid body solver without equations:

Even though a rigid body scene may consists of hundreds of objects and thousands of contact points, a popular way to solve the problem is to solve each contact point in sequential order, one at a time. It sounds kind of lame, and compared to other methods it is, but if iterated a couple of times it gives really good results, and this is what most games are actually using, so let's focus on solving one contact without friction first:

So you have two objects and one contact point with a contact normal. Start by computing the velocity at the contact point for both objects and compute the difference between those vectors. Project that difference onto the contact normal (dot product). This is the contact's relative velocity along the contact normal and it indicates how much the objects are moving towards each other or away from each other at the contact point. Let's call this velocity v. If v is positive, the objects are moving away from each other and we're done. If v is negative we need to proceed onto computing and applying an impulse.

This is the key computation in the solver. The ultimate question we want to answer is - how big of an impulse do we need to apply to make the objects stop moving towards each other? The direction of the impulse is going to be the contact normal (since there is no friction yet), so we're only looking for the magnitude - a scalar quantity. The velocity v that we just computed is also a scalar quantity. The impulse magnitude will be proportinonal to v - the more the objects are approaching each other, the bigger impulse we need to apply. Easy! So what we're really looking for is the correlation between those two (another scalar quantity).

Let's assume we're applying an impulse of magnitude 1.0 (unit impulse) and see how big of a velocity change that would cause. The beauty of linear systems is that everything scales.. well.. linearly, so if we know how much of a velocity change a unit impulse causes we can just compare it to the desired velocity change, v, and adjust it accordingly. Say a unit impulse would cause a velocity change of 0.25 and v is -0.5, then we just apply two unit impulses (magnitude 2.0) and we'll reach our target relative normal velocity (zero). So make a copy of the linear and angular velocities for the two objects, apply an equal but opposite unit impulse and measure the relative contact velocity again following the exact same procedure as described above. Now that you know how much velocity change a unit impulse causes, the final computation boils down to a simple division. That's it folks. Apply the impulse you just computed on both objects in opposite directions and they will not move towards each other any more at the contact point.

Friction can be handled in the exact same way as described above, but in the direction of tangential relative velocity at the contact point. Friction impulses need to be capped to the magnitude of the normal impulse scaled by the friction coefficient, so that if the normal impulse is 2.0 and the friction coefficient is 0.9 your maximum friction impulse is 1.8. After you compute the friction impulse magnitude using above method, if it's more than 1.8 just apply 1.8.

Now to make the solver stable you need to go over all contacts several times, and there is also a method called "accumulated impulses" that improves accuracy a lot which is not covered here, but most importantly, you need to compensate for penetration. This is usually done in a very pragmatic way - if the objects are in penetration, adjust the target relative contact velocity so that it is not zero, but slightly positive (the more penetration the more positive). This means that after we're done solving, the objects will move slightly away from each other along the contact normal instead of not moving at all.

Friday, July 30, 2010

Academic frustration and running liquids

I experimented with fluids about five years ago, at Meqon. Back then I didn't really have a clue about fluid dynamics. I guess it's fair to say I still don't have a clue about fluid dynamics, but I did read a few papers lately on SPH.

It's kind of remarkable really, how complicated all those papers are written just to land at something that is quite simple. Sometimes I get the feeling that the research community are intentionally making scientific papers look more complicated to boost their egos. I wish there were research papers written for engineers, not other researchers, where you could read stuff like "And when this little fellow gets squished, it kind of pushes outwards on all the neighboring particles, proportional to their respective distance and density."

I don't like formulas. I can read them, but it's very unintuitive for me, and probably many other programmers. I tend to believe that 3D computer graphics programmers are especially prone to finding formulas unintuitive since we are trained to visualize complex things in our minds. It's easier to visualize a description in plain english than a mathematical formula. For me at least. Maybe I just suck at math :)

A good engineering resource for physics is Danny Chapman's rowlhouse. There is source code to his physics project, jiglib, including SPH simulation and also his comments and ideas.

Back in 2005 I couldn't simulate running liquids, but only goo and clay-like fluids. I realize now that it was related to my urge to express everything through pairwise interactions. One would believe that a particle-based fluid is just a bunch of point masses interacting, but in SPH, a particle represents a sample of the implicit field, not just a point mass.

Anyway, so I implemented fluids in my solver, first as plain SPH, but I found it difficult to tweak, quite unstable and it doesn't interact well with rigid bodies, so I was searching for a semi-implicit formulation instead. It was actually quite tricky to come up with the right formulation, but eventually I think I nailed it. It looks good, it's unconditionally stable and it interacts naturally with rigid bodies without any hacks. Since it's semi-implicit it does get squishy at low iteration counts, but hey - fluids are no more incompressible than rigid bodies, right?






Oh, and a note on performance - no they are NOT real-time simulations. Quite far from it actually. The second one runs at maybe one fps. Surface generation takes a good amount of time, but the number one bottleneck is actually neighbor finding. I haven't looked into it that much, since it's madness doing this on a CPU nowadays anyway, but it's kind of frustrating that something sounding so simple is the most time consuming part. Finding the neighboring particles is usually three to four times as expensive as running the solver. What's even more frustrating is that it's probably mostly due to cache misses, so laying out the whole arsenal of SIMD tools is not going to help even the tiniest bit... the whole pipeline should scale well with cores though, so could someone please get that Larrabee going again?


Saturday, June 5, 2010

Practicing quadrupel windsor



Starting to get soft body stacking and interaction in place. Especially the friction between two soft bodies is a little tricky. This is not a fake rigid body interpolation, but a true particle representation, which can compress to a single point and then return to it's original shape. It's quite expensive to be honest, since it takes more than just the regular verlet loop to get this kind of stability, but demo runs at 60 FPS.

Thursday, May 27, 2010

Mesh collisions

This is usually where physics gets messy. It's all fun and games until someone suggests that maybe all objects are not perfectly convex, such as... the game level? There is convex decomposition, yes, but I'm not totally convinced that's a silver bullet. Convex decomposition is awfully complicated, and requires heavy preprocessing. It's definitely a good idea in many cases, but there will always be raw triangles.

Convex decomposition or not, you need some sort of mid-phase, finding which triangles/primitives collide with a specific object. A lot of work has been put into this, and I think most people today agree that a quantized, binary AABB tree is the ideal solution.

What's interesting here is how people usually query these AABB trees. The output from the broad phase is a list of pairs with overlapping bounding volumes. These pairs are then processed one by one, and in the case of a triangle mesh, the mid-phase is engaged to find the relevant triangles/primitives. After that, the near phase finds the actual contacts.

What I would like to suggest is to query the whole dynamic scene against the AABB tree all at once. That is, instead of colliding the AABB tree with a single object (single AABB) multiple times, you collide it with another AABB tree, representing all moving objects. This is especially relevant if combined with a Dynamic Bounding Volume Tree broad phase, as suggested by Erwin Coumans. In this case, all moving objects are already in a dynamic AABB tree of their own. Objects tend to appear in clusters, so objects close together also tend to collide with the same triangles. Doing the mid-phase this way saves you from drilling down the compressed AABB tree multiple times, which gives an instant performance gain. The tree/tree traversal is a bit of a mind job at first, but the Bullet DBVT implementation is a really good reference.

As usual, I'm really lazy and not doing my side-by-side comparisons, but it might show up later.

Monday, May 10, 2010

Movie time

It was a long time since I posted a movie, so I thought I'd record a little snippet of what I'm working on at the moment - mesh collisions. Without the fancy SSAO renderer it runs in 60 FPS. Stay tunes for details!


Tuesday, May 4, 2010

STL

I really like the idea of using standardized versions of simple data structures. Small snippets of code rewritten over and over again by thousands of people. I bet there are a handfull of programmers world-wide at this moment implementing a dynamic array or a hash table. As much as I like the idea, I hate the only standard out there - STL.

There are several problems:

1) The interface is horribly over-engineered and just plain sucks. Even the simplest of simple operations turns out as three or for lines of code. Really long ones. Say I want to remove an element from a vector. Instead of just offering a dead simple myVector.erase(element) I have to bother with iterators and find functions (of course defined in another include file), ending up with this:

std::vector::iterator it = find(myVector.begin(), myVector.end(), element);
if (it != vec.end())
vec.erase(it);

This does only remove the first occurance of "element", which is understandable. If you want to remove all of them you need to use a whole different method... Another option is to iterate through the vector myself and remove the desired elements, basically implementing the erase algorithm on the spot, which completely takes away the benefit of using a standardized vector in the first place. Just as a bonus, you need to take special care of what happens to the iterator as you erase the element it's pointing at. Sigh.

2) It is not minimal, nor consequent. There are heaps of ways to do everything, and none of them seems more commonly used than any other, and the naming of operations vary between classes. Removing elements is a good example, which can be done with member functions pop_back, erase or remove (available on lists but not vectors). There are also the global algorithms remove and remove_if, which actually does not really remove anything but merely scrambles elements around a bit for easier removal later. Confusing aye?

3) Because of its severe complexity it is not consistent across platforms. Hence, porting your code to another platform involves small tweaks. There are projects like stlport that makes life easier, but it kind of takes away the purpose of a standard if there is actually only one implementation that is portable.

4) Debugging STL code makes you suicidal. You know exactly what I mean.

5) It claims to be designed for performance, but only considers academic-style algorithmic complexity and not real-world performance which is often based more on memory allocations, locking and memory access patterns. Some implementations use pool allocators to speed things up, but it's still allocations. I personally use my own data structures which includes an initial template-specified storage as part of the container and only allocate memory if it grows beyond that initial size. I suppose STL could be configured to do similar things by a template wizard, but I wouldn't want to be the guy maintaining that mess.



Wednesday, April 14, 2010

Visualizing complex algorithms

I have had this idea for a long time about some tool to follow the steps of a complicated algorithm in some way other than stepping through it in the debugger, trying to mentally visualize what's going and frenetically scribbling unrecognizable geometrical figures on envelopes and old receipts.

A simple visualization sandbox is useful for visualizing the result, but it's really hard to visualize intermediate steps. GJK is a good example of a complex, iterative algorithm that has a visual representation at every step. A typical demo application would visualize the resulting closest points, normal, etc, but not the internal steps of the algorithm itself, so if something goes wrong, you're screwed. The typical way to deal with it is to write parts of the algorithm in an isolated visualization environment and then lift them over to the real implementation when they are "tested". However, for visualization code to be useful, it needs to stay in the actual implementation so when real-world problems start coming in, they can be tracked down quickly.

To do this properly, you would need to step the algorithm from the outside - that is, make an extra API to control it and poll the relevant information from the outside. Such an API would likely be substantially larger than the original API itself, and it's definitely not something you would want to ship.

Another approach would be to add visualization code to the algorithm implementation, but this does not tie in very nicely with modern rendering engines. Say you want to use OpenGL for rendering. It would be convenient to draw lines, points, text, etc and then wait for some input event before going any further. However, the way OGL (or D3D) works, you fill a back buffer and won't see it until the entire frame is drawn. There might be a way to go around this, but it would be hard to integrate into any other application. I was for a moment considering running the rendering loop in a separate thread and feed the visualization thread with commands from the algorithm thread and also include a pause command, that would block until the user wants to step further.

The solution I finally settled on was to use an event stream that is filled in by the algorithm implementation using macros and a visualization framework for rendering the stream. Hence, the algorithm runs all the way from beginning to end, recording visualization events along the way. There is a pause event which indicates that the visualization framework should wait for user input.

I also needed something for removing graphics that was previously added. Say you have the normal of a plane, and the plane is changing every iteration. One way to do that would be to start referencing all the events and add explicit remove events, but that would quickly get quite messy. Instead, I added push and pop commands, so a subset of the graphics can be visualized and then removed before going any further. My visualization app can walk both forwards and backwards in the recorded event stream, which makes it easy to follow what's going on. It turned out to be quite useful and it doesn't bloat the code too much, typically something like this:

DEBUG_PRINT("Computing stuff");
for(int i=0; i<10; i++)
{
...
computePoint(p);
computeNormal(n);
DEBUG_PUSH();
DEBUG_PRINT("Point and normal at iteration " + i);
DEBUG_POINT(p);
DEBUG_LINE(p, p+n);
DEBUG_PAUSE();
DEBUG_POP();
}

This has proven to be really useful for debugging GJK, EPA, MPR, etc. I use automatic unit testing on distance queries, save the state for failing configurations and load them up in the debugger to manually inspect what went wrong. It gives a pretty clear view on various precision problems, termination critera, etc that would be very hard to get an intuitive understanding of otherwise.


My visualization app in the middle of a penetration depth calculation.

The downside of my approach is that you can't combine it with stepping through a regular debugger. It would be quite nifty to see the graphical representation updating while stepping through the code in the debugger. I guess I could pass the event stream over a network socket to a different process, but that sounds just a tad too ambitious for my needs at the moment :)

Tuesday, March 30, 2010

Thoughts on shape casting

I implemented a different type of shape casting yesterday. The most common way, to my knowledge at least, is to do it with conservative advancement:

1) Run the GJK algorithm to find closest point to the origin in configuration space
2) Use the distance and angle to compute a safe estimate to move (cast) the configuration space object towards the origin, in the cast direction.
3) Repeat until below a certain threshold, or the direction to the origin is pointing away from the cast direction (in which case it is a miss).

Even though this is an iterative method, relying of GJK, which in itself is iterative, this is actually pretty fast, because it converges quickly, like two or three iterations and you can benefit from warmstarting GJK with the simplex from the previous iteration.

However, there is a completely different method, which I thought I came up with, but after googling it for a while it seems Jay Stelly has mentioned similar methods before across the Bullet and MollyRocket forums. The method uses MPR on a swept CSO. The idea is to sweep the CSO in the direction of the cast far enough to enclose the origin, and then raycast from the origin to find the distance to the surface.

1) Construct the swept CSO. This may sound complicated, but since support functions are additive it's as easy as adding a vector to the existing support function. The resulting shape will be a swept version of the original CSO. The direction of the vector should be the sweep direction, and the length should be "long enough" to pass the origin. As showed later on, the distance here doesn't really matter much, so just keep it reasonable to avoid numerical issues. I use the distance from the geometric center of the CSO to the origin.

2) Use MPR, or XenoCollide, to find the distance from the origin to the surface of the swept CSO, in the sweep direction. If the origin is not inside the CSO, we know there is a miss.

3) Subtract the distance found in step 2) from the distance computed in step 1).

4) That's it! The difference in step 3 is the final sweep distance. Imagine a line going from the original CSO through the origin to the point computed using MPR in step 2). The line will have the same direction as the sweep and the origin will divide it into two parts, the first part, from the original CSO to the origin, and the second part from the origin to the surface of the swept CSO. You know the total length of the line (determined in step 1) and you know the length of the second part of the line (computed in step 3), so therefore the first part must be the difference. I know, I know, a picture would be ideal, but I'm lazy. Bug me about it and I might post one :)

Although I haven't done any side-by-side comparisons yet, it seems to me much more efficient to do this second approach, since it is much simpler, less prone to numerical precision issues and (yet to be proved) faster. So if this is a known method, how come everybody is still talking about conservative advancement?

[EDIT]

I quickly profiled the new method against the old one (conservative advancement). Each session is 10 000 sweep queries with different angles, computing closest points, sweep distance and contact normal.


Box - Box (old)
Min 10 ms
Avg 16 ms
Max 30 ms

Box - Box (new)
Min 11 ms
Avg 14 ms
Max 16 ms

Cylinder - Cylinder (old)
Min 14 ms
Avg 40 ms
Max 99 ms

Cylinder - Cylinder (new)
Min 20 ms
Avg 35 ms
Max 53 ms

Polyhedron - Polyhedron 128 verts (old)
Min 99 ms
Avg 130 ms
Max 252 ms

Polyhedron - Polyhedron 128 verts (new)
Min 98 ms
Avg 130 ms
Max 185 ms

It turns out that conservative advancement is slightly faster in best case, slightly slower on average, but much slower in worst case, which makes sense, since it is doubly iterative. I have not noticed any differences in robustness so far, but I haven't stressed it either.


Sunday, March 21, 2010

Greetings from the inside!

So after spending a few weeks outside the CSO, I got around to step inside and experiment with a number of different penetration distance algorithms this week. However, I failed to come up with an alteriative to EPA that can actually guarantee a global minimum separation. These are the ones I tried:

1) Use MPR to raycast the inside of the CSO in a direction estimated by the geometric center of the CSO, use the resulting normal and iterate until convergence. This works in 95% of all cases and is really fast. The downside is that there is no way to guarantee or measure if this really is the global minimum.

2) Same method as in 1) but use multiple search directions. I was for a while under the impression that two local minima had to be at least 90 degrees apart, so casting six rays should be safe (of which three can be optimized away). This works in 99% of all cases, but there is still that annoying one percent where it finds a local minima.

3) Same method as in 1) but trace the edges of the last portal. After the initial surface point has been found, tilt the search direction in the direction of each edge, so that it falls outside the last portal, choose the best direction and iterate until convergence. This way we can sample the neighboring triangles and search those directions. Again, there is a 99% success rate. Unfortunately in some cases all neighboring triangles will lean towards the initial triangle, so we will just end up at the same local minima again.

4) Broken EPA. I tried using EPA without tracking connectivity and without keeping the polytope convex. Simply a triangle soup, where the closest triangle is split and valid triangles (the ones that are portals) are kept. It doesn't work.

5) Full EPA. I used the one found in Bullet as inspiration and it works really well in all cases. I'm really impressed by the Bullet implementation by Nathanael Presson. EPA is an awfully complicated algorithm to implement, but it seems very stable and quite fast.

So all this fuzz to come up with the conclusion that EPA is the way to go? I'm still contemplating wether there is a way to combine MPR and EPA to make an early out and just find the surface using MPR. It might not be possible.

I'm still not using EPA for all penetrations. In most cases using MPR with the last known separation axis works excellent, even just sampling and projecting support mappings on the last axis works really well. Physics is surprisingly forgiving about something, for once.

Tuesday, March 16, 2010

Trapped in configuration space

I've spent the last couple of weeks tinkering with GJK and ended up rewriting it from scratch. Ever since I saw Casey's video about SAT-GJK a couple of years ago I've wanted to try it, so I finally did. Now Casey's video only deals with binary overlap status, not closest points, but it was relatively easily to extend his approach and even do a linear sweep operation. I especially like how his method does not include any divisions or square roots, which makes it very robust. I must add though that it's not straightforward to get a robust implementation that works on both polyhedra and quadric objects. Especially the termination criteria needed a lot of attention.

I knew from the beginning that decent visualization tools would be required to pull off a new GJK implementation, and it's certainly not trivial to visualize all the information needed. There are the two objects, the configuration space object, the simplex which comes in four flavors, normals, closest points, etc, etc. Here's a screenshot from my test bed in the middle of a GJK distance calculation. The algorithm can be stepped and visualized at both the add and the reduce step of each iteration:


On the left is the configuration space object, the red triangle is current simplex, with closest point and normal. The corresponding object space closest points are also visualized on the two objects. I could have spent even more time on visualization, and I might just do that, because next I want to try out a couple of ideas for penetration depth computation.

The new implementation is both more robust and higher performance that the old one from Meqon one, which uses Johnson's distance algorithm from the original paper. I'm not sure how it compares to the Bullet implementation with geometric distance algorithm. Part of the performance increase is due to a more efficient caching scheme. To old version only supported an initial 1-simplex, which was the previous frame's closest points, while the new uses the whole previous simplex, which is a bit more complicated, since it can deform badly due to motion, but it really pays off, and most of the times it will terminate immediately after the first iteration.

Here are some performance comparisons:

Scene with 16-vert polyhedra
Old: 22.5 ms
New: 17.1 ms
New+caching: 9.3 ms

Scene with cylinders
Old: 2.6 ms
New: 2.5 ms
New+caching: 1.7 ms

Scene with boxes
Old: 10.7 ms
New: 8.8 ms
New+caching: 6.3 ms

Scene with mixed objects
Old: 19.3 ms
New: 13.9 ms
New+caching: 6.9 ms

So, depending on the scenario, and how much motion there is, the new version ranges from slightly faster to almost three times faster. More importantly though, is that the new version is more numerically robust.

I find it quite fascinating how Casey's approach with caching is very similar to the old Lin-Canny closest feature algorithm. Both methods are based on voronoi regions, but Casey's operates in configuration space, while Lin-Canny uses object space. Casey's approach uses implicit voronoi regions determined by support functions, while Lin-Canny uses explicit pre-computed voronoi regions, but the underlying principle, that the closest distance is always found within the voronoi regions of a particular feature pair is the same.

What currently keeps me awake at night is if there might be an implicit configuration space counterpart to V-Clip, which would then work for penetrating objects as well? The only implicit penetration depth method I'm aware of that has been proven to work is EPA, but it's so awkward I'm not even going to try implementing it. Intuitively I feel like there must be a simpler way to compute exact penetration depth that doesn't involve dynamic data structures.

Tuesday, February 9, 2010

Oh how hard can it be

Funny, I'm falling into my own how-hard-can-it-be convex hull trap I described just three posts ago! It turns out my idea for a support mapping-based convex hull generator isn't that great after all. It works perfectly in 2D, but in 3D it turns out the surface can get concave during the process, which of course will mess up the whole thing, so just forget everything I ever said about convex hull now would you (except perhaps the part about it always being harder than you think)? Thanks. Back to the drawing board. There must be some way to compute convex hulls using only the support direction..