Tuesday, March 13, 2018

Hot reloading hardcoded parameters

Here is a trick that I cannot take any credit for, but that I finally implemented. I remember reading about it online several years ago, but I cannot find the reference again (it might have been on mollyrocket), so I'll write up the idea:

Everyone uses hardcoded parameters sometimes because it's fast and easy:

float someValue = 5.0f;

Once you have a parameter in the code, it's likely that you sooner or later want to tune that into some kind of sweet spot. With a hardcoded parameter the process often involves recompiling and restarting (unless you implemented code hot reloading, in which case it still involves recompiling) many times to try out different values. A popular approach is to add some form of config file to get rid of the recompile step. Config files can be hot reloaded to also get rid of the restart step, but config files require some extra work for each parameter. You need to name the parameter, and you need to add it to the config file.

The idea of parameter hot reloading is to use the code itself as config file. The file is already there, the context and naming is already there, the initial value is already there, and once you're done tweaking, the result is already in your code!

This can be done by wrapping the tweakable, hardcoded parameter in a macro:

float someValue = TWEAK(5.0f);

The macro expands to a function call that looks something like this, using the __FILE__ and __LINE__ built-ins:

float TweakValue(float v, const char* fileName, int lineNumber);

This function stores each tweakable value in a hash table using the file and line properties, so that it can be accessed quickly. The really sweet part is that since we know the file and line number we can periodically (say once a frame, or using some file modification event) check if the source file has changed, and when it changes, just parse out the new value. Note that this is rather trivial, since at this point we know exactly on what line to look for it and how to parse it, because it will be wrapped by a parenthesis right after the word TWEAK.

One limitation is that it only works for one tweakable parameter per line. It's probably possible to make it work for more than one, but that requires a lot more parsing. Note that this can be done for more than just floats. I've also added support for booleans, vectors and colors. The boolean especially can be useful to toggle between two implementations at run time:

if (TWEAK(true))

Needless to say, in production builds, the TWEAK macro is just bypassed, adding zero overhead. Pretty neat isn't it?

Friday, February 9, 2018

Header file dependencies

Ten years ago, I wrote my own C++ software framework and it was probably one of the best moves in my career as a software developer. It has been immensely useful for every little project I have done ever since, but adding bits and pieces and modifying it down the road has made the software quality slowly degrade. I'm half-way through a rewrite, not from scratch but a pretty serious overhaul. One thing I've spent a lot of time on is reducing header file dependencies to improve compile times. It is one of those strangely satisfying things that you can never really motivate to spend time on while in production. So far I've managed to cut the compile time in half (from 17 seconds for a full rebuild down to 9, so it really wasn't that bad before either), mostly by eliminating system headers.

A very accessible tool for this is actually GCC. Just add the -H flag and it will print out a hierarchical header dependency graph, including system headers. Using this I found out that the system header, required for standard placement new operator included over 50 system headers. I could get rid of it thanks to a tip I got on twitter, by just declaring my own placement new operator like this:

enum TMemType { MEMTYPE_DUMMY };
inline void* operator new(size_t, TMemType, void* ptr) { return ptr; }

Then I just use T_PNEW instead of new in all of my templated containers.

I've also moved over to private implementation as a standard practice. It's a bit clunky and requires an extra pointer dereference for every call, but it reduces dependencies a lot. Most of my headers now only include the "tconfig.h" with standard types, and sometimes containers. The only place I found this extra dereference to be unacceptable was the mutex, but I also didn't want to include system headers in my own headers, so the ugly but functional compromise I settled on was to reserve memory using a char mMutexStorage[T_MUTEX_SIZE] in the header and then do a placement new in the implementation file. The value of T_MUTEX_SIZE will be platform dependent, but can easily be compared to the actual mutex size at runtime to avoid memory corruption.

Finally, I must mention something that took me a long time as a developer to realize – return types can be forward declared even when returned by value. It makes sense if you know how they are passed on the stack, but I somehow always just assumed the compiler needed to know the size for it unless it was pointer or reference. It was maybe five years ago I learned this and all compilers I came across since then accept forward declaration for return types. That makes a huge difference, because you can do this:

class TString getSomeInformation();


class TVec3 getWorldPosition();

Or any type you like, without including the actual header for it. This means you can get away with pretty much zero includes as long as the implementation is private without compromising your API. I'm pretty sure this works for parameters as well, but since they are usually passed as const references anyway (mine are at least) it's not such a big deal.

The only exception I found was MSVC, that does not allow this for cast operators, but luckily it can be bypassed by forward declaring outside the class instead of inlining it in the method declaration.

Speaking of forward declarations, this is also why I don't use namespaces. I always make inline forward declarations like the ones above, never at the top. It would have been so convenient to make inline forward declarations of namespace members, like this:

class mylib::math::Vec3 getWorldPosition();

But there is no such thing and I hate those clunky forward namespace declarations at the top of the header, so I'll stick with prefixes for now.

Tuesday, January 2, 2018

Screen Space Path Tracing – Diffuse

The last few posts has been about my new screen space renderer. Apart from a few details I haven't really described how it works, so here we go. I split up the entire pipeline into diffuse and specular light. This post will focusing on diffuse light, which is the hard part.

My method is very similar to SSAO, but instead of doing a number of samples on the hemisphere at a fixed distance, I raymarch every sample against the depth buffer. Note that the depth buffer is not a regular, single value depth buffer, but each pixel contains front and back face depth for the first and second layer of geometry, as described in this post.

The increment for each step is not view dependant, but fixed in world space, otherwise shadows would move with the camera. I start with a small step and then increase the step exponentially until I reach a maximum distance, at which the ray is considered a miss. Needless to say, raymarching multiple samples for every pixel is very costly, and this is without a doubt the most time consuming part of the renderer. However, since light is usually fairly low frequency information, it can be done at lower resolution and upscaled using this technique. I was also surprised that the number of steps needed for each ray is also quite low, as long as the step size is exponential, starting with a small step and then increase the step size gradually. This will capture fine detail near creases while also preserving occlusion from bigger obstacles nearby.

In the case of a ray miss, I fetch light from a low resolution environment map and in case of a hit, I fetch light from the hit pixel reprojected from the previous frame. This creates a very crude approximation of global illumination since light is able to bounce between surfaces across multiple frames. This is also what enables me to light objects using emissive materials as shown in this video. Here is pseudo code for the light sampler:

function sampleLight(pixel, dir)
  stepSize = smallDistance
  pos = worldPos(pixel)
  for each step s
    pos += dir * stepSize
    if pos depth is occluded by first or second layer of pixel(pos) then
      return fraction of light from pixel(pos) in previous frame
    stepSize *= gamma (greater than one to increase step size)
  return fraction of light transfer from cubeMap[dir]

Like any path tracing, what comes out will contain a certain amount of noise. Even on a powerful machine and in half resolution, there isn't time for more than a handful samples per pixel, so noise reduction is a very important step. This is what it looks like without any noise reduction at sixteen samples per pixel, and each sample is marched in twelve steps. Click image to view high resolution.

After applying a temporal reprojection filter, I get rid of a lot of the noise. Note that the filter runs on the light buffer before it gets applied to the scene using object color, etc. The problem with temporal filters is of course that areas where information is missing will be very noticeable during motion. Therefore, this filter cannot be too aggressive. I keep it around 70% and I also compare depth values in order to avoid ghosting.

After the temporal filter, I also apply a spatial filter that uses object ID, or more specifically smoothing group ID, to not blur across different surfaces. Again, this filter cannot be too aggressive, or fine detail will get lost. I use a 7x7 pixel kernel in two passes.

There is still some low frequency noise present, but when it will get eaten by temporal anti-aliasing and the final image looks like this.