Friday, July 6, 2012

Rod Marching

Fractal 3D fly-throughs are just perfect thought experiments.

Have you seen how beautiful they look?  Take the mandelbox (left) for example… boxplorer renders it at 1.2 fps on my creaky laptop with integrated Intel GPU.

These days these are drawn on the GPU using a pixel shader.  The classic algorithm is ray marching.

Normal ray casting works by casting a ray for each screen pixel and testing all objects to see which it strikes first.

Ray marching, on the other hand, walks a ray forward towards the scene.  Each step, it sees the closest distance to any part of the scene.  It then moves forward that much.  A picture (src) makes it much clearer:

This would be fairly slow on the PC.  The benefit is, however, that if the scene can be expressed formulaically you can do this in a GPU’s pixel (“fragment”) shader!

The general issue, however, is that you make a variable number of marching steps to reach close enough to the end object to stop.  As GPU fragment shaders operate in lock-step in batches, some embedded pixels being mixed in the same block as exposed pixels will cause the exposed pixel shaders to wait for the deep ones.

Of course, I had an idea…

I made a test scene (a couple of spheres) and rendered it on the CPU at 800x800 and specified a maximum of 20 iterations on each march.

My Core2 Duo T9400  @ 2.53GHz CPU can (on one core) do this in 150 milliseconds (6.6 fps) and it looks like this (white is just one march away):

That’s coloured by number of march steps; the white is just one march step away.

And here it is coloured by z:

(The black between the two spheres is where the ray passing close to the first sphere has used up all the steps and those pixels have fallen foul of the max-iterations counter (set to 20 steps).)

Think back to how ray marching works; for each pixel, we walk forward.

Early on, pixels that are adjacent are going to make much the same steps…

On the GPU, a pixel knows nothing about the marches of its neighbours.

But on the CPU, that’s an artificial restriction.

Imagine splitting the scene into some number of tiles.  These tiles cover many pixels; they are like rays with width.  I call them rods.  And in each tile, walk forward until you are closer to the scene than the radius of the rod.  At this point, divide the rod into four child rods - one in each quadrant - and recurse.

In this way, you trivially share early marches and dramatically reduce the number of computed positions.  On the test scene with a couple of spheres, its two orders of magnitude less.

For testing, I used 256 starting rods (each 50x50, so 16x16 divisions).  (I did not experiment to determine the right trade-off for this scene.)  In each of those rods, I marched forward and split as appropriately.  It takes under 10 milliseconds (100 fps+) and looks like this:

Its much darker and blockier because its coloured by march steps.

Here it is coloured by z instead:

Its a fuzzy approximation of the per-pixel one, but its a fast approximation.

This is excellent input to a GPU renderer.  Give this distance map (for each pixel, two floats; the last point in the march and the distance to impact) to a GPU fragment shader to continue with some small max-iteration setting!

This is using the CPU and GPU in synergy!

The CPU task is embarrassingly parallel and can be divided between cores (even laptops have several cores now).  Because we are subdividing four children, you can imagine some fancy SSE doing the children in parallel too.

You can imagine an auto-throttling that profiles and adjusts the CPU max-iterations and size and the GPU max-iterations to get best frame-rate; adapting even as the viewpoint changes.

UPDATE: this idea keeps on reappearing, and has been called cone marching; I’d doubtless have called my it cone marching too once I’d added perspective.


  1. c0d3 reblogged this from williamedwardscoder
  2. williamedwardscoder posted this

 ↓ click the "share" button below!