Friday, May 2, 2014

Making Image Mosaics

imageAs followers of the blog know (thx for the mails!), I’ve been making mosaics of Ludum Dare contest entries.  It started basic, and its got better.  Each of the ‘pixels’ in the Mona Lisa is actually a screenshot of a game!  Here’s how:

We have an target image, for example the Mona Lisa, and we have some large number of small input images which we want to arrange so that, when squinted at from a distance, it approximates the target image.

Tiling

The first step is to determine the tiling of the input images.  I wanted all images to be a uniform size.  But the small input images are a variety of sizes.  So I averaged their aspect ratio.  Aspect ratio is width / height, and is a floating point number (so be careful of integer / integer which usually gives integer!)

Some images will be distorted more than others when they are scaled to the output tile size, but so be it.  Packing non-uniform-sized images is non-trivial - we can’t even decide how best to stack oranges ;)

 Its then to decide an arbitrary width for the input image; the height will be derived from the average aspect ratio.  Lets call this thumb_width and thumb_height.

So we have some number of input images, and we want to work out how many rows and how many columns best fit.  The simple way is to start with 1 row and 1 column, and add a row or column - whichever is closest to the aspect ratio of the output image - until rows x columns is more than the number of input images:


target_aspect_ratio = target_width / target_height
rows = cols = 1
aspect = thumb_width / thumb_height
while rows * cols < num_images:
    add_col_aspect = ((cols + 1) * thumb_width) / (num_images / (cols + 1))
    if abs(add_col_aspect - target_aspect_ratio) < abs(aspect - target_aspect_ratio):
        cols += 1
        aspect = add_col_aspect
    else:
        rows += 1
    

(When working in Python, it’s important to ensure you get floats from those divides; in the real code, I wrap the lhs with float())

So now we know we have a grid of cols*rows tiles to place.  There may be more grid spots than source images, and you can decide what to do with those extra tiles.  You may add finesse by choosing to add the very last row or column based on which leaves the least empty tiles, although I chose aspect ratio over minimizing empties.

And you have some options with how to present those empty tiles in the final output image.  I decided to make them black, and not to account for this blackness in the image matching described below.  You may decide to copy in the original target image into those empty tiles, or whatever.

Matching / Scoring / Fitness

Now we know the source image grid we have to determine how well each input image matches each tile in the target image.

The simplest approach is Mean Square Error (MSE).  To compute the MSE for a thumb, you scale target image and the thumb and then compute the squared Euclidean distance of each corresponding pixel. You then average these squared distances to get the mean, but if you think about it there’s a fixed number of pixels in this grid we are using so we can skip the Mean -part of the calculation.

I did this on a ‘patch’ scale, so a few hundred samples per thumb per tile.  I didn’t really experiment nor reason about how many samples per thumb per tile would be optimal.  The number of samples per thumb per tile has a direct linear outcome on the time it takes to compute all the similarities, so you can trade speed for accuracy here.

There are better approaches to image similarity, such as PSNR and SSIM.  SSIM tries to capture the structural similarity too, which sounds like a good thing.  A simple MSE doesn’t quite capture contour matching.  You could also investigate merging scores from edge detection into the fitness metric e.g. mix in the MSE of the Sorbel or Canny filter of the image too.

You also have to pick the colour space.  I chose RGB.  This may not be a very good choice.  There are probably better colour spaces to use that more closely approximate how humans perceive colours.

This is really not practicing what I preach; I’ve been guilty of reading Charles Blooms blog, and commenting on StackOverflow about image matching metrics, and worked a lot with video transcoding and stuff, and yet when making my quick and dirty script I took the quick and dirty approaches.

The actual computation of the match for each thumb with each tile can take a considerable time.  Its embarrassingly parallel, but I did not invest the time in making Python parallelize it.  I could have speeded up the actual computation using numpy, which has a MSE function too, and I could have tried to use multiprocessing to do things in parallel (although pushing bytes to each worker is significant overhead and work would need to be invested in ensuring the underlying bitmaps were shared).  I just made it so that the MSE results were serialized (and gzipped; the cartesian product of all thumbs and all tiles is quite a lot of data) so that the scores could be re-used as I tweaked the placement algorithm.

Placement

Once you have some score for how similar each thumb is with each tile, you have to decide how to place them!

The simplest approach is ‘greedy’.  That is, you place the closest-matching thumb on its best tile.  And then the next and the next.

imageOne way to implement this is to put each match score, tile and thumb into a list and sort it so best score is first.  Then you iterate through this list, and if the tile is still unoccupied and the thumb still unplaced, you place it.  Even in Python, it takes just a few seconds.  Here’s the result on the right →

Why all those white blocks?  Well, in the mona lisa there isn’t any very white tiles, but we have lots of very white thumbs to place.  There are patches of skin in the source image that are white-ish, but we also have some skin-coloured tiles that are much closer matches and which we’ve already placed.

When placing greedily, we quickly place the very closest matches.  This leaves us with the increasingly poorly matched tiles and thumbs.  The end result is that the last-placed tiles are the worst possible matches!

After the first Ludum Dare contest, I left it here.  But when I run the mosaic script on the newest contest thumbnails, I was unimpressed and itched to fix this.

imageWhat’s the quickest possible fix?  Take the output of the greedy algorithm and add Random Swaps.  You pick two placements at random and compute if swapping them lowers the overall image similarity score.  And you make as many swap improvements as you can find, until the person running the script gets tired of waiting and interrupts.  I called this the ‘timed’ algorithm in the script.  Result on the right again :) →

Hi-Res images here.

So those pesky white thumbnails have migrated to her skin patches because whilst there are closer skin-coloured matches, placing a skin-coloured patch on the black cloth is less overall error than placing the very white.  And so, with time and millions of random swap checks and thousands of actual improving swaps, tiles migrate towards a global optimum.

This is missing a few tricks.  There might be three tiles A B and C which, when their placements are rotated, lowers the overall score.  However, none of the pairs A B, B C nor A C are themselves a better swap.  This pair swapping algorithm will not find these.

When entering rec.math contests, I would start thinking about how to better optimize placement.  But as it is, random swaps actually works quite well.  I’d love to hear your ideas for improvements!

(Mosaic script source-code here)

And special thanks to Roy van Rijn - a rec.math kind of contestant - for the ball-plank and encouragement :)

Notes

  1. williamedwardscoder posted this

 ↓ click the "share" button below!