Saturday, November 26, 2011


What should a ‘cache’ be? It means a lot of things, but to my mind the default programming type should be:

keep around expensive-to-generate bits of read-only data in case we need them again, or until the computer really needs that RAM for something else

I was writing a custom video editing program in Python (interesting choice of language for that problem) and I wanted to cache decoded frames; but I just wasn’t happy with the memory management of explicit caches.  It means making a decision about how much RAM to ring-fence for something without having global system knowledge.

The problem with most caching systems is they aren’t elastic and they don’t dynamically reflect the demands of other applications you are sharing a computer with.  Additionally, in my case, I wanted it to work multi-process too.

Python has weak references but they are something else; you have to keep a strong reference to something to keep it around.

Java has soft references for caching purposes but they are objects living in the small bit of your RAM assigned to your JVM.

My inspiration was the Operating System File System’s “Buffer Cache”:

All mainstream OSes have much the same strategy - unused pages of RAM are used to cache disk IO automatically. If your application is reading and/or writing to the same bits of a file, its likely getting reads back straight from the OS ‘buffer cache’ without hitting disk. And this happens much more than you think, since all the code and libraries are, at this level, just files too.

Imagine you could share that buffer cache?  Imagine you could write data that the kernel only kept around if it didn’t have a better use for it?  And imagine you could share the reads and writes between processes too if you wanted?  And the kernel had a way of managing permissions, locking and sharing.

So I made a quick and dirty filesystem to (ab)use the OS VM system for the caching of arbitrary objects without them writing through to file. As the buffer cache sits in front of the filesystem, the filesystem’s reading and writing can look like this:

int read(file*, offset, length, bytes*):
   return -EIO; # report an error
int write(file*, offset, length, bytes*):
   return length; # report success, but don't store the bytes anywhere

Now its a case of tracking where you have written your cached items to in some file on this filesystem, and if your reading back of the cached value at some later time succeeds, those bytes were read from the OS’s cached pages!

In this way its easy to cache items in the OS and let them compete for free RAM with the IO pages. As caching is diminishing returns, there’s likely plenty of free RAM to store the hot IO and these cached values, with the remainder of free RAM churning with the speculative storing of soon-discarded cold pages of things.

It would be a worthwhile finesse to regularly sort popular small items to be near to each sharing pages, but I took the easy way out and aligned all items on page boundaries; after all, uncompressed video frames are several pages each anyway.

I used FUSE to get this working. But it was quite slow. Unfair to say was extremely slow, since, it was faster than memcached running on the same machine :)

But imagine a kernel module filesystem to do this… that’d be blazingly fast. The key lookup could be inside the kernel too; I could imagine a lookup being a dreaded ioctl that returned the file offset and length of items or reserved for storing new items. And if the read hits the filesystem callback, it could check to see if its a miss or if the filesystem itself explicitly evicted the value since it has been replaced by another value in since the offset/length was fetched with the ioctl; in this way, multiple users could use the same store (perfect for forking and other cooperative things; what would your gnome or KDE want to cache that wasn’t naturally a file?). How fast might a kernel module be? Its the same as a RAM disk, basically. That’s fast :)

You could go further; if you are using some NoSQL server, or even memcached, you likely have it on another node. As sets are a fraction of the gets, you could have a single thread on each client that listens to the NoSQL server for the hashcodes of set keys, and invokes some ioctl that invalidates them in the common cache. This way, all your redis or whatever reads on each of your webserver nodes could be cached locally.

Or you could take a classic cache like memcached, that itself could use buffcacher instead of doing its own eviction policy and using a static amount of resources on each node.

I wish that all operating systems had a straightforward, even standardised, way to cache data in  a dynamic way.  And I wish that all programming languages came with standard libraries to utilise it.

Whilst programs are not particularly getting larger, the data they act upon is.  The future will be full of data, and it’ll be increasingly important to compress that data on storage, as storage is not getting faster even as it gets bigger.

I philosophised on whether my buffercacher is the first step towards proper explicit OS support for this; after reflection I decided that yes, files are good abstraction for this.  The OS doesn’t need to learn new tricks, it just needs a file system that is forgetful.  A good library can do the rest.  Just as long as its installed by default ;)

Your browser probably reserves 60% or so of your RAM just to fill up with pages you’ve visited or might soon visit.  (There’s so much wrong with that statement but its true in spirit.)  This is because there isn’t a system-wide caching system it can use.  It ought to be using buffcacher!


  1. williamedwardscoder posted this

 ↓ click the "share" button below!