Thursday, January 26, 2012

How I got massively faster DB with async batching

As said in my moreSQL article:

I found databases to be fastest when my own code buffers DB interaction and reorders things and coalesces things so the DB is getting a few queries each with more payload.

And I was rather disillusioned with ORMs.

ORMs are about fast-prototyping code.  However, I discovered - in my DB usage patterns - that it was extra development cost when we discovered that the performance cost was prohibitive we had to chop it all out and replace it with a custom pattern.  It would have been massively faster development to just have done DB asynchronous in the first place…

What I built is so generally applicable I offer it here as a useful pattern for your projects when you hit the same scaling issues we did:

I use asynchronous IO loop servers all the time.  Likely you do too.  I’ve used a lot of twisted (so named because that’s what it does to your sanity), more recently tornado (saner and tidier; possibly abandon-ware, and the non-existent error-handling is surely malicious; rightly popular); I’ve contributed the inner IO loop to deft and I’ve written my own hellepoll (leanest and meanest).  I haven’t used node.js; that’s the one you’ve been using, right? :)  So that’s my CV and context.

And if you are building an IO loop, you are likely still doing your DB blocking.  When your DB is slow, you’re told to just make your DB faster, or to spin your DB out to a separate server and bounce your DB calls via HTTP!

Doing blocking IO to the DB in an IO loop means your web-server goes as fast as the DB.  Requests that arrive whilst waiting for the DB to do its thing have to wait.

You typically have several web-server instances and incoming requests are being load-balanced between them; so its a race which web-requests get serviced first; they can reach the DB in arbitrary order.

Some IO loops come with a DB connection pool worker threads concept.  This means you can service non-DB-touching requests whilst the DB thread(s) are busy, but otherwise you get queued up.  This is a major improvement:

You have a thread pool of DB connection worker threads, each eating incoming requests, submitting them to the database, and then calling back to the web request when complete.  So the race between web requests can happen within the single server, and you have to be aware of that.

(I’ve heard more recently that tornado is adding or someone’s contributed something like twisted’s db connection pool with queues)

Now I had a particular app and it was using twisted’s db-connection-pool system, as above.

I’ll bowdlerize a bit but imagine that lots of simultaneous web-requests all need the database to insert a row a database.

Now this will typically mean each web-request makes a blocking call to the DB.  Or a blocking call to a NoSQL or memcache system - its all much the same here.

In my tests, my MySQL+innodb instance running on local host could take about 1000 such discrete SQL INSERT statements per second (and every few minutes it paused for rather longer, which was very nasty for responsiveness!).  Your faster disks aren’t going to budge that number much.

A single statement inserting all 1000 rows at once takes just milliseconds.

Basically, the throughput of the database is better measured in number of requests rather than size of those requests.

I speculate that its the row-count being returned that causes it to be sequential and slow.  Each statement is a request-response trip; it needs acknowledging before the DB connection handles its next request.

Now at this point, we could have tried all kinds of things.  We don’t want to give up on ACID, and our code is already in the DB connection pool worker thread with callbacks architecture.  Our database is largely prescribed from outside.  So the thing we did try is novel and new though:

I created a database worker queue that greedily ate the queued up statements (these statements were not SQL, but rather commands that it understood; they could have been SQL, but parsing SQL is not necessary if you write the code that submits to the queue too).  It then grouped them, sorted them, and turned them into just a few big SQL queries.  It executed them, put the corresponding callbacks into the main IO loop, and by this time there was a whole bunch more commands for it to reorder and coalesce.

The speedup comes by streaming to the DB rather than each individual statement from a web-request being an independent ping-pong.

On our hardware, INSERTs topped out at about 5K/sec sustained.  Obviously better hardware is available; on the other hand, some back-ends, especially shared or remote resources like Amazon EC2 has likely got substantially slower disk even.

Now this is a high-cost initial start development-wise.  You have to know the application-specific data-specific reordering rules.  Is it safe to put this before that if the CustomerID is different, for example?  And there’s opportunities to spot and cope with writes that cancel others and normalizing them out before writing to DB.  You can mix updates with inserts using MySQL’s REPLACE INTO and so on, although decoding the rows-affected count afterwards takes some care.  You can even batch inserts that return IDs if you try super hard ;)

I can imagine more general ways of inferring this, perhaps by standard annotations by the callee, but it does put a correctness strain on the developers.

But oh boy was the performance worth it!  Absolutely no trouble making a single Python(!) server saturate both network IO and DB disk.

This design has exactly the same access schematics as the DB connection pool approach.  The only ACID you give up is a tiny bit of Isolation.

By putting many statements into a single transaction, this breaks isolation.  As wikipedia says, I am not the first to do so.  Isolation is only broken if there is a validation error in some of the data; that would zap the other statements that chance put in the same transaction.  However, philosophically, if you consider a validation error from the DB to be a programming error, because you are doing your tainted validation client-side, then this can be no big loss.

And you can obviously also have a pool of such workers.  But with that becomes diminishing returns because a single thread (even with GIL contention) is going to easily saturate your database :)

From the perspective of each web-request, all statements happen in order if you have a single worker thread.  As each web-request must necessarily make no assumptions about the order of other independent web-requests, it is safe to reorder given the rules and constraints of your application.  (I don’t think this so straightforward to generalise; the code that decides the reordering rules has to grok the data.)

I took this one step further and added an additional performance improvement: I found was that the set of hot SELECT data was sufficiently small in our application (you measure server RAM in GB these days) that it made sense to store it Python-side.  So SELECTs actually resolved completely client-side; its important to update the cache atomically when the write has succeeded on the DB, but its straightforward.  That’s obviously not super-scalable; but you’d be surprised how far that can take you.

Still talking about Python specifically and most DB connections in general, they always push exceptions like the DB connection dying or a deadlock onto the caller.  And this is not the right decision.  The DB queue ought to contain the retry logic.  So I put that into mine; you’d have difficulty predicting the various transient error codes you can get back from MySQL; you learn them in production, sadly.

Twisted’s db-api has a very nice feature where you can write a method to be executed in the context of the blocking DB thread, termed an interaction.  I use this quite a bit, so I made it that if the DB call fails because of these transient connection or deadlocking problems, my wrapper throws a special ‘rolling-back exception’ so the interaction code can clean up any non-DB state that it has modified.  More complexity, all worth it if you need the performance of asynchronous DB calls.

I have not experimented with multiple transaction statements in a single request; I can imagine that working, and there being ways of inferring which transactions succeeded and which failed, so as not to lose a whole transaction if one component part fails.  However, I’ve always taken the strong position that the DB validation is for catching programming errors, not for catching bad input.  So you shouldn’t lose batches like that.

So this is what I’ve been meaning by reordering and coalescing database reads at an application level.

And to get it best, you have to know what the data is.

If there was an ORM that was async, then perhaps - just perhaps - it could work on top of it.   If writing asynchronous IO looping code is second nature to you, so is understanding that the DB interaction is also spread out through many IO loop iterations.  And throughput and responsiveness times for requests that include DB access are massively faster when the server is under pathological load.

You could also imagine an interesting blocking DB API built on top of Stackless Python perhaps?  I don’t know anything about Stackless, but I have a gut feeling its precisely for this.

Its not a one-size fits all solution.  It did completely change the performance of our app, though.

You may also like: How I got massively faster DB with in-process replication


  1. jonoit reblogged this from williamedwardscoder
  2. williamedwardscoder posted this

 ↓ click the "share" button below!