Sunday, November 17, 2013

PyParallel getting around the GIL

This is a fascinating and exciting talk (slides) about a patch/extension to make CPython use IOCP on Windows for completion-based asynchronous IO.

Its in direct response to the recent async IO stuff in Python which I’ve looked at previously.  There’s some coverage on Proggit.

PyParallel makes it so you can run Python protocol handlers and worker tasks in lightweight processes.  Its a bit like goroutines really, and it could be suitable for computational tasks too: share by communicating, not communicate by sharing.

The key idea is that you can utilise all CPU cores without spawning a whole bunch of processes.

I think the exact same approach can be ported to Linux, using edge-triggered epoll consumed by a set of worker threads.  (IIRC FreeBSD’s kqueue is not thread safe though; so it may be challenging to be efficient on all platforms?)

These lightweight processes run outside the GIL so can all run at once.  When they are running, though, the main loop cannot run.  I am unsure of the details of this and it raises alerts in my eyes; I’d really want the main Python loop to run truly in parallel.

What my mind wanders to is having a CPython where the multiprocessing module does not use processes, but instead uses the same VM but separate lightweight stacks/heaps in a work-stealing approach.  This gives up fair preemptive scheduling (by the kernel) for cheaper passing between tasks.  Its the same tradeoff that is serving Go so well and I think the PyParallel prototype shows how this can fit into Python and make Python perform better.  The GIL would become per-conceptual process, rather than truly global.

For example, I expect most SQL statements are not part of a transaction; web apps and so on are continuously just SELECTing the user and so forth.  You could imagine a web app that isn’t particularly aware that its blocking DB calls are being multiplexed over a set of worker threads and that the DB layer is aware of this and using a connection pool approach itself.  You might even imagine it trying to do something clever like my asynchronous batching.

You might even move away from CSP and allow code in a worker to mutate shared state by acquiring the GIL of the thread that created the variable in the first place; mutating shared state could be slightly slower but effective.

And what about yielding when it is used in a coroutine manner as in Tornado?

From all this you can imagine a new dialect of Python emerging, where multiprocessing and gorutines are built in?

 ↓ click the "share" button below!