Friday, November 2, 2012

Callback hell for Python 3.x?

We know that OS-level threads don’t scale; says Guido succinctly:

the number of sockets you can realistically have open in a single process or machine is always 1-2 orders of magnitude larger than the number of threads you can have

So, at a runtime level, using OS threads (or processes) and blocking IO doesn’t scale for common server use-cases (which are not CPU bound).

So a runtime must synthesise blocking IO using green threads or expose an asynchronous IO API.

Guido has asked for input on standardising asynchronous IO in Python 3.x.

Previously I’ve found green threads mis-sold; they are still threads, and they still need locks.  We know that multi-threaded programming is hard.  And I stand by my preference for callback hell because we’re talking about libraries not runtimes.

Or are we?  Is this a chance for Python to change fundamentally?  Are green-threads back on the agenda?

Guido says no, on portability grounds:

I’d be leaning towards rejecting greenlets, for the same reasons I’ve kept the doors tightly shut for Stackless — I like it fine as a library, but not as a language feature, because I don’t see how it can be supported on all platforms where Python must be supported.

Consider for a minute some asynchronous Scala code:

for {
  user <- getUserById(id)
  orders <- getOrdersForUser(
  products <- getProductsForOrders(orders)
  stock <- getStockForProducts(products)
} yield stock

verses to its Java equivalent:

Promise<User> user = getUserById(id);
Promise<List<Order>> orders = user.flatMap(new Function<User, List<Order>>() {
  public Promise<List<Order>> apply(User user) {
    return getOrdersForUser(;
Promise<List<Product>> products = orders.flatMap(new Function<List<Order>, List<Product>>() {
  public Promise<List<Product>> apply(List<Order> orders) {
    return getProductsForOrders(orders);
Promise<List<Stock>> stock = products.flatMap(new Function<List<Product>, List<Stock>>() {
  public Promise<List<Stock>> apply(List<Product> products) {
    return getStockForProducts(products);

You know what?  I don’t like either of them!  Yield is evil.  Its about the most complicated part of Python, and it really ought not be the operator needed to write scalable code, in my opinion.  What I want to write is:

def get_stock(id):
    user = get_user_by_id(id)
    orders = get_orders_for_ser(
    products = get_products_for_orders(orders)
    stock = get_stock_for_products(products)
    return stock

And for that to not block other requests!  And yet not a yield nor yield from in view.  I want green threads, and I want it to just work!

(And don’t get me started on concurrent.futures.Future and its callback in an arbitrary thread anti-usability… ;)

What I’d really like is for Python to tame that GIL so that many Python threads can run in parallel; the recipe I provisionally project would be:

  • to use escape analysis to make Python code do far less reference-count changes,
  • to make it so that Python code will relinquish the GIL while running,
  • to make it so that each thread has its own heap queue, objects being owned by the heap belonging to the thread they were allocated in, and refcount-to-zero not causing immediate reclaim if in the wrong thread,
  • to frown harder upon C code that assumes there’s a GIL

And of course it can’t be so easy :)

And on top of that, build green threads into CPython.  Make it so that the threading module does what people would expect it to do.  Threads should preemptively yield using Python’s instruction counter logic; CPU-intensive Python-side tasks should not stop yielding.

Of course, asynchronous APIs have some technical advantages over blocking APIs; you can ask for things to be done in parallel.  With blocking IO you must do A, then B, then C.  But, well, actually… imagine asynchronous functions return a result that, internally, blocks (as in cooperatively yields) only when you dereference it…

And on all this, standardise on multiprocessing (as in contractually isolated) or some Go-style channels laid over an actor-style task queue, yet using threads.  Steer slightly wider of needing developers to get their locking right when doing their multi-tasking.

I said:

I’m very comfortable with callback-style reactor-style code.  But I am becoming more and more allegic to complex APIs.  I look with envy at the goroutine approach where things are hidden.  I look forward to the Go crowd improving their throughput, improving their scheduling, improving scalability across cores etc and all without the programmers using the language having to adapt, migrate, rewrite.

My sincerely hope that Python integrates the gevent approach so that monkey-patching becomes unnecessary and the yielding in (seemingly) blocking IO calls and locks works properly for the new ‘task’ way and in the classic thread way and generic code does not have to understand the distinction.

Said Guido:

I agree that Goroutines are nice but the entire Go language was designed around them. gevent at this time is not sufficiently cross-platform and cross-Python-implementation to make it a valid approach.

Well, bah humbug.  Yeah, what I’m imagining isn’t Python.

 ↓ click the "share" button below!