I recently spotted that Python 3.5 has added yet more features to make coroutines more straightforward to implement and use. Since I’m well behind the curve I thought I’d bring myself back up to date over a series of blog posts, each going over some functionality added in successive Python versions — this one covers the facilities up to and including the yield from
syntax added in Python 3.3.
This is the 1st of the 4 articles that currently make up the “State of Python Coroutines” series.
I’ve always thought that coroutines are an underused paradigm.
Multithreading is great for easily expanding single threaded approaches to make better use of modern hardware with minimal changes; multiprocess is great for enforcement of interfaces and also extending across multiple machines. In both cases, however, the premise is on performance at the expense of simplicity.
To my mind, coroutines offer the flip side of the coin — perhaps performance isn’t critical, but your approach is just more naturally expressed as a series of cooperative processes. You don’t want to wade through a sea of memory barriers to implement such things, you just want to divide up your responsibilities and let the data flow through.
In this short series of posts I’m going to explore what facilities we have available for implementing coroutines in Python 3, and in the process catch myself up developments in that area.
Before looking at Python 3 it’s worth having a quick refresher on the options for implementing coroutines in Python 2, not least because many programmers will still be constrained to use this version in many commercial environments.
The genesis of coroutines was when generators were added to
the language in Python 2.2 — these are essentially lazily-evaluated lists.
One defines what looks like a
normal function but instead of a return
statement yield
is used. This
has the effect of emitting a value from your generator but — and this is
crucial — it also suspends execution of your generator in place and returns
the flow of execution back to the calling code. This continues until the caller
requests the next value from the generator at which point it resumes execution
just after the yield
statement.
For a real-world example consider the following implementation of the sieve of Eratosthenes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Generators are, of course, fantasically useful on their own. In terms of coroutines, however, they’re only half the story — they can yield outputs but they can only take their initial inputs, they can’t be updated during their execution.
To address this Python 2.5 extended generators in several ways which allow them to be turned into general purpose coroutines. A quick summary of these enhancements is:
yield
, which was previously a statement, was redefined to be an expression.send()
method to inject inputs during execution.throw()
method to inject exceptions.close()
method to allow the caller to terminate a generator early.There are a few other tweaks, but those are the main points. The net result
of these changes is that one could now write a generator where new values can
be injected, via the send()
method, and these are returned within the
generator as the value of the yield
expression.
As a simple example of this, consider the code below which implements a coroutine that accepts a number as a parameter and returns back the average of all the numbers up to that point.
1 2 3 4 5 6 7 |
|
The conversion of generators to true coroutines was the final development
in this story in Python 2 and development of the language long ago moved
to Python 3. In this vein there was another advancement of coroutines
added in Python 3.3 which was the yield from
construction.
This stemmed from the observation that it was quite cumbersome to refactor
generators into several smaller units. The complication is that a generator
can only yield
to its immediate caller — if you want to split generators
up for reasons of code reuse and modularity, the calling generator would
have to manually iterate the sub-generator and re-yield all the results.
This is tedious and inefficient.
The solution was to add a yield from
statement to delegate control entirely
to another generator. The subgenerator is run to completion, with results
being passed directly to the original caller without involvement from the
calling generator. In the case of coroutines, sent values and thrown exceptions
are also propogated directly to the currently executing subgenerator.
At its simplest this allows a more natural way to express solutions where
generators are delegated. For a really simple example, compare these two
sample1 implementations of
itertools.chain()
:
1 2 3 4 5 6 7 8 9 10 |
|
Right now, of course, this looks somewhat handy but a fairly minor improvement. But when you consider general coroutines, it becomes a great mechanism for transferring control. I think of them a bit like a state machine where each state can have its own coroutine, so the concerns are kept separate, and where the whole thing just flows data through only as fast as required by the caller.
I’ve illustrated this below by writing a fairly simple parser for expressions in Polish Notation — this is just like Reverse Polish Notation only backwards. Or perhaps I mean forwards. Well, whichever way round it is, it really lends itself to simple parsing because the operators precede their arguments which keeps the state machine nice and simple. As long as the arity of the operators is fixed, no brackets are required for an unambiguous representation.
First let’s see the code, then I’ll discuss its operation below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
The main entrypoint is the parse_expression()
generator. In this case it’s
necessary to have a single parent because we want the behaviour of the top-level
expressions to be fundamentally different — in this case, we want it to yield
the result of the expression, whereas intermediate values are instead consumed
internally within the set of generators and not exposed to calling code.
We use the parse_argument()
generator to calculate the result of an expression
and return it — it can use a return value since it’s called as a subgenerator
of parse_expression()
(and others). This determines whether each token is
an operator or numeric literal — in the latter case it just returns the
literal as a float
. In the former case it delegates to a subgenerator based
on the operator type — here I just have unary and binary operators as simple
illustrative cases. Note that one could easily implement an operator of
variable arity here, however, since the delegate generator makes its own
decision of when to relinquish control back to the caller — this is an
important property when modularising code.
Hopefully this example is otherwise quite clear — the parse_expression()
generator simply loops and yields the values of all the top-level expressions
that it encounters. Note that because there’s no filtering of the results by
the calling generator (since it’s just delegating) then it will yield lots
of None
results as it consumes inputs until the result of a top-level
expression can be yielded — it’ll be up to the calling code to ignore these.
This is just a consequence of the way send()
on a generator always yields
a value even if there isn’t a meaningful value.
The only other slight wrinkle is that you might see some
excessive bracketing around the yield
operators — this is typically a good
idea. PEP 342 describes the parsing rules, but if you just
remember to always bracket the expression then that’s one less thing to worry about.
One thing that’s worth noting is that this particular example is quite wasteful
for deeply nested expressions in the same way that recursive functions can be.
This is because it constructs two new generators for each nested expression —
one for parse_argument()
and one for whichever operator-specific subgenerator
this delegates to.
Whether this is acceptable depends on your use-cases and the extent which you
want to trade off the code expressiveness against space and time complexity.
Below is an example of how you might use parse_expression()
:
1 2 3 4 5 6 7 8 9 |
|
Here I’ve defined a convenience wrapper generator which accepts the expression
as a whitespace-delimited string and strips out the intermediate None
values
that are yielded. If you run that you should see there’s two top-level
expressions which yield the same result.
That wraps it up for this post — I hope it’s been a useful summary of where
things stand in terms of coroutines as far as Python 3.3. In future posts I’ll
discuss the asyncio
library that was added in Python 3.4, and the additional
async
keyword that was added in Python 3.5.
Neither of these are anything to do with the official Python library, of course — they’re just implementations off the top of my head. I chose itertools.chain()
purely because it’s very simple. ↩