In this series looking at features introduced by every version of Python 3, we continue our look at Python 3.9 by going through the notable changes in the standard library. These include concurrency improvements with changes to asyncio
, concurrent.futures
, and multiprocessing
; networking features with enhancements to ipaddress
, imaplib
, and socket
; and some additional OS features in os
and pathlib
.
This is the 20th of the 35 articles that currently make up the “Python 3 Releases” series.
Following on from the previous article looking at new features in Python 3.9, this one discusses changes to existing library modules. There’s a smaller set of changes than some of the previous articles here, partly because this release seemed to have fewer, and partly because I’ve tried to be a little more selective about which ones are useful enough to be worth covering.
A few small changes in math
, the first of which is simply that math.gcd()
now accepts more than the previous two arguments. It will return the largest integer which is a divisor of all arguments, unless any are zero in which case zero is returned.
On a related note there’s a new math.lcm()
method which returns the lowest common multiple of a specified set of integers — that is, the smallest value of which every specified integer is a divisor.
Another new function is math.nextafter()
, which returns the next distinctly representable float
value after a specified one in a specified direction. It takes two parameters, the first being the starting point and the second being a target value to indicate the direction in which the next value should be located. This mirrors the nextafter()
function in libc.
>>> import math
>>> math.nextafter(0.0, 1.0)
5e-324
>>> math.nextafter(5.0, 10.0)
5.000000000000001
>>> math.nextafter(3000.0, -math.inf)
2999.9999999999995
Finally, there is a third new function math.ulp()
which returns the value of the least-significant bit of the specified value — in other words, the difference in the magnitude of the float value between the least-significant bit being zero or one. Due to the nature of floating point values, this delta becomes larger with the magnitude of the value. The name of the function comes from the commonly used term for this: unit in the last place.
>>> math.ulp(1.0)
2.220446049250313e-16
>>> math.ulp(1000000.0)
1.1641532182693481e-10
>>> math.ulp(1.0e10)
1.9073486328125e-06
>>> math.ulp(1.0e100)
1.942668892225729e+84
On Linux platforms only, the os.pidfd_open()
function and os.P_PIDFD
constant have been added to support a newish features in the Linux kernel called pidfds. A detailed discussion is outside the scope of this article2, but suffice to say you can pass the PID of an existing process to pidfd_open()
to obtain a file descriptor that refers to a process and can be monitored with the usual poll()
and select()
mechanisms, so it’s useful for integrating into existing IO loops. Also see the discussion of PidfdChildWatcher
in asyncio
later in this article.
For users of other platforms, the os.putenv()
and os.unsetenv()
calls are now always available — the unsetenv()
call has been added to Windows, where it was previously missing, and Python now requires all other platforms to provide setenv()
and unsetenv()
calls for it to build successfully.
Finally, there’s a new function os.waitstatus_to_exitcode()
which translates the status code returned from os.wait()
and os.waitpid()
into the actual exit code of the process, if the process terminated normally, or the negation of the signal used to terminate the process otherwise.
>>> import os
>>> import signal
>>> import sys
>>> import time
>>>
>>> if (pid := os.fork()) == 0:
... sys.exit(13)
...
>>> waited_pid, status = os.wait()
>>> waited_pid == pid
True
>>> status
3328
>>> os.waitstatus_to_exitcode(status)
13
>>>
>>> if (pid := os.fork()) == 0:
... time.sleep(3600)
... sys.exit(10)
...
>>> os.kill(pid, signal.SIGTERM)
>>> waited_pid, status = os.wait()
>>> waited_pid == pid
True
>>> signal.SIGTERM
<Signals.SIGTERM: 15>
>>> status
15
>>> os.waitstatus_to_exitcode(status)
-15
This is definitely less tedious than the usual if
/else
dance one has to perform using os.WIFSIGNALED()
, os.WIFEXITED()
and os.WEXITSTATUS()
to get this information, and proves to be handy if you don’t use higher-level abstractions provided by subprocess
and the like.
One caveat worth noting is that if a process receives SIGSTOP
then it hasn’t actually terminated, but merely been suspended until it receives SIGCONT
. If you’re using os.wait()
then it won’t return, as there’s no process termination and no status to return. However, if you use os.waitpid()
, os.wait3()
or os.wait4()
and you pass os.WUNTRACED
in the options
argument then these functions will return in the case of a suspended process. The os.waitstatus_to_exitcode()
function doesn’t handle this case properly, so you’ll need to first check the returned status with os.WIFSTOPPED()
and, if it returns True
, then skip the call to os.waitstatus_to_exitcode()
.
A handy handful of improvements to this useful module. First up, the shutdown()
method of the Executor
class has hitherto been to wait for all futures passed to the executor to complete execution before freeing all the associated resources. The wait
parameter doesn’t affect this behaviour, that merely controls whether the method returns immediately or after the shutdown is complete, but either way the shutdown itself doesn’t happen until the futures are complete. The set of futures includes those which are currently executing, as you’d expect, but also futures which haven’t yet started, despite the fact these could often be safely cancelled. As of Python 3.9 there’s a new cancel_futures
parameter to this method which will cause futures not yet started to be cancelled instead of executed. Any futures which have already started execution will still be waited on, however.
Secondly, both ThreadPoolExecutor
and ProcessPoolExecutor
have been updated to no longer use daemon threads3, but instead use an internal function hook that’s similar to atexit.register()
, but called at threading shutdown instead of interpreter shutdown. This change was motivated by the fact that subinterpreters no longer support daemon threads — you can find more discussion in bpo-39812.
Finally there’s a performance improvement to ProcessPoolExecutor
to ensure that it always reuses idle worker processes where possible and only spawns new ones where necessary. The change also addresses an issue where the max_workers
parameter was regarded as the number of initial processes to spawn regardless of need, rather than the maximum number as it should be.
There are a number of changes to asyncio
in this release.
When using asyncio.loop.create_datagram_endpoint()
the parameter reuse_address
has defaulted to True
, which sets the SO_REUSEADDR
socket option on the UDP socket thus created for families AF_INET
and AF_INET6
. However, at least on Linux this option creates a serious security hole as it allows other processes to bind to the same address and port — the kernel will randomly distributed received packets between the bound processes. As a result in the 3.9 release1 this option has been defaulted to False
, and any attempt to set it to True
will raise an exception.
>>> import asyncio
>>> import socket
>>>
>>> async def func():
... await asyncio.get_running_loop().create_datagram_endpoint(
... lambda: asyncio.Protocol(),
... family=socket.AF_INET,
... reuse_address=True)
...
>>> asyncio.run(func())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File ".../lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "<stdin>", line 2, in func
File ".../lib/python3.9/asyncio/base_events.py", line 1331, in create_datagram_endpoint
raise ValueError("Passing `reuse_address=True` is no "
ValueError: Passing `reuse_address=True` is no longer supported, as the usage
of SO_REUSEPORT in UDP poses a significant security concern.
The next change is to address the issue that the BaseEventLoop.close()
method can leak dangling threads — it doesn’t wait for the default executor to close and hence the threads in the associated ThreadPoolExecutor
don’t get joined. To resolve this, a new coroutine loop.shutdown_default_executor()
has been added which calls shutdown()
on the executor and waits for it to complete. This new coroutine is now also called by asyncio.run()
right after shutting down the async generators, so you’ll benefit from the safety without code changes as long as you’re using this.
Next up we look at the addition of PidfdChildWatcher
to asyncio
. This is a new process watcher, which are a set of different policies used to monitor child processes. There are four choices of policy for this purpose prior to 3.9 which offer different tradeoffs between performance overhead, and the chances of conflicting with third party code which manages its own subprocesses.
I won’t go into them in detail here, but before you worry too much about the differences you might like to check out issue #94597 where there are plans for removing most of these options in release 3.12. For the purposes of this article, I’ll just concentrate on the change in release 3.9.
The new PidfdChildWatcher
manages a good balance of performance and safety by using the Linux kernel feature pidfds, which were described earlier in the article in the os
section. Since it does use signals or threads, it’s both safe and performant, and it scales linearly with the number of child processes. The downside is that it’s a Linux-specific extension so not available on other platforms.
When running code asynchronously, the main hazard to performance is IO-bound blocking functions. In general this issue is avoided by using non-blocking alternatives, but some libraries may not offer such options. For these cases the new asyncio.to_thread()
coroutine can come in handy.
All this does is execute the specified function in a separate thread, and returns a coroutine which can be awaited to get the result when the function returns. Hopefully the following example makes it fairly clear.
>>> import asyncio, time
>>>
>>> def blocking_function():
... print("Starting blocking function")
... time.sleep(3)
... print("Ending blocking function")
... return 123
...
>>> async def machine_goes_ping():
... for n in range(8):
... await asyncio.sleep(0.5)
... print("Ping!")
...
>>> async def main():
... print("Main started")
... results = await asyncio.gather(
... asyncio.to_thread(blocking_function),
... machine_goes_ping()
... )
... print(f"Results: {results}")
...
>>> asyncio.run(main())
Main started
Starting blocking function
Ping!
Ping!
Ping!
Ping!
Ping!
Ending blocking function
Ping!
Ping!
Ping!
Results: [123, None]
This isn’t actually IO-specific and could, in principle, also be used for CPU-intensive functions which would otherwise block the event loop. The main caveat here is that the GIL tends to prevent such parallelisation in Python code anyway, so the separate thread doesn’t provide any benefit. However, in third party extensions which release the GIL themselves then this benefit may also apply here too.
A brace of changes to IMAP support. Firstly, the IMAP4
and IMAP4_SSL
constructors now allow a timeout to be specified. Yes, it’s 2020 and Python is still adding timeouts to its network-oriented blocking functions. I’m really trying very hard not to rant about why people don’t always add timeouts to every single blocking function they ever implement — it’s like implementing a file format without including a “version” field, you just don’t do it4.
Secondly, the IMAP4
class, and hence its subclasses, now support an unselect()
method. Like the close()
method, this frees up any resources the server has allocated for the current mailbox and returns the connection to the freshly authenticated state. Unlike close()
, however, it does not also remove any messages marked “deleted” from the mailbox.
The ipaddress
module now supports IPv6 scoped address literals. Without wanting to dive into a detailed summary of IPv6, I’ll try and explain what this means as briefly as I can5. Every IPv6 address has a scope which defines where that address is valid. For unicast and anycast addresses, there are only two scopes: global and link-local. Global addresses are potentially globally routable, and are what most people will think of as IPv6 addresses. Link-local addresses are only valid on a specific network interface, and so they’re only unique if combined with the interface identifier. This is appended to the address with a suffix using a %
separator, as in fe80::1ff:fe23:4567:890a%eth2
6 — it’s this format which is now supported.
>>> import ipaddress
>>> addr = ipaddress.ip_address("fe80::1ff:fe23:4567:890a%eth2")
>>> addr.is_link_local
True
>>> addr.scope_id
'eth2'
A couple of small changes to importlib
in this release. First up, importlib.util.resolve_name()
now raises ImportError
instead of ValueError
for invalid relative imports, for consistency with the builtin import
statement.
A new importlib.resources.files()
function has also been added which allows access to resources stored in nested containers. It returns a Traversable
object which has methods to access information about itself and any objects nested with it, whether they are resources (“files”) or containers (“subdirectories”). The contents are directly available via the read_bytes()
and read_text()
methods, but there’s also an importlib.resources.as_file()
method if a real file is required — this is used as a context manager and extracts the resource to the filesystem, returning the path of the file. It will automatically clean up any temporary files at the end of the context block.
The Abstract Syntax Trees module has acquired some useful changes for those who find themselves parsing or analysing Python source code. The ast.dump()
method can now have indent=True
specified to produce a more readable multiline output. There’s also an ast.unparse()
method to re-create Python source code which would have resulted in the syntax tree provided.
>>> import ast
>>> tree = ast.parse("""
... def my_function(arg):
... x = arg**2
... return (x+10)
... """)
>>> print(ast.dump(tree, indent=True))
Module(
body=[
FunctionDef(
name='my_function',
args=arguments(
posonlyargs=[],
args=[
arg(arg='arg')],
kwonlyargs=[],
kw_defaults=[],
defaults=[]),
body=[
Assign(
targets=[
Name(id='x', ctx=Store())],
value=BinOp(
left=Name(id='arg', ctx=Load()),
op=Pow(),
right=Constant(value=2))),
Return(
value=BinOp(
left=Name(id='x', ctx=Load()),
op=Add(),
right=Constant(value=10)))],
decorator_list=[])],
type_ignores=[])
>>> print(ast.unparse(tree))
def my_function(arg):
x = arg ** 2
return x + 10
A few more useful changes too small to warrant their own sections.
isocalendar()
Now Returns namedtuple
isocalendar()
method of both datetime.date
and datetime.datetime
now returns a namedtuple
instead of a tuple
, with fields year
, week
and weekday
.fcntl
fcntl
module now offers constants F_OFD_GETLK
, F_OFD_SETLK
and F_OFD_SETLKW
, which are used for the Linux-specific file descriptor locking. This contrasts with regular fcntl()
locks which are per-process not per-descriptor.103 EARLY_HINTS
and 425 TOO_EARLY
, as well as the ever-useful 418 IM_A_TEAPOT
, are now available in http.HTTPStatus
.close()
to multiprocessing.SimpleQueue
close()
method on SimpleQueue
which closes the file descriptors used for the associated pipe. This is useful to avoid leaking file descriptors if, for example, a reference to the queue persists for longer than expected.readlink()
Method to pathlib.Path
os.readlink()
function.random.randbytes()
random.getrandbits()
but instead of returning an int
of specified length, it returns a bytes
object of the specified length7.signal
module has a new function pidfd_send_signal()
which sends a signal to a process specified by a pidfd, as opposed to os.kill()
which expects a PID.AF_UNIX
, it’s long been possible to send and receive file descriptors over them. However, the sendmsg()
and particularly recvmsg()
invocations required to do so are a little fiddly. The socket
module has now acquired new send_fds()
and recv_fds()
methods to wrap this up more conveniently.sys.stderr
would always be line-buffered if connected to a TTY, and block-buffered otherwise. To ensure errors are displayed more promptly, however, it’s now always line-buffered by default. If this causes problems you can either run python
with the -u
switch for force stdout
and stderr
to be unbuffered, you can call flush()
when required, or you can reopen sys.stderr
with os.fdopen()
passing an appropriate value for the buffering
parameter.$ python3.8 -c 'import sys; print(sys.stderr.line_buffering)'
True
$ python3.8 -c 'import sys; print(sys.stderr.line_buffering)' 2>/tmp/output
False
$ python3.9 -c 'import sys; print(sys.stderr.line_buffering)' 2>/tmp/output
True
reset_peak()
to tracemalloc
A fairly incremental release all round, this one, but it’s good to see asyncio
continues to improve at a reasonable pace. Access to pidfds on Linux also looks pretty handy, especially when multiple threads within a process need to interact with child processes — it’s a shame that this feature isn’t more portable, but I daresay these days there are quite a few commercial programmers who can assume the use of the Linux platform for in-house code.
In the next article I’ll move on to Python 3.10, just as Python 3.11 gets released — at least catching up feels more feasible than it did when I started this whole series!
Actually this was also introduced into Python 3.8.1, but since I’m only looking at the initial version of each major release in these articles this is where it lands. ↩
If you want to find out a lot about the development of pidfds, there’s an in-depth video from the Kernel Recipes 2019 conference. ↩
Normally the Python interpreter won’t exit until all threads have completed, even after the main thread has ended. A thread marked as a daemon thread, however, doesn’t count for this purpose and will be forcefully terminated at shutdown instead. ↩
Though of course people do still do it, as the evidence clearly indicates. ↩
But as any regular readers will know, brevity is not one of my core competencies. ↩
Strictly speaking that’s not necessarily valid since support for string-based suffixes is optional, but it’s typically the way it’s done on Unix. On Windows the interface identifiers are numerical, and since support for numerical identifiers is mandatory, unlike strings, then they should also work on Unix as well. ↩
Note that as with other functions in random
, this function generates pseudorandom values which aren’t suitable for cryptographic purposes. If you want more secure bytes, use either os.urandom()
or the token_bytes()
method from the secrets
module that was added in Python 3.6. ↩