☑ What’s New in Python 3.12 - Library Changes

17 Mar 2024 at 2:58PM in Software
 | 
Photo by Dids on Pexels
 | 

In this series looking at features introduced by every version of Python 3, we take a look at the new features added in Python 3.12 in the standard library, as well as a few other minor language improvements I missed in previous articles.

This is the 32nd of the 36 articles that currently make up the “Python 3 Releases” series.

python 312

This is my third and final article looking at the changes in Python 3.12. In the last couple of we looked at a improvements to type hinting and some language and interpreter changes. In this article we’ll look at a handful of smaller changes to the core language1, and also some improvements to the standard library as well.

Other Language Enhancements

Let’s kick off with the language changes. The summary of the changes I’m discussing is:

  • No longer returning an ExceptionGroup for a single exception in some cases
  • Safer garbage collection, triggered by the eval breaker
  • Hashable slices
  • Better accuracy using sum() with floats
  • Support for the Linux perf profiler
  • Changes to assignment expressions in comprehensions

Some of these are fairly niche changes, especially the last one, and I’ve tried to call these out where I think they may not be of interest to everyone. But if you’ve read my previous articles, you’ll know that I do love to dive into the details a little—there’s always the table of contents if you want to skip around!

Without further ado, let’s jump right now.

One Exception to Rule The ExceptionGroup

We’ll start with perhaps a bit of a niche one, but I think it’s something that might genuinely come up occasionally, so it’s worth being aware of. I covered use of ExceptionGroup in one of my 3.11 articles, so I’m going to assume you’re familiar with that and the except* syntax—if not, do go take a read because it’s useful stuff.

With that in mind, consider the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
try:
    try:
        raise ExceptionGroup(
            "example",
            [ValueError(1), ValueError(2), TypeError("foo")]
        )
    except* ValueError as exc_group:
        print(f"Inner catch: {exc_group!r}")
        raise RuntimeError("oops")
except Exception as exc:
    print(f"Outer catch: {exc!r}")

You can try to figure out what’s going on here yourself if you like, before reading on, because I’ll spoil if for you now. What happens is that we’re raising a heterogenous ExceptionGroup which contains two ValueError and one TypeError. The following except* ValueError separates out the ValueError instances and passes them to the handler as exc_group, which is printed.

From within the handler, then we raise a RuntimeError, but since there’s still an extent ExceptionGroup with the remaining TypeError left to handle, this gets raised as a new ExceptionGroup containing the new RuntimeError and the original ExceptionGroup but with the handled ValueError removed, since they were already handled by the except* ValueError.

The output is thus as follows:

Inner catch: ExceptionGroup('example', [ValueError(1), ValueError(2)])
Outer catch: ExceptionGroup('', [RuntimeError('oops'), ExceptionGroup('example', [TypeError('foo')])])

This much hasn’t changed in Python 3.12. But now let’s add another handler:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
try:
    try:
        raise ExceptionGroup(
            "example",
            [ValueError(1), ValueError(2), TypeError("foo")]
        )
    except* ValueError as exc_group:
        print(f"Inner catch #1: {exc_group!r}")
    except* TypeError as exc_group:
        print(f"Inner catch #2: {exc_group!r}")
        raise RuntimeError("oops")
except* Exception as exc_group:
    print(f"Outer catch: {exc_group!r}")

So this time, what happens? We’ve exhausted the original ExceptionGroup, so there’s no reason why we shouldn’t just raise RuntimeError as a bare exception, right? Well, not in Python 3.112 as it turns out. Due to the group being handled, a new group is still created even though it only has a single exception in it:

Python 3.11.3
Inner catch #1: ExceptionGroup('example', [ValueError(1), ValueError(2)])
Inner catch #2: ExceptionGroup('example', [TypeError('foo')])
Outer catch: ExceptionGroup('', [RuntimeError('oops')])

In Python 3.12 this has been changed to the more intuitive behaviour of just raising the bare RuntimeError as a single exception, not a group. This change has also been backported to 3.11.4 and later.

Python 3.12
Inner catch #1: ExceptionGroup('example', [ValueError(1), ValueError(2)])
Inner catch #2: ExceptionGroup('example', [TypeError('foo')])
Outer catch: RuntimeError('oops')

Collecting Garbage on The Eval Breaker

This one is just for interest, really, because it probably shouldn’t change runtime behaviour in a noticeable fashion, so I’ll keep it brief.

Historically, Python garbage collection could be run on any object allocation. This meant that a garbage collection could run halfway through running a bytecode instructions, whilst the VM is in an arbitrary and potentially inconsistent state—this has been a source of a number of issues, as you might expect.

As of Python 3.12, the garbage collector is now only run between bytecode instructions, using the eval breaker mechanism. This is a flag which is set during points of execution to indicate that the current flow of bytecode execution should be interrupted once the current instruction completes. This has hitherto been used for things like when another thread is requesting the GIL, or to run signal handlers.

So, where previously a garbage collection would immediately have been triggered by calling gc_collect_generations(), now the code simply sets a new flag gc_scheduled as well as the eval_breaker flag in the interpreter state. In the later code which is triggered when eval_breaker is set, there’s a new check for this gc_scheduled flag which triggers the same gc_collect_generations() function.

Making a Hash Of Slices

You’re probably familiar with slice notation for taking a potentially non-contiguous subset of a sequence.

>>> values = list(range(100))
>>> values[20:80:7]
[20, 27, 34, 41, 48, 55, 62, 69, 76]

What you might be less familiar with is that you can represent the parameters of this slice using a slice object, and pass this around as a first-class object, using it to extract slices from sequences as needed.

>>> s = slice(20, 80, 7)
>>> values[s]
[20, 27, 34, 41, 48, 55, 62, 69, 76]

What you can’t do, at least in Python 3.11, is to store these slice objects into a dict or set as they’re not hashable.

Python 3.11
>>> {s: True}
TypeError: unhashable type: 'slice'

In Python 3.12, however, they are.

Python 3.12
>>> s = slice(20, 80, 7)
>>> {s: True}
{slice(20, 80, 7): True}

Sum Things Better

Prior to Python 3.12, the humble sum() was implemented in a very simple fashion, just progressively summing values as you’d expect. For lossless values like integers, this is fine, but for floating point values this naive approach can impact precision.

If you really care about precision, there’s a function math.fsum() which does a very careful job to maintain as much accuracy as possible, based on guarantees that the IEEE-754 floating point standard makes. Unfortunately this function isn’t particularly well known, takes about 10X as long as sum(), and also coerces everything to float. As a result, some improvements were made in Python 3.12 to make built-in sum() somewhat better, but without making it as slow as math.fsum().

This is done using a technique called compensated summation, where during the summation an additional running compensation value which attempts to track the errors in the low-order bits which have crept into the calculation. On each iteration, this compensation is added back in, to try to keep the total more accurate3.

Actually, this change to sum() uses an improved version by Arnold Neumaier, which improves the behaviour in cases where the next item to be added is larger in magnitude, ignoring the sign, than the running sum so far. This is important, because if a large value is added to a small sum then it’s the low-order bits of the sum which are lost, rather than the value.

I’m honestly not nearly enough of an expert in floating point error accumulation to offer an opinion on how often this change will make a noticeable difference in accuracy, but I can say the difference in performace is fairly minimal, only taking about 1.5% longer based on some basic measurements which I lifted from the PEP and ran on my Apple M2 Macbook Pro. This doesn’t quite agree with the discussion on the Python issue ticket, which claims the overhead is zero, but at worst it’s very minimal.

$ python3.11 -m timeit -s 'import random' -s 'values = [random.expovariate(10.0) for i in range(10_000)]' 'sum(values)'
10000 loops, best of 5: 33.4 usec per loop
$ python3.12 -m timeit -s 'import random' -s 'values = [random.expovariate(10.0) for i in range(10_000)]' 'sum(values)'
10000 loops, best of 5: 33.9 usec per loop

Perf-ect Profiling on Linux

Profiling your code is really helpful when you want to do any optimisation—if you’re trying to speed things up without seeing where your code is spending most of its time, then you’re at serious risk of wasting a lot of our effort.

There are several ways to profile code, but I think of them in two basic categories: deterministic profilers involve changes to instrument your code (or the interpreter, in the case of Python) to track the exact sequence of events when running your code; whereas statistical profilers take snapshots of where your code while it’s running and see which functions contain most of those samples. Python’s profile and cProfile modules are examples of the former.

As useful as deterministic profiling can be, however, it typically will have some effect on the performance of the code your profiling, and typically it also means running the code under controlled conditions. Sometimes, however, you want to profile code in a production environment, running under a real-world load and with minimal overhead incurred. This is where statistical profilers come into their own.

Linux has had good support for statistical profiling since OProfile was added in 2001. This uses a system timer4 to briefly interrupt execution at fairly frequent intervals and take samples of which process is running and the state of the program counter, etc. These samples can later be post-processed and filtered by a specific application to see which parts of that code were taking the most time.

More recently, a tool called perf was added to Linux in 2009, which uses the same mechanims but has a different userspace component. The events are captured in an in-kernel buffer, and the userspace component periodically empties this to disk as it becomes full, so the overhead added by profiling is quite low. This tool has more features than oprofile, and seems to now be the de facto staistical profiler on Linux. I’m not going to talk about the use of perf directly in this article, but if you want to play with it then their documentation has a good tutorial.

The reason this is all of interest to this article is that Python 3.12 has added features which make perf much more useful for profiling Python applications. Normally, perf would only get information about the C functions in the Python interpreter, but not the bytecode that’s being executed—this is probably not particularly useful because you’ll just have a lot of calls to _PyEval_EvalFrameDefault() and not a lot else. That’ll tell you that bytecode is being executed, but not what it’s doing.

In Python 3.12 you can enable perf support to dynamically generate small pieces of code before the execution of each Python function, and then it uses special mapping files to help perf translate these into the names of the actual Python functions being executed.

To enable the support, your Python needs to be compiled with the correct option, which is should be on Linux (you can check with python -m sysconfig | grep HAVE_PERF_TRAMPOLINE). You also need to enable the support at runtime by either passing option -X perf or setting environment variable PYTHONPERFSUPPORT to a non-zero value.

To try this out, I simply used the sample code from the Python documentation:

perftest.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def foo(n):
    result = 0
    for _ in range(n):
        result += 1
    return result

def bar(n):
    foo(n)

def baz(n):
    bar(n)

if __name__ == "__main__":
    baz(1000000)

I also had to compile a special version of the Python interpreter, tweaking the compilation options to pass -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer to get things to work properly. This is because the dyanmic code that’s inserted doesn’t have access to DWARF debug information, so it needs the frame pointers to be intact to follow the traceback. The frame pointer is often omitted because it frees up an additional register which can improve performance, particularly on x86 architectures.

Because I use pyenv to manage multiple Python versions on my machine, I therefore installed a version of 3.12 with these compile flags set like this:

PYTHON_CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" pyenv install 3.12

All this done, it was time to give it a try. Since the output is so verbose I’m not going to do a comparison with/without the support enabled at runtime, I’ll just show you a snippet of the output with support enabled to show you what’s available.

$ perf record -F 9999 -g -o perf.data /home/andy/.pyenv/versions/3.12.2/bin/python -X perf perftest.py
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.154 MB perf.data (1015 samples) ]

Here you can see I’m passing the specific Python binary to the call—I didn’t want to take any chances that the pyenv shim system would mess things up. I’m passing -F 9999 to sample at 9999Hz, -g to enable call graph recording, and -o perf.data to specify the output filename where the data will be stored.

Then to see the results I did this:

(Apologies to anyone on a mobile device, I couldn’t really reformat this without making it totally unreadable—scroll to the right!)

$ perf report --stdio -n -g
# Samples: 1K of event 'cpu-clock:pppH'
# Event count (approx.): 102010200
#
# Children      Self       Samples  Command  Shared Object         Symbol
# ........  ........  ............  .......  ....................  ..............................................................
#
    92.94%     0.00%             0  python   libc.so.6             [.] 0x00007f2699adfd90
            |
            ---0x7f2699adfd90
               Py_BytesMain
               |
               |--78.43%--Py_RunMain
               |          |
               |          |--75.59%--pymain_run_python.constprop.0
               |          |          |
               |          |           --75.29%--_PyRun_AnyFileObject
               |          |                     _PyRun_SimpleFileObject
               |          |                     |
               |          |                      --74.90%--run_mod
               |          |                                |
               |          |                                 --74.71%--run_eval_code_obj
               |          |                                           PyEval_EvalCode
               |          |                                           py::<module>:/tmp/perftest.py
               |          |                                           _PyEval_EvalFrameDefault
               |          |                                           PyObject_Vectorcall
               |          |                                           |
               |          |                                            --74.61%--py::baz:/tmp/perftest.py
               |          |                                                      _PyEval_EvalFrameDefault
               |          |                                                      PyObject_Vectorcall
               |          |                                                      py::bar:/tmp/perftest.py
               |          |                                                      _PyEval_EvalFrameDefault
               |          |                                                      PyObject_Vectorcall
               |          |                                                      py::foo:/tmp/perftest.py
               |          |                                                      |
               |          |                                                      |--67.06%--_PyEval_EvalFrameDefault
               |          |                                                      |          |
               |          |                                                      |          |--16.18%--_PyObject_Free
               |          |                                                      |          |
               |          |                                                      |          |--14.41%--_PyLong_Add
               |          |                                                      |          |          |
               |          |                                                      |          |          |--6.37%--_PyObject_Malloc
               |          |                                                      |          |          |
               |          |                                                      |          |          |--2.16%--__tls_get_addr
               |          |                                                      |          |          |
               |          |                                                      |          |           --0.78%--_Py_NewReference

You can see the percentage of cumulative time spent in each of those functions, and if you see what’s buried across to the right there, you can see this part:

|
 --74.61%--py::baz:/tmp/perftest.py
           _PyEval_EvalFrameDefault
           PyObject_Vectorcall
           py::bar:/tmp/perftest.py
           _PyEval_EvalFrameDefault
           PyObject_Vectorcall
           py::foo:/tmp/perftest.py
           |
           |--67.06%-- ...
...

Those py::baz:/tmp/perftest.py entries are from the dynamic code that the support has inserted—you can see they show you the function name and the Python source file, which should make it quite possible to do some fairly effective profiling.

I’ve only scratched the surface of the options available here, but even just explaining this much has got quite involved, so I’ll leave it there. That said, the documentation all seems to be pretty good if you want to take this further, and it looks like a great way to do some non-invasive profiling of your Python code.

Assignments in Comprehensions

This is a real niche one, so feel free to skip next section if it gets too esoteric for you. I know I cover some niche features, but this one is pushing the envelope, in my view, and I’m entirely unconvinced it has any particularly practical use-cases which wouldn’t be better written another way.

You’ve decided to brave the esotericism? Great, let’s dive in—but first I need to add some background5.

When PEP 572 introduced the walrus operator, it placed some restrictions on its use within comprehensions. In particular, although it’s valid to assign to non-local values within a comprehension (i.e. those defined outside the comprehension itself), it’s not valid to assign to the target variables of the comprehension itself.

So this is valid:

>>> total = 0
>>> [total := total + i for i in range(5)]
[0, 1, 3, 6, 10]

… but this is not:

>>> [i := i + 1 for i in range(5)]
  File "<stdin>", line 1
SyntaxError: assignment expression cannot rebind comprehension iteration variable 'i'

These examples would yield the same results in Python 3.12, but there is a difference in a more subtle case. To understand what this difference is, first let’s look again at that case where the targets aren’t local, i.e. aren’t declared within the comprehension. This is something I suspect a lot of people never considered about comprehensions, so let’s cement it with a more complex example.

>>> some_dicts = [{} for i in range(5)]
>>> max(i for i, some_dicts[i]["id"] in ((j, j) for j, _ in enumerate(some_dicts)))
4
>>> some_dicts
[{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]

You might need a moment to unpick what’s going on here. To help, let’s pull it into two halves—first, there’s a nested generator expression on the right-hand side of it:

((j, j) for j, _ in enumerate(some_dicts))

… which yields the values (0, 0), (1, 1), (2, 2) etc. up to the length of some_dicts. This repetition may seem pointless, but you’ll see why it’s there in a moment. Then the outer generator expression consumes this:

(i for i, some_dicts[i]["count"] in (...))

The estoeric aspect here is that we’re using some_dicts[i]["count"] as one of the targets for the expression. So we iterate through the values yielded by that nested generator expression mentioned above, and for each one we assign the 2-tuple to (i, some_dicts[i]["count"]). Fortunately for us, this assignment is performed in a left-to-right fashion, and the net result is the following sequence of assignments:

i, some_dicts[i]["count"] = (0, 0)
i, some_dicts[i]["count"] = (1, 1)
...

If you saw this written out, it might look a bit odd but it makes sense. But you may find the fact that you can do this in a comprehension rather surprising—I know I did, when I first came across it.

Right, so that’s the context—so what’s changed in Python 3.12? Well, consider this rather odd sequence in Python 3.11.

>>> index = 0
>>> values = ["A", "B", "C", "D", "E", "F"]
>>> [(index := i) for i, values[index] in enumerate(values)]
  File "<stdin>", line 1
SyntaxError: assignment expression cannot rebind comprehension iteration variable 'index'

Now technically the assignment to index should work, because it’s not actually assigned to by the comprension expression—it’s only read from. This is exactly what’s changed Python 3.12, where this is now valid:

>>> index = 0
>>> values = ["A", "B", "C", "D", "E", "F"]
>>> [(index := i) for i, values[index] in enumerate(values)]
[0, 1, 2, 3, 4, 5]
>>> values
['B', 'C', 'D', 'E', 'F', 'F']
>>> index
5

The key to understanding what’s going on here is that index is assigned to the number from the enumerate() call from the previous iteration. So it ends up doing a kind of weird barrel shift operation but offset in the list.

So that’s it, a lot fo context for a simple change. Look, I did warn you it was esoteric! Let’s move on to the standard library changes, they’re always useful.

Data Types — Calendar

An easy but convenient one to kick off: the calendar module now offers two new enumerations, Day and Month. These represent, respectively, days of the week and months of the year. The calendar module already had constants for the days of the week, and these are still available—similarly, the months of the year are also available as module-level constants, which are the same as the enumeration values.

>>> import calendar
>>> calendar.Month.MARCH
calendar.MARCH
>>> calendar.Month.MARCH.value
3
>>> calendar.Month.MARCH.name
'MARCH'
>>> calendar.month_name[calendar.Month.MARCH]
'March'
>>> calendar.Day.WEDNESDAY
calendar.WEDNESDAY
>>> calendar.Day.WEDNESDAY.value
2
>>> calendar.Day.WEDNESDAY.name
'WEDNESDAY'
>>> calendar.day_name[calendar.Day.WEDNESDAY]
'Wednesday'

Numeric and Mathematical Modules

fractions

Objects of type fractions.Fraction have had a __format__() method added which supports almost exactly the same semantics as float. It’s worth noting that the implementation doesn’t simply convert to float, though, it uses the full precision of the Fraction value.

>>> format(fractions.Fraction(355,113), "03.32f")
'3.14159292035398230088495575221239'
>>> format(fractions.Fraction(7, 2401), ".3e")
'2.915e-03'

math

There’s a new sumprod() function which calculates the dot product of two equal-length iterables. If the iterables are of different lengths, ValueError is raised. This is useful in more cases than you might imagine, such as totalling the price on an invoice when given a list of prices and a list of quantities, or by computing a weighted average from a list of values and a list of weights. The implementation also takes some pains to maintain as much accuracy as possible when dealing with floating point values.

>>> import math
>>> prices = [399, 499, 1795, 850]
>>> quantities = [5, 2, 1, 3]
>>> math.sumprod(prices, quantities)
7338
>>> math.sumprod(prices[:-1], quantities)
ValueError: Inputs are not the same length

The other change in math is an update to the nextafter() function, which I covered back in an article on 3.9. This function returns the next distinctly representable float value after a specified one in a specified direction. The change in 3.12 is that there’s a new steps argument that moves more than one step along the list of uniquely representable floating point values.

If you’re wondering why you’d need this, then you don’t6.

>>> math.nextafter(0.001, 0.0)
0.0009999999999999998
>>> math.nextafter(0.001, 0.0, steps=1000000)
0.0009999999997831596

random

In the random module this release we have a new function to return values modelled with a binomial distribution. If you’re a little rusty on your maths classes, this represents the number of successes out of a specified number of independent trials, where the probability of success in each trial is the same specified value.

Surprisingly, I believe this is the first discrete probability distribution7 in the random module. It’s a bit of a modest start, since the implementation is extremely simple:

def binomialvariate(n, p):
    return sum(self.random() < p for i in range(n))

statistics

We looked at the addition of statistics.correlation() back in 3.10, which calculates the Pearson correlation coefficient. This is a great when applicable, but it has some limitations—in particular it only reflects linear correlations, and it’s not suitable for ordinal data.

In 3.12, however, your ordinal correlating prayers are answered by the addition of a method keyword parameter which defaults to "linear" for a Pearson analysis, but can also be set to "ranked" to instead calculate the Spearman’s rank correlation coefficient.

This takes me back to the early 90’s where I learned about Spearman’s rank in Geography lessons at school. It’s a really useful generalised way to detect correlation between two data sets without knowing what type of correlation exists. What it tries to show is whether there is any monotonic function that relates the two data sets, and it does this by calculating the correlation of the ranks of the data sets instead of their values.

The example below shows that checking for a linear correlation between exponentially related values offers poor results, but Spearman’s rank correctly shows a perfect positive correlation.

>>> import statistics
>>> x = list(range(100))
>>> y = [2**i for i in x]
>>> statistics.correlation(x, y)
0.2954805084486747
>>> statistics.correlation(x, y, method="ranked")
1.0

Functional Programming Modules — itertools

The itertools module has grown another handy utility function called batched(), which simply accumulates values from an iterable will be collected into tuples of a specified length, which can then be processed collectively. All the tuples will be the requested length except, potentially, for the last one, if it’s mopping up a few remaining values.

This could be useful for, say, allocating chunks of small work items to worker threads for processing, where assigning individual items would incur too much overhead. However, the implementation is simple and potentially not suitable for realtime generation of items which might block for a long time—in such cases you might want some sort of timeout to generate a partial batch early, which isn’t offered here.

But the simplicity is fine in many cases, and it’s certainly another useful little tool in the already crowded toolbox of the itertools module.

>>> import itertools
>>> groups = itertools.batched(range(50), 8)
>>> print("\n".join(repr(i) for i in groups))
(0, 1, 2, 3, 4, 5, 6, 7)
(8, 9, 10, 11, 12, 13, 14, 15)
(16, 17, 18, 19, 20, 21, 22, 23)
(24, 25, 26, 27, 28, 29, 30, 31)
(32, 33, 34, 35, 36, 37, 38, 39)
(40, 41, 42, 43, 44, 45, 46, 47)
(48, 49)

File and Directory Access

os.path

A couple of new functions in os.path, probably mostly of interest to Windows users. Firstly, there’s a new isjunction() function which returns True if the specified path refers to a junction point. In case you’re unaware, this is a feature of the Windows NTFS filesystem and it’s somewhat similar to a symbolic link to a directory8. On platforms that don’t support junctions9, you don’t get any exception but this function will always return False.

The second new function is splitroot(), which works a bit like splitdrive() but splits the path into three components: the drive specification, the root directory and the remainder of the path.

On Posix systems, the drive portion will always be empty and the root will typically be "/", although double-slashes are preserved as these can trigger implementation-specific behaviour, as per the POSIX specification. For example, they might trigger a Samba lookup on Linux.

Linux
>>> import os.path
>>> os.path.splitroot("/usr/share/dict/words")
('', '/', 'usr/share/dict/words')
>>> os.path.splitroot("//usr/share/dict/words")
('', '//', 'usr/share/dict/words')

On Windows, the drive portion can be a local drive such as "C:", as you’d expect, but can also be the name of a network share.

Windows
>>> import os.path
>>> os.path.splitroot("C:/Users/andy/Downloads/python-3.12.0-amd64.exe")
('C:', '/', 'Users/andy/Downloads/python-3.12.0-amd64.exe')
>>> os.path.splitroot("//HomeNAS/Backup/Hosts/Penfold")
('//HomeNAS/Backup', '/', 'Hosts/Penfold')

pathlib

The pathlib module has seen a few improvements:

  • Ability to subclass all the path types
  • A new Path.walk() method as a more convenient os.walk()
  • Handling of non-child directories in PurePath.relative_to()
  • Case-insensitive globbing functions

One that I won’t cover in detail is a new Path.is_junction() method, which just returns the result of os.path.isjunction() described above. The remaining changes are discussed in the subsections below.

Subclassing PurePath and Friends

It’s now possible to subclass PurePath, Path, and the Posix and Windows variants. There’s also a new PurePath.with_segments() method which is now used whenever a modified version of the path is created, such as using parent() or relative_to(). If you create a derived class which maintains its own state, you can use this method to pass the state along to other these modified instances.

>>> import pathlib
>>>
>>> class MyPath(pathlib.Path):
...     def __init__(self, *path_segments, some_context):
...         super().__init__(*path_segments)
...         self.some_context = some_context
...     def with_segments(self, *path_segments):
...         return type(self)(*path_segments, some_context=self.some_context)
...
>>>
>>> x = MyPath("/Users/andy/project/src/sourcefile.py", some_context=123)
>>> y = x.parent / "anotherfile.py"
>>> y
MyPath('/Users/andy/project/src/anotherfile.py')
>>> y.some_context
123

New Path.walk() Method

For convenience, Path now provides a walk() method so you don’t have the slight annoyance of using os.walk() with the usual os.path.join(root, filename) that you need to form a full path from the results of os.walk(). The same parameters are accepted, except that these have slightly different names (e.g. follow_symlinks instead of the followlinks which os.walk() accepts).

The only slightly surprising thing to me was that the yielded type from Path.walk() is tuple[Path, list[str], list[str]] whereas I would have expected the second and third items to be list[Path]. This is more consistent with the original os.walk(), I suppose, and it’s arguably more efficient—it doesn’t carry the overhead of constructing a Path for items which the developer may not need to use. Beacuse the root directory is a pathlib.Path, however, at least you can easily use the / operator to append the entries to the root directory to form a full path.

>>> print("\n".join(repr(i) for i in pathlib.Path("/etc/ssh").walk()))
(PosixPath('/etc/ssh'), ['ssh_config.d', 'sshd_config.d'], ['sshd_config',
'ssh_config', 'moduli'])
(PosixPath('/etc/ssh/ssh_config.d'), [], [])
(PosixPath('/etc/ssh/sshd_config.d'), [], ['100-macos.conf'])

Sibling Directories Handled in PurePath.relative_to()

The relative_to() method on PurePath is a handy way of working out the relative path within a specified root directory.

>>> from pathlib import PurePath
>>> PurePath("/aaa/bbb/ccc/ddd/eee").relative_to("/aaa/bbb")
PurePosixPath('ccc/ddd/eee')

However, if you try to use it with a path which isn’t a direct child of the one specified, you’ll get an exception.

>>> PurePath("/aaa/bbb/ccc").relative_to("/aaa/ddd/eee")
ValueError: '/aaa/bbb/ccc' is not in the subpath of '/aaa/ddd/eee'

However, as of 3.12 not if you specify the new walk_up parameter as True, which will show you the relative path between even unrelated directories.

>>> PurePath("/aaa/bbb/ccc").relative_to("/aaa/ddd/eee", walk_up=True)
PurePosixPath('../../bbb/ccc')

Case-Insensitive Globbing

When using filename globbing, the rules about the case-sensitivity are platform-dependent—POSIX systems are typically case-sensitive when matching filenames, Windows systems are typically case-insensitive.

As of 3.12, however, the globbing methods of Path (namely glob(), rglob(), and match()) there’s a new case_sensitive parameter to allow you to specify this. It defaults to None to engage the default platform behaviour, but you can set it to True or False to override this.

>>> from pathlib import Path
>>> from pprint import pprint
>>>
>>> p = Path("pelican-plugins")
>>>
>>> pprint(list(p.rglob("readme*")))
[PosixPath('pelican-plugins/post_stats/readme.rst'),
 PosixPath('pelican-plugins/interlinks/readme.md'),
 PosixPath('pelican-plugins/glossary/readme.md'),
 PosixPath('pelican-plugins/css-html-js-minify/readme.rst')]
>>>
>>> pprint(list(p.rglob("readme*", case_sensitive=False)))
[PosixPath('pelican-plugins/Readme.rst'),
 PosixPath('pelican-plugins/representative_image/Readme.md'),
 PosixPath('pelican-plugins/series/Readme.md'),
 PosixPath('pelican-plugins/plantuml/Readme.rst'),
 PosixPath('pelican-plugins/gzip_cache/Readme.rst'),
 PosixPath('pelican-plugins/simple_footnotes/README.md'),
...

shutil

The shutil module has seen some useful changes in this release:

  • Avoid need for working directory changes in mark_archive()
  • Better error handling in rmtree()
  • Windows-specific enhancement to which()

There are also some changes to shutil.unpack_archive() but those are discussed in the section on tarfile below. The remaining shutil changes are covered in more detail in the following subsections.

Root Directory Passed to make_archive()

Prior to 3.12, the implementation of shutil.make_archive() used os.chdir() when using custom archivers, registered with register_archive_format(), changing the current working directory prior to calling them. This is generally a poor choice, as it makes functions like this inherently unsafe for concurrent execution—not just multithreaded situations, but any case where other code might be interspersed, such as async functions.

In 3.10.6 the need for this was removed for standard zip and tar archives, and as of 3.12 it’s now possible for custom archivers to support a root_dir argument which avoids this need. To declare this support, you attach the attribute supports_root_dir to the function itself with a value of True. If you do this, the change of working directory is skipped and instead your archiver receives an additional root_dir keyword parameter which it is expected to respect.

If you don’t add this attribute to custom archivers, the change of working directory will still happen, for backwards compatibility, and your archiving process will not be concurrency safe.

>>> import os, pprint, shutil
>>>
>>> def dummy_archiver(base_name, base_dir, **kwargs):
...     print(f"base_name={base_name} base_dir={base_dir} cwd={os.getcwd()}")
...     pprint.pprint(kwargs)
...
>>> shutil.register_archive_format("dummy", dummy_archiver)
>>> os.chdir("/")
>>>
>>> shutil.make_archive("sample", "dummy", root_dir="/Users/andy")
base_name=/sample base_dir=. cwd=/Users/andy
{'dry_run': 0, 'group': None, 'logger': None, 'owner': None}
>>>
>>> dummy_archiver.supports_root_dir = True
>>> shutil.make_archive("sample", "dummy", root_dir="/Users/andy")
base_name=sample base_dir=. cwd=/
{'dry_run': 0,
 'group': None,
 'logger': None,
 'owner': None,
 'root_dir': '/Users/andy'}

Error Handling in rmtree()

The handy shutil.rmtree() function deletes a directory and all its contents recursively, similar to the rm -r command on Unix and the rd /s command on Windows.

It has long accepted an onerror parameter which is a callback to invoke on errors—as parameters it receives a reference to the function that caused the issue, the path of the file that it was processing at the time, and the 3-tuple that sys.exc_info() returns, containing information about the exception that’s triggered the callback.

The 3-tuple isn’t usually particularly helpful, however, and as of 3.12 you can specify a callback using the onexc parameter instead, and this receives simply the exception itself as the third argument. The previous onerror should be considered deprecated and will probably be removed in some future release.

>>> import shutil
>>>
>>> def handler(function, path, exc):
...     print(f"function={function.__name__} path={path} exc={exc!s}")
...
>>> shutil.rmtree("/tmp/foo", onexc=handler)
function=open path=/tmp/foo/dir_two exc=[Errno 13] Permission denied: 'dir_two'
function=rmdir path=/tmp/foo exc=[Errno 66] Directory not empty: '/tmp/foo'

Enhancements to which() on Windows

There have been a few improvements to the which() function on the Windows platform—this searches the current PATH environment variable for a specified command, and returns the first match that would be executed by a shell if that command were run.

The improvements on Windows are all intended to more closely match the behaviour of the Windows where.exe utility, and the specific changes are:

  • The PATHEXT variable, which specifies all the file extensions which should be considered as executables, is now respected even if you pass the full path of a command to which(). For example, if you pass "C:/windows/system32/cmd" then it will now find C:/windows/system32/cmd.exe, whereas previously it would not.
  • To determine whether the current directory should be automatically prepended to the search path, which() now calls NeedCurrentDirectoryForExePathW() on Windows.
  • If a match is found earlier in the search path by adding a component from PATHEXT, this will now be returned in preference to an exact match, not using anything from PATHEXT, from later in the search path.
  • If an exact match is found against a command that has no extension, or an extension that’s not in PATHEXT then it will now be returned—previously this would require a ‘.’ entry to be added to PATHEXT.

tempfile

A couple of small changes in the tempfile module.

Firstly, there’s always been some fiddliness when using NamedTemporaryFile on Windows. This is because for the file to be usefully opened by the filename by another reader, the original must first be closed. Unfortunately this renders the delete option, which deletes the file on close(), essentially useless on Windows, or on cross-platform code which must support Windows.

There’s been a long running issue on this since 2012, but it finally has some resolution as there’s now a delete_on_close parameter which enables some behaviour that’s useful cross-platform. If you specify delete=True then delete_on_close=True (the default) maintains the old behaviour where the file is immediately deleted on close(). If you specify delete_on_close=False, however, the file will not be deleted on close(), but only on the exit of the associated context manager or, failing that, on __del__() of the object. Of course, the usual caveats and concerns with relying on __del__() apply here, so always use a context manager.

The other small change in this release is that tempfile.mkdtemp() now always returns an absolute path, even if the dir parameter provided was a relative path. This is handy, as keeping your paths separate from your working directory is generally a good idea.

Data Persistence — sqlite3

There are a few handy changes to sqlite3 in this release:

  • A new command-line interface
  • New autocommit parameter on connection
  • Query and set connection configuration options

Command-Line Interface

This is a simple one: running python -m sqlite3 drops you into an interactive SQLite shell. This is convenient when the native SQLite command-line utility may not always be installed.

You can specify an SQLite database filename as a paramter, or you can omit this and a transient in-memory database will be used which persists only for that particular session—this is equivalent to specifying :memory: as the filename in code. There’s also a handy -v option to show the version of the underlying SQLite library in use.

$ python -m sqlite3 -v
SQLite version 3.43.2
$ python -m sqlite3
sqlite3 shell, running on SQLite version 3.43.2
Connected to a transient in-memory database

Each command will be run using execute() on the cursor.
Type ".help" for more information; type ".quit" or CTRL-D to quit.
sqlite> CREATE TABLE foo (id INT, name VARCHAR);
sqlite> INSERT INTO foo VALUES (1, "Andy");
sqlite> INSERT INTO foo VALUES (2, "Mike");
sqlite> INSERT INTO foo VALUES (3, "Karen");
sqlite> SELECT * FROM foo;
(1, 'Andy')
(2, 'Mike')
(3, 'Karen')

Autocommit Control

The behaviour of the sqlite3 module has always been somewhat inconsistent with both the requirements of PEP 249 and the standard SQLite auto-commit behaviour.

Prior to 3.12, on connection you could specify an isolation_level parameter which could be set to:

  • None to never implicitly open a transaction—SQLite is in auto-commit mode, but applications can also manually open their own transactions.
  • "DEFERRED" (the default) to start an implicit deferred transaction on any execute().
  • "IMMEDIATE" to start an implcit immediate transaction on any execute().
  • "EXCLUSIVE" to start an implcit excluseive transaction on any execute().

You can see the differences between these transaction types in the SQLite documentation.

The issue here is that this is inconsistent, because whilst it starts an implicit transaction before data manipulation (DML) statements like INSERT, UPDATE and DELETE, it does not start a transaction before data query (DQL) statements like SELECT, nor does it start one before data definition (DDL) statements like CREATE and DROP. This is not particularly intuitive.

In 3.12, therefore, there’s a new autocommit parameter to the connect() method, which can take one of three values:

  • False to enable PEP 249-compliant transaction control—this is the suggested value going forward.
  • True to enable full SQLite autocommit mode with implicit transactions. In this mode the SQLite handles implicit transactions, rather than sqlite3, and the commit() and rollback() methods do nothing.
  • sqlite3.LEGACY_TRANSACTION_CONTROL to maintain the pre-3.12 behaviour. This is the default, at least for now, and isolation_level is only respected if this value is provided.

Note that with autocommit=False, sqlite3 still ensures that a transaction is always open, so immediately after connection, and after every subsequent commit() or rollback(), a new transaction will be opened. Deferred mode is always used for these transactions. However, explicit calls are required to commit() and rollback() to end these transactions in non-auto-commit mode. If the database connection is closed with pending changes, an implicit rollback() is performed.

Well, that felt more complicated to explain than it should have been. My advice: adopt autocommit=False in your code, and be rigorous about always using commit() and rollback(). Based on my experience, this is the best way to write database code—not being explicit about transactions has always been a recipe for confusion and bugs for me.

Connection Configuration Options

The SQLite engine provides a slew of low-level options that you can see on a per-connection basis. Some examples include:

SQLITE_DBCONFIG_DEFENSIVE
The “defensive” flag disables some SQL which can corrupt databases, such as modifying some pragmas related to making the schema writable and changing the journalling mode, and direct writes to implementation details of the engine.
SQLITE_DBCONFIG_WRITEABLE_SCHEMA
This controls whether updates to the sqlite_schema system table are permitted. Normally you should have this turned off.
SQLITE_DBCONFIG_ENABLE_FKEY
Enables or disables enforcement of foreign key constraints.

If you’re interested you can see the full list supported by the engine.

The change in Python 3.12 is that there are now setconfig() and getconfig() methods on an sqlite3.Connection object to set and get the subset of these flags which are boolean. For an example of these, the code below shows executing a statement that violates a foreign key constraint, first with SQLITE_DBCONFIG_ENABKLE_FKEY set to True, then False.

>>> import sqlite3
>>> conn = sqlite3.Connection(":memory:")
>>> cur = conn.cursor()
>>> cur.execute("""
... CREATE TABLE year_group (
...     id integer PRIMARY KEY,
...     name text NOT NULL
... )""")
<sqlite3.Cursor object at 0x1002e7f40>
>>> cur.execute("""
... CREATE TABLE students (
...     id integer PRIMARY KEY,
...     username text NOT NULL,
...     year_group_id integer NOT NULL,
...     FOREIGN KEY (year_group_id)
...         REFERENCES year_group (id)
... )""")
<sqlite3.Cursor object at 0x1002e7f40>
>>>
>>> conn.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FKEY, True)
>>> cur.execute("INSERT INTO students VALUES (1, 'andy', 123)")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.IntegrityError: FOREIGN KEY constraint failed
>>>
>>> conn.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FKEY, False)
>>> cur.execute("INSERT INTO students VALUES (1, 'andy', 123)")
<sqlite3.Cursor object at 0x1002e7f40>

Data Compression and Archiving — tarfile

There are some handy changes to tarfile as outlined by PEP 706—as well as tarfile they also impact shutil.unpack_archive(), but since the changes are the same I’m just discussing them here.

These changes are motivated by the fact that the tar utility is extremely powerful and flexible, as it’s intended for performing system-level backups of Unix machines. This means that it’s able to faithfully reproduce permissions, symbolic links and other special files such as devices. This is sometimes useful, but can also be dangerous as it’s quite easy for a maliciously modified archive to sneak in things such as a symbolic link to /etc/passwd, or use relative paths to escape from the directory into which files are being extracted.

THe main change here is to add extraction filters, which are functions that are invoked on each file just prior to extracting it. The function is passed a TarInfo object which describes the file, and the filter can decide to:

  • Return the TarInfo it is passed to extract the file as specified.
  • Modify the TarInfo, or return a new one, to be used instead of the one in the archive. This PEP adds a replace() method to TarInfo to allow attributes to be modified.
  • Return None to block this file being extracted entirely.
  • Raise an exception, generally the new FilterError or ExtractError, to abort the operation or skip the file depending on the value of a Tarfile.errorlevel setting.

This callable can be passed using the filter parameter to TarFile.extract() or TarFile.extractall().

In addition to passing your own filter function, you can pass a string specifying one of a set of builtin filter functions:

fully_trusted
Extracts the files as specified, with all features enabled.
tar
Honours most features, but limits some of the most dangerous ones. This filtering is performed with the tarfile.tar_filter() function described in a moment.
data
Ignore or block most features specific to Unix filesystems—this is intended for cross-platform data archiving usage, and is performed with the tarfile.data_filter() function.
None
If you pass None, or don’t specify the filter parameter at all, the default filter is used. This can be set with TarFile.extraction_filter.

It’s worth being aware of the default behaviour in 3.12, which is that TarFile.extraction_filter defaults to None. If you don’t specify a filter during extraction, and you don’t override this default for extraction_filter, then you’ll get a DeprecationWarning telling you that the default filter will change in Python 3.14, and then the extraction will proceed as if you’d specified fully_trusted as the filter. As of Python 3.14, the plan is instead that data will be the default filter—this is safer, but may break existing code which is relying on the more dangerous tar features. To fix such broken code, however, you simply need to set a filter.

Finally, to add some colour around the builtin filters, the tar_filter() function, unsurprisingly used to implement the tar filter, performs the following:

  • Strip leading slashes from filenames.
  • Refuse to extract files with absolute filenames (raises AbsolutePathError).
  • Refuse to extract files whose fully resolved absolute path (including symlinks) would be outside the destination root directory (raises OutsideDestinationError).
  • Clear high mode bits (setuid, setgid and sticky) on all extracted files.

The data_filter() function does everything that tar_filter() does, but additionally:

  • Refuse to extract hard or symbolic links that point to absolute paths, or those whose fully resolve absolute destination path would be outside the destination root directory (raises AbsoluteLinkError or LinkOutsideDestinationError).
  • Refuse to extract device files and named pipes (raises SpecialFileError).
  • For regular files:
    • Ensure owner read and write permissions are set.
    • Remove group and other execute permission bits unless the owner also has execute permission set.
  • For directories, don’t set the mode at all, so the default umask is used.
  • Don’t set user or group, so the defaults for the executing process are used.

Note that all the raised exceptions mentioned above are subclasses of the new FilterError.

Generic Operating System Services — os

There are some smallish changes in the os module which I’ll run through very briefly.

Non-blocking os.pidfd_open()
On Linux, the pidfd_open() function opens a file descriptor on another process, which can be used for monitoring and managing it. As of 3.12, there’s a new os.PIDFD_NONBLOCK flag which opens this descriptor in non-blocking mode.
os.DirEntry.is_junction()
This method is available as a convenience for calling os.isjunction(), described earlier.
Listing Drives on Windows
Python on Windows (only) now provides os.listdrives(), os.listvolumes() and os.listmounts() for enumerating drives, volumes and mounts respectively.
File Creation Time Changes on Windows
On Windows (only) there are some changes to the stat_result returned by os.stat() and similar functions. The st_ctime field is deprecated on Windows, and st_birthtime should be used instead—this frees up st_ctime to show the most recent metadata change instead, as it does on other platforms.

Networking and Interprocess Communication — asyncio

There are quite a cluster of changes to asyncio, namely:

  • Performance of writing to sockets has improved
  • Support for eager task construction, which can improve performance
  • Platform-specific improvements to performance of monitoring child processes
  • Custom event loop factories
  • Removal of legacy support for generator coroutines
  • Can now pass task-yielding generators to asyncio.wait() and asyncio.as_completed()
  • Implmentation of asyncio.current_task() is now in C for performance

I’ll run through each of these in the subsections below, aside from the last couple which are fairly self-explanatory.

Performance of Socket Writing

Prior to 3.12, the transport code in asyncio for sockets would create a bytearray buffer into which it would copy any data which was queued for transmission. This data copying can add significant overhead when sending larger amounts of data, so in 3.12 a zero-copy approach has been implemented.

Under the hood this creates a collections.deque of individual buffers, using memoryview and the buffer protocol to save unsent portions of buffers without the need for copying. Python’s use of immutable bytes objects and reference counting make this all quite safe and performant.

In addition to zero copy, the code also now uses the sendmsg() call where available, which gathers data from disparate buffers in memory. This also eliminates the need to either copy these buffers into a single buffer, or to incur the overhead of multiple system calls, one per buffer.

Eager Task Construction

Based on a request from develolpers at Instagram, there is now support for “eager” tasks where execution of the new task starts synchronously during Task construction, rather than being scheduled on the event loop and run on the next context switch. This can be a particular performance boost to tasks which might be able to use caching to avoid performing any actual blocking I/O, as they may be able to avoid the overhead of being scheduled on the event loop at all.

To use this, you can register asyncio.eager_task_factory() against your event loop using the set_task_factory() method. There’s also create_eager_task_factory() to create a version of such a factory which uses your own custom task constructor instead of the default Task().

The reason that this is opt-in, rather than just being a behind-the-scenes change, is that this does actually change the semantics. For example, task execution order will probably change, and if your task raises an exception or returns without blocking then it won’t ever actually be scheduled to the event loop, which could break some existing code.

Child Process Monitoring

The asyncio has provided multiple ways of watching child processes, added in various releases. However, the complexity of this shouldn’t really be exposed to the developer—instead each platform should select the most performant version available.

As a result, all but two of the watchers are deprecated in 3.12. The two remaining are:

  • ThreadedChildWatcher starts a new thread per child process which uses os.waitpid()—since the thread remains blocked it doesn’t add CPU load, but it does add memory overhead.
  • PidfdChildWatcher uses pidfds where available (i.e. on Linux kernels 5.3+) to monitor processes without signals or threads. This is very lightweight and efficient.

Essentially, PidfdChildWatcher will be used where available, otherwise ThreadedChildWatcher will be used instead. This shouldn’t really have a great deal of impact unless you’re already usign the set_child_watcher() function to manually choose this, in which case you should just strip that out and let the Python libraries do what they do best.

Custom Event Loop Factories

There has been a long-term goal for asyncio to deprecate the policy system and the child watchers system—if you want some context you can read the discussion on this issue.

The child watchers have already been deprecated as mentioned above, and in 3.12 there’s another small step towards policy-free usage which is to support a loop_factory parameter on asyncio.run(), which will be used in favour of asyncio.new_event_loop(). This allows developers to supply their own factory to configure the event loop instead of using policies, which allows people to start migrating their code before, presumably in some future release, policies become more formally deprecated.

Deprecating Legacy Generator Coroutines

I’m old enough to remember the days before the async keyword, waaay back in the mists of time when Python 2.5 was released which extended generators to add the send(), throw() and close() methods to them, turning them into basic coroutines.

Fast forward all the way to Python 3.4, when asyncio was added, which provided much better faciltieis for managing coroutines. This allowed you to “bless” a generator into a “proper” coroutine using the @asyncio.coroutine decorator—this caused asyncio.iscoroutinefunction() on them to return True, and allowed create_task() to wrap them in a Task structure.

Then in Python 3.5, the async and await keywords were added, among other changes, and these were immediately the best and recommended way to define coroutines.

Now we’re getting on for a decade later than Python’s 3.5 release, and generator-based coroutines are no longer well supported and probably won’t work correctly. This is symbolically being made formal in release 3.12 with an update to asyncio.iscoroutine() so this will no longer return True for generator-based coroutines.

On the face of it this is a very small change, but symbolically it’s nice to cut off this old legacy approach for coroutines entirely. That said, I still know of teams with code still running on Python 2.7, so I don’t think we can rule out that this will break something somewhere out there!

Development Tools — unittest

There’s now a handy --durations option when running unittest on the command-line, which shows the N slowest running test cases. This is helpful for figuring out why your unit tests take ten minutes to run, so you can focus your efforts on addressing the slowest tests.

The output below is from the Pelican unit tests, which are actually pretty fast as you can see.

$ python -m unittest --durations=5
----8<----< Output skipped for brevity >----8<----
Slowest test durations
----------------------------------------------------------------------
0.219s     test_period_archives_context (pelican.tests.test_generators.TestArticlesGenerator.test_period_archives_context)
0.172s     test_custom_locale_generation_works (pelican.tests.test_pelican.TestPelican.test_custom_locale_generation_works)
0.169s     test_period_in_timeperiod_archive (pelican.tests.test_generators.TestArticlesGenerator.test_period_in_timeperiod_archive)
0.168s     test_basic_generation_works (pelican.tests.test_pelican.TestPelican.test_basic_generation_works)
0.167s     test_custom_generation_works (pelican.tests.test_pelican.TestPelican.test_custom_generation_works)

Python Runtime Services

inspect

A couple of small changes to inspect in this release.

Added markcoroutinefunction()
For reasons which are fairly muddled and seem mostly related to the Django and other web frameworks work, there’s a new markcoroutinefunction() method which tags a given function such that inspect.iscoroutinefunction() will return True for it even when it wouldn’t otherwise. You can wade through the related issue if you want to understand why this matters so much to some people. I wouldn’t bother unless you’re really interested or really bored.
Inspecting Async Generators
There’s a new function getasyncgenstat(), which is like getgeneratorstate() but for asynchronous generators. There’s also getasyncgenlocals(), which does the same as getgeneratorlocals() but for asynchonrous generators.

sys

Aside from the addition of sys.monitoring, which we already covered in the previous article, there are a couple of other changes in the sys module:

Added sys.last_exc For Unhandled Exception
This contains the same details in a 3-tuple that sys.last_type, sys.last_value and sys.last_traceback currently hold individually. These three older values will be deprecated at some point.
Recusion Limits
The limit set by sys.setrecursionlimit() now only applies to Python code. Builtin functions now have their own separate limit, which seems to be 1500 levels of recursion that’s hard-coded in the interpreter. I believe this decoupling was due to the fact taht Python functions calls don’t consume the C stack any more, so it didn’t make sense to keep these values coupled together. In practice, if you hit either limit then you’re doing something quite wrong somewhere, in my view.

Conclusions

So that’s it for Python 3.12. It’s been a another sizeable release with some welcome enhancements to type hints, some handy new options for interpreter monitoring and debugging, useful simplification of the rules around f-strings, and a host of useful library changes on top of that.

Some of the longer-term changes in progress also seem to be making promising progress, such as the deprecation of policies within asyncio—I’ve always found these to be overly complicated. The continue background improvement into useful libraries like pathlib and the mathmatical libraries is also nice to see, as cumulatively these enhancements go a long way to making Python the convenient and concise language that it is today.

So that’s it from me on Python features for another 6 months until Python 3.13 goes final. I’ll be interested to see what else they can find to squeeze in!


  1. Yes, I’m sure I’ll regret not putting these in their own article, as they’re bound to take longer to explain than I’m anticipating as I glibly write this introduction. But there aren’t so many standard library changes in this release, so I’m hoping it all balances out. I mean, I know full well they won’t balance out and this article will end up being an unreadably huge glob of text, but I’m an eternal optimist (this is a polite way of saying that I have a pathological inability to learn from past experience). 

  2. Actually, the change I’m covering in Python 3.12 has already been back-ported to 3.11.4 and later, so if you do this test yourself you’ll need to be using 3.11.3 or earlier. 

  3. If you want more details, follow the link at the start of the paragraph for the Wikipedia page that describes things much more comprehensively. 

  4. Modern CPUs have hardware performance counters which can record events without actively running code being required, although my familiarity with CPU architectures pre-dates these sorts of innovations. Suffice to say that the sampling can these days be done quite efficiently. 

  5. Really, it’s not too late to skip to the next section… 

  6. To be honest, even if you’re not wondering then you probably still don’t. But it gives one a nice warm, fuzzy feeling to know that these niche cases are there—it gives you confidence that niche cases you do need some time might also be there when you need to go looking. 

  7. Also known as a probability mass function

  8. I’ve left this deliberately vague because Linux, for example, does have some NTFS support, but from my very brief reading of the Python source code on Linux the path module will always be posixpath.py, and in this module isjunction() is hard-coded to return False. This means that even if Linux does offer some way to identify junctions through the filesystem APIs, Python will not use it on Linux systems. Usual caveats: I may be missing some subtlety here, and this behaviour may change in future Python versions. 

The next article in the “Python 3 Releases” series is What’s New in Python 3.13 - Intepreter Updates
Tue 29 Oct, 2024
17 Mar 2024 at 2:58PM in Software
 | 
Photo by Dids on Pexels
 | 
×

Yes, I’m sure I’ll regret not putting these in their own article, as they’re bound to take longer to explain than I’m anticipating as I glibly write this introduction. But there aren’t so many standard library changes in this release, so I’m hoping it all balances out. I mean, I know full well they won’t balance out and this article will end up being an unreadably huge glob of text, but I’m an eternal optimist (this is a polite way of saying that I have a pathological inability to learn from past experience). 

×

Actually, the change I’m covering in Python 3.12 has already been back-ported to 3.11.4 and later, so if you do this test yourself you’ll need to be using 3.11.3 or earlier. 

×

If you want more details, follow the link at the start of the paragraph for the Wikipedia page that describes things much more comprehensively. 

×

Modern CPUs have hardware performance counters which can record events without actively running code being required, although my familiarity with CPU architectures pre-dates these sorts of innovations. Suffice to say that the sampling can these days be done quite efficiently. 

×

Really, it’s not too late to skip to the next section… 

×

To be honest, even if you’re not wondering then you probably still don’t. But it gives one a nice warm, fuzzy feeling to know that these niche cases are there—it gives you confidence that niche cases you do need some time might also be there when you need to go looking. 

×

Also known as a probability mass function

×

I’ve left this deliberately vague because Linux, for example, does have some NTFS support, but from my very brief reading of the Python source code on Linux the path module will always be posixpath.py, and in this module isjunction() is hard-coded to return False. This means that even if Linux does offer some way to identify junctions through the filesystem APIs, Python will not use it on Linux systems. Usual caveats: I may be missing some subtlety here, and this behaviour may change in future Python versions.