☑ What’s New in Python 3.13 - Library Changes Part 1

23 Feb 2025 at 1:58PM in Software
 | 
Photo by Andy Pearce (via Microsoft Designer)
 | 

In this series looking at features introduced by every version of Python 3, we take a look at some of the new features added in Python 3.13. In this article we look at the first half of the changes in the standard library, comprising a new exception, as well as changes to regular expression support, data types, mathematical modules, data persistence, the configparser module, file & directory access, and operating system services.

This is the 35th of the 35 articles that currently make up the “Python 3 Releases” series.

python 313

Continuing my look at what’s new in Python 3.13, I’m going to cover a small language feature which I didn’t get to yet, and then the first half of the standard library changes. In the following article in this series, I’ll complete my look at Python 3.13 by going through the remaining library changes.

In both articles I’ve stuck to my approach of grouping them as the Python library documentation does, and in this article we’ll be touching on the following sections:

  • Text Processing Services
  • Data Types
  • Numeric and Mathematical
  • File and Directory Access
  • Data Persistence
  • File Formats
  • Generic Operating System Services

We’ll kick off with a minor language change, however, which isn’t in the standard library but didn’t deserve its own article.

Mutated locals()

In the early days of Python, looking up a variable name always meant lookup up a value in a special dict, which was often available as the __dict__ dunder attribute. An odd side-effect of this, which occasionally people relied upon, was that you could modify this.

>>> print(xyz)
Traceback (most recent call last):
  File "<python-input-5>", line 1, in <module>
    print(xyz)
          ^^^
NameError: name 'xyz' is not defined
>>>
>>> locals()["xyz"] = 123
>>>
>>> print(xyz)
123

Later on, however, this was changed in optimised scopes (i.e. functions, generators, coroutines, comprehensions and generator expressions) to improve performance—these were no longer simple dictionaries under the hood. To keep locals() working, it returned a dict that was created on-the-fly from the underlying variable storage. If the function was called again, the same instance was updated and returned. This varied across Python implementations, however, and so the feature became quite unreliable.

This lead to some pretty weird behaviour—unless you’re familiar with the details, the output of the function below, which was run in CPython 3.12, might seem surprising.

Python 3.12
>>> def func():
...     xyz = 111
...     my_locals = locals()
...     print("[1]", my_locals)
...     xyz = 222
...     print("[2]", my_locals)
...     locals()
...     print("[3]", my_locals)
...
>>> func()
[1] {'xyz': 111}
[2] {'xyz': 111}
[3] {'xyz': 222, 'my_locals': {...}}

So, in Python 3.13 this behaviour has been clarified and standardised across all implementations, with the details as specified in PEP 667.

The new behaviour doesn’t alter the fact that mutating locals() in some contexts will persist, and other in other contexts it will not. However, the behaviour is now at least consistent in optimised scopes (i.e. functions, generators, coroutines, comprehensions and generator expressions) such that each call to locals() returns an independent snapshot of the underlying local variables.

So, the above snippet is at least somewhat more sensible when run in Python 3.13:

Python 3.13
>>> def func():
...     xyz = 111
...     my_locals = locals()
...     print("[1]", my_locals)
...     xyz = 222
...     print("[2]", my_locals)
...     locals()
...     print("[3]", my_locals)
...
>>> func()
[1] {'xyz': 111}
[2] {'xyz': 111}
[3] {'xyz': 111}

It’s worth noting that if you do want to modify the locals of a function, you can use sys._getframe() to obtain the current stack frame object and access the f_locals attribute of it. As of Python 3.13, this will be a dict-like write-through proxy and writes to it will directly update the underlying variable itself, so everything remains consistent.

Python 3.13
>>> import sys
>>> def func():
...     xyz = 111
...     print(xyz)
...     sys._getframe().f_locals["xyz"] = 222
...     print(xyz)
...     print(locals())
...
>>> func()
111
222
{'xyz': 222}

Although I feel it would be a stretch to call this behaviour wholly intuitive, it’s at least rather more predictable and comprehensible than it was previously.

New Exception: PythonFinalizationError

The first change to the library is a new exception type PythonFinalizationError, which is raised when certain forbidden operations are carried out during finalisation (i.e. interpreter shutdown). Currently these operations are starting a new thread or forking a new child process.

The code below illustrates one example of this, where the call to os.fork() raises the new exception during the __del__() method of an object on the stack which is freed during shutdown.

>>> import os
>>> import sys
>>> class Foo:
...     def __del__(self):
...         try:
...             os.fork()
...         except Exception as exc:
...             print("Caught:", repr(exc))
...
>>> instance = Foo()
>>> sys.exit()
Caught: PythonFinalizationError("can't fork at interpreter shutdown")

Text Processing Services

A very simple change in the re module to kick off with—the re.error exception has now been renamed to PatternError for better clarity. The issue explains that the lowercase and generic seeming name was not consistent with much of the rest of the library, which seems reasonable. It’s also an example of how even the simplest change can trigger a lot of discussion, in this case the new name to use.

The old name will continue to work, and both names are actually just references to the underlying _compile.PatternError concrete exception. But I would suggest it makes sense to start renaming occurrences of this in your code as as you’re making changes, as I do think the new name is a little more readable.

Data Types

Next up we have some minor enhancements to the array and types modules, but we’ll start by looking at a more significant change in the copy module to add a generalised replace() method.

copy

Various object types in Python support a replace() method, which generates a new object as a copy of the original but with specified fields modified with new values. Examples include the time, date and datetime objects from the datetime module, and the dataclasses.replace() function for classes defined with the @dataclass decorator.

As an aside, it’s important to note that this is quite different from methods such as str.replace() and bytes.replace(), which don’t alter attributes but the string itself. This overload of naming is unfortunate, but at this point it feels like we’re rather stuck with it—also, all the alternatives I could think of (e.g. modified_copy()) were more cumbersome.

As of Python 3.13, a new function copy.replace() has been added which is a generalised form of this operation, although it does need cooperation from the objects in question so only a limited number of object types are supported right now. The new attribute values are specified using keyword parameters, as in the example below for namedtuple.

>>> from collections import namedtuple
>>> import copy
>>>
>>> MyObject = namedtuple("MyObject", ["one", "two", "three"])
>>> instance = MyObject(111, 222, 333)
>>> instance
MyObject(one=111, two=222, three=333)
>>> new_instance = copy.replace(instance, two="XXX", three="YYY")
>>> new_instance
MyObject(one=111, two='XXX', three='YYY')
>>> instance
MyObject(one=111, two=222, three=333)

Support for this operation is added to an object by implementing the __replace__() method, which will be passed the same keyword parameters as are passed to copy.replace() and is expected to return a new instance, or raise an exception. This complements the existing __copy__() and __deepcopy__() methods called by copy.copy() and copy.deepcopy() respectively.

The object types that currently support this new method in the initial release are somewhat limited:

  • collections.namedtuple
  • dataclasses.dataclass
  • datetime.datetime, datetime.date, and datetime.time
  • inspect.Signature and inspect.Parameter
  • types.SimpleNamespace
  • code objects

However, I expect that this set will grow over time, if there are other cases where this would be useful. At least now there’s some sort of consistency imposed across multiple object types.

I was a little surprised to see the last of those included, as applying this functionality to change things like co_varnames and co_nlocals of code objects seems a little obscure—perhaps there are unexpectedly common cases where this is useful of which I’m ignorant. Looking at the pull request, however, it looks like it was just an easy case where there was already a replace() method that could be used, rather than necessarily being a particularly common use-case.

array

A couple of small changes here. Firstly, the 'u' type code for Unicode characters, corresponding to C type wchar_t, has been deprecated since Python 3.3—its size is 16-bit or 32-bit depending on the platform, which makes it a source of unpredictable bugs. In Python 3.13, this has finally been replaced by the 'w' code, which is always 32-bit and corresponds to the C type Py_UCS4. The 'u' code is still supported, but now raises a DeprecationWarning and is likely to be removed in Python 3.16, or perhaps later if there is sufficient complaint. My advice is change over as soon as you’re conveniently able to do so.

The other change in Python 3.13 is that the array.array type now has a clear() method, and so now properly meets the contract for the MutableArray abstract base class in collections.abc.

types

The humble SimpleNamespace, added back in Python 3.3, has had a simple addition. It’s always been possible to initialise it with keyword parameters, but now it’s also possible to provide a single parameter for this purpose which is either a mapping, or an iterable of (name, value) 2-tuples.

Numeric and Mathematical

Moving on to the numerical modules, there are a few assorted minor goodies:

  • Better string formatting of Fraction.
  • math.fma() for fused multiply-add.
  • A CLI for random.
  • Kernel density estimation functions in statistics.

fractions

The Fraction type provided by this module now supplements the __format__() method added in Python 3.12 with the remaining formatting operations that hadn’t been added, such as padding and alignment.

For example, the format below would have raised a ValueError in Python 3.12.

>>> import fractions
>>> x = fractions.Fraction(5, 12)
>>> f"{x:-^10}"
'---5/12---'

math

The math module has sadly not yet grown a trailing letter ‘s’ that would make it correct British English. It has, however, grown an fma() function, which stands for fused multiply-add. Calling fma(x, y, z) is arithmetically equivalent to x * y + z, except that the calculation is done to very high precision and then only a single rounding is done back to the range of float.

This is a special case of the multiply-accumulate operations which are common in digital signal processing, and is one of those specialised operations that exists to reduce accumulated inaccuracies. This operation is required for comliance with the IEEE 754 floating point standard.

random

The random module has acquired a command-line interface. It has three main operations:

  • Selecting a random choice from a list of strings.
  • Generating a random integer between 1 and a specified limit, inclusive.
  • Generating a uniformly distributed random float between 0 and a specified limit, inclusive.

You can force which is performed by providing --choice / -c, --integer / -i or --float / -f, respectively. Alternatively, it decides which you want based on whether you provided one or more strings, an integer, or a decimal value.

The exmaples below illustrate the behaviour when none of the three options are provided.

❯ python3.13 -m random one two three four five
three
❯ python3.13 -m random one two three four five
two
❯ python3.13 -m random one two three four five
three
❯ python3.13 -m random 100
60
❯ python3.13 -m random 100
44
❯ python3.13 -m random 10.0
3.2605013874775954

statistics

There are a pair of new functions for kernel density estimation1 in the statistics module. This is a way of using kernel smoothing, a kind weighted moving average, to estimate a continuous probability density function2 from a number of discrete samples. This is similar on concept to a histogram, but is flexible enough to produce smooth distributions.

This is something that could probably fill a whole article in itself, even assuming I was sufficiently knowledgeable in the topic3, but I’ll try to just give you enough of an idea of what it’s about to see if it’s likely to be useful to you. I’m going to assume you have at least a high level understanding of what a PDF is, or I suspect you might find the rest of this section tricky to follow.

To understand the KDE technique, let’s first understand kernel smoothing. This is a technique to estimate the value of a continuous function at a given point based on the weighted average of sample values which are centred around the point being estimated. The kernel is a weighting function which assigns a weight to each data sample based on the offset from the point being estimated, and the weighted average of those points is the estimated value. The Gaussian function is often used as a kernel, which assigns weights based on a normal distribution around the point being estimated, but there are a number of other functions which are also sometimes used with different characteristics.

Next let’s look at how to apply a kernel function $K()$ to generate the value of a PDF at point $x$. We take a sampling window known as the bandwidth, which is represented as $h$ in the equation below, and this is used to define the degree of smoothing. Choosing larger values will yield a smoother and more averaged result, whereas smaller values will emphasise localised variations.

To factor in the bandwidth, we use a what’s called the scaled kernel $K_{h}(x) = \frac{1}{h}K(\frac{x}{h})$. We’ll see how this implements the effect of the bandwidth in a moment—let’s look at the formula:

$$f_{h}(x) = \frac{1}{n}\displaystyle\sum_{i=1}^n K_{h}(x - x_{i})= \frac{1}{nh}\displaystyle\sum_{i=1}^n K(\frac{x - x_{i}}{h})$$

The first version is in terms of the scaled kernel $K_{h}()$, the second is rewritten in terms of the plain kernel function $K()$, and just has the $\frac{1}{h}$ terms factored outside of the sum.

By way of explanation, we’re taking each sample in turn and deriving its offset from $x$, which is where the $x - x_{i}$ comes from. We then divide each offset by the bandwidth $h$, which means that as $h$ gets larger then the value becomes closer to zero, which is the centre of the kernel function, and hence assigns higher weight to the sample. To normalise this we then divide back down by $h$ again. So you can see how as $h$ gets larger, the samples further away from $x$ have a greater impact than for smaller $h$, but they are scaled down by a larger amount as well. Thus, higher $h$ has the effect of averaging over a wider range of samples. Once we have these kernel function values for each sample, we sum them and divide by $n$ to find the mean.

Note that some kernel functions apply a weight to every sample, even for narrow bandwidths, but that weight may be vanishingly small for samples far away. Other functions have a strict window and assign zero weight to samples outside this window.

New Functions

Hopefully that made some kind of sense, even though I’m well aware it wasn’t what you’d call rigorous. Now let’s look at the new functions themselves that have been added to the statistics module.

The first is a function kde() which takes an iterable of sample values, along with a value for the bandwidth and also, optionally, a kernel function. If no kernel is specified, a default of "normal" is used, which uses the Gaussian function as a kernel. I won’t talk specifically about the other possible values, but the documentation lists all the options.

There’s also a cumulative parameter which, if set to True, returns a cumulative distribution function4. Whilst the integral of the PDF between two values yields the probability of a single sampled value lying between those two limits, the CDF represents the probability that the value will be at or lower than the point of the curve.

The result of calling kde() is the estimated probability density function itself—when passed a single parameter indicating the value, it returns the value of the PDF at that point.

The other new function is kde_random(), which is similar but returns a function which generates random values according to the estimated distribution which matches the specified data. This function also takes a seed parameter, which supposedly allows for repeatable sequences, but it’s worth noting this might change between Python updates if a more accurate process for determing the CDF is implemented.

Illustration

To illustrate this, first I generated 20 random samples distributed with the beta distribution. From those samples, I then generated three variants of the PDF by calling kde() with different parameters for kernel and bandwidth. I’ve shown my code below, which also then uses the third party matplotlib and seaborn libraries to plot these functions on a graph for comparison5.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import matplotlib.pyplot as pyplot
import seaborn
import statistics

# Generated from: [round(random.betavariate(2, 4), 2) for i in range(20)]
# Included as literals so the results can be reproduced.
samples = [0.46, 0.43, 0.37, 0.16, 0.44, 0.34, 0.6, 0.36, 0.11, 0.24,
           0.37, 0.3, 0.16, 0.55, 0.46, 0.48, 0.44, 0.38, 0.36, 0.05]

# Three KDE functions using the same samples, but different parameters.
func_blue = statistics.kde(samples, h=0.2, kernel="normal")
func_green = statistics.kde(samples, h=0.05, kernel="normal")
func_red = statistics.kde(samples, h=0.2, kernel="uniform")

# Set up some colours that work in both dark and light mode.
_, ax = pyplot.subplots()
[i.set_color("#008F8F") for i in ax.spines.values()]
ax.tick_params(axis='x', colors='#008F8F')
ax.tick_params(axis='y', colors='#008F8F')

# Generate 100 points for sampling the PDF from 0 to 1.
x_points = [i/100 for i in range(100)]

# Generate one plot for each of the functions created above.
seaborn.lineplot(
    x=x_points,
    y=[func_blue(x) for x in x_points],
    color="#3282C3",
    ax=ax)
seaborn.lineplot(
    x=x_points,
    y=[func_red(x) for x in x_points],
    color="#DD4444",
    ax=ax)
seaborn.lineplot(
    x=x_points,
    y=[func_green(x) for x in x_points],
    color="#228D33",
    ax=ax)

# Finally, export the result as an SVG file.
pyplot.savefig("kde-plots.svg", format='svg', transparent=True)

Here’s the image that code generates, which shows all three functions plotted against each other:

KDE plots

The blue and green curves are generated with the "normal" kernel, but with different bandwidths. You can see that the wider bandwidth of the blue curve has lead to more smoothing and a lower peak, whereas the green curve’s very narrow bandwidth has meant its shape reflects the clustering of samples more closely. I would reiterate that I’m no expert, but it seems to me that the selection of the bandwidth is quite crucial, since undersmoothing will result in a jagged shape that hides the overall pattern, especially for sparse samples; whereas oversmoothing will stetch things out and obscure the underlying shape of the distribution.

The red curve uses the "uniform" kernel, which I believe just assigns equal weight to all the points in a limit window around the point being estimated. You can see this leads to a jagged result, as sample points come in and out of the window as we slide it along the axis. As a good example of how a poor choice of bandwidth can destroy the results, using "uniform" with a sufficiantly wide bandwidth will just result in a flat line, because the value at all points will simply be the mean of all the values.

KDE: Conclusion

I understand these functions enough to see their utility in certain applications, even if I’m not sufficiently expert to necessarily use them correctly. I believe KDE has applications in Byes Classifiers, to improve the accuracy, and more generally in signal processing and econometrics, and I’m sure other statistical applications as well.

File and Directory Access

There are a number of smaller enhancements in modules for filesystem manipulation:

  • Added glob.translate() to turn glob expressions into regular expressions.
  • Some Windows-focused enhancements to os.path.
  • A number of assorted improvements to pathlib.
  • Support for dir_fd and follow_symlinks parameters in shutil.chown().
  • A fix for CVE-2024-4030, by supporting mode 0o700 in os.mkdir() on Windows.

All but the last two are covered in a little more detail in the sections below.

glob

Glob patterns can be a convenient way to express filename patterns, but in some contexts where you really need is a regular expression. As of this release, you can now have both using glob.translate()—you pass in a glob pattern string as an input, and it returns you the equivalent regular expression string as an output.

One detail which is important to note is that the glob will only match file names—the glob will not span across directory separators (i.e. slashes).

Here are a couple of examples to illustrate:

>>> import glob
>>> glob.translate("*.log")
'(?s:(?!\\.)[^/]*\\.log)\\Z'
>>> glob.translate("app???-[0-9]*.log")
'(?s:app[^/][^/][^/]\\-[0-9][^/]*\\.log)\\Z'

os.path

A collection of smaller Windows-focused changes in os.path.

Added isreserved()
Only available on Windows. Returns True if the specified name is one of the reserved names on Windows, False otherwise.
isabs() and leading slashes
On Windows, paths with unix-style leading slashes are no longer regarded as absolute by isabs(), because their lack of a drive letter means they’re still relative to the current drive.
relpath() resolves inaccessible files
On Windows, when using old MS-DOS style filenames (e.g. FULLFI~1.TXT) previously a file which wasn’t accessible would break relpath() unnecessarily—this is no longer the case.

pathlib

A number of changes in pathlib. First up there’s a new UnsupportedOperation exception, which is now raised instead of NotImplementedError for operations which are not supported on a given path object. This inherits from NotImplementedError, so any code which catches that should continue to work.

Next is a new constructor for Path objects Path.from_uri() which, as the name implies, is designed to accept a URI using the file:// scheme.

>>> import pathlib
>>> pathlib.Path.from_uri("file:///home/andy/somefile.txt")
PosixPath('/home/andy/somefile.txt')

Then a couple of changes specifically to PurePath. First of these is a full_match() method, which accepts a glob-style pattern and returns True if the path in question matches it. This also supports ** for recursive matching, where ** will match across directory boundaries. The second change is a new parser attribute which stores the underlying module used for parsing paths—currently this will be posixpath or ntpath.

>>> from pathlib import PurePath
>>> PurePath("/one/two/three/four.txt").full_match("/one/*")
False
>>> PurePath("/one/two/three/four.txt").full_match("/one/two/*/*")
True
>>> PurePath("/one/two/three/four.txt").full_match("/one/**/four.*")
True
>>> PurePath("/one/two/three/four.txt").full_match("/one/**/four/*")
False
>>> PurePath("/").parser
<module 'posixpath' (frozen)>

The behaviour of the Path methods glob() and rglob() have subtly changed. Previously, if a glob pattern ended with ** then these methods would only return entries which were directories, now files are returned as well.

>>> from pathlib import Path
>>> ssl_entries = Path("/etc").glob("./ssl/**")
>>> print("\n".join(repr(i) for i in ssl_entries))
PosixPath('/etc/ssl')
PosixPath('/etc/ssl/cert.pem')
PosixPath('/etc/ssl/x509v3.cnf')
PosixPath('/etc/ssl/openssl.cnf')
PosixPath('/etc/ssl/certs')

Finally, some new symlink-handling behaviours. The recurse_symlinks keyword parameter has been added to the glob() and rglob() methods of Path, which follows symlinks even when expanding a ** glob term—by default, symlinks are not followed in this case. Also, the follow_symlinks keyword parameter has been added to is_file(), is_dir(), owner() and group() methods of Path—this parameter defaults to True if not specified.

Data Persistence

A siginficant change in the dbm module, and a handful of smaller updates to marshal and sqlite3.

dbm

The dbm module provides a generic interface to the DBM family of database engines, which are a series of key-value stores that trace their lineage back to the original dbm library written by Ken Thompson back in 1979.

Until Python 3.13, this module supported three engines6:

  • dbm.ndbm or “new DBM” was a rewrite of the original—still standard on Solaris, I think.
  • dbm.gnu is the GNU re-implementation of ndbm, now ubiquitous on Linux.
  • dbm.dumb is a low performance option in pure Python which is not really intended for production use, but as a fallback for when no better engine is available.

The main change in Python 3.13 is that an SQLite backend has been added, dbm.sqlite3, which uses the existing sqlite3 module to create the backend database—this is also now the default backend database type. This is a good choice, as SQLite is great little engine for simpler structured data storage cases.

The implementation seems to create a database with a single table called Dict two columns key and value, which are both of type BLOB. Due to the lack of assumptions that dbm can make about data types, this is probably about the best you can do without a lot of effort. The snippet below illustrates creating a new database through dbm and then interrogating it directly with sqlite3.

>>> import dbm
>>> db = dbm.open("/tmp/mydb", "c")
>>> db["hello"] = "world"
>>> db["one"] = 123
>>> db["two"] = 2.34
>>> db["hello"]
b'world'
>>> db["one"]
b'123'
>>> db["two"]
b'2.34'
>>> db.close()
>>>
>>> import sqlite3
>>> conn = sqlite3.connect("/tmp/mydb")
>>> cur = conn.cursor()
>>> cur.execute("SELECT * FROM Dict")
<sqlite3.Cursor object at 0x10506cdc0>
>>> for row in cur:
...     print(row)
...
(b'hello', b'world')
(b'one', b'123')
(b'two', b'2.34')

One detail that stands out, if you haven’t used dbm before, is that keys and values are always stored as bytes objects, and converted to them as required on storage. I would suggest always converting them yourself, especially from str, to prevent unexpected bugs around encoding/decoding when running on different platforms.

One last additional change in Python 3.13 is that all three database engines now support a clear() method, which deletes all existing keys.

marshal

For general persistence of Python objects, the general suggestion is to use the pickle module, or possible shelve (which is built on top of pickle) if you need to serialise many items.

But there’s also been the marshal module for a long time. This is intended for internal usage, primarily the reading and writing of pre-compiled .pyc files. Because it’s internal there’s no guarantee made about keeping the format stable or compatible between versions, and it only handles a subset of simple types, and those reasons are why pickle and shelve are generally better options. In my experience pickle is also faster than marshal for cases where I’ve tested them7.

Still, there are cases where marshal may be useful, and if you have one of those cases then you may be interested in a change in Python 3.13 which makes it a little safer. One of the things that’s not supported is using marshal to serialise code objects, and then load them in a different version of Python. Hence in Python 3.13 there’s now an allow_code parameter on the functions—setting this to False will prevent serialisation and deserialisation of code objects, reducing the risk of cross version incompatibilities.

Whilst we’re on the subject, I’ll reiterate something that’s clearly stated in the documentation but people often seem to overlook for some reason: you must never under any circumstances use any of these modules on data from an external source that you don’t trust—doing so introduces huge security holes, such as arbitrary code execution. If you truly must load data that’s passed through any stage that you don’t 100% control, do yourself a favour and use hmac or something to protect its integrity, or you’re quite likely to seriously regret it someday.

sqlite3

Now we’ll finish off the data persistence changes with a couple of small changes to the sqlite3 module itself. The first change is that failing to close a database connection will now raise a ResourceWarning.

One thing you should always do when using sqlite3, or almost any data filesystem-backed data persistence layer, is make sure you cleanly close your connection before you shut down. This is surprisingly easy to forget, since Python generally does a great job of closing things for you, but it’s not impossible to have occasional data loss if you’re not careful with this, particularly if the interpreter should crash.

To help you catch these errors, if the connection is still open when the interpreter is finalised then as of Python 3.13, you’ll get a ResourceWarning—the code still goes ahead and tries to close, so this shouldn’t break anything. As with all warnings, these are typically silenced in production code, but should show up in your unit tests to help you avoid those incredibly frustrating occasional production faults we all know and love.

The other change is that the iterdump() method on the Connection object, which dumps a series of SQL commands to re-create the current database schema and contents, has acquired a filter parameter to control which database objects are included. The value should be a string, which is interpreted as if it was a LIKE pattern (e.g. foo_%), or None for the default behaviour of dumping everything.

File Formats

Another smaller change here, in that the configparser.ConfigParser class now has support for unnamed sections in configuration files. This isn’t enabled by default, and you’ll still get a MissingSectionHeaderError if your configuration file contains any directives which aren’t in sections.

However, if you pass allow_unnamed_section=True to the ConfigParser constructor, then any directives before the first section header will be placed in a special unnamed section, which can be accessed by passing configparser.UNNAMED_SECTION as the section name.

>>> import configparser
>>> configfile = """
... first: one
... second: two
...
... [MySection]
... third: three
... """
>>>
>>> parser = configparser.ConfigParser(allow_unnamed_section=True)
>>> parser.read_string(configfile)
>>> parser.get(configparser.UNNAMED_SECTION, "second")
'two'
>>> configparser.UNNAMED_SECTION
<UNNAMED_SECTION>

I can imagine this being useful for simple cases where you’d like to use configparser, but where the need to insert a single redundant section name is a little onerous.

One thing that might be worth highlighting is that when creating a configuration file, it doesn’t look like programmatic support for adding an unnamed section exists—so this is mostly a read-only feature for now.

>>> import configparser
>>> parser = configparser.ConfigParser(allow_unnamed_section=True)
>>> parser.set(configparser.UNNAMED_SECTION, "name", "value")
  # Traceback omitted for brevity
configparser.NoSectionError: No section: <UNNAMED_SECTION>
>>> parser.add_section(configparser.UNNAMED_SECTION)
  # Traceback omitted for brevity
TypeError: section names must be strings

Generic Operating System Services

To finish this article off, we have a raft of changes in system level modules. These comprise:

  • ctypes
    • Initialisation of metaclasses in __init__() instead of __new__()
    • A new __align__ attribute on Structure objects.
  • io
    • Logging of errors raised by close().
  • os
    • A new function to obtain the number of logical CPU cores
    • Linux timer file descriptors
    • Enhancements to the chmod() family on Windows
    • Enhancements to posix_spawn()
  • time
    • Some functions now use higher-precision clocks on Windows.

So let’s run through these in a little more detail.

ctypes

A couple of internal refactoring changes8 have meant that internal metaclasses are now initialised during __init__() as opposed to the earlier __new__() as previously. This has a couple of consequences:

  • When instantiating the metaclass, you should pass parameters directly to the class rather than using __new__() yourself.
    • This also works in previous Python versions, so doesn’t break compatibility with those versions.
  • If you’ve added custom logic in __new__() that was subsequent to chaining into super().__new__, then you should move this logic to __init__().
    • I believe this doesn’t work in previous versions, so if you need to code to be compatible you may need it to do this conditionally on sys.version_info.

A second change is an enhancement to the Structure object, which is an abstract base class for structures in native byte order. Until now it’s been impossible to define a Structure which can map to a structure in a language with forced alignment. This might be specified with #pragma align N, or with GCC __attribute__ ((aligned(N))), or with C++11 the alignas specifier.

As of Python 3.13, however, you can define the _align_ attribute, with the byte alignment specified as the value. For example, set _align_ = 0x10 should fields on 16-byte boundaries. This might be done in underlying C data structure if, say, they’ll be processed by SIMD instructions.

io

Prior to Python 3.13, any exceptions raised by the close() method of io.IOBase would just be swallowed if called by the __del__() finaliser, unless you’ve enabled development mode with -X dev. As of this release, however, they will be passed to sys.unraisablehook(), even during interpreter shutdown—the default implementation writes details of the exception to stderr.

>>> import os
>>> x = open("/tmp/foo", "w")
>>> os.close(x.fileno())
>>> exit()
Exception ignored in: <_io.TextIOWrapper name='/tmp/foo' mode='w' encoding='UTF-8'>
OSError: [Errno 9] Bad file descriptor

os

There are actually a few new goodies in os in this release, so let’s break out the subsections.

Available CPU Cores

There’s a new process_cpu_count() function, which returns the number of logical CPU cores which are available to the calling thread—this will typically be an int, but could be None if it couldn’t be determined on the current platform.

The observant among you may wonder how this differs from the existing os.cpu_count(), which was added way back in Python 3.4. Well, on at least some platforms it’s possible to set a CPU affinity9 for a process, and this can restrict the number of cores available. This won’t change the result of os.cpu_count(), since that’s the number of logical cores available in the system, but this might throw off any automatic scaling behaviours by making a process act as if it has more CPU cores than the OS is allowing it to use.

The process_cpu_count() function is aware of these restrictions and returns the actual number of cores available, which is probably going to be what you want for most practical purposes. On a related note, there have also been updates to three other modules to use this new function instead of os.cpu_count() to select the default number of worker threads and processes—these are:

  • compileall
  • concurrent.futures
  • multiprocessing

As well as this new function, there’s also new support for a PYTHON_CPU_COUNT environment variable, and a corresponding -X cpu_count=N option, which will override the limit returned by both os.cpu_count() and os.process_cpu_count(). This could be useful for easily enforcing CPU limits on a Python process, particularly if it’s difficult for you to modify the environment in which it’s running.

Linux Timer File Descriptors

Linux has supported receiving timer events via file descriptors since 2.6.25, released back in 2008. In Python 3.13, support for this low-level interface has now been added.

The basic approach is that you create a new timer descriptor with timerfd_create(), and then you start a countdown timer on this descriptor using timerfd_settime(). Calling read() on this descriptor before the timer has expired will typically cause it to block, and select() and poll() can also wait for it as with any other file descriptor.

You can see this basic operation illustrated in the code below:

>>> import os, time
>>> timer_fd = os.timerfd_create(time.CLOCK_MONOTONIC)
>>> start_time = time.time(); os.timerfd_settime(timer_fd, initial=30.0)
(0.0, 0.0)
>>> print(os.read(timer_fd, 8)); end_time = time.time()
# ... blocks here until timer expires...
b'\x01\x00\x00\x00\x00\x00\x00\x00'
>>> end_time - start_time
30.000555515289307

There are some details that are important to note here. Firstly, the timer is quite flexible—you can see an initial delay using the initial parameter, but you can also (optionally) pass an interval parameter to cause the timer to repeatedly fire at this interval beyond the initial delay. This defaults to zero, which means the timer only fires once.

The second detail which isn’t immediately obvious is that the read() call must be passed a value of 8 and it always returns 8 bytes of data, which is the number of times the timer has fired since the last read(). This value is a 64-bit value in the host’s native endianness, which you can decode using int.from_bytes() as follows:

>>> import os, sys, time
>>> timer_fd = os.timerfd_create(time.CLOCK_MONOTONIC)
>>> os.timerfd_settime(timer_fd, initial=5.0, interval=2.0)
(0.0, 0.0)
>>> value = os.read(timer_fd, 8)
>>> int.from_bytes(value, byteorder=sys.byteorder)
18

The next thing you may have noticed is that timerfd_settime() returns a 2-tuple, which is (next_expiration, interval). The first value is the delay until which the previous timer would have expired, and the second is the interval of the wait which was current at the time.

Also, some more details on the initial timerfd_create() call. You’ll noticed that you have to pass the identifier of a clock to use—these are defined in time and the choices are CLOCK_REALTIME, CLOCK_MONOTONIC, or CLOCK_BOOTTIME10. My suggestion is go for CLOCK_MONOTONIC unless you have reason to believe one of the others is a better choice.

There’s a flags parameter to timerfd_create() which you can set to TFD_NONBLOCK to make the descriptor non-blocking. This means whenever a read() would have otherwise blocked, it will instead raise OSError with the errno attribute set to errno.EAGAIN.

The flags parameter is technically a bitwise OR of multiple flags, but the only other one currently available is TFD_CLOEXEC, to set close-on-exec behaviour, but this is always set by Python itself and there’s no way to stop it. Hence, TFD_NONBLOCK or not is really the only choice you have in this release.

The timerfd_settime() function also has a flags parameter, and this genuinely does have two flags, although only one of them can be used on its own. The flags are:

TFD_TIMER_ABSTIME
Instead of interpreting initial as a relative offset from the current moment, this flag causes it to be interpreted as an absolute value of the selected clock. This is useful if you want the timer to trigger at a specific time relative to other events for which you already have a clock timestamp. The interval value is still interpreted as relative to the last trigger.
TFD_TIMER_CANCEL_ON_SET
This flag only has meaning if bitwise OR with TFD_TIMER_ABSTIME and is only useful if the clock used is CLOCK_REALTIME. If specified, then if the clock value changes discontinuously (e.g. due to daylight savings time) then the timer is cancelled. Any read of a timer so cancelled will raise OSError with errno set to errno.ECANCELED.

In closing I’ll also briefly mention the timerfd_gettime() function which, when passed a timer file descriptor, returns the same (next_expiration, interval) value that we saw earlier when we reset an existing timer. One further point to note is that next_expiration is always relative to the current time, even if TFD_TIMER_ABSTIME was used.

Also, there are timerfd_settime_ns() and timerfd_gettime_ns() variants where the times are specified in nanoseconds instead of seconds, in cases where a float() representation of seconds loses accuracy. My suspicion is that if you’re using a high-level language like Python then you’re unlikely to be worrying about this level of timer accuracy, but they’re there if you need them.

One final note—as with any file descriptor opened with os.open() do remember that you should os.close() timer file descriptors once you’re finished using them. As these descriptors are just integers, they don’t have finalisers which close them when they go out of scope. If you don’t close them, your process will leak file descriptors, and if it’s running for a long time then this can definitely cause major headaches11.

chmod() on Windows

Next up we have some changes on the Windows platform. The os.lchmod() function is now supported on Windows, which changes permissions of the source of a symlink rather than the target. This is equivelent to passing follow_symlinks=False to os.chmod(), and that parameter is also now supported on Windows. Note that on Windows follow_symlinks defaults to False, whereas it’s True on other platforms.

Also on Windows, os.fchmod() has been added, to set the permissions by file descriptor as opposed to by filename. As with other platforms (since Python 3.3) it’s also possible to pass a file descriptor directly to os.chmod() instead of a path, and this has the same effect.

posix_spawn() Changes

The os.posix_spawn() function was added in Python 3.8 and is a low-level access to the system call of the same name. This is a way to spawn a child process, and is essentially a merged version of the fork() and exec() calls. It’s useful on small, embedded systems which lack support for these as individual operations, but even on larger systems, it can be more convenient12 and reduce the scope for programming errors.

Now it has to be said that most Python users should be using the functionality of the subprocess module instead of calling os.posix_spawn() directly—but Python is a flexible language, and it’s here if you need it.

The first change in Python 3.13 is that you can pass env=None, which copies the current process’s environment in the newly spawned child process. This is convenient as it saves you having to manually copy the existing environment and pass it in.

The second change is support for the POSIX_SPAWN_CLOSEFROM action passed in the file_actions sequence. This parameter indicates actions on existing file descriptors that posix_spawn() should perform in the child process before running the new executable—i.e. between fork() and exec(). The POSIX_SPAWN_CLOSEFORM action performs the equivalent of os.closerange(fd, ∞)—in other words, it closes every file descriptor from the one specified in the file action tuple up to the highest open file descriptor in the process. This is only supported on platforms on which posix_spawn_file_actions_addclosefrom_np() is provided, so best avoided in code that needs to be portable.

time

In closing, a couple of final Windows-specific changes for better time reoslution. The time.monotonic() function now uses the QueryPerformanceCounter() Windows call for a resolution of 1µs, and time.time() function now uses GetSystemTimePreciseAsFileTime() for a 1µs resolution.

The previously used calls, GetTickCount64() and GetSystemTimeAsFileTime() respectively, had a resolution of 15.6ms so this represents a significant improvement.

Conclusions

Well, we’ve reached the end at last—that felt like a much longer article than I thought it would be, but I struggle to think of any of these Python updates I’ve written where I couldn’t have said that.

Some more esoteric details in there, but also some definitely useful changes which I think will be useful. The generalisation of replace() methods stands out as something that should help library consistency over time. One thing that really helps when using a large system is having patterns that you can apply repeatedly in different situations, rather than having to go consult the documentation for each and every little module that you need to deal with to see how it specifically solves the same issue. Although the coverage of copy.replace() is somewhat limited at present, having the standard way to support these operations is what matters—coverage can be extended in future as it becomes useful to do so.

The statistics.kde() function is interesting, although I appreciate this type of analysis may have a limited audience. It’s one of those little “batteries included” things with Python, where you surprisingly often come across some functionality which you were convinced you’d need to implement yourself. When you find it in the library, not only do you avoid the effort of doing so, but more importantly you avoid the potential bugs. It was also quite an interesting learning experience for me—not only the statistical process itself, but also poking around with matplotlib and seaborn. I’ve also discovered these integrate quite nicely with JupyterLab, which makes it convenient to play with them.

I’ve no doubt that the filename matching improvements to glob and pathlib will prove useful, providing that I can remember that they’re there the next time I need them. I will say this is one of several areas where I struggle slightly with the way that the Python library breaks functionality down into modules—the fact that pathlib, glob and fnmatch are all separate makes it slightly hard to discover some of this functionality, in my view. Clearly they can’t go around just gratuitously breaking people by moving things around, but equally it does add friction. I’m hoping the addition of full_match() in pathlib.PurePath implies that perhaps they’re treating this module as the place to cenralise the useful parts of this functionality, and avoid the need for people to be so familiar with those other modules.

The shift of dbm to using sqlite3 is definitely a good move—it’s a flexible and efficient format, and the value of having a single cross-platform backend outweighs any minor concern I might have had with using a powerful engine for simple cases. There’s a reason why many applications use this even for simple configuration files and other state.

On the subject of configuration files, the addition of unnamed sections to configparser is also welcome, although in practice I suspect the addition of tomllib in Python 3.11 means that many people will be slowly switching to that format for better application of transferrable skills and tooling.

Finally, all of the operating system changes are good to see—as someone who used to do embedded development, I still have a soft spot for system level programming, and I’m happy to be able to use these primitives in a highly productive language like Python. The Linux timer file descriptor support in particular is likely to be useful, especially in existing cases where you have a poll() loop around, say, a set of sockets, and you just need a way to insert a regular timer for background tasks. This allows you to avoid the need for background threads in heavily IO-bound applications, which can avoid all sorts of irritating complications. Being Linux-specific could make things trickier for use in libraries, but I think a lot of application developers will be happy to make their applications Linux-specific.

So that’s it for this article—I’m going to try to squeeze all the other library changes in the following article, and hopefully there won’t be quite such a delay until that one. I console myself with the fact that at least I’ve still got a decent interval before Python 3.14 is released!


  1. Kernel density estimation will hereafter be referred to as KDE, not to be confused with the KDE GUI framework

  2. Probability density function will hereafter be referred to as PDF, not to be confused with the PDF file format

  3. For the avoidance of doubt, I’m absolutely not knowledgeable in this topic—this description was put together from my rather misty recollections of probability density functions from school, and a quick trawl through Wikipedia. I find it interesting to learn about these things, but I’m under no illusions that it’s possible to gain anything more than the most superficial understanding in the limited time I spend on these articles. 

  4. Cumulative distribution function will hereafter be referred to as CDF, not to be confused with Cardiff Central railway station

  5. Please note that I’ve barely used matplotlib or seaborn, so please do not regard my use of it as illustrating any best practices or good examples. I got this working by poking through a few examples online, like most people do, rather than spending time to get a deep understanding of these powerful libraries. 

  6. You may well wonder why the most famous and well-maintained such database, Berkeley DB, is not supported. Indeed, it used to be supported in Python 2.x as the bsddb module, but support was dropped in Python 3.x. As discussed briefly in PEP 3108, I believe the justification was the combination of high maintenance burden with comparatively low benefit. 

  7. A quick note of caution, however. In Python 2.x there always used to be a pure Python pickle module, and later an optimised C implementation called cPickle was added. In Python 3 these were rolled together and the C implementation is automatically used if present, and the Python one is a fallback option. If you’re on a platform where the Python one is used, for some reason, you might find your experience of pickle performance very different to mine. 

  8. The first change was #114314, which converted the use of static types in ctypes to be heap types. The second change was #117142, which converted the _ctypes extension module to use multi-phase init

  9. If you want to know more about CPU affinity on Linux, check out the man pages for taskset(), sched_setaffinity() and sched_getaffinity(). For an overview of scheduling in general, the sched (7) man page has a good overview (not to be confused with Shed Seven). 

  10. CLOCK_BOOTTIME was added in Linux kernel 3.15 (released June 2014) and it’s essentially the same as CLOCK_MONOTONIC except that while the system is suspended then CLOCK_MONOTONIC is paused whereas CLOCK_BOOTTIME continues to tick. 

  11. Your symptom for this would be getting errno.EMFILE, which means too many open file descriptors for your process; or in principle you might get errno.ENFILE, which means too many open file descriptors across the system as a whole—this can also be the result of other processes. Typically you’d expect to hit your per-process limit first, however, unless you’re doing something a little extreme. 

  12. Whilst only providing a subset of the functionality you can achieve with separate fork() and exec() calls, posix_spawn() does make it more convenient to perform additional housekeeping tasks between the two steps by passing in certain parameters. It handles things like updating signal masks and default handlers, changing the scheduling algorithm, changing process group and effective user and group IDs, closing existing file descriptors, etc. 

This is the most recent article in the “Python 3 Releases” series, which started with What’s New in Python 3.0
Sun 24 Jan, 2021
23 Feb 2025 at 1:58PM in Software
 | 
Photo by Andy Pearce (via Microsoft Designer)
 |