In this series looking at features introduced by every version of Python 3, we continue our look at the new Python 3.11 release, taking a look at new language features around type hints.
This is the 26th of the 35 articles that currently make up the “Python 3 Releases” series.
Having looked at Python 3.11’s performance improvements and new exception support features in previous articles, now we turn our attention to some enhancements to type hinting support in this release.
There are a series of changes in this area, which I’ve summarised below.
TypeVarTuple
.TypedDict
.Self
type annotation.LiteralString
.dataclass_transform
for decorators which preserve dataclass
semantics.from __future__ import annotations
is uncertain.As usual, we’ll look at all of these in more detail in the follow sections.
Since type hinting was first formalised in Python 3.5, the typing
module has contained TypeVar
to represent type variables. This type has some flexibility to support multiple types, but it only ever represents a single type at a time. With a language like Python which supports fully heterogeneous containers, however, that can be quite an onerous limitation. As a result, Python 3.11 brings us typing.TypeVarTuple
, which is equivalent to TypeVar
except that it can represent a tuple of arbitrary types.
I’ll try and drill in to quite a bit here, but there’s a lot of detailed analysis in PEP 646 so if you really want the nitty gritty details, go check it out.
This is perhaps best illustrated with a code snippet. Here we see a function which returns whatever tuple
it’s given but with the first element converted to an int
.
1 2 3 4 5 6 7 8 |
|
There’s a bit to unpack in the example above, pun intended with apologies, so let’s go through it line by line.
After the imports, on line 3 we declare a type variable using the new TypeVarTuple
— this represents an arbitrary tuple of potentially different types. We need this because our function only deals with the first item of the tuple, and so we need a way to express that we allow any remaining types, and whatever they are in the input they will also be the same in the output.
Now we get to the signature of convert_first_int()
on line 5. The values
parameter is declared with type tuple[int|str|float, *Ts]
. That first type hint of int|str|float
is normal enough, it means that first item must be one of int
, str
or float
— the use of the |
operator is an application of the more concise format for typing.Union
that was added in Python 3.10.
The second clause *Ts
is a new use of the *
operator to “unpack” the tuple of types represented by the type variable Ts
. Thus, the whole specification tuple[int|str|float, *Ts]
represents a tuple
where the first type is one of the three listed, and the remaining types can be anything. The return type of tuple[int, *Ts]
therefore indiates that a tuple
will be returned where the first item is always an int
, and the remaining types will be whatever they were in the original values
.
So TypeVarTuple
is just like Tuple[T1, T2, ...]
where the number of types is arbitrary. An important point to note, however, is that it must always be unpacked with the *
operator when used — it would not be valid to use something like values: Ts
or values: tuple[Ts]
in the example above.
Since this use of *
requires a grammar change, in earlier versions of Python it’s not available — the TypeVarTuple
type is available via the typing_extensions
backport package, but without this grammar change it’s not particularly useful. As a result, there’s also typing.Unpack
which has the same effect. So instead of tuple[*Ts]
you’d write tuple[Unpack[Ts]]
— but I wouldn’t bother in Python 3.11, the asterisk syntax is more concise and clearly what the PEP authors intended people to use where possible.
Also it’s worth noting that, unlike TypeVar
, TypeVarTuple
doesn’t yet support constrained types or the keyword parameters like bound
or covariant
. These are likely to come in a future PEP and release, but things have been kept simple for now.
Another important point is that every ocurrence of the same TypeVarTuple
variable within a given context must refer to the same type. For example, the following code would not be valid:
Ts = TypeVarTuple("Ts")
def my_function(arg1: tuple[*Ts], arg2: tuple[*Ts]) -> None:
...
my_function((1, 2), ("3", "4")) # NOT valid
Finally, only one unpacking is allowed in a given tuple.
Xs = TypeVarTuple("Xs")
Ys = TypeVarTuple("Ys")
a: tuple[*Xs] # Valid
b: tuple[int, *Ys] # Valid
c: tuple[*Xs, *Ys] # NOT valid
*args
¶The final aspect of TypeVarTuple
I’d like to cover is its use with *args
. According to PEP 484 if *args
is annotated with a type then all the arguments are expected to have the same type.
def my_function(*args: int) -> None:
...
my_function(1, 2, 3) # Valid
my_function(1, 2, "3") # NOT valid
With TypeVarTuple
, however, we can now properly annotate heterogeneous type specifications. Unlike the other examples above, we don’t need to unpack within a tuple
because *args
is already a tuple
— this is the only instance where the type variable like *Ts
can be used directly, as opposed to parameterise something else (e.g. tuple[*Ts]
).
Ts = TypeVarTuple("Ts")
def my_function(*args: *Ts) -> None:
...
my_function(1, 2, "3") # Inferred as tuple[int, int, str]
This extension of the use of *
doesn’t just apply to TypeVarTuple
either — other tuple
type specifications can be used. Take this code, for example.
def my_function(*args: *tuple[int, *tuple[str, ...], int]) -> None:
...
This annotation expects my_function()
to be called with a single int
, followed by zero or more str
values and then a final int
at the end. Of course, at runtime they’ll all be passed in args
as normal.
TypedDict
¶Python 3.8 introduced the TypedDict
class as a way to add type hints to specific keys within a dictionary. As described in PEP 589 the keys can either be all required, which is the default, or made all optional by passing total=False
to the base class1, as in the snippet below.
class AllOptional(TypedDict, total=False):
one: int
two: str
three: float
In Python 3.11 this has been further enhanced, as per PEP 655, to allow some fields to be marked as optional whilst allowing others to be required. This has been achieved by adding two new identifiers to the typing
module which are Required
and NotRequired
— these might seem a slightly convoluted choice, but given that Optional
already means “a type or None
“, different nomenclature was needed. The reason both are required is to cater for cases where total=True
as well as total=False
.
The snippet below contains two class definitions which are equivalent in their notions of which fields are required and which are optional.
from typing import NotRequired, Required, TypedDict
class First(TypedDict):
one: int
two: str|None
three: NotRequired[float]
class Second(TypedDict, total=False):
one: Required[int]
two: Required[str|None]
three: float
This is all fairly straightforward, but one point to note is the use of str|None
instead of the more usual Optional[str]
. This is recommended by the PEP because of the understandable confusion should one write Required[Optional[str]]
, though it would still be both syntactically and semantically correct.
Let’s see an example of mypy
identifying a violation of these rules. Firstly, here’s the code.
typeddict.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
And here’s the result of running mypy
over it.
$ mypy --python-version 3.11 typeddict.py
typeddict.py:16: error: Missing key "one" for TypedDict "MyStructure" [typeddict-item]
Found 1 error in 1 file (checked 1 source file)
get_type_hints()
¶One final point is that these hints are filtered out of the results of typing.get_type_hints()
, unless you specify include_extras=True
in the call.
>>> from typing import NotRequired, Required, TypedDict, get_type_hints
>>>
>>> class First(TypedDict):
... one: int
... two: str|None
... three: NotRequired[float]
...
>>> get_type_hints(First)
{'one': <class 'int'>, 'two': str | None, 'three': <class 'float'>}
>>> get_type_hints(First, include_extras=True)
{'one': <class 'int'>, 'two': str | None, 'three': typing.NotRequired[float]}
Self
Type Annotation¶It’s quite often the case one has to refer to the current class in the signature of a method, often when a method needs to return a new instance of a class. This can sometimes be the case for normal methods, and also for class methods which act as alternative constructors. As of Python 3.11, this has been made more convenient and intuitive by adding typing.Self
, whose use I’ll illustrate below.
Consider adding type annotations to a normal method which must return an instance of the class. To see how the new typing.Self
helps us, let’s first see how to do this without it.
A simple approach to this is just to annotate with the name of the class itself, as in the example below.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
This works, but has two issues. Firstly it only works using from __future__ import annotations
, whose future is somewhat in doubt (as discussed later in this article) — you could work around that by using a string literal instead, though. Secondly, if Person
is subclassed then this method is still annotated as returning the base class, which is going to cause type checkers some problems.
An option which is better in some ways is to use a TypeVar
which is bound to the class, as follows.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
This works, but it’s fiddly — you need to remember to bind the TypeVar
, and you also need to annotate self
which isn’t normally done, hence is easy to forget.
Now we can look at the new annotation typing.Self
that has been added in Python 3.11. It is essentially just an alias for a TypeVar
bound to the current class, as in the example immediately above, but it’s significantly simpler to use — you can see how in the updated example below.
1 2 3 4 5 6 7 8 9 10 |
|
Self
can be used in most places you’d expect, but on class methods will be another common example, so let’s see an example of that as well. In the code snippet below, the Person
class has been updated to include a from_csv()
method which is passed a string which contains a comma-separated row of values, and is expected to construct a Person
instance from it and return it.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Importantly, the code here continues to work if included in a subclass, as nothing here specifically mentions the Person
base class in the code or annotations.
Injection attacks are one of the most common ways of subverting software. This is where carefully crafted user input is provided in such a way as to cause software to process it in a way which wasn’t anticipated by the author. One of the most common examples is SQL injection2, where user input is used unchecked in a query string.
To prevent this you need to sanitise your inputs somehow before using them. There are various domain-specific ways of doing this, such as parameterised queries in SQL. In addition, some languages also offer more general approaches to this problem, such as the taint checking which is offered by languages like Perl and Ruby.
Python doesn’t offer generalised taint checks, though there are some static analysis tools which claim to do this — the Pyre type checker also includes the Pysa static analyser which performs a taint analysis. Exploring Pyre in detail is still on my “to do” list, however, and may well form the topic of a future blog article.
That said, in Python 3.11 there’s a new type annotation which will be of some help in preventing injection attacks in cases like passing SQL queries. This is the typing.LiteralString
annotation, which is described in PEP 675.
You may remember that a somewhat similar-sounding annotation, typing.Literal
, was added back in Python 3.8 — I briefly described it in an earlier article. This allowed you to annotate that a particular parameter must have one of a pre-determined set of specific literal values.
LiteralString
is a generalisation of this, which permits any string value but only if it has been constructed from literals which are sourced from within the code, and not any user input.
So, if you consider the execute()
method of the sqlite3.Cursor
object, you could annotate it as follows:
from collections.abc import Iterator
from typing import LiteralString
class Cursor:
...
def execute(
self,
sql: LiteralString,
parameters: Iterator[str]|dict[str, str] = (),
/):
...
At runtime this will have no effect, as with other type annotations — indeed, at runtime a literal string is just a plain str
like any other, so this is too late to apply any checks. Within the type checker, this information is already tracked in order to implement the checks for Literal
, so LiteralString
uses this same machinery — in essence it’s a superset of all possible Literal
values.
For completeness, all of the following cases are compliant with LiteralString
:
x = "hello"
).LiteralString
(e.g. y = x + " world"
).sep.join(items)
is compliant if sep
is a LiteralString
and items
is an Iterable[LiteralString]
.LiteralString
, and similarly with str.format()
.str
methods which preserve the LiteralString
status of the string.Whilst full-scale taint checking would be nice, this probably covers a pretty high proportion of the common cases all on its own, since it’s mostly string inputs from users which cause the problems. Hence, I’m glad to see pragmatic steps being taken to allow these security flaws to be detected earlier.
Note: This is a bit of an obscure feature, and unlikely to be of interest unless you want to implement your own equivalent of dataclasses.dataclass
in the Python library. I mention this now so you can skip to the next section if you’re not interested.
In Python 3.7 the dataclasses
module was introduced, offering the @dataclass
class decorator which made a class into a sort of a mutable namedtuple
. I went through this in an earlier article in the series.
Type checkers generally have good support for this module and its decorator, since it’s part of the standard library. However, there are also populate third party libraries which offer similar facilities, and these are less well supported — one recently popular example is pydantic, which adds runtime checks on class attributes based on annotations, typically to implement services such as HTTP-based APIs.
The main change here is the addition of a typing.dataclass_transform
decorator. This can be applied to a decorator function, class or metaclass, and hints to a type checker that this decorator endows classes it creates with additional dataclass-like behaviours.
Since this is potentially a little confusing, just to clarify — the @dataclass_transform
decorator is applied to the decorator that you write which itself is used to decorate classes. Perhaps an example might make this clearer — I won’t give an implementation of the decorator function, since it would unnecessarily complicate this example with a lot of additional code.
from typing import dataclass_transform
# This is the equivalent of dataclasses.dataclass().
@dataclass_transform
def my_dataclass_decorator(cls):
"""Adds __init__() etc. and returns updated class."""
...
# And this is how you would use the decorator defined above.
@my_dataclass_decorator
class MyClass:
one: int
two: str
The specific changes that are assumed to be supported by such decorators are:
__init__()
method based on the declared data fields.__eq__()
, __gt__()
, etc.)However, not all of these are necessarily always implemented by such a decorator. As with @dataclass
, it’s assumed that the decorator can take parameters to customise these changes. For example, @dataclass
can accept init=False
to disable the generation of an __init__()
method, or order=True
to add additional rich comparison operators like __gt__()
, whereas by default only an equality comparison is added.
Type checkers are expected to honour the same parameters as accepted by @dataclass
, and assume that they provide the same function. Also, since the default values of these may differ between @dataclass
and the third party decorator, the @dataclass_transform
decorator itself can take parameters to specify the default in use. For example, if called as @dataclass_transform(eq_default=False)
then if the caller of the third party decorator doesn’t provide an eq
argument then the value will be assumed to be False
— in the standard library @dataclass
, the default would be True
.
The parameters that @dataclass_transform
accepts are listed, along with the decorator parameter whose default they set. The meanings of the decorator params can be found in the standard library documentation for @dataclass
.
@dataclass_transform param |
Decorator param |
---|---|
eq_default |
eq |
order_default |
order |
kw_only_default |
kw_only |
As well as these, there’s also a field_specifiers
parameter to @dataclass_transform
, which specifies a list of supported classes to describe fields — i.e. classes which provide equivalent functionality to that of dataclasses.field
in the standard library.
The only other aspect that I want to mention here is that there is a small runtime change as well as the annotation aspect — a new attribute __dataclass_transform__
is added to the decorator function or class for introspection purposes. This will be a dict
containing the parameters which were passed to @dataclass_transform
.
That’s about as much detail as I’d like to go into, but do have a read through PEP 681 if you want the full details. You might also like to peruse the original PEP 557 which described the original dataclasses
module.
The final change I’d like to discuss here isn’t really a change but a lack of a change — or perhaps a change of plans. First some context, for those who don’t keep up with Python development closely.
Back in Python 3.7, PEP 563 was introduced which postponed evaluation of annotations — instead of being processed at parsing time, they’re preserved in __annotations__
in string form for later use. The main goals of this PEP were twofold:
These changes were present in Python 3.7, but had to be activated with the use of from __future__ import annotations
at the start of the source file. The original intention was to then make this behaviour the default in Python 3.10 — i.e. the __future__
import would no longer be required, and the change would impact everyone.
Around April 2021, however, this change was deferred and moved out of Python 3.10. This was done because it transpired that various people had started to use annotations for purposes other than type hints, and deferring their execution would break this code. The example which seemed to be creating the most noise was the pydantic project, and its use in the FastAPI framework — this is apparently a Python framework for building HTTP-based APIs.
Roll on Python 3.11, and the decision appears to have been deferred again. However, this time the announcement explicitly mentioned the possibility that PEP 563 may not be ever accepted as a default.
In the interest of full transparency, we want to let the Python community know that the Steering Council continues to discuss PEP 563 (Postponed Evaluation of Annotations) and PEP 649 (Deferred Evaluation Of Annotations Using Descriptors). We have concluded that we lack enough detailed information to make a decision in favor of either PEP. As we see it, adopting either PEP 563 or PEP 649 as the default would be insufficient. They will not fully resolve the existing problems these PEPs intend to fix, will break some existing code, and likely don’t address all the use cases and requirements across the static and dynamic typing constituents. We are also uncertain as to the best migration path from the current state of affairs.
The PEP 649 to which they refer is an alternative approach which involves deferring the construction of __annotations__
to the point where it’s first queried, after which point forward references will have most likely been resolved. It also means the overhead of constructing it is only incurred when it’s actually queried — the value is cached, so the overhead is still only incurred once.
You might also like to read this message from Łukasz Langa, the author of PEP 563, where he discusses his take on the situation (as of April 2021). I think it’s a really clear summary of the issues, and a great way to catch up.
So what are we to do with all this right now?
Well, it seems to me that it’s unlikely that PEP 563 will be accepted to become the default at this point — it breaks things for a sizeable community of users, and there doesn’t seem to be any way of preventing that without a major change on one side or the other. None of us have a crystal ball, mind you, that’s just my opinion.
That said, PEP 649 isn’t accepted at all as yet, so the from __future__ import co_annotations
it uses can’t be used even in Python 3.11. As a result, as I see it there’s only two options for most developers who want to use type hints:
The second option did get significantly easier in a few common cases with the addition of typing.Self
in Python 3.11, so maybe that’s the way I’ll be going — I can usually find ways to structure my code around any other use of forward references, since I’m rather used to doing the same sort of thing in C/C++ anyway.
Frankly, it’s not a great situation. If you can avoid use of PEP 563 features for now, I would suggest your life is potentially going to be easier in future if so — but I wouldn’t hold back from using type hints just for that reason, it’s not like it’s probably not a massively difficult change to update later, particularly if you only use from __future__ import annotations
in the specific source files where it’s needed, because then you have an easier thing to search for to find which source files might need updates later as things change.
Overall, though, I definitely hope some conclusive decision is taken before Python 3.12 — I suspect in terms of total overall pain caused, the current indecision is probably hurting more than either of the choices would do.
I think that’s the major language changes in Python 3.11 covered now, so in the next article I’ll mop up any of the smaller changes I think are worth mentioning, plus make a start discussing the updates to the standard library modules.
That’s about it for this topic. The subject of type hints are starting to get into the long weeds, now, but that’s a promising sign — it probably means that the simple problems are all solved and there’s relatively little to hold people back from type hinting almost all of their codebase.
Of the items I’ve discussed above, I have to say that typing.Self
, arguably the simplest feature, is the one I’m going to most appreciate. The rest of them are a little more niche, but mostly I’m glad that type hinting is still getting a good deal of attention as I feel it’s a really solid step any developer can take to not only catch bugs earlier, but make their code more comprehensible to newcomers.
I think that’s about it for the major new features in Python 3.11, so in the next article I’ll cover out any of the smaller changes I think are worth highlighting, and also take a look at the new modules added, plus make a start on looking at the updates to existing modules.
More accurately, keyword parameters in the base class specification are passed to the __init_subclass__()
class method of the base class, which was introduced in Python 3.6 and which I discussed in a previous article in this series. ↩
If you’re writing code which uses SQL and you don’t know about SQL injection, go read up on it right now (or at least before you write any more code) before some Little Bobby Tables teaches you the lesson in a more painful way. ↩