The Python Language Reference

I’ve been reading the Python 3.7 Language Reference and will use this post to write up bits that were new to me so that I don’t forget them immediately. Sections are named after the corresponding section of the language reference.

2 Lexical analysis

Bytes literals are produced like string literals, with a prefix b or B:

bytesliteral = b"abc"
type(bytesliteral) # bytes

Adding an r prefix to a byte or string literal makes it a raw string: backslashes are treated literally.

Multiple adjacent string or byte literals concatenate: "hello" ' world' is equivalent to "hello world". This concatenation happens at compile time, as opposed to +.

In f-strings, we can use !r or !s to control whether the substituted expression is stringified using repr() or str():

state = "pure"
f"I scarce believe my love to be so {state!r}"

3 Data model

The description of exactly what things are called mutable is a little confusing. At first: “Objects whose values can change are called mutable, objects whose value is unchangeable once they are created are called immutable.” But then (in the same paragraph!) “mutability is not strictly the same as having an unchangeable value” because an “immutable” container type like a tuple which contains (references to) mutable objects (like lists) can have its value change when we modify these latter. At a guess, the best way to understand the usage of mutable/immutable here is just to declare that certain types are going to be called mutable and others immutable.

There is a hint that it’s possible to modify the type of an object “under certain controlled conditions” but this “can lead to some very strange behaviour”.

The standard type hierarchy

There’s an Ellipsis type which you can create with ... but it doesn’t seem to have any functions beyond syntactic sugar for numpy slices, see here.

Bool really is a subtype of int with different string conversion behaviour.

You can create tuples without any parens: a = 1, 2, 3 is fine.

Function types

There are many interesting attributes for user-defined functions.

__doc__ is the docstring, or None
__defaults__ is a tuple containing Nones or default argument values, if they exist. There is a similar attribute for keyword-only parameters.
__code__ is the code object, representing byte-compiled Python code (bytecode).
__globals__ refers to the dictionary holding the function’s global variables.
__annotations__ is a dict containing type annotations of parameters, if there were any.

def f(x : int) -> str:
    return str(x + 2)
ann = f.__annotations__
ann # {'x': int, 'return': str}

__closure__ has information about non-global free variables used by a function, for example, a function inside a function. For example:

def a(x):
    t = 2
    def b(y):
        return x + t + y
    return b

Now f = a(3) is a function with a non-null closure attribute, e.g. f.__closure__[0].cell_contents will be 2, the value of the variable t local to a and used as a free variable in the definition of b. Details here.

Coroutine functions and asynchronous generator functions

Defined with async def, these are for concurrency. They’re above my pay grade right now.

4 Execution model

Blocks

The following things constitute a Python code block:

modules
function bodies
class definitions
single commands when interacting with the interpreter
a script file passed to the interpreter
a script command passed to the interpreter with -c, e.g. python -c "print("hello")"
a module run via python -m modulename
the argument to eval or exec

Naming and binding

nonlocal makes an identifier refer to the variable with the same name in the nearest enclosing non-global scope. It’s an error if no such variable exists.

If a local variable is defined in a block, its scope includes (and is sometimes limited to) that block, but if it is defined in a function block its scope extends to any blocks contained in the function definition (unless it is re-defined there).

You can’t use a variable from an enclosing scope in a block and then create a local with the same name. For example,

a = 42

def g(x):
    b = a+2
    a = 99
    return a * 1000*b + 100000*x

is an error (“local variable a referenced before assignment”) because binding the name a inside the function definition means all references to a must refer to the new local - even those which appear before the new binding took place.

Free variable name resolution happens at runtime: the example in the docs is that

i = 10
def f():
    print(i)
i = 42
f()

prints 42, not 10.

6 Expressions

Generator comprehensions

Generator comprehensions can be done with parentheses:

a = (x + 2 for x in range(100))

produces a generator object. Now for i in a is OK, or you can do a.__next__() manually to get the next output.

Special syntax for unpacking

Formal parameters named with the **name syntax get passed a parameter name-value dict containing the values of any keywords specified in the call which don’t match explicit keyword formal parameters. For example, in

def f(x, y = 2, **d):
    print(d)
    return x + y + d["z"]

f(1, z=4, y=5)

the dict {'z': 4} will be printed and the result 10 returned.

The double asterisk syntax can be used in calls to pass a named parameter-value dict, e.g. f(1, 4, **{'z' : 50}) works for the function above. A single asterisk just unpacks iterables into positional arguments in-place, e.g. if y was an iterable with two values y1 and y2 then g(x, *y, z) is equivalent to g(x, y1, y2, z).

Weird NaN behaviour

float(NaN) and its decimal equivalent give false when compared with < or > or == to any number, themselves included, so they are not equal to themselves. On the other hand list comparison assumes that identical objects (ones that make x is y true) are equal. [float('NaN')] == [float('NaN')] is false (as float('NaN') is float('NaN') is false) so they’re compared for equality, but if nan = float('NaN') then [nan] = [nan] is true…

Left to right evaluation

Expressions are evaluated from left to right. The following would be evaluated in numerical order

e1 + e2 * (e3 + e4)
e1(e2, e3, *e4, e5)

Boolean quirks

not x is a bool regardless of x, but x or y returns the value of x or that of y. For example, if s is a string then s or "default" is s if it is nonempty and "default" otherwise.

Mutable default arguments

A great way to mess up. Default values for function parameters are calculated once, when the function is defined.

def f(x):
    print("f called")
    return x

def g(a, b=f(4)):
    return a + b

Interactively, you will see "f called" as soon as you execute the definition of g.

This means if you have a mutable default, awful things can happen:

def f(x = []):
    return x

If you append 1 to f(), for example, subsequent calls to f with no argument return [1], until you modify it in some other way that is.

7 Simple statements

The built-in __debug__ variable controls whether asserts are checked, you can’t modify it except by requesting optimization with -O on the command line. assert e1, e2 is equivalent to raising an AssertionError(e2) when e1 is false.

8 Compound statements

The official terminology is that these are made up of one or more clauses. A clause has a header and a suite. The header is a keyword, maybe some other stuff, then a colon. The suite is usually the indented block coming after the header, though it doesn’t have to be on a separate line. Multiple statements in the suite can be separated with semicolons.

Else in while and for

If these have an else clause, it is run after the loop would normally terminate but is skipped if there is a break.

8.5 with

The point of with is to encapsulate common try-except-finally patterns.