The Python Language Reference
I’ve been reading the Python 3.7 Language Reference and will use this post to write up bits that were new to me so that I don’t forget them immediately. Sections are named after the corresponding section of the language reference.
2 Lexical analysis
Bytes literals are produced like string literals, with a prefix b
or
B
:
bytesliteral = b"abc"
type(bytesliteral) # bytes
Adding an r
prefix to a byte or string literal makes it a raw
string: backslashes are treated literally.
Multiple adjacent string or byte literals concatenate: "hello" ' world'
is equivalent to "hello world"
. This concatenation happens at compile
time, as opposed to +
.
In f-strings, we can use !r
or !s
to control whether the
substituted expression is stringified using repr()
or str()
:
state = "pure"
f"I scarce believe my love to be so {state!r}"
3 Data model
The description of exactly what things are called mutable is a little confusing. At first: “Objects whose values can change are called mutable, objects whose value is unchangeable once they are created are called immutable.” But then (in the same paragraph!) “mutability is not strictly the same as having an unchangeable value” because an “immutable” container type like a tuple which contains (references to) mutable objects (like lists) can have its value change when we modify these latter. At a guess, the best way to understand the usage of mutable/immutable here is just to declare that certain types are going to be called mutable and others immutable.
There is a hint that it’s possible to modify the type of an object “under certain controlled conditions” but this “can lead to some very strange behaviour”.
The standard type hierarchy
There’s an Ellipsis
type which you can create with ...
but it
doesn’t seem to have any functions beyond syntactic sugar for numpy
slices, see
here.
Bool really is a subtype of int with different string conversion behaviour.
You can create tuples without any parens: a = 1, 2, 3
is fine.
Function types
There are many interesting attributes for user-defined functions.
__doc__
is the docstring, orNone
__defaults__
is a tuple containingNone
s or default argument values, if they exist. There is a similar attribute for keyword-only parameters.__code__
is the code object, representing byte-compiled Python code (bytecode).__globals__
refers to the dictionary holding the function’s global variables.__annotations__
is a dict containing type annotations of parameters, if there were any.
def f(x : int) -> str:
return str(x + 2)
ann = f.__annotations__
ann # {'x': int, 'return': str}
__closure__
has information about non-global free variables used by a function, for example, a function inside a function. For example:
def a(x):
t = 2
def b(y):
return x + t + y
return b
Now f = a(3)
is a function with a non-null closure attribute, e.g.
f.__closure__[0].cell_contents
will be 2, the value of the variable
t
local to a
and used as a free variable in the definition of b
.
Details
here.
Coroutine functions and asynchronous generator functions
Defined with async def
, these are for concurrency. They’re above my
pay grade right now.
4 Execution model
Blocks
The following things constitute a Python code block:
- modules
- function bodies
- class definitions
- single commands when interacting with the interpreter
- a script file passed to the interpreter
- a script command passed to the interpreter with
-c
, e.g.python -c "print("hello")"
- a module run via
python -m modulename
- the argument to eval or exec
Naming and binding
nonlocal
makes an identifier refer to the variable with the same name
in the nearest enclosing non-global scope. It’s an error if no such
variable exists.
If a local variable is defined in a block, its scope includes (and is sometimes limited to) that block, but if it is defined in a function block its scope extends to any blocks contained in the function definition (unless it is re-defined there).
You can’t use a variable from an enclosing scope in a block and then create a local with the same name. For example,
a = 42
def g(x):
b = a+2
a = 99
return a * 1000*b + 100000*x
is an error (“local variable a
referenced before assignment”) because
binding the name a
inside the function definition means all references
to a
must refer to the new local - even those which appear before the
new binding took place.
Free variable name resolution happens at runtime: the example in the docs is that
i = 10
def f():
print(i)
i = 42
f()
prints 42, not 10.
6 Expressions
Generator comprehensions
Generator comprehensions can be done with parentheses:
a = (x + 2 for x in range(100))
produces a generator object. Now for i in a
is OK, or you can do
a.__next__()
manually to get the next output.
Special syntax for unpacking
Formal parameters named with the **name
syntax get passed a parameter
name-value dict containing the values of any keywords specified in the
call which don’t match explicit keyword formal parameters. For example,
in
def f(x, y = 2, **d):
print(d)
return x + y + d["z"]
f(1, z=4, y=5)
the dict {'z': 4}
will be printed and the result 10 returned.
The double asterisk syntax can be used in calls to pass a named
parameter-value dict, e.g. f(1, 4, **{'z' : 50})
works for the
function above. A single asterisk just unpacks iterables into
positional arguments in-place, e.g. if y
was an iterable with two
values y1
and y2
then g(x, *y, z)
is equivalent to g(x, y1, y2, z)
.
Weird NaN behaviour
float(NaN)
and its decimal equivalent give false when compared with
<
or >
or ==
to any number, themselves included, so they are not
equal to themselves. On the other hand list comparison assumes that
identical objects (ones that make x is y
true) are equal.
[float('NaN')] == [float('NaN')]
is false (as float('NaN') is float('NaN')
is false) so they’re compared for equality, but if nan = float('NaN')
then [nan] = [nan]
is true…
Left to right evaluation
Expressions are evaluated from left to right. The following would be evaluated in numerical order
e1 + e2 * (e3 + e4)
e1(e2, e3, *e4, e5)
Boolean quirks
not x
is a bool
regardless of x
, but x or y
returns the value of
x
or that of y
. For example, if s
is a string then s or
"default"
is s
if it is nonempty and "default"
otherwise.
Mutable default arguments
A great way to mess up. Default values for function parameters are calculated once, when the function is defined.
def f(x):
print("f called")
return x
def g(a, b=f(4)):
return a + b
Interactively, you will see "f called"
as soon as you execute the
definition of g
.
This means if you have a mutable default, awful things can happen:
def f(x = []):
return x
If you append
1 to f()
, for example, subsequent calls to f
with
no argument return [1]
, until you modify it in some other way that is.
7 Simple statements
The built-in __debug__
variable controls whether asserts are checked,
you can’t modify it except by requesting optimization with -O
on the
command line. assert e1, e2
is equivalent to raising an
AssertionError(e2)
when e1
is false.
8 Compound statements
The official terminology is that these are made up of one or more clauses. A clause has a header and a suite. The header is a keyword, maybe some other stuff, then a colon. The suite is usually the indented block coming after the header, though it doesn’t have to be on a separate line. Multiple statements in the suite can be separated with semicolons.
Else in while and for
If these have an else clause, it is run after the loop would normally
terminate but is skipped if there is a break
.
8.5 with
The point of with
is to encapsulate common try-except-finally
patterns.