A toy model

Consider a system made of several parts. A complete description could be given by knowing the precise state of each of these parts, and how they will evolve in the future, but often this is unnecessary and not even useful. To illustrate the concept, let us begin by considering a system made of two coins, each displaying either head ($H$) or tail ($T$). The state of the system is then completely specified by assigning the value $H$ or $T$ to coin 1 and $H$ or $T$ to coin 2. There are therefore 4 possible configurations, or microstates, each one identifiable by one row in Tab [*].

Table: Status of coin 1 and coin 2 and the corresponding value of the variable $f = (N_H - N_T)/2$, with $N_H$ and $N_T$ the number of coins displaying H or T, respectively.
 Coin 1 Coin 2 $f$    
 H H 1    
 H T 0    
 T H 0    
 T T -1    
          


Each of the 4 possibilities has obviously the same probability of occurring, so the postulate of equal probability of microstates is not difficult to accept for this system. If we throw the coins many times we would expect each of these four states to occur roughly 25% of the time. Now define the variable $f = (N_H - N_T)/2$, with $N_H$ and $N_T$ the number of coins displaying $H$ or $T$, respectively. The value of $f$ can be -1, 0 or 1, depending on the combinations of $H$ and $T$ of the two coins and note that the value 0 can be obtained with two microstates, while -1 and 1 with only one. Therefore, if we throw the coins many times, we would expect to see the value of $f = 0$ more often than the values $f=1$ or $f=-1$ (twice as often, to be precise). So, if we use the microstates we have a complete description of the system; in any realisation the system will choose one of these microstates at random. If instead we choose to use the value of $f$ then the description of the system is incomplete, as it has been coarse grained to be represented by a collective variable.

Now imagine we have a system formed by $N$ coins. We could write a table equivalent to Tab. [*], by specifying all possible combinations of $H$'s and $T$'s. Such a table would have $2^N$ rows, and the value of $f(N_H,N) = (N_H - N_T)/N$ would also change between -1 and 1. As in the example with only two coins, the values $f=1$ and $f=-1$ can only be realised by a single microstate, all $H$'s or all $T$'s, but the intermediate values can be realised by many combinations. The number of combinations $\omega(N_H,N)$ for which there are exactly $N_H$ heads can be easily worked out. The number of possible ways of arranging $N$ distinct objects in $N$ different positions is $N!$ (first object in any of the $N$ positions, then for any of these choices the second object has $N-1$ possibilities and so on, so the total number of possibilities is $N(N-1)\dots 1$). However, we do not have $N$ distinct objects but two groups of $N_H$ and $N_T$ identical objects, and so any permutations of these identical objects does not result in a different configuration. As a consequence, the total number of possibilities is $N! / N_H! N_T!$, which can be rewritten as:

$\displaystyle \omega(N_H,N) = \frac{N!}{N_H!(N-N_H)! } = \binom{N}{N_H}.$ (3.1)

We can easily verify that the total number of possibilities, obtained by summing $\omega(N_H,N)$ over all possible values of $N_H$, is indeed $2^N$:

$\displaystyle \Omega(N) = \sum_{N_H=0}^N \omega(N_H,N) = \sum_{N_H=0}^N \binom{N}{N_H} = (1+1)^N = 2^N,$ (3.2)

where the last equality comes from the usual expansion of the $N^{th}$ power of the sum of two numbers:

$\displaystyle (a+b)^N = a^N + N a^{N-1} b + \dots + N a b^{N-1} + b^N = \sum_{M=0}^N \binom{N}{M}a^{N-M} b^M,$ (3.3)

in which we have chosen $a = b = 1$. The quantity $\Omega(N)$ is known as the statistical weight of the system. The weight function $\omega(N_H,N)$ tells us about the number of ways we can realise a configuration with exactly $N_H$ heads, which is equal to the number of ways we can obtain the corresponding value $f(N_H,N)$, and is the statistical weight of the system with the constraint of having $N_H$ heads. If we divide this number by the total number of possibilities $\Omega(N)$, we obtain the probability $p(N_H,N)$ of observing the value $f(N_H,N)$, which is correctly normalised to 1. This probability function $p(N_H,N)$ has structure, and it is easy to verify that it has a maximum for $N_H = N/2$ and it is symmetric around this value. In Fig. [*] we show the probability $p(N_H,N)$ divided by its value at $N_H = N/2$, for three different system sizes. The figure shows how this probability becomes more localised around $f = 0$ as $N$ increases.

Figure: The probability function $p(f)/p(0)$ as function of $f$ for three different values of $N$.
\includegraphics[width=8cm]{f.pdf}

The meaning of the maximum for $N_H = N/2$ is that if we assume that the coins have been thrown at random and we make a measurement of $f$, the most likely value we would find is $f \simeq 0$ (indeed, if we found a value very different from 0 we would suspect that the coins are loaded). In a dynamic situation, in which the coins are continuously randomly flipped, say one flip per second, we would expect to observe $f$ to be almost zero all the time, but of course its value would fluctuate.

We could start with a situation in which $f$ is very different from zero, but as we start randomly flipping coins we would see the value of $f$ to approach zero. During this initial transient we would say that the system is not in equilibrium and we would expect that equilibrium would be established after a sufficiently large number of flips. Note that at any instant in time the corresponding snapshot configuration is simply one of the possible $2^N$, as they all have the same probability of occurring. There is nothing special with a configuration for which $f \simeq 0$ compared to a configuration for which $f\simeq 1$. All microstates have the same chance to occur. However, $f \simeq 0$ can be realised by many more configurations than $f\simeq 1$ and this is the only reason why it is more likely to be observed, so this is why at equilibrium it is what we would and should observe. However, if we waited for long enough we would observe any of the possible $2^N$ configurations, including those that give e.g. $f\simeq 1$, but the system will not stay for long with such a value of $f$. The value $f=1$, for example, would be observed for a fraction of the time equal to $1/2^N$, which quickly becomes negligible as $N$ grows 3.1. Most of the time we will observe a value $f \simeq 0$, and since large deviations from this value are exceedingly unlikely, then once the system has reached equilibrium it will stay in equilibrium if no external perturbations move it away from it. Conversely, if the system is not in equilibrium it will inevitably move towards it.

Note that we associated the concept of equilibrium to the value of $f$, i.e. to the value of a macroscopic variable obtained as average over the many degrees of freedom of the system. If we insisted on a microscopic description of the system then we would not observe anything special, only the system moving from one microstate to another.

Now let us discuss another important principle. Imagine we impose some constraint, for example that a fraction of the coins are not going to be flipped and they all show head. We can still apply the arguments developed above and the only difference would be that now equilibrium is achieved for a different value of $f$, which takes into account the constraint. The other important effect of the presence of the constrain is that, since we are not free to flip all coins, the total number of possible configurations is reduced, $\Omega(N,\alpha) < \Omega(N)$ (here $\alpha$ is a parameter describing the constraint). We could continue this process by adding a second constraint $\beta$, which will give $\Omega(N,\alpha, \beta) < \Omega(N,\alpha)$ because we would be reducing the freedom of the system even further. We see that any additional constraint that reduces the freedom of the system also reduces its statistical weight.

Let us go back to the initial situation in which we only have one constraint, $\alpha$, and let us remove it. By removing the constraint we increased the number of possible configurations from $\Omega(N,\alpha)$ to $\Omega(N)$. This shows that the statistical weight is maximum with respect to any internal constraint. Sometimes this statement is presented in terms of equilibrium, i.e. that the statistical weight increases as the system moves towards equilibrium, but in fact $\Omega$ is defined by the physical constraints that act on the system, not its status of equilibrium. The difference is subtle but important, as a status of non-equilibrium must include a discussion of the mechanisms that make the system evolve towards equilibrium, which also involve time in some form. These mechanisms, and indeed any discussion involving time, are not part of equilibrium thermodynamics.