2.4 Row operations
When we want to find solutions to a system of linear equations, we usually go about it by adding or subtracting multiples of one equation from another in order to eliminate variables one by one. These operations on equations correspond to row operations on the matrix form (2.2) of the equations.
Definition 2.8 A row operation, or row op, is one of the following procedures we can apply to a matrix.
- \(r_i(\lambda)\): multiply each entry in row \(i\) by the number \(\lambda \neq 0\). This is also written \(r_i \mapsto \lambda r_i\).
- \(r_{ij}\): swap rows \(i\) and \(j\). This is also written \(r_i \leftrightarrow r_j\).
- \(r_{ij}(\lambda)\): add \(\lambda\) times row \(i\) to row \(j\), where \(i \neq j\). This is also written \(r_j \mapsto r_j + \lambda r_i\).
Sometimes these are called elementary row operations, but we’ll just call them row operations. If \(r\) is a row operation and \(A\) a matrix we write \(r(A)\) for the result of applying \(r\) to \(A\).
Often people write \((A | \mathbf{b})\) or put a dotted line before the last column of an augmented matrix to emphasise where it came from.
Doing a row operation to the augmented matrix of a system corresponds to manipulating the system of equations in a way that doesn’t change the solutions.
Suppose we have a matrix equation \(A \mathbf{x}=\mathbf{b}\), where \(\mathbf{x}\) is a matrix of indeterminates. A solution \(\mathbf{v}\) of this matrix equation is defined to be a column vector of numbers such that \(A \mathbf{v}=\mathbf{b}\).
Proof. The only row operation for which this is not obvious is \(r_{ij}(\lambda)\), which adds \(\lambda\) times row \(i\) to row \(j\).
First suppose \(\mathbf{v}=\begin{pmatrix} v_1 \\ \vdots \\ v_m \end{pmatrix}\) is a solution of the system \(A \mathbf{x}=\mathbf{b}\). We have to show it is a solution to the new system \(A' \mathbf{x}=\mathbf{b}'\). Since \(\mathbf{v}\) is a solution of \(A \mathbf{x}=\mathbf{b}\) we have \[\begin{align*} a_{i1}v_1 + \cdots + a_{im}v_m &= b_i \\ a_{j1}v_1 + \cdots + a_{jm}v_m &= b_j . \end{align*}\] It follows by adding \(\lambda\) times the first equation to the second that \[ (a_{j1}+\lambda a_{i1})v_1 + \cdots + (a_{jm}+\lambda a_{im})v_m = b_j + \lambda b_i, \] and therefore \(\mathbf{v}\) is a solution of \(A'\mathbf{x}=\mathbf{b}'\).
Conversely, suppose \(\mathbf{v}=\begin{pmatrix} v_1 \\ \vdots \\ v_m \end{pmatrix}\) is a solution of the system \(A' \mathbf{x}=\mathbf{b}'\). We have to show \(A \mathbf{v}=\mathbf{b}\). First note that only the \(j\)th equation in this system is a problem, since all the other equations are the same as those of the system \(A' \mathbf{x}=\mathbf{b}'\). So we only need to show that the \(j\)th holds, that is, that \[ a_{j1}v_1 + \cdots + a_{jm}v_m = b_j . \] But the \(i\)th and \(j\)th equations of the system \(A' \mathbf{x}=\mathbf{b}'\) tell us \[\begin{align*} a_{i1}v_1 + \cdots + a_{im}v_m &= b_i \\ (a_{j1}+\lambda a_{i1})v_1 + \cdots + (a_{jm}+\lambda a_{im})v_m & = b_j + \lambda b_i. \end{align*}\] Subtracting \(\lambda\) times the first equation from the second gives us what we need.The next proposition shows that doing a row operation to a matrix has the same effect as multiplying it by an elementary matrix.
Proof. It’s enough to check the case when \(A=\begin{pmatrix} a_1\\ \vdots \\ a_m \end{pmatrix}\) is a column vector, because of the columnwise nature of matrix multiplication. There are three cases according to the three types of row operation. In each case we will work out \(r(I_m)A\) and show that it is the same as \(r(A)\). For any \(l\), the \(l\)th entry of \(r(I_m)A\) is
\[\begin{equation} \tag{2.3} \sum_{k=1}^m s_{lk}a_k \end{equation}\]
where \(s_{lk}\) is the \(l,k\) entry of \(r(I_m)\).
- \(r=r_i(\lambda)\). For \(l\neq i\), the \(l\)th row of \(r(I_m)\) is the same as the \(l\)th row of \(I_m\), so \(s_{lk}=0\) unless \(k=l\) in which case it is 1. So from (2.3), the \(l\)th entry of \(r(I_m)A\) is \(a_l\). If \(l=i\) then \(s_{lk}=\lambda\) if \(k=l\) and 0 otherwise, so by (2.3) the \(i\)th entry of \(r(I_m)A\) is \(\lambda a_i\). Therefore \(r(I_m)A\) is the same as \(A\), except that the \(i\)th entry is multiplied by \(\lambda\). This is \(r(A)\).
- \(r=r_{ij}\). There are three cases for \(l\).
- \(l\neq i,j\). In this case \(s_{lk}=0\) unless \(k=l\) in which case it is 1, so from (2.3) the sum equals \(a_r\).
- \(l=i\). Since \(r_{ij}(I_m)\) swaps rows \(i\) and \(j\) of \(I_m\), \(s_{ik}=1\) if \(k=j\) and 0 otherwise. So the sum is \(a_j\).
- \(l=j\). Similar to the last case, the sum equals \(a_i\). Therefore \(r(I_m)A\) is the same as \(A\), except in row \(i\) where \(a_j\) appears and row \(j\) where \(a_i\) appears. This is \(r(A)\).
- \(l\neq i,j\). In this case \(s_{lk}=0\) unless \(k=l\) in which case it is 1, so from (2.3) the sum equals \(a_r\).
- \(r=r_{ij}(\lambda)\). This time the rows of \(r(I_m)\) are the same as the rows of \(I_m\), except the \(i\)th which has \(i,i\) entry 1 and \(i,j\) entry \(\lambda\), so \(s_{ii}=1, s_{ij}=\lambda\), and all other \(s_{ik}=0\). From (2.3), every entry of \(r(I_m)A\) is the same as the corresponding entry of \(A\) except the \(i\)th, where we get \(a_i+\lambda a_j\), and again, \(r(I_m)A\) is the same as \(r(A)\).
Proof. Each row operation \(r\) has an inverse \(r^{-1}\) by Lemma 2.1. Then
\[ r(I_m) r^{-1}(I_m) = r(r^{-1}(I_m)) \]
by Proposition 2.4, and this equals \(I_m\). Similarly \(r^{-1}(I_m)r(I_m)=I_m\), so the elementary matrix \(r(I_m)\) is invertible with inverse \(r^{-1}(I_m)\).2.4.1 Row reduced echelon form
Because doing row operations to the augmented matrix \((A|\mathbf{b})\) doesn’t change the solutions of the matrix equation \(A\mathbf{x}=\mathbf{b}\), we have a method of solving such a system of linear equations: keep doing row ops until the equations are so simple that the solutions can be easily read off. The ‘simple’ form we are looking for is called row reduced echelon form.
Definition 2.11
- A zero row of a matrix is one in which every entry is 0.
- The leading entry in a row of a matrix is the first non-zero entry, starting from the left.
- A matrix is in row reduced echelon form, or RRE form or RREF for short, if:
- all leading entries are 1,
- any leading entry is strictly to the right of any leading entry in the row above it,
- if a column contains a leading entry, every other entry in that column is 0, and
- any zero rows are below any non-zero rows.
Example 2.3
- \(\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}\) isn’t in RRE form: the zero row is at the top.
- \(\begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}\) isn’t in RRE form: there is a row in which the left-most non-zero entry is not 1.
- \(\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}\) isn’t in RRE form: the left-most 1 in row 2 is not to the right of the left-most 1 in the row above it.
- \(\begin{pmatrix} 1 &\alpha &\beta & 3 \\ 0 &0 & 1 & -2 \end{pmatrix}\) is not in RRE unless \(\beta = 0\): the left-most 1 in row 2 is in column 3, but it is not the only non-zero entry in column 3 unless \(\beta = 0\).
- \(\begin{pmatrix} 1 & 0 & 0 & 3 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix}\) is in RRE form.
If \((A|\mathbf{b})\) is the augmented matrix of a system of linear equations and \(A\) is in RRE form, we can easily read off the solutions (if any) of the system. For example, if we think of the fourth example above with \(\alpha=1, \beta=0\) as the augmented matrix of a system of linear equations, those equations are
\[\begin{align*} x_1 + x_2&= 3 \\ x_3 &= -2 \end{align*}\]
and we can see that the general solution is \(x_3=-2\), \(x_2\) can be anything, and \(x_1 = 3-x_2\). Similarly if we had a system whose augmented matrix was \[ \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 5 \\ 0 & 0 & 2 \end{pmatrix} \] then we can see that the system has no solutions, since the last equation says \(0 = 2\) which is impossible.
Proof. We’ll first show that a matrix can be put into echelon form by doing row operations. The proof is by induction on the number of columns. It is easy to get a one-column matrix into echelon form: if all entries are zero then the matrix is in echelon form already, if not, swap the zero rows so that they are at the bottom then multiply the non-zero rows by the reciprocal of their entries, then subtract the top row from all of the non-zero rows below it. The matrix then looks like \(\begin{pmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}\) which is in echelon form.
Now suppose \(A\) is a matrix with \(n>1\) columns. By induction there is a sequence of row operations we can do that puts the matrix formed by the first \(n-1\) columns of \(A\) into echelon form. Let \(B\) be the result of doing those row operations to \(A\), so that the first \(n-1\) columns of \(B\) form a matrix in echelon form, but \(B\) itself may not be in echelon form because of its final column.
Suppose that the RREF matrix formed by the first \(n-1\) columns of \(B\) has \(k\) rows of zeros at the bottom. The row operations we are about to do will affect only those \(k\) rows. If there are any zero entries in the bottom \(k\) rows of the final column of \(B\), swap the rows containing them to the bottom. Multiply the rows amongst the last \(k\) with non-zero final column entry by the reciprocal of that entry. Subtract the top of these \(k\) rows from each of the non-zero rows below it. The resulting matrix is in echelon form.
Now we have shown that any matrix \(A\) can be reduced to echelon form by doing row operations. To get it into RRE form using row operations, for each row which contains a leading entry, subtract multiples of that row from the others so that the leading entry is the only non-zero entry in its column. It is then in RRE form.Here is the key result on RRE form and the solution of linear equations:
Proposition 2.6 Let \((A' | \mathbf{b}')\) be the result of putting \((A|\mathbf{b})\) into RRE form. Then a vector \(\mathbf{v}\) is a solution of \(A\mathbf{x}=\mathbf{b}\) if and only if it is a solution of \(A' \mathbf{x} = \mathbf{b}'\).
Proof. (not examinable, taken from Yuster, Mathematics Magazine 57 no.,2 1984 pp.93–94). The proof is by induction on the number \(m\) of columns of \(A\); for \(m=1\) the result is clear. Now suppose \(A\) has \(m>1\) columns, and that \(B\) and \(C\) are different matrices in RRE form obtained by doing row operations to \(A\). The first \(m-1\) columns of \(B\) and \(C\) are RRE forms for the first \(m-1\) columns of \(A\), so by the inductive hypothesis \(B\) and \(C\) can only differ in their last columns. Suppose they differ in row \(j\).
Let \(\mathbf{u}\) be any solution to \(B \mathbf{u}=\mathbf{0}\). By the previous proposition, \(B \mathbf{u} = \mathbf{0}\) iff \(A \mathbf{u} = \mathbf{0}\) iff \(C \mathbf{u} = \mathbf{0}\), so we have \((B-C)\mathbf{u}=\mathbf{0}\). As the first \(m-1\) columns of \(B\) and \(C\) are the same, this last equation is equivalent to \((b_{im}-c_{im})u_m = 0\) for all \(i\). As \(b_{jm}\neq c_{jm}\) we must have \(u_m=0\).
This means the last columns of \(B\) and \(C\) have a leading 1. Otherwise the variable \(x_m\) corresponding to the last column could be chosen freely when solving \(B \mathbf{x}=\mathbf{0}\) and \(C \mathbf{x}= \mathbf{0}\). Since \(B\) and \(C\) are in RRE form, this last column has a leading 1 and all other entries are zero. But the first \(m-1\) columns of \(B\) and \(C\) determine where the leading 1 in column \(m\) can go if they are to be in RRE form. Since the first \(m-1\) columns of \(B\) and \(C\) are equal, it follows \(B\) and \(C\) are equal, a contradiction.
This last result means that we can talk about the row reduced echelon form of a matrix, because it tells us that given any matrix \(A\) there is one and only one matrix in RRE form that can be obtained from \(A\) by doing row operations.
2.4.2 Solving systems in RRE form
Our method for solving linear systems whose matrix form is \(A\mathbf{x}=\mathbf{b}\) is then:
- Form the augmented matrix \((A|\mathbf{b})\), and do row operations until it is in echelon form \((A'| \mathbf{b}')\). This new system has the same solutions as the old system, by Proposition 2.3.
- Read off the solutions, if there are any.
We can obtain the solutions of a set of equations whose augmented matrix is in row reduced echelon form as follows. Recall that the final column corresponds to the constant term of the equations, and each column before that corresponds to a variable.
- If the final column has a leading entry, there are no solutions.
- Otherwise, variables which correspond to columns with no leading entry (if any) can be chosen freely, and the remaining variables are uniquely determined in terms of these.
The first bullet point is illustrated by what happens in the second example above when \(d\neq 0\). In the first example the \(z\) column has no leading entry, so we were able to choose \(z\) freely, therefore getting infinitely many solutions. In the second, each of the \(x,y,z\) columns had leading entries, so there was no free choice and the equations had only one solution.
2.4.3 Invertibility and RRE form
We want to prove that a matrix is invertible if and only if its RRE form is the identity, but we need two preliminary lemmas.
Proof. Suppose the RRE form of \(A\) is \(I_n\), so that there are row ops \(r_1,\ldots,r_m\) with \[ r_1(r_2(\cdots (r_m (A)))\cdots ) = I_n.\] By Proposition 2.4, \[ r_1(I_n)r_2(I_n)\cdots r_m(I_n) A = I_n.\] By Corollary 2.1, the \(r_i(I_n)\) are invertible, and multiplying the above equation on the left by the inverse of \(r_1(I_n)\), then the inverse of \(r_2(I_n)\), and so on, \[ A = r_m(I_n)^{-1}r_{m-1}(I_n)^{-1} \cdots r_1(I_n)^{-1}.\] It follows that \(A\) is a product of invertible matrices, so by Lemma 2.2 \(A\) is invertible.
Conversely suppose the RRE form of \(A\) is not \(I_n\), so that by Lemma 2.3 it has a row of zeroes. There are therefore less than \(n\) leading entries in the RRE form, so there’s a column with no leading entry, say column \(j\). The matrix equation \(A\mathbf{x}=\mathbf{0}\) then has infinitely many solutions, since we can choose the value of the variable corresponding to column \(j\) freely. But if \(A\) is invertible, \(A\mathbf{x}=\mathbf{0}\) implies \(\mathbf{x}=A^{-1}\mathbf{0}= \mathbf{0}\) so there is only one solution.If \(A\) is invertible and \(\mathbf{c}_j\) is the \(j\)th column of \(A^{-1}\), the fact that
\[ A A^{-1}=I_n\]
and the fact that matrix multiplication works column-by-column means that \(A \mathbf{c}_j=\mathbf{e}_j\), where \(\mathbf{e}_j\) is the \(j\)th column of \(I_n\) (that is, the column vector with zeroes everywhere except for a 1 in row \(j\)). This means \(\mathbf{c}_j\) is the solution of the equation \(A\mathbf{x}=\mathbf{e}_j\). But we can solve this by putting \((A \, \mathbf{e}_j)\) into RRE form: we know that the RRE form of \(A\) is \(I_n\), so the result will be \((I_n \, \mathbf{c}_j)\).
Rather than do this \(n\) times to work out the \(n\) columns of \(A^{-1}\), we can simply start with the matrix \((A\,\,\, I_n)\) and put it into RRE form — the result will be \((I_n\,\,\, A^{-1})\).
We can use this method as a test of the invertibility of \(A\) which produces at the same time the inverse of \(A\), if it exists: start with \((A\,\,\, I_n)\) and put it into RRE form, getting \((C\,\,\, D)\) say. If \(C\neq I_n\) then \(A\) isn’t invertible, by Theorem 2.1. If \(C=I_n\), then \(A\) is invertible and the remaining part \(D\) of this matrix is \(A^{-1}\).