Introduction to Linear Systems of Equations

Our study of linear algebra will begin with examining systems of linear equations. Such linear equations appear frequently in applied mathematics in modelling certain phenomena. For example in linear programming, profit is usually maximized subject to certain constraints related to labour, time availability etc. These constraints can be put in the form of a linear system of equations.

While we have already studied the contents of this chapter it is a good idea to quickly re read this page to freshen up the definitions.

Linear systems

Graph sample of linear equations

A linear equation is an equation in which each term is either a constant or the product of a constant times the first power of a variable. Such an equation is equivalent to equating a first-degree polynomial to zero. Some examples of linear equations are as follows:

$x+3y=-4\$
$7x_{1}=15+x_{2}\$
$z{\sqrt {2}}+e=\pi \$

The term linear comes from basic algebra and plane geometry where the standard form of algebraic representation of a line that is on the real plane is $ax+by=c$ where a, b, c are real constants and x, y are real variables. Review of the above examples will find each equation fits the general form.

The following are not linear equations:

$xy+7=2\$
${\sqrt {x_{1}}}+x_{2}=11$
$x^{3}=6-12z\$

For an equation to be linear, it does not necessarily have to be in standard form (all terms with variables on the left-hand side). The constants in linear equations need not be integral (or even rational).

Linear equations are classified by the number of variables they involve. The classification is straightforward -- an equation with n variables is called a linear equation in n variables. If n is 2 the linear equation is geometrically a straight line, and if n is 3 it is a plane. The geometrical shape for a general n is sometimes referred to as an affine hyperplane. We'll however be simply using the word n-plane for all n.

For clarity and simplicity, a linear equation in n variables is written in the form $a_{1}x_{1}+a_{2}x_{2}+a_{3}x_{3}+...+a_{n}x_{n}=b\$ , where $a_{1},a_{2},...,a_{n}\$ are constants (called the coefficients), and $b\$ is the constant term.

A linear system in three variables determines a collection of planes.

A linear system (or system of linear equations) is a collection of linear equations involving the same set of variables. For example,

{\begin{alignedat}{7}3x&&\;+\;&&2y&&\;-\;&&z&&\;=\;&&1&\\2x&&\;-\;&&2y&&\;+\;&&4z&&\;=\;&&-2&\\-x&&\;+\;&&{\tfrac {1}{2}}y&&\;-\;&&z&&\;=\;&&0&\end{alignedat}}

is a system of three equations in the three variables $x,y,z\,\!$ .

A general system of m linear equations with n unknowns (or variables) can be written as

{\begin{alignedat}{7}a_{11}x_{1}&&\;+\;&&a_{12}x_{2}&&\;+\cdots +\;&&a_{1n}x_{n}&&\;=\;&&&b_{1}\\a_{21}x_{1}&&\;+\;&&a_{22}x_{2}&&\;+\cdots +\;&&a_{2n}x_{n}&&\;=\;&&&b_{2}\\\vdots \;\;\;&&&&\vdots \;\;\;&&&&\vdots \;\;\;&&&&&\;\vdots \\a_{m1}x_{1}&&\;+\;&&a_{m2}x_{2}&&\;+\cdots +\;&&a_{mn}x_{n}&&\;=\;&&&b_{m}\\\end{alignedat}}

Here $x_{1},\ x_{2},...,x_{n}$ are the unknowns, $a_{11},\ a_{12},...,\ a_{mn}$ are the coefficients of the system, and $b_{1},\ b_{2},...,b_{m}$ are the constant terms.

Solutions of linear systems

A solution of a linear equation is any n-tuple of values $(s_{1},s_{2},....,s_{n})\$ which satisfies the linear equation. For example, $(-1,-1)\$ is a solution of the linear equation $x+3y=-4\$ since $-1+(3\times -1)=-1+(-3)=-4$ , but $(1,5)\$ is not.

Similarly, a solution to a linear system is any n-tuple of values $(s_{1},s_{2},....,s_{n})\$ which simultaneously satisfies all the linear equations given in the system.

For example,

{\begin{alignedat}{7}3x&&\;+\;&&2y&&\;-\;&&z&&\;=\;&&1&\\2x&&\;-\;&&2y&&\;+\;&&4z&&\;=\;&&-2&\\-x&&\;+\;&&{\tfrac {1}{2}}y&&\;-\;&&z&&\;=\;&&0&\end{alignedat}}

has as its solution $(1,-2,-2)\$ . This can also be written as:

${\begin{alignedat}{2}x&=&1\\y&=&-2\\z&=&-2\end{alignedat}}$

We also refer to the collection of all possible solutions as the solution set.

In general, for any linear system of equations there are three possibilities regarding solutions:

A unique solution: In this case only one specific solution set exists. Geometrically this implies the n-planes specified by each equation of the linear system all intersect at a unique point in the space that is specified by the variables of the system.

No solution: The equations are termed inconsistent and specify n-planes in space which do not intersect or overlap. It is not possible to specify a solution set that satisfies all equations of the system.

An infinite range of solutions: The equations specify n-planes whose intersection is an m-plane where $m\leq n$ . This being the case, it is possible to show that an infinite set of solutions within a specific range exists that satisfy the set of linear equations.

The following pictures illustrate these cases:


A unique solution	No solution	An infinite range of solutions

Why are there only these three cases and no others? Although a justification shall be provided in the next chapter, it is a good exercise for you to figure it out now.

A linear system is said to be inconsistent if it has no solution. If there exists at least one solution, then the system is said to be consistent.

Methods of solving linear systems

We know that linear equations in 2 or 3 variables can be solved using techniques such as the elimination and the substitution method. However these techniques are not appropriate for dealing with large systems where there are a large number of variables. These techniques are therefore generalized and a systematic procedure called Gaussian elimination is usually used in actual practice. A variant of this technique known as the Gauss Jordan method is also used.

Many times we are required to solve many linear systems where the only difference in them are the constant terms. The coefficients of the variables all remain the same. A technique called LU decomposition is used in this case. A variant called Cholesky factorization is also used when possible. We will study these techniques in later chapters.

Gauss Jordan Elimination

Performing row operations on a matrix is the method we use for solving a system of equations. In order to solve the system of equations, we want to convert the matrix to reduced row echelon form ("rref"), in which there are ones down the main diagonal from the upper left corner to the lower right corner, and zeros in every position below the main diagonal. Here are some examples of matrices in rref:

${\begin{bmatrix}1&3&1\\0&1&-1\\0&0&1\end{bmatrix}}$ , ${\begin{bmatrix}1&3&1\\0&1&-1\\\end{bmatrix}}$ , ${\begin{bmatrix}1&0&0&3\\0&1&0&0\\0&0&1&2\\\end{bmatrix}}$ , ${\begin{bmatrix}1&7\\0&1\\\end{bmatrix}}$ , ${\begin{bmatrix}1&3&0\\0&1&1\\0&0&0\end{bmatrix}}$ , ${\begin{bmatrix}1&0&0\\0&0&0\\0&0&0\end{bmatrix}}$

We use row operations corresponding to equation operations to obtain a new matrix that is row-equivalent in a simpler form. Here are the guidelines to obtaining row-echelon form.

In any nonzero row, the first nonzero number is a 1. It is called a leading 1.
Any all-zero rows are placed at the bottom on the matrix.
Any leading 1 is below and to the right of a previous leading 1.
Any column containing a leading 1 has zeros in all other positions in the column.

To solve a system of equations we can perform the following row operations to convert the coefficient matrix to row-echelon form and do back-substitution to find the solution.

Interchange a row $R_{i}$ with another row $R_{j}$ . (Notation: $R_{i}\longleftrightarrow R_{j}$ )
Multiply a row by a constant. (Notation: $cR_{i}$ )
Add the product of a row multiplied by a constant to another row. (Notation: $R_{i}+cR_{j}$ )

Each of the row operations corresponds to the operations we have already learned to solve systems of equations in three variables. With these operations, there are some key moves that will quickly achieve the goal of writing a matrix in row-echelon form. To obtain a matrix in row-echelon form for finding solutions, we use Gaussian (also called Gauss-Jordan elimination, a method that uses row operations to obtain a 1 as the first entry so that row 1 can be used to convert the remaining rows.

The steps to perform Gaussian elimination on an augmented matrix (and thus convert the matrix to reduced row echelon form) are as follows.

The first equation should have a leading coefficient of 1. Interchange rows or multiply by a constant, if necessary.
Use row operations to obtain zeros down the first column below the first entry of 1.
Use row operations to obtain a 1 in row 2, column 2.
Use row operations to obtain zeros down column 2, below the entry of 1.
Use row operations to obtain a 1 in row 3, column 3.
Continue this process for all rows until there is a 1 in every entry down the main diagonal and there are only zeros below.
If any rows contain all zeros, place them at the bottom.

Example of Gauss-Jordan elimination:

${\begin{bmatrix}1&3&1&9\\1&1&-1&1\\3&11&5&35\end{bmatrix}}\to {\begin{bmatrix}1&3&1&9\\0&-2&-2&-8\\0&2&2&8\end{bmatrix}}\to {\begin{bmatrix}1&3&1&9\\0&-2&-2&-8\\0&0&0&0\end{bmatrix}}\to {\begin{bmatrix}1&0&-2&-3\\0&1&1&4\\0&0&0&0\end{bmatrix}}$

First, we subtract 1 of the first row from the second row ( $R_{2}-R_{1}$ ) and 3 of the first row from the third row ( $R_{3}-3R_{1}$ ). Then, we add the new second row to the new third row ( $R_{3}+R_{2}$ ). Lastly, we multiply the new second row by $-{\frac {1}{2}}$ to make the diagonal entries equal to 1 ( $-{\frac {1}{2}}R_{2})$ . Our matrix is now in reduced row echelon form, since all nonzero diagonal entries are equal to 1, the all-zero row is at the bottom of the matrix, and all entries under the diagonal are zero.