Linear algebra is undoubtedly the most useful field in all of algebra. It finds applications in all kinds of science and engineering, like quantum mechanics, graphics programming, and machine learning. It is the “most well-behaved” algebraic theory, in that other abstract algebra topics often try to approximate linear algebra, when possible.
For many students, linear algebra means vectors and matrices and determinants, and complex formulas for computing them. Matrices, in particular, come equipped with a fairly complicated, and a fortiori convoluted, multiplication operation.
This is not the only way to teach linear algebra, of course. Matrices and their multiplication appear complicated, but actually are a natural and compact way to represent a particular type of function, i.e., a linear map (or linear transformation).
This article is a short introduction to viewing linear algebra from the perspective of abstract algebra, from which matrices arise as a computational tool, rather than an object of study in and of themselves. I do assume some degree of familiarity with the idea of a matrix.
Most linear algebra courses open with a description of vectors in Euclidean space: . Vectors there are defined as tuples of real numbers that can be added, multiplied, and scaled. Two vectors can be combined into a number through the dot product. Vectors come equipped with a notion of magnitude and direction.
However, this highly geometric picture can be counterproductive, since it is hard to apply geometric intuition directly to higher dimensions. It also obscures how this connects to working over a different number system, like the complex numbers.
Instead, I’d like to open with the concept of a linear space, which is somewhat more abstract than a vector space1.
First, we will need a notion of a “coefficient”, which is essentially something that you can do arithmetic with. We will draw coefficients from a designated ground field . A field is a setting for doing arithmetic: a set of objects that can be added, subtracted, and multiplied, and divided in the “usual fashion” along with special and values. E.g. , , , and so on.
Not only are the real numbers a field, but so are the complex numbers , and the rational numbers . If we drop the “division” requirement, we can also include the integers , or polynomials with rational coefficients , for example.
Having chosen our coefficients , a linear space over is another set of objects that can be added and subtracted (and including a special value )2, along with a scaling operation, which takes a coefficient and one of our objects and produces a new .
The important part of the scaling operation is that it’s compatible with addition: if we have and , we require that
This is what makes a linear space “linear”: you can write equations that look like first-degree polynomials (e.g. ), and which can be manipulated like first-degree polynomials.
These polynomials are called linear because their graph looks like a line. There’s no multiplication, so we can’t have , but we do have multiplication by a coefficient. This is what makes linear algebra is “linear”.
Some examples: -tuples of elements drawn from any field are a linear space over that field, by componentwise addition and scalar multiplication; e.g., . Setting shows that every field is a linear space over itself.
Polynomials in one variable over some field, , are also a linear space, since polynomials can be added together and scaled by a any value in (since lone coefficients are degree zero polynomials). Real-valued functions also form a linear space over in a similar way.
A linear map is a function between two linear spaces and over which “respects” the linear structure in a particular way. That is, for any and ,
We call this type of relationship (respecting addition and scaling) “linearity”. One way to think of this relationship is that is kind of like a different kind of coefficient, in that it distributes over addition, which commutes with the “ordinary” coefficients from . However, applying produces a value from rather than .
Another way to think of it is that if we have a linear polynomial like in , then . We say that commutes with all linear polynomials.
The most obvious sort of linear map is scaling. Given any coefficient , it defines a “scaling map”:
It’s trivial to check this is a linear map, by plugging it into the above equations: it’s linear because scaling is distributive and commutative.
Linear maps are the essential thing we study in linear algebra, since they describe all the different kinds of relationships between linear spaces.
Some linear maps are complicated. For example, a function from that rotates the plane by some angle is linear, as are operations that stretch or shear the plane. However, they can’t “bend” or “fold” the plane: they are all fairly rigid motions. In the linear space of rational polynomials, multiplication by any polynomial, such as or , is a linear map. The notion of “linear map” depends heavily on the space we’re in.
Unfortunately, linear maps as they are quite opaque, and do not lend themselves well to calculation. However, we can build an explicit representation using a linear basis.
For any linear space, we can construct a relatively small of elements such that any element of the space can be expressed as some linear function of these elements.
Explicitly, for any , we can construct a sequence3 such that for any , we can find such that
Such a set is called a basis if it is linearly independent: no one can be expressed as a linear function of the rest. The dimension of , denoted , is the number of elements in any choice of basis. This value does not depend on the choice of basis4.
Constructing a basis for any is easy: we can do this recursively. First, pick a random element of , and define a new linear space where we have identified all elements that differ by a factor of as equal (i.e., if , we treat and as equal in ).
Then, a basis for is a basis of with added. The construction of is essentially “collapsing” the dimension “points” in, giving us a new space where we’ve “deleted” all of the elements that have a nonzero component.
However, this only works when the dimension is finite; more complex methods must be used for infinite-dimensional spaces. For example, the polynomials are an infinite-dimensional space, with basis elements . In general, for any linear space , it is always possible to arbitrarily choose a basis, although it may be infinite5.
Bases are useful because they give us a concrete representation of any element of . Given a fixed basis , we can represent any by the coefficients themselves. For a finite-dimensional , this brings us back column vectors: -tuples of coefficients from that are added and scaled componentwise.
The th basis element is represented as the vector whose entries are all except for the th one, which is . E.g.,
It is important to recall that the choice of basis is arbitrary. From the mathematical perspective, any basis is just as good as any other, although some may be more computationally convenient.
Over , and are sometimes called the “standard basis”, but and are also a basis for this space. One easy mistake to make, particularly when working over the tuple space , is to confuse the actual elements of the linear space with the coefficient vectors that represent them. Working with abstract linear spaces eliminates this source of confusion.
Working with finite-dimensional linear spaces and , let’s choose bases and for them, and let’s consider a linear map .
The powerful thing about bases is that we can more compactly express the information content of . Given any , we can decompose it into a linear function of the basis (for some coefficients), so we can write
In other words, to specify , we only need to specify what it does to each of the basis elements. But what’s more, because also has a basis, we can write
Putting these two formulas together, we have an explicit closed form for , given the coefficients of , and the coefficients of :
Alternatively, we can express and as column vectors, and as the matrix with entires . The entries of the resulting column vector are given by the above explicit formula for , fixing the value of in each entry.
(Remember, this is all dependent on the choices of bases and !)
Behold, we have derived the matrix-vector multiplication formula: the th entry of the result is the dot product of the vector and the th row of the matrix.
But it is crucial to keep in mind that we had to choose bases and to be entitled to write down a matrix for . The values of the coefficients depend on the choice of basis.
If your linear space happens to be , there is an “obvious” choice of basis, but not every linear space over is ! Importantly, the actual linear algebra does not change depending on the basis6.
So, where does matrix multiplication come from? An 7 matrix represents some linear map , where , , and appropriate choices of basis (, ) have been made.
Keeping in mind that linear maps are supreme over matrices, suppose we have a third linear space , and a map , and let . Choosing a basis for , we can represent as a matrix of dimension .
Then, we’d like for the matrix product to be the same matrix we’d get from representing the composite map as a matrix, using the aforementioned choices of bases for and (the basis choice for should “cancel out”).
Recall our formula for in terms of its matrix coefficients and the coefficients of the input , which we call . We can produce a similar formula for , giving it matrix coefficients , and coefficients for . (I appologize for the number of indices and coefficients here.)
If we write , then is the coefficient is multiplied by; i.e., we fix , and drop it from the summation: .
Substituting that into the above formula, we now have something like the following.
In , we’ve rearranged things so that the sum in parenthesis is the th matrix coefficient of the composite . Because we wanted to represent , it must be an matrix whose entries are
This is matrix multiplication. It arises naturally out of composition of linear maps. In this way, the matrix multiplication formula is not a definition, but a theorem of linear algebra!
Given an matrix and an matrix , both with coefficients in , then is an matrix with entires
If the matrix dimension is read as instead of , the shape requirements are more obvious: two matrices and can be multiplied together only when they represent a pair of maps and .
The identity matrix is an matrix:
We want it to be such that for any appropriately-sized matrices and , it has and . Lifted up to linear maps, this means that should represent the identity map , when . This map sends each basis element to itself, so the columns of should be the basis vectors, in order:
If we shuffle the columns, we’ll get a permutation matrix, which shuffles the coefficients of a column vector. For example, consider this matrix.
This is similar to the identity, but we’ve swapped the first two columns. Thus, it will swap the first two coefficients of any column vector.
Matrices may seem unintuitive when they’re introduced as a subject of study. Every student encountering matrices for the same time may ask “If they add componentwise, why don’t they multiply componentwise too?”
However, approaching matrices as a computational and representational tool shows that the convoluted-looking matrix multiplication formula is a direct consequence of linearity.
In actual modern mathematics, the objects I describe are still called vector spaces, which I think generates unnecessary confusion in this case. “Linear space” is a bit more on the nose for what I’m going for. ↩
This type of structure (just the addition part) is also called an “abelian group”. ↩
Throughout , , and are indices in some unspecified but ordered indexing set, usually . I will not bother giving this index set a name. ↩
An example of a messy infinite-dimensional basis is considered as linear space over (in general, every field is a linear space over its subfields). The basis for this space essentially has to be “, and all irrational numbers” except if we include e.g. and we can’t include , which is a -linear combination of and .
On the other hand, is two-dimensional over , with basis .
Incidentally, this idea of “view a field as a linear space over its subfield ” is such a useful concept that it is called the “degree of the field extension ”, and given the symbol .
This, and . ↩
You may recall from linear algebra class that two matrices and of the same shape are similar if there are two appropriately-sized square matrices and such that . These matrices and represent a change of basis, and indicate that the linear maps these matrices come from do “the same thing” to elements of .
Over an algebraically closed field like (i.e. all polynomials have solutions), there is an even stronger way to capture the information content of a linear map via Jordan canonicalization, which takes any square matrix and produces an almost-diagonal square matrix that only depends on the eigenvalues of , which is the same for similar matrices, and thus basis-independent. ↩
Here, as always, matrix dimensions are given in RC (row-column) order. You can think of this as being “input dimension” to “output dimension”. ↩