What is a Matrix? A Miserable Pile of Coefficients!

Linear algebra is undoubtedly the most useful field in all of algebra. It finds applications in all kinds of science and engineering, like quantum mechanics, graphics programming, and machine learning. It is the “most well-behaved” algebraic theory, in that other abstract algebra topics often try to approximate linear algebra, when possible.

For many students, linear algebra means vectors and matrices and determinants, and complex formulas for computing them. Matrices, in particular, come equipped with a fairly complicated, and a fortiori convoluted, multiplication operation.

This is not the only way to teach linear algebra, of course. Matrices and their multiplication appear complicated, but actually are a natural and compact way to represent a particular type of function, i.e., a linear map (or linear transformation).

This article is a short introduction to viewing linear algebra from the perspective of abstract algebra, from which matrices arise as a computational tool, rather than an object of study in and of themselves. I do assume some degree of familiarity with the idea of a matrix.

Linear Spaces

Most linear algebra courses open with a description of vectors in Euclidean space: Rn\R^n. Vectors there are defined as tuples of real numbers that can be added, multiplied, and scaled. Two vectors can be combined into a number through the dot product. Vectors come equipped with a notion of magnitude and direction.

However, this highly geometric picture can be counterproductive, since it is hard to apply geometric intuition directly to higher dimensions. It also obscures how this connects to working over a different number system, like the complex numbers.

Instead, I’d like to open with the concept of a linear space, which is somewhat more abstract than a vector space1.

First, we will need a notion of a “coefficient”, which is essentially something that you can do arithmetic with. We will draw coefficients from a designated ground field KK. A field is a setting for doing arithmetic: a set of objects that can be added, subtracted, and multiplied, and divided in the “usual fashion” along with special 00 and 11 values. E.g. a+0=aa + 0 = a, 1a=a1a = a, a(b+c)=ab+aca(b + c) = ab + ac, and so on.

Not only are the real numbers R\R a field, but so are the complex numbers C\C, and the rational numbers Q\Q. If we drop the “division” requirement, we can also include the integers Z\Z, or polynomials with rational coefficients Q[x]\Q[x], for example.

Having chosen our coefficients KK, a linear space VV over KK is another set of objects that can be added and subtracted (and including a special value 00)2, along with a scaling operation, which takes a coefficient cKc \in K and one of our objects vVv \in V and produces a new cvVcv \in V.

The important part of the scaling operation is that it’s compatible with addition: if we have a,bKa, b \in K and v,wVv, w \in V, we require that

a(v+w)=av+aw(a+b)v=av+bv\begin{gather*}a (v + w) = av + aw \\ (a + b) v = av + bv\end{gather*}
Math

This is what makes a linear space “linear”: you can write equations that look like first-degree polynomials (e.g. ax+bax + b), and which can be manipulated like first-degree polynomials.

These polynomials are called linear because their graph looks like a line. There’s no multiplication, so we can’t have x2x^2, but we do have multiplication by a coefficient. This is what makes linear algebra is “linear”.

Some examples: nn-tuples of elements drawn from any field are a linear space over that field, by componentwise addition and scalar multiplication; e.g., R3R^3. Setting n=1n = 1 shows that every field is a linear space over itself.

Polynomials in one variable over some field, K[x]K[x], are also a linear space, since polynomials can be added together and scaled by a any value in KK (since lone coefficients are degree zero polynomials). Real-valued functions also form a linear space over R\R in a similar way.

Linear Transformations

A linear map is a function f:VWf: V \to W between two linear spaces VV and WW over KK which “respects” the linear structure in a particular way. That is, for any cKc\in K and v,wVv, w \in V,

f(v+w)=f(v)+f(w)f(cv)=cf(v)\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}
Math

We call this type of relationship (respecting addition and scaling) “linearity”. One way to think of this relationship is that ff is kind of like a different kind of coefficient, in that it distributes over addition, which commutes with the “ordinary” coefficients from KK. However, applying ff produces a value from WW rather than VV.

Another way to think of it is that if we have a linear polynomial like p(x)=ax+bp(x) = ax + b in xx, then f(p(x))=p(f(x))f(p(x)) = p(f(x)). We say that ff commutes with all linear polynomials.

The most obvious sort of linear map is scaling. Given any coefficient cKc \in K, it defines a “scaling map”:

μc:VVvcv\begin{gather*}\mu_c: V \to V \\ v \mapsto cv\end{gather*}
Math

It’s trivial to check this is a linear map, by plugging it into the above equations: it’s linear because scaling is distributive and commutative.

Linear maps are the essential thing we study in linear algebra, since they describe all the different kinds of relationships between linear spaces.

Some linear maps are complicated. For example, a function from R2R2\R^2 \to \R^2 that rotates the plane by some angle θ\theta is linear, as are operations that stretch or shear the plane. However, they can’t “bend” or “fold” the plane: they are all fairly rigid motions. In the linear space Q[x]\Q[x] of rational polynomials, multiplication by any polynomial, such as xx or x21x^2 - 1, is a linear map. The notion of “linear map” depends heavily on the space we’re in.

Unfortunately, linear maps as they are quite opaque, and do not lend themselves well to calculation. However, we can build an explicit representation using a linear basis.

Linear Basis

For any linear space, we can construct a relatively small of elements such that any element of the space can be expressed as some linear function of these elements.

Explicitly, for any VV, we can construct a sequence3 eie_i such that for any vVv \in V, we can find ciKc_i \in K such that

v=iciei.v = \sum_i c_i e_i.
Math

Such a set eie_i is called a basis if it is linearly independent: no one eie_i can be expressed as a linear function of the rest. The dimension of VV, denoted dimV\dim V, is the number of elements in any choice of basis. This value does not depend on the choice of basis4.

Constructing a basis for any VV is easy: we can do this recursively. First, pick a random element e1e_1 of VV, and define a new linear space V/e1V/e_1 where we have identified all elements that differ by a factor of e1e_1 as equal (i.e., if vw=ce1v - w = ce_1, we treat vv and ww as equal in V/e1V/e_1).

Then, a basis for VV is a basis of V/e1V/e_1 with e1e_1 added. The construction of V/e1V/e_1 is essentially “collapsing” the dimension e1e_1 “points” in, giving us a new space where we’ve “deleted” all of the elements that have a nonzero e1e_1 component.

However, this only works when the dimension is finite; more complex methods must be used for infinite-dimensional spaces. For example, the polynomials Q[x]\Q[x] are an infinite-dimensional space, with basis elements 1,x,x2,x3,...\\{1, x, x^2, x^3, ...\\}. In general, for any linear space VV, it is always possible to arbitrarily choose a basis, although it may be infinite5.

Bases are useful because they give us a concrete representation of any element of VV. Given a fixed basis eie_i, we can represent any w=icieiw = \sum_i c_i e_i by the coefficients cic_i themselves. For a finite-dimensional VV, this brings us back column vectors: (dimV)(\dim V)-tuples of coefficients from KK that are added and scaled componentwise.

[c0c1cn]:=given eiiciei\Mat{c_0 \\ c_1 \\ \vdots \\ c_n} \,\underset{\text{given } e_i}{:=}\, \sum_i c_i e_i
Math

The iith basis element is represented as the vector whose entries are all 00 except for the iith one, which is 11. E.g.,

[100]=given eie1,[010]=given eie2,...\Mat{1 \\ 0 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_1, \,\,\, \Mat{0 \\ 1 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_2, \,\,\, ...
Math

It is important to recall that the choice of basis is arbitrary. From the mathematical perspective, any basis is just as good as any other, although some may be more computationally convenient.

Over R2\R^2, (1,0)(1, 0) and (0,1)(0, 1) are sometimes called the “standard basis”, but (1,2)(1, 2) and (3,4)(3, -4) are also a basis for this space. One easy mistake to make, particularly when working over the tuple space KnK^n, is to confuse the actual elements of the linear space with the coefficient vectors that represent them. Working with abstract linear spaces eliminates this source of confusion.

Representing Linear Transformations

Working with finite-dimensional linear spaces VV and WW, let’s choose bases eie_i and djd_j for them, and let’s consider a linear map f:VWf: V \to W.

The powerful thing about bases is that we can more compactly express the information content of ff. Given any vVv \in V, we can decompose it into a linear function of the basis (for some coefficients), so we can write

f(v)=f(iciei)=if(ciei)=icif(ei)f(v) = f\left(\sum_i c_i e_i\right) = \sum_i f(c_i e_i) = \sum_i c_i \cdot f(e_i)
Math

In other words, to specify ff, we only need to specify what it does to each of the dimV\dim V basis elements. But what’s more, because WW also has a basis, we can write

f(ei)=jAijdjf(e_i) = \sum_j A_{ij} d_j
Math

Putting these two formulas together, we have an explicit closed form for f(v)f(v), given the coefficients AijA_{ij} of ff, and the coefficients cic_i of vv:

f(v)=i,jciAijdjf(v) = \sum_{i,j} c_i A_{ij} d_j
Math

Alternatively, we can express vv and f(v)f(v) as column vectors, and ff as the AA matrix with entires AijA_{ij}. The entries of the resulting column vector are given by the above explicit formula for f(v)f(v), fixing the value of jj in each entry.

[A0,0A1,0An,0A1,0A1,1An,1A0,mA1,mAn,m]A[c0c1cn]v=[iciAi,0iciAi,1iciAi,m]Av\underbrace{\Mat{ A_{0,0} & A_{1,0} & \cdots & A_{n,0} \\ A_{1,0} & A_{1,1} & \cdots & A_{n,1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{0,m} & A_{1,m} & \cdots & A_{n,m} }}_A \, \underbrace{\Mat{c_0 \\ c_1 \\ \vdots \\ c_n}}_v = \underbrace{\Mat{ \sum_i c_i A_{i,0} \\ \sum_i c_i A_{i,1} \\ \vdots \\ \sum_i c_i A_{i,m} }}_{Av}
Math

(Remember, this is all dependent on the choices of bases eie_i and djd_j!)

Behold, we have derived the matrix-vector multiplication formula: the jjth entry of the result is the dot product of the vector and the jjth row of the matrix.

But it is crucial to keep in mind that we had to choose bases eie_i and djd_j to be entitled to write down a matrix for ff. The values of the coefficients depend on the choice of basis.

If your linear space happens to be Rn\R^n, there is an “obvious” choice of basis, but not every linear space over R\R is Rn\R^n! Importantly, the actual linear algebra does not change depending on the basis6.

Matrix Multiplication

So, where does matrix multiplication come from? An n×mn \times m7 matrix AA represents some linear map f:VWf: V \to W, where dimV=n\dim V = n, dimW=m\dim W = m, and appropriate choices of basis (eie_i, djd_j) have been made.

Keeping in mind that linear maps are supreme over matrices, suppose we have a third linear space UU, and a map g:UVg: U \to V, and let =dimU\ell = \dim U. Choosing a basis hkh_k for UU, we can represent gg as a matrix BB of dimension ×n\ell \times n.

Then, we’d like for the matrix product ABAB to be the same matrix we’d get from representing the composite map fg:UWfg: U \to W as a matrix, using the aforementioned choices of bases for UU and WW (the basis choice for VV should “cancel out”).

Recall our formula for f(v)f(v) in terms of its matrix coefficients AijA_{ij} and the coefficients of the input vv, which we call cic_i. We can produce a similar formula for g(u)g(u), giving it matrix coefficients BkiB_{ki}, and coefficients bkb_k for uu. (I appologize for the number of indices and coefficients here.)

f(v)=i,jciAijdjg(u)=k,ibkBkiei\begin{align*}f(v) &= \sum_{i,j} c_i A_{ij} d_j \\ g(u) &= \sum_{k,i} b_k B_{ki} e_i\end{align*}
Math

If we write f(g(u))f(g(u)), then cic_i is the coefficient eie_i is multiplied by; i.e., we fix ii, and drop it from the summation: ci=kbkBkic_i = \sum_k b_k B_{ki}.

Substituting that into the above formula, we now have something like the following.

f(g(u))=i,jkbkBkiAijdjf(g(u))=k,jbk(iAijBki)dj()\begin{align*}f(g(u)) &= \sum_{i,j} \sum_{k} b_k B_{ki} A_{ij} d_j \\ f(g(u)) &= \sum_{k,j} b_k \left(\sum_{i} A_{ij} B_{ki} \right) d_j &(\star)\end{align*}
Math

In ()(\star), we’ve rearranged things so that the sum in parenthesis is the (k,j)(k,j)th matrix coefficient of the composite fgfg. Because we wanted ABAB to represent fgfg, it must be an ×m\ell \times m matrix whose entries are

(AB)kj=iAijBki(AB)_{kj} = \sum_{i} A_{ij} B_{ki}
Math

This is matrix multiplication. It arises naturally out of composition of linear maps. In this way, the matrix multiplication formula is not a definition, but a theorem of linear algebra!

Theorem (Matrix Multiplication)

Given an n×mn \times m matrix AA and an ×n\ell \times n matrix BB, both with coefficients in KK, then ABAB is an ×m\ell \times m matrix with entires

(AB)kj=iAijBki (AB)_{kj} = \sum_{i} A_{ij} B_{ki}
Math

If the matrix dimension is read as nmn \to m instead of n×mn \times m, the shape requirements are more obvious: two matrices AA and BB can be multiplied together only when they represent a pair of maps VWV \to W and UVU \to V.

Other Consequences, and Conclusion

The identity matrix is an n×nn \times n matrix:

In=[111]I_n = \Mat{ 1 \\ & 1 \\ && \ddots \\ &&& 1 }
Math

We want it to be such that for any appropriately-sized matrices AA and BB, it has AIn=AAI_n = A and InB=BI_n B = B. Lifted up to linear maps, this means that InI_n should represent the identity map VVV \to V, when dimV=n\dim V = n. This map sends each basis element eie_i to itself, so the columns of InI_n should be the basis vectors, in order:

[100][010][001]\Mat{1 \\ 0 \\ \vdots \\ 0} \Mat{0 \\ 1 \\ \vdots \\ 0} \cdots \Mat{0 \\ 0 \\ \vdots \\ 1}
Math

If we shuffle the columns, we’ll get a permutation matrix, which shuffles the coefficients of a column vector. For example, consider this matrix.

[010100001]\Mat{ 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 }
Math

This is similar to the identity, but we’ve swapped the first two columns. Thus, it will swap the first two coefficients of any column vector.

Matrices may seem unintuitive when they’re introduced as a subject of study. Every student encountering matrices for the same time may ask “If they add componentwise, why don’t they multiply componentwise too?”

However, approaching matrices as a computational and representational tool shows that the convoluted-looking matrix multiplication formula is a direct consequence of linearity.

f(v+w)=f(v)+f(w)f(cv)=cf(v)\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}
Math
  1. In actual modern mathematics, the objects I describe are still called vector spaces, which I think generates unnecessary confusion in this case. “Linear space” is a bit more on the nose for what I’m going for. 

  2. This type of structure (just the addition part) is also called an “abelian group”. 

  3. Throughout ii, jj, and kk are indices in some unspecified but ordered indexing set, usually {1,2,...,n}\{1, 2, ..., n\}. I will not bother giving this index set a name. 

  4. This is sometimes called the dimension theorem, which is somewhat tedious to prove. 

  5. An example of a messy infinite-dimensional basis is R\R considered as linear space over Q\Q (in general, every field is a linear space over its subfields). The basis for this space essentially has to be “11, and all irrational numbers” except if we include e.g. ee and π\pi we can’t include e+12πe + \frac{1}{2}\pi, which is a Q\Q-linear combination of ee and π\pi.

    On the other hand, C\C is two-dimensional over R\R, with basis 1,i\\{1, i\\}.

    Incidentally, this idea of “view a field KK as a linear space over its subfield FF” is such a useful concept that it is called the “degree of the field extension K/FK/F”, and given the symbol [K:F][K : F].

    This, [R:Q]=[\R : \Q] = \infty and [C:R]=2[\C : \R] = 2

  6. You may recall from linear algebra class that two matrices AA and BB of the same shape are similar if there are two appropriately-sized square matrices SS and RR such that SAR=BSAR = B. These matrices SS and RR represent a change of basis, and indicate that the linear maps A,B:VWA, B: V \to W these matrices come from do “the same thing” to elements of VV.

    Over an algebraically closed field like C\C (i.e. all polynomials have solutions), there is an even stronger way to capture the information content of a linear map via Jordan canonicalization, which takes any square matrix AA and produces an almost-diagonal square matrix that only depends on the eigenvalues of AA, which is the same for similar matrices, and thus basis-independent. 

  7. Here, as always, matrix dimensions are given in RC (row-column) order. You can think of this as being “input dimension” to “output dimension”. 

Related Posts

What is a Matrix? A Miserable Pile of Coefficients!

Linear algebra is undoubtedly the most useful field in all of algebra. It finds applications in all kinds of science and engineering, like quantum mechanics, graphics programming, and machine learning. It is the “most well-behaved” algebraic theory, in that other abstract algebra topics often try to approximate linear algebra, when possible.

For many students, linear algebra means vectors and matrices and determinants, and complex formulas for computing them. Matrices, in particular, come equipped with a fairly complicated, and a fortiori convoluted, multiplication operation.

This is not the only way to teach linear algebra, of course. Matrices and their multiplication appear complicated, but actually are a natural and compact way to represent a particular type of function, i.e., a linear map (or linear transformation).

This article is a short introduction to viewing linear algebra from the perspective of abstract algebra, from which matrices arise as a computational tool, rather than an object of study in and of themselves. I do assume some degree of familiarity with the idea of a matrix.

Linear Spaces

Most linear algebra courses open with a description of vectors in Euclidean space: Rn\R^n. Vectors there are defined as tuples of real numbers that can be added, multiplied, and scaled. Two vectors can be combined into a number through the dot product. Vectors come equipped with a notion of magnitude and direction.

However, this highly geometric picture can be counterproductive, since it is hard to apply geometric intuition directly to higher dimensions. It also obscures how this connects to working over a different number system, like the complex numbers.

Instead, I’d like to open with the concept of a linear space, which is somewhat more abstract than a vector space1.

First, we will need a notion of a “coefficient”, which is essentially something that you can do arithmetic with. We will draw coefficients from a designated ground field KK. A field is a setting for doing arithmetic: a set of objects that can be added, subtracted, and multiplied, and divided in the “usual fashion” along with special 00 and 11 values. E.g. a+0=aa + 0 = a, 1a=a1a = a, a(b+c)=ab+aca(b + c) = ab + ac, and so on.

Not only are the real numbers R\R a field, but so are the complex numbers C\C, and the rational numbers Q\Q. If we drop the “division” requirement, we can also include the integers Z\Z, or polynomials with rational coefficients Q[x]\Q[x], for example.

Having chosen our coefficients KK, a linear space VV over KK is another set of objects that can be added and subtracted (and including a special value 00)2, along with a scaling operation, which takes a coefficient cKc \in K and one of our objects vVv \in V and produces a new cvVcv \in V.

The important part of the scaling operation is that it’s compatible with addition: if we have a,bKa, b \in K and v,wVv, w \in V, we require that

a(v+w)=av+aw(a+b)v=av+bv\begin{gather*}a (v + w) = av + aw \\ (a + b) v = av + bv\end{gather*}
Math

This is what makes a linear space “linear”: you can write equations that look like first-degree polynomials (e.g. ax+bax + b), and which can be manipulated like first-degree polynomials.

These polynomials are called linear because their graph looks like a line. There’s no multiplication, so we can’t have x2x^2, but we do have multiplication by a coefficient. This is what makes linear algebra is “linear”.

Some examples: nn-tuples of elements drawn from any field are a linear space over that field, by componentwise addition and scalar multiplication; e.g., R3R^3. Setting n=1n = 1 shows that every field is a linear space over itself.

Polynomials in one variable over some field, K[x]K[x], are also a linear space, since polynomials can be added together and scaled by a any value in KK (since lone coefficients are degree zero polynomials). Real-valued functions also form a linear space over R\R in a similar way.

Linear Transformations

A linear map is a function f:VWf: V \to W between two linear spaces VV and WW over KK which “respects” the linear structure in a particular way. That is, for any cKc\in K and v,wVv, w \in V,

f(v+w)=f(v)+f(w)f(cv)=cf(v)\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}
Math

We call this type of relationship (respecting addition and scaling) “linearity”. One way to think of this relationship is that ff is kind of like a different kind of coefficient, in that it distributes over addition, which commutes with the “ordinary” coefficients from KK. However, applying ff produces a value from WW rather than VV.

Another way to think of it is that if we have a linear polynomial like p(x)=ax+bp(x) = ax + b in xx, then f(p(x))=p(f(x))f(p(x)) = p(f(x)). We say that ff commutes with all linear polynomials.

The most obvious sort of linear map is scaling. Given any coefficient cKc \in K, it defines a “scaling map”:

μc:VVvcv\begin{gather*}\mu_c: V \to V \\ v \mapsto cv\end{gather*}
Math

It’s trivial to check this is a linear map, by plugging it into the above equations: it’s linear because scaling is distributive and commutative.

Linear maps are the essential thing we study in linear algebra, since they describe all the different kinds of relationships between linear spaces.

Some linear maps are complicated. For example, a function from R2R2\R^2 \to \R^2 that rotates the plane by some angle θ\theta is linear, as are operations that stretch or shear the plane. However, they can’t “bend” or “fold” the plane: they are all fairly rigid motions. In the linear space Q[x]\Q[x] of rational polynomials, multiplication by any polynomial, such as xx or x21x^2 - 1, is a linear map. The notion of “linear map” depends heavily on the space we’re in.

Unfortunately, linear maps as they are quite opaque, and do not lend themselves well to calculation. However, we can build an explicit representation using a linear basis.

Linear Basis

For any linear space, we can construct a relatively small of elements such that any element of the space can be expressed as some linear function of these elements.

Explicitly, for any VV, we can construct a sequence3 eie_i such that for any vVv \in V, we can find ciKc_i \in K such that

v=iciei.v = \sum_i c_i e_i.
Math

Such a set eie_i is called a basis if it is linearly independent: no one eie_i can be expressed as a linear function of the rest. The dimension of VV, denoted dimV\dim V, is the number of elements in any choice of basis. This value does not depend on the choice of basis4.

Constructing a basis for any VV is easy: we can do this recursively. First, pick a random element e1e_1 of VV, and define a new linear space V/e1V/e_1 where we have identified all elements that differ by a factor of e1e_1 as equal (i.e., if vw=ce1v - w = ce_1, we treat vv and ww as equal in V/e1V/e_1).

Then, a basis for VV is a basis of V/e1V/e_1 with e1e_1 added. The construction of V/e1V/e_1 is essentially “collapsing” the dimension e1e_1 “points” in, giving us a new space where we’ve “deleted” all of the elements that have a nonzero e1e_1 component.

However, this only works when the dimension is finite; more complex methods must be used for infinite-dimensional spaces. For example, the polynomials Q[x]\Q[x] are an infinite-dimensional space, with basis elements 1,x,x2,x3,...\\{1, x, x^2, x^3, ...\\}. In general, for any linear space VV, it is always possible to arbitrarily choose a basis, although it may be infinite5.

Bases are useful because they give us a concrete representation of any element of VV. Given a fixed basis eie_i, we can represent any w=icieiw = \sum_i c_i e_i by the coefficients cic_i themselves. For a finite-dimensional VV, this brings us back column vectors: (dimV)(\dim V)-tuples of coefficients from KK that are added and scaled componentwise.

[c0c1cn]:=given eiiciei\Mat{c_0 \\ c_1 \\ \vdots \\ c_n} \,\underset{\text{given } e_i}{:=}\, \sum_i c_i e_i
Math

The iith basis element is represented as the vector whose entries are all 00 except for the iith one, which is 11. E.g.,

[100]=given eie1,[010]=given eie2,...\Mat{1 \\ 0 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_1, \,\,\, \Mat{0 \\ 1 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_2, \,\,\, ...
Math

It is important to recall that the choice of basis is arbitrary. From the mathematical perspective, any basis is just as good as any other, although some may be more computationally convenient.

Over R2\R^2, (1,0)(1, 0) and (0,1)(0, 1) are sometimes called the “standard basis”, but (1,2)(1, 2) and (3,4)(3, -4) are also a basis for this space. One easy mistake to make, particularly when working over the tuple space KnK^n, is to confuse the actual elements of the linear space with the coefficient vectors that represent them. Working with abstract linear spaces eliminates this source of confusion.

Representing Linear Transformations

Working with finite-dimensional linear spaces VV and WW, let’s choose bases eie_i and djd_j for them, and let’s consider a linear map f:VWf: V \to W.

The powerful thing about bases is that we can more compactly express the information content of ff. Given any vVv \in V, we can decompose it into a linear function of the basis (for some coefficients), so we can write

f(v)=f(iciei)=if(ciei)=icif(ei)f(v) = f\left(\sum_i c_i e_i\right) = \sum_i f(c_i e_i) = \sum_i c_i \cdot f(e_i)
Math

In other words, to specify ff, we only need to specify what it does to each of the dimV\dim V basis elements. But what’s more, because WW also has a basis, we can write

f(ei)=jAijdjf(e_i) = \sum_j A_{ij} d_j
Math

Putting these two formulas together, we have an explicit closed form for f(v)f(v), given the coefficients AijA_{ij} of ff, and the coefficients cic_i of vv:

f(v)=i,jciAijdjf(v) = \sum_{i,j} c_i A_{ij} d_j
Math

Alternatively, we can express vv and f(v)f(v) as column vectors, and ff as the AA matrix with entires AijA_{ij}. The entries of the resulting column vector are given by the above explicit formula for f(v)f(v), fixing the value of jj in each entry.

[A0,0A1,0An,0A1,0A1,1An,1A0,mA1,mAn,m]A[c0c1cn]v=[iciAi,0iciAi,1iciAi,m]Av\underbrace{\Mat{ A_{0,0} & A_{1,0} & \cdots & A_{n,0} \\ A_{1,0} & A_{1,1} & \cdots & A_{n,1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{0,m} & A_{1,m} & \cdots & A_{n,m} }}_A \, \underbrace{\Mat{c_0 \\ c_1 \\ \vdots \\ c_n}}_v = \underbrace{\Mat{ \sum_i c_i A_{i,0} \\ \sum_i c_i A_{i,1} \\ \vdots \\ \sum_i c_i A_{i,m} }}_{Av}
Math

(Remember, this is all dependent on the choices of bases eie_i and djd_j!)

Behold, we have derived the matrix-vector multiplication formula: the jjth entry of the result is the dot product of the vector and the jjth row of the matrix.

But it is crucial to keep in mind that we had to choose bases eie_i and djd_j to be entitled to write down a matrix for ff. The values of the coefficients depend on the choice of basis.

If your linear space happens to be Rn\R^n, there is an “obvious” choice of basis, but not every linear space over R\R is Rn\R^n! Importantly, the actual linear algebra does not change depending on the basis6.

Matrix Multiplication

So, where does matrix multiplication come from? An n×mn \times m7 matrix AA represents some linear map f:VWf: V \to W, where dimV=n\dim V = n, dimW=m\dim W = m, and appropriate choices of basis (eie_i, djd_j) have been made.

Keeping in mind that linear maps are supreme over matrices, suppose we have a third linear space UU, and a map g:UVg: U \to V, and let =dimU\ell = \dim U. Choosing a basis hkh_k for UU, we can represent gg as a matrix BB of dimension ×n\ell \times n.

Then, we’d like for the matrix product ABAB to be the same matrix we’d get from representing the composite map fg:UWfg: U \to W as a matrix, using the aforementioned choices of bases for UU and WW (the basis choice for VV should “cancel out”).

Recall our formula for f(v)f(v) in terms of its matrix coefficients AijA_{ij} and the coefficients of the input vv, which we call cic_i. We can produce a similar formula for g(u)g(u), giving it matrix coefficients BkiB_{ki}, and coefficients bkb_k for uu. (I appologize for the number of indices and coefficients here.)

f(v)=i,jciAijdjg(u)=k,ibkBkiei\begin{align*}f(v) &= \sum_{i,j} c_i A_{ij} d_j \\ g(u) &= \sum_{k,i} b_k B_{ki} e_i\end{align*}
Math

If we write f(g(u))f(g(u)), then cic_i is the coefficient eie_i is multiplied by; i.e., we fix ii, and drop it from the summation: ci=kbkBkic_i = \sum_k b_k B_{ki}.

Substituting that into the above formula, we now have something like the following.

f(g(u))=i,jkbkBkiAijdjf(g(u))=k,jbk(iAijBki)dj()\begin{align*}f(g(u)) &= \sum_{i,j} \sum_{k} b_k B_{ki} A_{ij} d_j \\ f(g(u)) &= \sum_{k,j} b_k \left(\sum_{i} A_{ij} B_{ki} \right) d_j &(\star)\end{align*}
Math

In ()(\star), we’ve rearranged things so that the sum in parenthesis is the (k,j)(k,j)th matrix coefficient of the composite fgfg. Because we wanted ABAB to represent fgfg, it must be an ×m\ell \times m matrix whose entries are

(AB)kj=iAijBki(AB)_{kj} = \sum_{i} A_{ij} B_{ki}
Math

This is matrix multiplication. It arises naturally out of composition of linear maps. In this way, the matrix multiplication formula is not a definition, but a theorem of linear algebra!

Theorem (Matrix Multiplication)

Given an n×mn \times m matrix AA and an ×n\ell \times n matrix BB, both with coefficients in KK, then ABAB is an ×m\ell \times m matrix with entires

(AB)kj=iAijBki (AB)_{kj} = \sum_{i} A_{ij} B_{ki}
Math

If the matrix dimension is read as nmn \to m instead of n×mn \times m, the shape requirements are more obvious: two matrices AA and BB can be multiplied together only when they represent a pair of maps VWV \to W and UVU \to V.

Other Consequences, and Conclusion

The identity matrix is an n×nn \times n matrix:

In=[111]I_n = \Mat{ 1 \\ & 1 \\ && \ddots \\ &&& 1 }
Math

We want it to be such that for any appropriately-sized matrices AA and BB, it has AIn=AAI_n = A and InB=BI_n B = B. Lifted up to linear maps, this means that InI_n should represent the identity map VVV \to V, when dimV=n\dim V = n. This map sends each basis element eie_i to itself, so the columns of InI_n should be the basis vectors, in order:

[100][010][001]\Mat{1 \\ 0 \\ \vdots \\ 0} \Mat{0 \\ 1 \\ \vdots \\ 0} \cdots \Mat{0 \\ 0 \\ \vdots \\ 1}
Math

If we shuffle the columns, we’ll get a permutation matrix, which shuffles the coefficients of a column vector. For example, consider this matrix.

[010100001]\Mat{ 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 }
Math

This is similar to the identity, but we’ve swapped the first two columns. Thus, it will swap the first two coefficients of any column vector.

Matrices may seem unintuitive when they’re introduced as a subject of study. Every student encountering matrices for the same time may ask “If they add componentwise, why don’t they multiply componentwise too?”

However, approaching matrices as a computational and representational tool shows that the convoluted-looking matrix multiplication formula is a direct consequence of linearity.

f(v+w)=f(v)+f(w)f(cv)=cf(v)\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}
Math
  1. In actual modern mathematics, the objects I describe are still called vector spaces, which I think generates unnecessary confusion in this case. “Linear space” is a bit more on the nose for what I’m going for. 

  2. This type of structure (just the addition part) is also called an “abelian group”. 

  3. Throughout ii, jj, and kk are indices in some unspecified but ordered indexing set, usually {1,2,...,n}\{1, 2, ..., n\}. I will not bother giving this index set a name. 

  4. This is sometimes called the dimension theorem, which is somewhat tedious to prove. 

  5. An example of a messy infinite-dimensional basis is R\R considered as linear space over Q\Q (in general, every field is a linear space over its subfields). The basis for this space essentially has to be “11, and all irrational numbers” except if we include e.g. ee and π\pi we can’t include e+12πe + \frac{1}{2}\pi, which is a Q\Q-linear combination of ee and π\pi.

    On the other hand, C\C is two-dimensional over R\R, with basis 1,i\\{1, i\\}.

    Incidentally, this idea of “view a field KK as a linear space over its subfield FF” is such a useful concept that it is called the “degree of the field extension K/FK/F”, and given the symbol [K:F][K : F].

    This, [R:Q]=[\R : \Q] = \infty and [C:R]=2[\C : \R] = 2

  6. You may recall from linear algebra class that two matrices AA and BB of the same shape are similar if there are two appropriately-sized square matrices SS and RR such that SAR=BSAR = B. These matrices SS and RR represent a change of basis, and indicate that the linear maps A,B:VWA, B: V \to W these matrices come from do “the same thing” to elements of VV.

    Over an algebraically closed field like C\C (i.e. all polynomials have solutions), there is an even stronger way to capture the information content of a linear map via Jordan canonicalization, which takes any square matrix AA and produces an almost-diagonal square matrix that only depends on the eigenvalues of AA, which is the same for similar matrices, and thus basis-independent. 

  7. Here, as always, matrix dimensions are given in RC (row-column) order. You can think of this as being “input dimension” to “output dimension”.