# What is a Matrix? A Miserable Pile of Coefficients!

Linear algebra is undoubtedly the most useful field in all of algebra. It finds applications in all kinds of science and engineering, like quantum mechanics, graphics programming, and machine learning. It is the “most well-behaved” algebraic theory, in that other abstract algebra topics often try to approximate linear algebra, when possible.

For many students, linear algebra means vectors and matrices and determinants, and complex formulas for computing them. Matrices, in particular, come equipped with a fairly complicated, and a fortiori convoluted, multiplication operation.

This is not the only way to teach linear algebra, of course. Matrices and their multiplication appear complicated, but actually are a natural and compact way to represent a particular type of function, i.e., a linear map (or linear transformation).

This article is a short introduction to viewing linear algebra from the perspective of abstract algebra, from which matrices arise as a computational tool, rather than an object of study in and of themselves. I do assume some degree of familiarity with the idea of a matrix.

## Linear Spaces

Most linear algebra courses open with a description of vectors in Euclidean space: $\R^n$. Vectors there are defined as tuples of real numbers that can be added, multiplied, and scaled. Two vectors can be combined into a number through the dot product. Vectors come equipped with a notion of magnitude and direction.

However, this highly geometric picture can be counterproductive, since it is hard to apply geometric intuition directly to higher dimensions. It also obscures how this connects to working over a different number system, like the complex numbers.

Instead, I’d like to open with the concept of a linear space, which is somewhat more abstract than a vector space1.

First, we will need a notion of a “coefficient”, which is essentially something that you can do arithmetic with. We will draw coefficients from a designated ground field $K$. A field is a setting for doing arithmetic: a set of objects that can be added, subtracted, and multiplied, and divided in the “usual fashion” along with special $0$ and $1$ values. E.g. $a + 0 = a$, $1a = a$, $a(b + c) = ab + ac$, and so on.

Not only are the real numbers $\R$ a field, but so are the complex numbers $\C$, and the rational numbers $\Q$. If we drop the “division” requirement, we can also include the integers $\Z$, or polynomials with rational coefficients $\Q[x]$, for example.

Having chosen our coefficients $K$, a linear space $V$ over $K$ is another set of objects that can be added and subtracted (and including a special value $0$)2, along with a scaling operation, which takes a coefficient $c \in K$ and one of our objects $v \in V$ and produces a new $cv \in V$.

The important part of the scaling operation is that it’s compatible with addition: if we have $a, b \in K$ and $v, w \in V$, we require that

$\begin{gather*}a (v + w) = av + aw \\ (a + b) v = av + bv\end{gather*}$

This is what makes a linear space “linear”: you can write equations that look like first-degree polynomials (e.g. $ax + b$), and which can be manipulated like first-degree polynomials.

These polynomials are called linear because their graph looks like a line. There’s no multiplication, so we can’t have $x^2$, but we do have multiplication by a coefficient. This is what makes linear algebra is “linear”.

Some examples: $n$-tuples of elements drawn from any field are a linear space over that field, by componentwise addition and scalar multiplication; e.g., $R^3$. Setting $n = 1$ shows that every field is a linear space over itself.

Polynomials in one variable over some field, $K[x]$, are also a linear space, since polynomials can be added together and scaled by a any value in $K$ (since lone coefficients are degree zero polynomials). Real-valued functions also form a linear space over $\R$ in a similar way.

### Linear Transformations

A linear map is a function $f: V \to W$ between two linear spaces $V$ and $W$ over $K$ which “respects” the linear structure in a particular way. That is, for any $c\in K$ and $v, w \in V$,

$\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}$

We call this type of relationship (respecting addition and scaling) “linearity”. One way to think of this relationship is that $f$ is kind of like a different kind of coefficient, in that it distributes over addition, which commutes with the “ordinary” coefficients from $K$. However, applying $f$ produces a value from $W$ rather than $V$.

Another way to think of it is that if we have a linear polynomial like $p(x) = ax + b$ in $x$, then $f(p(x)) = p(f(x))$. We say that $f$ commutes with all linear polynomials.

The most obvious sort of linear map is scaling. Given any coefficient $c \in K$, it defines a “scaling map”:

$\begin{gather*}\mu_c: V \to V \\ v \mapsto cv\end{gather*}$

It’s trivial to check this is a linear map, by plugging it into the above equations: it’s linear because scaling is distributive and commutative.

Linear maps are the essential thing we study in linear algebra, since they describe all the different kinds of relationships between linear spaces.

Some linear maps are complicated. For example, a function from $\R^2 \to \R^2$ that rotates the plane by some angle $\theta$ is linear, as are operations that stretch or shear the plane. However, they can’t “bend” or “fold” the plane: they are all fairly rigid motions. In the linear space $\Q[x]$ of rational polynomials, multiplication by any polynomial, such as $x$ or $x^2 - 1$, is a linear map. The notion of “linear map” depends heavily on the space we’re in.

Unfortunately, linear maps as they are quite opaque, and do not lend themselves well to calculation. However, we can build an explicit representation using a linear basis.

## Linear Basis

For any linear space, we can construct a relatively small of elements such that any element of the space can be expressed as some linear function of these elements.

Explicitly, for any $V$, we can construct a sequence3 $e_i$ such that for any $v \in V$, we can find $c_i \in K$ such that

$v = \sum_i c_i e_i.$

Such a set $e_i$ is called a basis if it is linearly independent: no one $e_i$ can be expressed as a linear function of the rest. The dimension of $V$, denoted $\dim V$, is the number of elements in any choice of basis. This value does not depend on the choice of basis4.

Constructing a basis for any $V$ is easy: we can do this recursively. First, pick a random element $e_1$ of $V$, and define a new linear space $V/e_1$ where we have identified all elements that differ by a factor of $e_1$ as equal (i.e., if $v - w = ce_1$, we treat $v$ and $w$ as equal in $V/e_1$).

Then, a basis for $V$ is a basis of $V/e_1$ with $e_1$ added. The construction of $V/e_1$ is essentially “collapsing” the dimension $e_1$ “points” in, giving us a new space where we’ve “deleted” all of the elements that have a nonzero $e_1$ component.

However, this only works when the dimension is finite; more complex methods must be used for infinite-dimensional spaces. For example, the polynomials $\Q[x]$ are an infinite-dimensional space, with basis elements $\\{1, x, x^2, x^3, ...\\}$. In general, for any linear space $V$, it is always possible to arbitrarily choose a basis, although it may be infinite5.

Bases are useful because they give us a concrete representation of any element of $V$. Given a fixed basis $e_i$, we can represent any $w = \sum_i c_i e_i$ by the coefficients $c_i$ themselves. For a finite-dimensional $V$, this brings us back column vectors: $(\dim V)$-tuples of coefficients from $K$ that are added and scaled componentwise.

$\Mat{c_0 \\ c_1 \\ \vdots \\ c_n} \,\underset{\text{given } e_i}{:=}\, \sum_i c_i e_i$

The $i$th basis element is represented as the vector whose entries are all $0$ except for the $i$th one, which is $1$. E.g.,

$\Mat{1 \\ 0 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_1, \,\,\, \Mat{0 \\ 1 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_2, \,\,\, ...$

It is important to recall that the choice of basis is arbitrary. From the mathematical perspective, any basis is just as good as any other, although some may be more computationally convenient.

Over $\R^2$, $(1, 0)$ and $(0, 1)$ are sometimes called the “standard basis”, but $(1, 2)$ and $(3, -4)$ are also a basis for this space. One easy mistake to make, particularly when working over the tuple space $K^n$, is to confuse the actual elements of the linear space with the coefficient vectors that represent them. Working with abstract linear spaces eliminates this source of confusion.

### Representing Linear Transformations

Working with finite-dimensional linear spaces $V$ and $W$, let’s choose bases $e_i$ and $d_j$ for them, and let’s consider a linear map $f: V \to W$.

The powerful thing about bases is that we can more compactly express the information content of $f$. Given any $v \in V$, we can decompose it into a linear function of the basis (for some coefficients), so we can write

$f(v) = f\left(\sum_i c_i e_i\right) = \sum_i f(c_i e_i) = \sum_i c_i \cdot f(e_i)$

In other words, to specify $f$, we only need to specify what it does to each of the $\dim V$ basis elements. But what’s more, because $W$ also has a basis, we can write

$f(e_i) = \sum_j A_{ij} d_j$

Putting these two formulas together, we have an explicit closed form for $f(v)$, given the coefficients $A_{ij}$ of $f$, and the coefficients $c_i$ of $v$:

$f(v) = \sum_{i,j} c_i A_{ij} d_j$

Alternatively, we can express $v$ and $f(v)$ as column vectors, and $f$ as the $A$ matrix with entires $A_{ij}$. The entries of the resulting column vector are given by the above explicit formula for $f(v)$, fixing the value of $j$ in each entry.

$\underbrace{\Mat{ A_{0,0} & A_{1,0} & \cdots & A_{n,0} \\ A_{1,0} & A_{1,1} & \cdots & A_{n,1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{0,m} & A_{1,m} & \cdots & A_{n,m} }}_A \, \underbrace{\Mat{c_0 \\ c_1 \\ \vdots \\ c_n}}_v = \underbrace{\Mat{ \sum_i c_i A_{i,0} \\ \sum_i c_i A_{i,1} \\ \vdots \\ \sum_i c_i A_{i,m} }}_{Av}$

(Remember, this is all dependent on the choices of bases $e_i$ and $d_j$!)

Behold, we have derived the matrix-vector multiplication formula: the $j$th entry of the result is the dot product of the vector and the $j$th row of the matrix.

But it is crucial to keep in mind that we had to choose bases $e_i$ and $d_j$ to be entitled to write down a matrix for $f$. The values of the coefficients depend on the choice of basis.

If your linear space happens to be $\R^n$, there is an “obvious” choice of basis, but not every linear space over $\R$ is $\R^n$! Importantly, the actual linear algebra does not change depending on the basis6.

## Matrix Multiplication

So, where does matrix multiplication come from? An $n \times m$7 matrix $A$ represents some linear map $f: V \to W$, where $\dim V = n$, $\dim W = m$, and appropriate choices of basis ($e_i$, $d_j$) have been made.

Keeping in mind that linear maps are supreme over matrices, suppose we have a third linear space $U$, and a map $g: U \to V$, and let $\ell = \dim U$. Choosing a basis $h_k$ for $U$, we can represent $g$ as a matrix $B$ of dimension $\ell \times n$.

Then, we’d like for the matrix product $AB$ to be the same matrix we’d get from representing the composite map $fg: U \to W$ as a matrix, using the aforementioned choices of bases for $U$ and $W$ (the basis choice for $V$ should “cancel out”).

Recall our formula for $f(v)$ in terms of its matrix coefficients $A_{ij}$ and the coefficients of the input $v$, which we call $c_i$. We can produce a similar formula for $g(u)$, giving it matrix coefficients $B_{ki}$, and coefficients $b_k$ for $u$. (I appologize for the number of indices and coefficients here.)

\begin{align*}f(v) &= \sum_{i,j} c_i A_{ij} d_j \\ g(u) &= \sum_{k,i} b_k B_{ki} e_i\end{align*}

If we write $f(g(u))$, then $c_i$ is the coefficient $e_i$ is multiplied by; i.e., we fix $i$, and drop it from the summation: $c_i = \sum_k b_k B_{ki}$.

Substituting that into the above formula, we now have something like the following.

\begin{align*}f(g(u)) &= \sum_{i,j} \sum_{k} b_k B_{ki} A_{ij} d_j \\ f(g(u)) &= \sum_{k,j} b_k \left(\sum_{i} A_{ij} B_{ki} \right) d_j &(\star)\end{align*}

In $(\star)$, we’ve rearranged things so that the sum in parenthesis is the $(k,j)$th matrix coefficient of the composite $fg$. Because we wanted $AB$ to represent $fg$, it must be an $\ell \times m$ matrix whose entries are

$(AB)_{kj} = \sum_{i} A_{ij} B_{ki}$

This is matrix multiplication. It arises naturally out of composition of linear maps. In this way, the matrix multiplication formula is not a definition, but a theorem of linear algebra!

#### Theorem (Matrix Multiplication)

Given an $n \times m$ matrix $A$ and an $\ell \times n$ matrix $B$, both with coefficients in $K$, then $AB$ is an $\ell \times m$ matrix with entires

$(AB)_{kj} = \sum_{i} A_{ij} B_{ki}$

If the matrix dimension is read as $n \to m$ instead of $n \times m$, the shape requirements are more obvious: two matrices $A$ and $B$ can be multiplied together only when they represent a pair of maps $V \to W$ and $U \to V$.

### Other Consequences, and Conclusion

The identity matrix is an $n \times n$ matrix:

$I_n = \Mat{ 1 \\ & 1 \\ && \ddots \\ &&& 1 }$

We want it to be such that for any appropriately-sized matrices $A$ and $B$, it has $AI_n = A$ and $I_n B = B$. Lifted up to linear maps, this means that $I_n$ should represent the identity map $V \to V$, when $\dim V = n$. This map sends each basis element $e_i$ to itself, so the columns of $I_n$ should be the basis vectors, in order:

$\Mat{1 \\ 0 \\ \vdots \\ 0} \Mat{0 \\ 1 \\ \vdots \\ 0} \cdots \Mat{0 \\ 0 \\ \vdots \\ 1}$

If we shuffle the columns, we’ll get a permutation matrix, which shuffles the coefficients of a column vector. For example, consider this matrix.

$\Mat{ 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 }$

This is similar to the identity, but we’ve swapped the first two columns. Thus, it will swap the first two coefficients of any column vector.

Matrices may seem unintuitive when they’re introduced as a subject of study. Every student encountering matrices for the same time may ask “If they add componentwise, why don’t they multiply componentwise too?”

However, approaching matrices as a computational and representational tool shows that the convoluted-looking matrix multiplication formula is a direct consequence of linearity.

$\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}$
1. In actual modern mathematics, the objects I describe are still called vector spaces, which I think generates unnecessary confusion in this case. “Linear space” is a bit more on the nose for what I’m going for.

2. This type of structure (just the addition part) is also called an “abelian group”.

3. Throughout $i$, $j$, and $k$ are indices in some unspecified but ordered indexing set, usually $\{1, 2, ..., n\}$. I will not bother giving this index set a name.

4. This is sometimes called the dimension theorem, which is somewhat tedious to prove.

5. An example of a messy infinite-dimensional basis is $\R$ considered as linear space over $\Q$ (in general, every field is a linear space over its subfields). The basis for this space essentially has to be “$1$, and all irrational numbers” except if we include e.g. $e$ and $\pi$ we can’t include $e + \frac{1}{2}\pi$, which is a $\Q$-linear combination of $e$ and $\pi$.

On the other hand, $\C$ is two-dimensional over $\R$, with basis $\\{1, i\\}$.

Incidentally, this idea of “view a field $K$ as a linear space over its subfield $F$” is such a useful concept that it is called the “degree of the field extension $K/F$”, and given the symbol $[K : F]$.

This, $[\R : \Q] = \infty$ and $[\C : \R] = 2$

6. You may recall from linear algebra class that two matrices $A$ and $B$ of the same shape are similar if there are two appropriately-sized square matrices $S$ and $R$ such that $SAR = B$. These matrices $S$ and $R$ represent a change of basis, and indicate that the linear maps $A, B: V \to W$ these matrices come from do “the same thing” to elements of $V$.

Over an algebraically closed field like $\C$ (i.e. all polynomials have solutions), there is an even stronger way to capture the information content of a linear map via Jordan canonicalization, which takes any square matrix $A$ and produces an almost-diagonal square matrix that only depends on the eigenvalues of $A$, which is the same for similar matrices, and thus basis-independent.

7. Here, as always, matrix dimensions are given in RC (row-column) order. You can think of this as being “input dimension” to “output dimension”.

# What is a Matrix? A Miserable Pile of Coefficients!

Linear algebra is undoubtedly the most useful field in all of algebra. It finds applications in all kinds of science and engineering, like quantum mechanics, graphics programming, and machine learning. It is the “most well-behaved” algebraic theory, in that other abstract algebra topics often try to approximate linear algebra, when possible.

For many students, linear algebra means vectors and matrices and determinants, and complex formulas for computing them. Matrices, in particular, come equipped with a fairly complicated, and a fortiori convoluted, multiplication operation.

This is not the only way to teach linear algebra, of course. Matrices and their multiplication appear complicated, but actually are a natural and compact way to represent a particular type of function, i.e., a linear map (or linear transformation).

This article is a short introduction to viewing linear algebra from the perspective of abstract algebra, from which matrices arise as a computational tool, rather than an object of study in and of themselves. I do assume some degree of familiarity with the idea of a matrix.

## Linear Spaces

Most linear algebra courses open with a description of vectors in Euclidean space: $\R^n$. Vectors there are defined as tuples of real numbers that can be added, multiplied, and scaled. Two vectors can be combined into a number through the dot product. Vectors come equipped with a notion of magnitude and direction.

However, this highly geometric picture can be counterproductive, since it is hard to apply geometric intuition directly to higher dimensions. It also obscures how this connects to working over a different number system, like the complex numbers.

Instead, I’d like to open with the concept of a linear space, which is somewhat more abstract than a vector space1.

First, we will need a notion of a “coefficient”, which is essentially something that you can do arithmetic with. We will draw coefficients from a designated ground field $K$. A field is a setting for doing arithmetic: a set of objects that can be added, subtracted, and multiplied, and divided in the “usual fashion” along with special $0$ and $1$ values. E.g. $a + 0 = a$, $1a = a$, $a(b + c) = ab + ac$, and so on.

Not only are the real numbers $\R$ a field, but so are the complex numbers $\C$, and the rational numbers $\Q$. If we drop the “division” requirement, we can also include the integers $\Z$, or polynomials with rational coefficients $\Q[x]$, for example.

Having chosen our coefficients $K$, a linear space $V$ over $K$ is another set of objects that can be added and subtracted (and including a special value $0$)2, along with a scaling operation, which takes a coefficient $c \in K$ and one of our objects $v \in V$ and produces a new $cv \in V$.

The important part of the scaling operation is that it’s compatible with addition: if we have $a, b \in K$ and $v, w \in V$, we require that

$\begin{gather*}a (v + w) = av + aw \\ (a + b) v = av + bv\end{gather*}$

This is what makes a linear space “linear”: you can write equations that look like first-degree polynomials (e.g. $ax + b$), and which can be manipulated like first-degree polynomials.

These polynomials are called linear because their graph looks like a line. There’s no multiplication, so we can’t have $x^2$, but we do have multiplication by a coefficient. This is what makes linear algebra is “linear”.

Some examples: $n$-tuples of elements drawn from any field are a linear space over that field, by componentwise addition and scalar multiplication; e.g., $R^3$. Setting $n = 1$ shows that every field is a linear space over itself.

Polynomials in one variable over some field, $K[x]$, are also a linear space, since polynomials can be added together and scaled by a any value in $K$ (since lone coefficients are degree zero polynomials). Real-valued functions also form a linear space over $\R$ in a similar way.

### Linear Transformations

A linear map is a function $f: V \to W$ between two linear spaces $V$ and $W$ over $K$ which “respects” the linear structure in a particular way. That is, for any $c\in K$ and $v, w \in V$,

$\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}$

We call this type of relationship (respecting addition and scaling) “linearity”. One way to think of this relationship is that $f$ is kind of like a different kind of coefficient, in that it distributes over addition, which commutes with the “ordinary” coefficients from $K$. However, applying $f$ produces a value from $W$ rather than $V$.

Another way to think of it is that if we have a linear polynomial like $p(x) = ax + b$ in $x$, then $f(p(x)) = p(f(x))$. We say that $f$ commutes with all linear polynomials.

The most obvious sort of linear map is scaling. Given any coefficient $c \in K$, it defines a “scaling map”:

$\begin{gather*}\mu_c: V \to V \\ v \mapsto cv\end{gather*}$

It’s trivial to check this is a linear map, by plugging it into the above equations: it’s linear because scaling is distributive and commutative.

Linear maps are the essential thing we study in linear algebra, since they describe all the different kinds of relationships between linear spaces.

Some linear maps are complicated. For example, a function from $\R^2 \to \R^2$ that rotates the plane by some angle $\theta$ is linear, as are operations that stretch or shear the plane. However, they can’t “bend” or “fold” the plane: they are all fairly rigid motions. In the linear space $\Q[x]$ of rational polynomials, multiplication by any polynomial, such as $x$ or $x^2 - 1$, is a linear map. The notion of “linear map” depends heavily on the space we’re in.

Unfortunately, linear maps as they are quite opaque, and do not lend themselves well to calculation. However, we can build an explicit representation using a linear basis.

## Linear Basis

For any linear space, we can construct a relatively small of elements such that any element of the space can be expressed as some linear function of these elements.

Explicitly, for any $V$, we can construct a sequence3 $e_i$ such that for any $v \in V$, we can find $c_i \in K$ such that

$v = \sum_i c_i e_i.$

Such a set $e_i$ is called a basis if it is linearly independent: no one $e_i$ can be expressed as a linear function of the rest. The dimension of $V$, denoted $\dim V$, is the number of elements in any choice of basis. This value does not depend on the choice of basis4.

Constructing a basis for any $V$ is easy: we can do this recursively. First, pick a random element $e_1$ of $V$, and define a new linear space $V/e_1$ where we have identified all elements that differ by a factor of $e_1$ as equal (i.e., if $v - w = ce_1$, we treat $v$ and $w$ as equal in $V/e_1$).

Then, a basis for $V$ is a basis of $V/e_1$ with $e_1$ added. The construction of $V/e_1$ is essentially “collapsing” the dimension $e_1$ “points” in, giving us a new space where we’ve “deleted” all of the elements that have a nonzero $e_1$ component.

However, this only works when the dimension is finite; more complex methods must be used for infinite-dimensional spaces. For example, the polynomials $\Q[x]$ are an infinite-dimensional space, with basis elements $\\{1, x, x^2, x^3, ...\\}$. In general, for any linear space $V$, it is always possible to arbitrarily choose a basis, although it may be infinite5.

Bases are useful because they give us a concrete representation of any element of $V$. Given a fixed basis $e_i$, we can represent any $w = \sum_i c_i e_i$ by the coefficients $c_i$ themselves. For a finite-dimensional $V$, this brings us back column vectors: $(\dim V)$-tuples of coefficients from $K$ that are added and scaled componentwise.

$\Mat{c_0 \\ c_1 \\ \vdots \\ c_n} \,\underset{\text{given } e_i}{:=}\, \sum_i c_i e_i$

The $i$th basis element is represented as the vector whose entries are all $0$ except for the $i$th one, which is $1$. E.g.,

$\Mat{1 \\ 0 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_1, \,\,\, \Mat{0 \\ 1 \\ \vdots \\ 0} \,\underset{\text{given } e_i}{=}\, e_2, \,\,\, ...$

It is important to recall that the choice of basis is arbitrary. From the mathematical perspective, any basis is just as good as any other, although some may be more computationally convenient.

Over $\R^2$, $(1, 0)$ and $(0, 1)$ are sometimes called the “standard basis”, but $(1, 2)$ and $(3, -4)$ are also a basis for this space. One easy mistake to make, particularly when working over the tuple space $K^n$, is to confuse the actual elements of the linear space with the coefficient vectors that represent them. Working with abstract linear spaces eliminates this source of confusion.

### Representing Linear Transformations

Working with finite-dimensional linear spaces $V$ and $W$, let’s choose bases $e_i$ and $d_j$ for them, and let’s consider a linear map $f: V \to W$.

The powerful thing about bases is that we can more compactly express the information content of $f$. Given any $v \in V$, we can decompose it into a linear function of the basis (for some coefficients), so we can write

$f(v) = f\left(\sum_i c_i e_i\right) = \sum_i f(c_i e_i) = \sum_i c_i \cdot f(e_i)$

In other words, to specify $f$, we only need to specify what it does to each of the $\dim V$ basis elements. But what’s more, because $W$ also has a basis, we can write

$f(e_i) = \sum_j A_{ij} d_j$

Putting these two formulas together, we have an explicit closed form for $f(v)$, given the coefficients $A_{ij}$ of $f$, and the coefficients $c_i$ of $v$:

$f(v) = \sum_{i,j} c_i A_{ij} d_j$

Alternatively, we can express $v$ and $f(v)$ as column vectors, and $f$ as the $A$ matrix with entires $A_{ij}$. The entries of the resulting column vector are given by the above explicit formula for $f(v)$, fixing the value of $j$ in each entry.

$\underbrace{\Mat{ A_{0,0} & A_{1,0} & \cdots & A_{n,0} \\ A_{1,0} & A_{1,1} & \cdots & A_{n,1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{0,m} & A_{1,m} & \cdots & A_{n,m} }}_A \, \underbrace{\Mat{c_0 \\ c_1 \\ \vdots \\ c_n}}_v = \underbrace{\Mat{ \sum_i c_i A_{i,0} \\ \sum_i c_i A_{i,1} \\ \vdots \\ \sum_i c_i A_{i,m} }}_{Av}$

(Remember, this is all dependent on the choices of bases $e_i$ and $d_j$!)

Behold, we have derived the matrix-vector multiplication formula: the $j$th entry of the result is the dot product of the vector and the $j$th row of the matrix.

But it is crucial to keep in mind that we had to choose bases $e_i$ and $d_j$ to be entitled to write down a matrix for $f$. The values of the coefficients depend on the choice of basis.

If your linear space happens to be $\R^n$, there is an “obvious” choice of basis, but not every linear space over $\R$ is $\R^n$! Importantly, the actual linear algebra does not change depending on the basis6.

## Matrix Multiplication

So, where does matrix multiplication come from? An $n \times m$7 matrix $A$ represents some linear map $f: V \to W$, where $\dim V = n$, $\dim W = m$, and appropriate choices of basis ($e_i$, $d_j$) have been made.

Keeping in mind that linear maps are supreme over matrices, suppose we have a third linear space $U$, and a map $g: U \to V$, and let $\ell = \dim U$. Choosing a basis $h_k$ for $U$, we can represent $g$ as a matrix $B$ of dimension $\ell \times n$.

Then, we’d like for the matrix product $AB$ to be the same matrix we’d get from representing the composite map $fg: U \to W$ as a matrix, using the aforementioned choices of bases for $U$ and $W$ (the basis choice for $V$ should “cancel out”).

Recall our formula for $f(v)$ in terms of its matrix coefficients $A_{ij}$ and the coefficients of the input $v$, which we call $c_i$. We can produce a similar formula for $g(u)$, giving it matrix coefficients $B_{ki}$, and coefficients $b_k$ for $u$. (I appologize for the number of indices and coefficients here.)

\begin{align*}f(v) &= \sum_{i,j} c_i A_{ij} d_j \\ g(u) &= \sum_{k,i} b_k B_{ki} e_i\end{align*}

If we write $f(g(u))$, then $c_i$ is the coefficient $e_i$ is multiplied by; i.e., we fix $i$, and drop it from the summation: $c_i = \sum_k b_k B_{ki}$.

Substituting that into the above formula, we now have something like the following.

\begin{align*}f(g(u)) &= \sum_{i,j} \sum_{k} b_k B_{ki} A_{ij} d_j \\ f(g(u)) &= \sum_{k,j} b_k \left(\sum_{i} A_{ij} B_{ki} \right) d_j &(\star)\end{align*}

In $(\star)$, we’ve rearranged things so that the sum in parenthesis is the $(k,j)$th matrix coefficient of the composite $fg$. Because we wanted $AB$ to represent $fg$, it must be an $\ell \times m$ matrix whose entries are

$(AB)_{kj} = \sum_{i} A_{ij} B_{ki}$

This is matrix multiplication. It arises naturally out of composition of linear maps. In this way, the matrix multiplication formula is not a definition, but a theorem of linear algebra!

#### Theorem (Matrix Multiplication)

Given an $n \times m$ matrix $A$ and an $\ell \times n$ matrix $B$, both with coefficients in $K$, then $AB$ is an $\ell \times m$ matrix with entires

$(AB)_{kj} = \sum_{i} A_{ij} B_{ki}$

If the matrix dimension is read as $n \to m$ instead of $n \times m$, the shape requirements are more obvious: two matrices $A$ and $B$ can be multiplied together only when they represent a pair of maps $V \to W$ and $U \to V$.

### Other Consequences, and Conclusion

The identity matrix is an $n \times n$ matrix:

$I_n = \Mat{ 1 \\ & 1 \\ && \ddots \\ &&& 1 }$

We want it to be such that for any appropriately-sized matrices $A$ and $B$, it has $AI_n = A$ and $I_n B = B$. Lifted up to linear maps, this means that $I_n$ should represent the identity map $V \to V$, when $\dim V = n$. This map sends each basis element $e_i$ to itself, so the columns of $I_n$ should be the basis vectors, in order:

$\Mat{1 \\ 0 \\ \vdots \\ 0} \Mat{0 \\ 1 \\ \vdots \\ 0} \cdots \Mat{0 \\ 0 \\ \vdots \\ 1}$

If we shuffle the columns, we’ll get a permutation matrix, which shuffles the coefficients of a column vector. For example, consider this matrix.

$\Mat{ 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 }$

This is similar to the identity, but we’ve swapped the first two columns. Thus, it will swap the first two coefficients of any column vector.

Matrices may seem unintuitive when they’re introduced as a subject of study. Every student encountering matrices for the same time may ask “If they add componentwise, why don’t they multiply componentwise too?”

However, approaching matrices as a computational and representational tool shows that the convoluted-looking matrix multiplication formula is a direct consequence of linearity.

$\begin{gather*}f(v + w) = f(v) + f(w) \\ f(cv) = c \cdot f(v)\end{gather*}$
1. In actual modern mathematics, the objects I describe are still called vector spaces, which I think generates unnecessary confusion in this case. “Linear space” is a bit more on the nose for what I’m going for.

2. This type of structure (just the addition part) is also called an “abelian group”.

3. Throughout $i$, $j$, and $k$ are indices in some unspecified but ordered indexing set, usually $\{1, 2, ..., n\}$. I will not bother giving this index set a name.

4. This is sometimes called the dimension theorem, which is somewhat tedious to prove.

5. An example of a messy infinite-dimensional basis is $\R$ considered as linear space over $\Q$ (in general, every field is a linear space over its subfields). The basis for this space essentially has to be “$1$, and all irrational numbers” except if we include e.g. $e$ and $\pi$ we can’t include $e + \frac{1}{2}\pi$, which is a $\Q$-linear combination of $e$ and $\pi$.

On the other hand, $\C$ is two-dimensional over $\R$, with basis $\\{1, i\\}$.

Incidentally, this idea of “view a field $K$ as a linear space over its subfield $F$” is such a useful concept that it is called the “degree of the field extension $K/F$”, and given the symbol $[K : F]$.

This, $[\R : \Q] = \infty$ and $[\C : \R] = 2$

6. You may recall from linear algebra class that two matrices $A$ and $B$ of the same shape are similar if there are two appropriately-sized square matrices $S$ and $R$ such that $SAR = B$. These matrices $S$ and $R$ represent a change of basis, and indicate that the linear maps $A, B: V \to W$ these matrices come from do “the same thing” to elements of $V$.

Over an algebraically closed field like $\C$ (i.e. all polynomials have solutions), there is an even stronger way to capture the information content of a linear map via Jordan canonicalization, which takes any square matrix $A$ and produces an almost-diagonal square matrix that only depends on the eigenvalues of $A$, which is the same for similar matrices, and thus basis-independent.

7. Here, as always, matrix dimensions are given in RC (row-column) order. You can think of this as being “input dimension” to “output dimension”.