A concise, beginner-friendly introduction to the core ideas of linear algebra.
- Download PDF – print-ready version
- Download EPUB – e-reader friendly
- View LaTeX – Latex source
A scalar is a single numerical quantity, most often taken from the real numbers, denoted by
An element of
Example 1.1.1.
- A 2-dimensional vector:
$(3, -1) \in \mathbb{R}^2$ . - A 3-dimensional vector:
$(2, 0, 5) \in \mathbb{R}^3$ . - A 1-dimensional vector:
$(7) \in \mathbb{R}^1$ , which corresponds to the scalar$7$ itself.
Vectors are often written vertically in column form, which emphasizes their role in matrix multiplication:
The vertical layout makes the structure clearer when we consider linear combinations or multiply matrices by vectors.
In
- As a point in space, described by its coordinates.
- As a displacement or arrow, described by a direction and a length.
- As an abstract element of a vector space, whose properties follow algebraic rules independent of geometry.
- Vectors are written in boldface lowercase letters:
$\mathbf{v}, \mathbf{w}, \mathbf{x}$ . - The i-th entry of a vector
$\mathbf{v}$ is written$v_i$ , where indices begin at 1. - The set of all n-dimensional vectors over
$\mathbb{R}$ is denoted$\mathbb{R}^n$ . - Column vectors will be the default form unless otherwise stated.
Scalars and vectors form the atoms of linear algebra. Every structure we will build-vector spaces, linear transformations, matrices, eigenvalues-relies on the basic notions of number and ordered collection of numbers. Once vectors are understood, we can define operations such as addition and scalar multiplication, then generalize to subspaces, bases, and coordinate systems. Eventually, this framework grows into the full theory of linear algebra, with powerful applications to geometry, computation, and data.
- Write three different vectors in
$\mathbb{R}^2$ and sketch them as arrows from the origin. Identify their coordinates explicitly. - Give an example of a vector in
$\mathbb{R}^4$ . Can you visualize it directly? Explain why high-dimensional visualization is challenging. - Let
$\mathbf{v} = (4, -3, 2)$ . Write$\mathbf{v}$ in column form and state$v_1, v_2, v_3$ . - In what sense is the set
$\mathbb{R}^1$ both a line and a vector space? Illustrate with examples. - Consider the vector
$\mathbf{u} = (1,1,\dots,1) \in \mathbb{R}^n$ . What is special about this vector when$n$ is large? What might it represent in applications?
Vectors in linear algebra are not static objects; their power comes from the operations we can perform on them. Two fundamental operations define the structure of vector spaces: addition and scalar multiplication. These operations satisfy simple but far-reaching rules that underpin the entire subject.
Given two vectors of the same dimension, their sum is obtained by adding corresponding entries. Formally, if
then their sum is
Example 1.2.1.
Let
Geometrically, vector addition corresponds to the parallelogram rule. If we draw both vectors as arrows from the origin, then placing the tail of one vector at the head of the other produces the sum. The diagonal of the parallelogram they form represents the resulting vector.
Multiplying a vector by a scalar stretches or shrinks the vector while preserving its direction, unless the scalar is
negative, in which case the vector is also reversed. If
then
Example 1.2.2.
Let
This corresponds to flipping the vector through the origin and doubling its length.
The interaction of addition and scalar multiplication allows us to form linear combinations. A linear combination of
vectors
Linear combinations are the mechanism by which we generate new vectors from existing ones. The span of a set of vectors-the collection of all their linear combinations-will later lead us to the idea of a subspace.
Example 1.2.3.
Let
Thus
- Addition:
$\mathbf{u} + \mathbf{v}$ means component-wise addition. - Scalar multiplication:
$c\mathbf{v}$ scales each entry of$\mathbf{v}$ by$c$ . - Linear combination: a sum of the form
$c_1 \mathbf{v}_1 + \cdots + c_k \mathbf{v}_k$ .
Vector addition and scalar multiplication are the defining operations of linear algebra. They give structure to vector spaces, allow us to describe geometric phenomena like translation and scaling, and provide the foundation for solving systems of equations. Everything that follows-basis, dimension, transformations-builds on these simple but profound rules.
- Compute
$\mathbf{u} + \mathbf{v}$ where$\mathbf{u} = (1,2,3)$ and$\mathbf{v} = (4, -1, 0)$ . - Find
$3\mathbf{v}$ where$\mathbf{v} = (-2,5)$ . Sketch both vectors to illustrate the scaling. - Show that
$(5,7)$ can be written as a linear combination of$(1,0)$ and$(0,1)$ . - Write
$(4,4)$ as a linear combination of$(1,1)$ and$(1,-1)$ . - Prove that if
$\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ , then$(c+d)(\mathbf{u}+\mathbf{v}) = c\mathbf{u} + c\mathbf{v} + d\mathbf{u} + d\mathbf{v}$ for scalars$c,d \in \mathbb{R}$ .
The dot product is the fundamental operation that links algebra and geometry in vector spaces. It allows us to measure lengths, compute angles, and determine orthogonality. From this single definition flow the notions of norm and angle, which give geometry to abstract vector spaces.
For two vectors in
Equivalently, in matrix notation:
Example 1.3.1.
Let
The dot product outputs a single scalar, not another vector.
The Euclidean norm of a vector is the square root of its dot product with itself:
This generalizes the Pythagorean theorem to arbitrary dimensions.
Example 1.3.2.
For
This is exactly the length of the vector as an arrow in the plane.
The dot product also encodes the angle between two vectors. For nonzero vectors
where
Example 1.3.3.
Let
Hence
The vectors are perpendicular.
Two vectors are said to be orthogonal if their dot product is zero:
Orthogonality generalizes the idea of perpendicularity from geometry to higher dimensions.
- Dot product:
$\mathbf{u} \cdot \mathbf{v}$ . - Norm (length):
$|\mathbf{v}|$ . - Orthogonality:
$\mathbf{u} \perp \mathbf{v}$ if$\mathbf{u} \cdot \mathbf{v} = 0$ .
The dot product turns vector spaces into geometric objects: vectors gain lengths, angles, and notions of perpendicularity. This foundation will later support the study of orthogonal projections, Gram–Schmidt orthogonalization, eigenvectors, and least squares problems.
- Compute
$\mathbf{u} \cdot \mathbf{v}$ for$\mathbf{u} = (1,2,3)$ ,$\mathbf{v} = (4,5,6)$ . - Find the norm of
$\mathbf{v} = (2, -2, 1)$ . - Determine whether
$\mathbf{u} = (1,1,0)$ and$\mathbf{v} = (1,-1,2)$ are orthogonal. - Let
$\mathbf{u} = (3,4)$ ,$\mathbf{v} = (4,3)$ . Compute the angle between them. - Prove that
$|\mathbf{u} + \mathbf{v}|^2 = |\mathbf{u}|^2 + |\mathbf{v}|^2 + 2\mathbf{u}\cdot \mathbf{v}$ . This identity is the algebraic version of the Law of Cosines.
Orthogonality captures the notion of perpendicularity in vector spaces. It is one of the most important geometric ideas in linear algebra, allowing us to decompose vectors, define projections, and construct special bases with elegant properties.
Two vectors
This condition ensures that the angle between them is
Example 1.4.1.
In
A collection of vectors is called orthogonal if every distinct pair of vectors in the set is orthogonal. If, in addition, each vector has norm 1, the set is called orthonormal.
Example 1.4.2.
In
form an orthonormal set: each has length 1, and their dot products vanish when the indices differ.
Orthogonality makes possible the decomposition of a vector into two components: one parallel to another vector, and one
orthogonal to it. Given a nonzero vector
The difference
is orthogonal to
Example 1.4.3.
Let
Thus
where
In general, if
where the first term is parallel to
-
$\mathbf{u} \perp \mathbf{v}$ : vectors$\mathbf{u}$ and$\mathbf{v}$ are orthogonal. - An orthogonal set: vectors pairwise orthogonal.
- An orthonormal set: pairwise orthogonal, each of norm 1.
Orthogonality gives structure to vector spaces. It provides a way to separate independent directions cleanly, simplify computations, and minimize errors in approximations. Many powerful algorithms in numerical linear algebra and data science (QR decomposition, least squares regression, PCA) rely on orthogonality.
- Verify that the vectors
$(1,2,2)$ and$(2,0,-1)$ are orthogonal. - Find the projection of
$(3,4)$ onto$(1,1)$ . - Show that any two distinct standard basis vectors in
$\mathbb{R}^n$ are orthogonal. - Decompose
$(5,2)$ into components parallel and orthogonal to$(2,1)$ . - Prove that if
$\mathbf{u}, \mathbf{v}$ are orthogonal and nonzero, then$(\mathbf{u}+\mathbf{v})\cdot(\mathbf{u}-\mathbf{v}) = 0$ .
Matrices are the central objects of linear algebra, providing a compact way to represent and manipulate linear transformations, systems of equations, and structured data. A matrix is a rectangular array of numbers arranged in rows and columns.
An
Each entry
- If
$m = n$ , the matrix is square. - If
$m = 1$ , the matrix is a row vector. - If
$n = 1$ , the matrix is a column vector.
Thus, vectors are simply special cases of matrices.
Example 2.1.1. A
Here,
Example 2.1.2. A
This will later serve as the representation of a linear transformation on
- Matrices are denoted by uppercase bold letters:
$A, B, C$ . - Entries are written as
$a_{ij}$ , with the row index first, column index second. - The set of all real
$m \times n$ matrices is denoted$\mathbb{R}^{m \times n}$ .
Thus, a matrix is a function
Matrices generalize vectors and give us a language for describing linear operations systematically. They encode systems of equations, rotations, projections, and transformations of data. With matrices, algebra and geometry come together: a single compact object can represent both numerical data and functional rules.
- Write a
$3 \times 2$ matrix of your choice and identify its entries$a_{ij}$ . - Is every vector a matrix? Is every matrix a vector? Explain.
- Which of the following are square
matrices:
$A \in \mathbb{R}^{4\times4}$ ,$B \in \mathbb{R}^{3\times5}$ ,$C \in \mathbb{R}^{1\times1}$ ? - Let $D = \begin{bmatrix} 1 & 0 \ 0 & 1 \end{bmatrix}$. What kind of matrix is this?
- Consider the matrix $E = \begin{bmatrix} a & b \ c & d \end{bmatrix}$. Express
$e_{11}, e_{12}, e_{21}, e_{22}$ explicitly.
Once matrices are defined, the next step is to understand how they combine. Just as vectors gain meaning through addition and scalar multiplication, matrices become powerful through two operations: addition and multiplication.
Two matrices of the same size are added by adding corresponding entries. If
then
Example 2.2.1. Let
Then
\begin{bmatrix} 0 & 2 \ 8 & 6 \end{bmatrix}. $$
Matrix addition is commutative (
For a scalar
This stretches or shrinks all entries of the matrix uniformly.
Example 2.2.2. If
then
The defining operation of matrices is multiplication. If
then their product is the
Thus, the entry in the
Example 2.2.3. Let
Then
$$ AB = \begin{bmatrix} 1\cdot4 + 2\cdot2 & 1\cdot(-1) + 2\cdot5 \ 0\cdot4 + 3\cdot2 & 0\cdot(-1) + 3\cdot5 \end{bmatrix}
\begin{bmatrix} 8 & 9 \ 6 & 15 \end{bmatrix}. $$
Notice that matrix multiplication is not commutative in general:
Matrix multiplication corresponds to the composition of linear transformations. If
- Matrix sum:
$A+B$ . - Scalar multiple:
$cA$ . - Product:
$AB$ , defined only when the number of columns of$A$ equals the number of rows of$B$ .
Matrix multiplication is the core mechanism of linear algebra: it encodes how transformations combine, how systems of equations are solved, and how data flows in modern algorithms. Addition and scalar multiplication make matrices into a vector space, while multiplication gives them an algebraic structure rich enough to model geometry, computation, and networks.
- Compute
$A+B$ for
- Find
$3A$ where
- Multiply
- Verify with an explicit example that
$AB \neq BA$ . - Prove that matrix multiplication is distributive:
$A(B+C) = AB + AC$ .
Two special operations on matrices-the transpose and the inverse-give rise to deep algebraic and geometric properties. The transpose rearranges a matrix by flipping it across its main diagonal, while the inverse, when it exists, acts as the undo operation for matrix multiplication.
The transpose of an
Formally,
Example 2.3.1. If
then
Properties of the Transpose.
- $ (A^T)^T = A$.
- $ (A+B)^T = A^T + B^T$.
- $ (cA)^T = cA^T$, for scalar
$c$ . - $ (AB)^T = B^T A^T$.
The last rule is crucial: the order reverses.
A square matrix
where
Not every matrix is invertible. A necessary condition is that
Example 2.3.2. Let
Its determinant is
\begin{bmatrix} -2 & 1 \ 1.5 & -0.5 \end{bmatrix}. $$
Verification:
$$ AA^{-1} = \begin{bmatrix} 1 & 2 \ 3 & 4 \end{bmatrix} \begin{bmatrix} -2 & 1 \ 1.5 & -0.5 \end{bmatrix}
\begin{bmatrix} 1 & 0 \ 0 & 1 \end{bmatrix}. $$
- The transpose corresponds to reflecting a linear transformation across the diagonal. For vectors, it switches between row and column forms.
- The inverse, when it exists, corresponds to reversing a linear transformation. For example, if
$A$ scales and rotates vectors,$A^{-1}$ rescales and rotates them back.
- Transpose:
$A^T$ . - Inverse:
$A^{-1}$ , defined only for invertible square matrices. - Identity:
$I_n$ , acts as the multiplicative identity.
The transpose allows us to define symmetric and orthogonal matrices, central to geometry and numerical methods. The inverse underlies the solution of linear systems, encoding the idea of undoing a transformation. Together, these operations set the stage for determinants, eigenvalues, and orthogonalization.
- Compute the transpose of
- Verify that
$(AB)^T = B^T A^T$ for
- Determine whether
is invertible. If so, find
- Find the inverse of
and explain its geometric action on vectors in the plane.
- Prove that if
$A$ is invertible, then so is$A^T$ , and$(A^T)^{-1} = (A^{-1})^T$ .
Certain matrices occur so frequently in theory and applications that they are given special names. Recognizing their properties allows us to simplify computations and understand the structure of linear transformations more clearly.
The identity matrix
It acts as the multiplicative identity:
Geometrically,
A diagonal matrix has all off-diagonal entries zero:
Multiplication by a diagonal matrix scales each coordinate independently:
Example 2.4.1. Let
Then
A permutation matrix is obtained by permuting the rows of the identity matrix. Multiplying a vector by a permutation matrix reorders its coordinates.
Example 2.4.2. Let
Then
Thus,
Permutation matrices are always invertible; their inverses are simply their transposes.
A matrix is symmetric if
and skew-symmetric if
Symmetric matrices appear in quadratic forms and optimization, while skew-symmetric matrices describe rotations and cross products in geometry.
A square matrix
Equivalently, the rows (and columns) of
Example 2.4.3. The rotation matrix in the plane:
is orthogonal, since
Special matrices serve as the building blocks of linear algebra. Identity matrices define the neutral element, diagonal matrices simplify computations, permutation matrices reorder data, symmetric and orthogonal matrices describe fundamental geometric structures. Much of modern applied mathematics reduces complex problems to operations involving these simple forms.
- Show that the product of two diagonal matrices is diagonal, and compute an example.
- Find the permutation matrix that cycles
$(a,b,c)$ into$(b,c,a)$ . - Prove that every permutation matrix is invertible and its inverse is its transpose.
- Verify that
is orthogonal. What geometric transformation does it represent? 5. Determine whether
are symmetric, skew-symmetric, or neither.
One of the central motivations for linear algebra is solving systems of linear equations. These systems arise naturally in science, engineering, and data analysis whenever multiple constraints interact. Matrices provide a compact language for expressing and solving them.
A linear system consists of equations where each unknown appears only to the first power and with no products between
variables. A general system of
Here the coefficients
The system can be expressed compactly as:
where
-
$A \in \mathbb{R}^{m \times n}$ is the coefficient matrix$[a_{ij}]$ , -
$\mathbf{x} \in \mathbb{R}^n$ is the column vector of unknowns, -
$\mathbf{b} \in \mathbb{R}^m$ is the column vector of constants.
This formulation turns the problem of solving equations into analyzing the action of a matrix.
Example 3.1.1. The system
can be written as
\begin{bmatrix} 5 \ 4 \end{bmatrix}. $$
A linear system may have:
-
No solution (inconsistent): The equations conflict. Example: $ \begin{cases} x + y = 1 \ x + y = 2 \end{cases} $ has no solution.
-
Exactly one solution (unique): The system’s equations intersect at a single point. Example: The above system with coefficient matrix $ \begin{bmatrix} 1 & 2 \ 3 & -1 \end{bmatrix} $ has a unique solution.
-
Infinitely many solutions: The equations describe overlapping constraints (e.g., multiple equations representing the same line or plane).
The nature of the solution depends on the rank of
- In
$\mathbb{R}^2$ , each linear equation represents a line. Solving a system means finding intersection points of lines. - In
$\mathbb{R}^3$ , each equation represents a plane. A system may have no solution (parallel planes), one solution (a unique intersection point), or infinitely many (a line of intersection). - In higher dimensions, the picture generalizes: solutions form intersections of hyperplanes.
Linear systems are the practical foundation of linear algebra. They appear in balancing chemical reactions, circuit analysis, least-squares regression, optimization, and computer graphics. Understanding how to represent and classify their solutions is the first step toward systematic solution methods like Gaussian elimination.
-
Write the following system in matrix form: $ \begin{cases} 2x + 3y - z = 7, \ x - y + 4z = 1, \ 3x + 2y + z = 5 \end{cases} $
-
Determine whether the system $ \begin{cases} x + y = 1, \ 2x + 2y = 2 \end{cases} $ has no solution, one solution, or infinitely many solutions.
-
Geometrically interpret the system $ \begin{cases} x + y = 3, \ x - y = 1 \end{cases} $ in the plane.
-
Solve the system $ \begin{cases} 2x + y = 1, \ x - y = 4 \end{cases} $ and check your solution.
-
In
$\mathbb{R}^3$ , describe the solution set of $ \begin{cases} x + y + z = 0, \ 2x + 2y + 2z = 0 \end{cases} $. What geometric object does it represent?
To solve linear systems efficiently, we use Gaussian elimination: a systematic method of transforming a system into a simpler equivalent one whose solutions are easier to see. The method relies on elementary row operations that preserve the solution set.
On an augmented matrix
- Row swapping: interchange two rows.
- Row scaling: multiply a row by a nonzero scalar.
- Row replacement: replace one row by itself plus a multiple of another row.
These operations correspond to re-expressing equations in different but equivalent forms.
A matrix is in row echelon form (REF) if:
- All nonzero rows are above any zero rows.
- Each leading entry (the first nonzero number from the left in a row) is to the right of the leading entry in the row above.
- All entries below a leading entry are zero.
Further, if each leading entry is 1 and is the only nonzero entry in its column, the matrix is in reduced row echelon form (RREF).
- Write the augmented matrix for the system.
- Use row operations to create zeros below each pivot (the leading entry in a row).
- Continue column by column until the matrix is in echelon form.
- Solve by back substitution: starting from the last pivot equation and working upward.
If we continue to RREF, the solution can be read off directly.
Example 3.2.1. Solve
Step 1. Augmented matrix
Step 2. Eliminate below the first pivot
Subtract 2 times row 1 from row 2, and 3 times row 1 from row 3:
Step 3. Pivot in column 2
Divide row 2 by -3:
Add 7 times row 2 to row 3:
Step 4. Pivot in column 3
Divide row 3 by -2:
Step 5. Back substitution
From the last row: $ z = \tfrac{11}{3}. $
Second row: $ y - z = -\tfrac{1}{3} \implies y = -\tfrac{1}{3} + \tfrac{11}{3} = \tfrac{10}{3}. $
First row: $ x + 2y - z = 3 \implies x + 2\cdot\tfrac{10}{3} - \tfrac{11}{3} = 3. $
So $ x + \tfrac{20}{3} - \tfrac{11}{3} = 3 \implies x + 3 = 3 \implies x = 0. $
Solution: $ (x,y,z) = \big(0, \tfrac{10}{3}, \tfrac{11}{3}\big). $
Gaussian elimination is the foundation of computational linear algebra. It reduces complex systems to a form where solutions are visible, and it forms the basis for algorithms used in numerical analysis, scientific computing, and machine learning.
-
Solve by Gaussian elimination: $ \begin{cases} x + y = 2, \ 2x - y = 0. \end{cases} $
-
Reduce the following augmented matrix to REF: $ \left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \ 2 & -1 & 3 & 14 \ 1 & 4 & -2 & -2 \end{array}\right]. $
-
Show that Gaussian elimination always produces either:
- a unique solution,
- infinitely many solutions, or
- a contradiction (no solution).
-
Use Gaussian elimination to find all solutions of $ \begin{cases} x + y + z = 0, \ 2x + y + z = 1. \end{cases} $
-
Explain why pivoting (choosing the largest available pivot element) is useful in numerical computation.
Gaussian elimination not only provides solutions but also reveals the structure of a linear system. Two key ideas are the rank of a matrix and the consistency of a system. Rank measures the amount of independent information in the equations, while consistency determines whether the system has at least one solution.
The rank of a matrix is the number of leading pivots in its row echelon form. Equivalently, it is the maximum number of linearly independent rows or columns.
Formally,
The rank tells us the effective dimension of the space spanned by the rows (or columns).
Example 3.3.1. For
row reduction gives
Thus,
Consider the system
where
- If
$\text{rank}(A) = \text{rank}(A|\mathbf{b}) = n$ (number of unknowns), the system has a unique solution. - If
$\text{rank}(A) = \text{rank}(A|\mathbf{b}) < n$ , the system has infinitely many solutions.
Example 3.3.2. Consider
The augmented matrix is
Row reduction gives
Here,
Example 3.3.3. For
the augmented matrix reduces to
Here,
Rank is a measure of independence: it tells us how many truly distinct equations or directions are present. Consistency explains when equations align versus when they contradict. These concepts connect linear systems to vector spaces and prepare for the ideas of dimension, basis, and the Rank–Nullity Theorem.
- Compute the rank of
- Determine whether the system
is consistent.
-
Show that the rank of the identity matrix
$I_n$ is$n$ . -
Give an example of a system in
$\mathbb{R}^3$ with infinitely many solutions, and explain why it satisfies the rank condition. -
Prove that for any matrix
$A \in \mathbb{R}^{m \times n}$ , $ \text{rank}(A) \leq \min(m,n). $
A homogeneous system is a linear system in which all constant terms are zero:
where
Every homogeneous system has at least one solution:
This is called the trivial solution. The interesting question is whether nontrivial solutions (nonzero vectors) exist.
Nontrivial solutions exist precisely when the number of unknowns exceeds the rank of the coefficient matrix:
In this case, there are infinitely many solutions, forming a subspace of
where null(A) is the set of all solutions to
Example 3.4.1. Consider
The augmented matrix is
Row reduction:
So the system is equivalent to:
From the second equation,
Thus solutions are:
The null space is the line spanned by the vector
The solution set of a homogeneous system is always a subspace of
- If
$\text{rank}(A) = n$ , the only solution is the zero vector. - If
$\text{rank}(A) = n-1$ , the solution set is a line through the origin. - If
$\text{rank}(A) = n-2$ , the solution set is a plane through the origin.
More generally, the null space has dimension
Homogeneous systems are central to understanding vector spaces, subspaces, and dimension. They lead directly to the concepts of kernel, null space, and linear dependence. In applications, homogeneous systems appear in equilibrium problems, eigenvalue equations, and computer graphics transformations.
- Solve the homogeneous system
What is the dimension of its solution space?
- Find all solutions of
-
Show that the solution set of any homogeneous system is a subspace of
$\mathbb{R}^n$ . -
Suppose
$A$ is a$3 \times 3$ matrix with$\text{rank}(A) = 2$ . What is the dimension of the null space of$A$ ? -
For
compute a basis for the null space of
Up to now we have studied vectors and matrices concretely in
A vector space over the real numbers
- Vector addition: For any
$\mathbf{u}, \mathbf{v} \in V$ , there is a vector$\mathbf{u} + \mathbf{v} \in V$ . - Scalar multiplication: For any scalar
$c \in \mathbb{R}$ and any$\mathbf{v} \in V$ , there is a vector$c\mathbf{v} \in V$ .
These operations must satisfy the following axioms (for all
- Commutativity of addition:
$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ . - Associativity of addition:
$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ . - Additive identity: There exists a zero vector
$\mathbf{0} \in V$ such that$\mathbf{v} + \mathbf{0} = \mathbf{v}$ . - Additive inverses: For each
$\mathbf{v} \in V$ , there exists$(-\mathbf{v} \in V$ such that$\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$ . - Compatibility of scalar multiplication:
$a(b\mathbf{v}) = (ab)\mathbf{v}$ . - Identity element of scalars:
$1 \cdot \mathbf{v} = \mathbf{v}$ . - Distributivity over vector addition:
$a(\mathbf{u} + \mathbf{v}) = a\mathbf{u} + a\mathbf{v}$ . - Distributivity over scalar addition:
$(a+b)\mathbf{v} = a\mathbf{v} + b\mathbf{v}$ .
If a set
Example 4.1.1. Standard Euclidean space
Example 4.1.2. Polynomials
The set of all polynomials with real coefficients, denoted
Example 4.1.3. Functions
The set of all real-valued functions on an interval, e.g.
Not every set with operations qualifies. For instance, the set of positive real numbers under usual addition is not a vector space, because additive inverses (negative numbers) are missing. The axioms must all hold.
In familiar cases like
The concept of vector space unifies seemingly different mathematical objects under a single framework. Whether dealing with forces in physics, signals in engineering, or data in machine learning, the common language of vector spaces allows us to use the same techniques everywhere.
- Verify that
$\mathbb{R}^2$ with standard addition and scalar multiplication satisfies all eight vector space axioms. - Show that the set of integers
$\mathbb{Z}$ with ordinary operations is not a vector space over$\mathbb{R}$ . Which axiom fails? - Consider the set of all polynomials of degree at most 3. Show it forms a vector space over
$\mathbb{R}$ . What is its dimension? - Give an example of a vector space where the vectors are not geometric objects.
- Prove that in any vector space, the zero vector is unique.
A subspace is a smaller vector space living inside a larger one. Just as lines and planes naturally sit inside three-dimensional space, subspaces generalize these ideas to higher dimensions and more abstract settings.
Let
-
$\mathbf{0} \in W$ (contains the zero vector), - For all
$\mathbf{u}, \mathbf{v} \in W$ , the sum$\mathbf{u} + \mathbf{v} \in W$ (closed under addition), - For all scalars
$c \in \mathbb{R}$ and vectors$\mathbf{v} \in W$ , the product$c\mathbf{v} \in W$ (closed under scalar multiplication).
If these hold, then
Example 4.2.1. Line through the origin in
is a subspace of
Example 4.2.2. The x–y plane in
is a subspace of
Example 4.2.3. Null space of a matrix
For a matrix
is a subspace of
Not every subset is a subspace.
- The set
${ (x,y) \in \mathbb{R}^2 \mid x \geq 0 }$ is not a subspace: it is not closed under scalar multiplication (a negative scalar breaks the condition). - Any line in
$\mathbb{R}^2$ that does not pass through the origin is not a subspace, because it does not contain$\mathbf{0}$ .
Subspaces are the linear structures inside vector spaces.
- In
$\mathbb{R}^2$ , the subspaces are: the zero vector, any line through the origin, or the entire plane. - In
$\mathbb{R}^3$ , the subspaces are: the zero vector, any line through the origin, any plane through the origin, or the entire space. - In higher dimensions, the same principle applies: subspaces are the flat linear pieces through the origin.
Subspaces capture the essential structure of linear problems. Column spaces, row spaces, and null spaces are all subspaces. Much of linear algebra consists of understanding how these subspaces intersect, span, and complement each other.
- Prove that the set
$W = { (x,0) \mid x \in \mathbb{R} } \subseteq \mathbb{R}^2$ is a subspace. - Show that the line
${ (1+t, 2t) \mid t \in \mathbb{R} }$ is not a subspace of$\mathbb{R}^2$ . Which condition fails? - Determine whether the set of all vectors
$(x,y,z) \in \mathbb{R}^3$ satisfying$x+y+z=0$ is a subspace. - For the matrix
describe the null space of
The ideas of span, basis, and dimension provide the language for describing the size and structure of subspaces. Together, they tell us how a vector space is generated, how many building blocks it requires, and how those blocks can be chosen.
Given a set of vectors
The span is always a subspace of
Example 4.3.1.
In
A basis of a vector space
- Span
$V$ . - Are linearly independent (no vector in the set is a linear combination of the others).
If either condition fails, the set is not a basis.
Example 4.3.2.
In
form a basis. Every vector
The dimension of a vector space
Examples 4.3.3.
-
$\dim(\mathbb{R}^2) = 2$ , with basis$(1,0), (0,1)$ . -
$\dim(\mathbb{R}^3) = 3$ , with basis$(1,0,0), (0,1,0), (0,0,1)$ . - The set of polynomials of degree at most 3 has dimension 4, with basis
$(1, x, x^2, x^3)$ .
- The span is like the reach of a set of vectors.
- A basis is the minimal set of directions needed to reach everything in the space.
- The dimension is the count of those independent directions.
Lines, planes, and higher-dimensional flats can all be described in terms of span, basis, and dimension.
These concepts classify vector spaces and subspaces in terms of size and structure. Many theorems in linear algebra-such as the Rank–Nullity Theorem-are consequences of understanding span, basis, and dimension. In practical terms, bases are how we encode data in coordinates, and dimension tells us how much freedom a system truly has.
- Show that
$(1,0,0)$ ,$(0,1,0)$ ,$(1,1,0)$ span the$xy$ -plane in$\mathbb{R}^3$ . Are they a basis? - Find a basis for the line
${(2t,-3t,t) : t \in \mathbb{R}}$ in$\mathbb{R}^3$ . - Determine the dimension of the subspace of
$\mathbb{R}^3$ defined by$x+y+z=0$ . - Prove that any two different bases of
$\mathbb{R}^n$ must contain exactly$n$ vectors. - Give a basis for the set of polynomials of degree
$\leq 2$ . What is its dimension?
Once a basis for a vector space is chosen, every vector can be expressed uniquely as a linear combination of the basis vectors. The coefficients in this combination are called the coordinates of the vector relative to that basis. Coordinates allow us to move between the abstract world of vector spaces and the concrete world of numbers.
Let
be an ordered basis for
The scalars
Example 4.4.1. Let the basis be
To find the coordinates of
This gives the system
Adding:
So,
In
Relative to this basis, the coordinates of a vector are simply its entries. Thus, column vectors are coordinate representations by default.
If
with basis vectors as columns. For any vector
Thus, switching between bases reduces to matrix multiplication.
Coordinates are the address of a vector relative to a chosen set of directions. Different bases are like different coordinate systems: Cartesian, rotated, skewed, or scaled. The same vector may look very different numerically depending on the basis, but its geometric identity is unchanged.
Coordinates turn abstract vectors into concrete numerical data. Changing basis is the algebraic language for rotations of axes, diagonalization of matrices, and principal component analysis in data science. Mastery of coordinates is essential for moving fluidly between geometry, algebra, and computation.
- Express
$(4,2)$ in terms of the basis$(1,1), (1,-1)$ . - Find the coordinates of
$(1,2,3)$ relative to the standard basis of$\mathbb{R}^3$ . - If
$\mathcal{B} = {(2,0), (0,3)}$ , compute$[ (4,6) ]_{\mathcal{B}}$ . - Construct the change of basis matrix from the standard basis of
$\mathbb{R}^2$ to$\mathcal{B} = {(1,1), (1,-1)}$ . - Prove that coordinate representation with respect to a basis is unique.
A central theme of linear algebra is understanding linear transformations: functions between vector spaces that preserve their algebraic structure. These transformations generalize the idea of matrix multiplication and capture the essence of linear behavior.
Let
is called a linear transformation (or linear map) if for all vectors
-
Additivity:
$$ T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}), $$
-
Homogeneity:
$$ T(c\mathbf{u}) = cT(\mathbf{u}). $$
If both conditions hold, then
Example 5.1.1. Scaling in
This doubles the length of every vector, preserving direction. It is linear.
Example 5.1.2. Rotation.
Let
This rotates vectors by angle
Example 5.1.3. Differentiation.
Let
The map
is not linear, because
Linear transformations are exactly those that preserve the origin, lines through the origin, and proportions along those lines. They include familiar operations: scaling, rotations, reflections, shears, and projections. Nonlinear transformations bend or curve space, breaking these properties.
Linear transformations unify geometry, algebra, and computation. They explain how matrices act on vectors, how data can be rotated or projected, and how systems evolve under linear rules. Much of linear algebra is devoted to understanding these transformations, their representations, and their invariants.
-
Verify that
$T(x,y) = (3x-y, 2y)$ is a linear transformation on$\mathbb{R}^2$ . -
Show that
$T(x,y) = (x+1, y)$ is not linear. Which axiom fails? -
Prove that if
$T$ and$S$ are linear transformations, then so is$T+S$ . -
Give an example of a linear transformation from
$\mathbb{R}^3$ to$\mathbb{R}^2$ . -
Let
$T:\mathbb{R}[x] \to \mathbb{R}[x]$ be integration:$$ T(p(x)) = \int_0^x p(t),dt. $$
Prove that
$T$ is a linear transformation.
Every linear transformation between finite-dimensional vector spaces can be represented by a matrix. This correspondence is one of the central insights of linear algebra: it lets us use the tools of matrix arithmetic to study abstract transformations.
Let
The action of
Placing these outputs as columns gives the matrix of
Then for any vector
Example 5.2.1. Scaling in
So the matrix is
Example 5.2.2. Rotation in the plane.
The rotation transformation
Example 5.2.3. Projection onto the x-axis.
The map
Matrix representations depend on the chosen basis. If
Matrices are not just convenient notation-they are linear maps once a basis is fixed. Every rotation, reflection, projection, shear, or scaling corresponds to multiplying by a specific matrix. Thus, studying linear transformations reduces to studying their matrices.
Matrix representations make linear transformations computable. They connect abstract definitions to explicit calculations, enabling algorithms for solving systems, finding eigenvalues, and performing decompositions. Applications from graphics to machine learning depend on this translation.
- Find the matrix representation of
$T:\mathbb{R}^2 \to \mathbb{R}^2$ ,$T(x,y) = (x+y, x-y)$ . - Determine the matrix of the linear transformation
$T:\mathbb{R}^3 \to \mathbb{R}^2$ ,$T(x,y,z) = (x+z, y-2z)$ . - What matrix represents reflection across the line
$y=x$ in$\mathbb{R}^2$ ? - Show that the matrix of the identity transformation on
$\mathbb{R}^n$ is$I_n$ . - For the differentiation map
$D:\mathbb{R}_2[x] \to \mathbb{R}_1[x]$ , where$\mathbb{R}_k[x]$ is the space of polynomials of degree at most$k$ , find the matrix of$D$ relative to the bases${1,x,x^2}$ and${1,x}$ .
To understand a linear transformation deeply, we must examine what it kills and what it produces. These ideas are captured by the kernel and the image, two fundamental subspaces associated with any linear map.
The kernel (or null space) of a linear transformation
The kernel is always a subspace of
Example 5.3.1.
Let
In matrix form,
To find the kernel, solve
This gives the equations
a line in
The image (or range) of a linear transformation
Equivalently, it is the span of the columns of the representing matrix. The image is always a subspace of
Example 5.3.2. For the same transformation as above,
the columns are
For a linear transformation
This fundamental result connects the lost directions (kernel) with the achieved directions (image).
- The kernel describes how the transformation flattens space (e.g., projecting a 3D object onto a plane).
- The image describes the target subspace reached by the transformation.
- The rank–nullity theorem quantifies the tradeoff: the more dimensions collapse, the fewer remain in the image.
Kernel and image capture the essence of a linear map. They classify transformations, explain when systems have unique or infinite solutions, and form the backbone of important results like the Rank–Nullity Theorem, diagonalization, and spectral theory.
- Find the kernel and image of
$T:\mathbb{R}^2 \to \mathbb{R}^2$ ,$T(x,y) = (x-y, x+y)$ . - Let $A = \begin{bmatrix} 1 & 2 & 3 \ 0 & 1 & 4 \end{bmatrix}$. Find bases for
$\ker(A)$ and$\text{im}(A)$ . - For the projection map
$P(x,y,z) = (x,y,0)$ , describe the kernel and image. - Prove that
$\ker(T)$ and$\text{im}(T)$ are always subspaces. - Verify the Rank–Nullity Theorem for the transformation in Example 5.3.1.
Linear transformations can look very different depending on the coordinate system we use. The process of rewriting vectors and transformations relative to a new basis is called a change of basis. This concept lies at the heart of diagonalization, orthogonalization, and many computational techniques.
Suppose
If
Equivalently,
Here,
Let
Thus, changing basis corresponds to a similarity transformation of the matrix.
Example 5.4.1.
Let
In the standard basis, its matrix is
Now consider the basis
Then
Computing gives
In this new basis, the transformation is diagonal: one direction is scaled by 4, the other collapsed to 0.
Change of basis is like rotating or skewing your coordinate grid. The underlying transformation does not change, but its description in numbers becomes simpler or more complicated depending on the basis. Finding a basis that simplifies a transformation (often a diagonal basis) is a key theme in linear algebra.
Change of basis connects the abstract notion of similarity to practical computation. It is the tool that allows us to diagonalize matrices, compute eigenvalues, and simplify complex transformations. In applications, it corresponds to choosing a more natural coordinate system-whether in geometry, physics, or machine learning.
- Let $A = \begin{bmatrix} 2 & 1 \ 0 & 2 \end{bmatrix}$. Compute its representation in the basis
${(1,0),(1,1)}$ . - Find the change-of-basis matrix from the standard basis of
$\mathbb{R}^2$ to${(2,1),(1,1)}$ . - Prove that similar matrices (related by
$P^{-1}AP$ ) represent the same linear transformation under different bases. - Diagonalize the matrix $A = \begin{bmatrix} 1 & 0 \ 0 & -1 \end{bmatrix}$ in the basis
${(1,1),(1,-1)}$ . - In
$\mathbb{R}^3$ , let$\mathcal{B} = {(1,0,0),(1,1,0),(1,1,1)}$ . Construct the change-of-basis matrix$P$ and compute$P^{-1}$ .
Determinants are numerical values associated with square matrices. At first they may appear as a complicated formula, but their importance comes from what they measure: determinants encode scaling, orientation, and invertibility of linear transformations. They bridge algebra and geometry.
For a
the determinant is defined as
Geometric meaning: If
For
the determinant can be computed as
Geometric meaning: In
For
- If
$\det(A) = 0$ : the transformation squashes space into a lower dimension, so$A$ is not invertible. - If
$\det(A) > 0$ : volume is scaled by$\det(A)$ , orientation preserved. - If
$\det(A) < 0$ : volume is scaled by$|\det(A)|$ , orientation reversed.
-
Shear in
$\mathbb{R}^2$ : $A = \begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix}$. Then$\det(A) = 1$ . The transformation slants the unit square into a parallelogram but preserves area. -
Projection in
$\mathbb{R}^2$ : $A = \begin{bmatrix} 1 & 0 \ 0 & 0 \end{bmatrix}$. Then$\det(A) = 0$ . The unit square collapses into a line segment: area vanishes. -
Rotation in
$\mathbb{R}^2$ : $R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \ \sin\theta & \cos\theta \end{bmatrix}$. Then$\det(R_\theta) = 1$ . Rotations preserve area and orientation.
The determinant is not just a formula-it is a measure of transformation. It tells us whether a matrix is invertible, how it distorts space, and whether it flips orientation. This geometric insight makes the determinant indispensable in analysis, geometry, and applied mathematics.
- Compute the determinant of $\begin{bmatrix} 2 & 3 \ 1 & 4 \end{bmatrix}$. What area scaling factor does it represent?
- Find the determinant of the shear matrix $\begin{bmatrix} 1 & 2 \ 0 & 1 \end{bmatrix}$. What happens to the area of the unit square?
- For the
$3 \times 3$ matrix $\begin{bmatrix} 1 & 0 & 0 \ 0 & 2 & 0 \ 0 & 0 & 3 \end{bmatrix}$, compute the determinant. How does it scale volume in$\mathbb{R}^3$ ? - Show that any rotation matrix in
$\mathbb{R}^2$ has determinant$1$ . - Give an example of a
$2 \times 2$ matrix with determinant$-1$ . What geometric action does it represent?
Beyond their geometric meaning, determinants satisfy a collection of algebraic rules that make them powerful tools in linear algebra. These properties allow us to compute efficiently, test invertibility, and understand how determinants behave under matrix operations.
Let
-
Identity:
$$ \det(I_n) = 1. $$
-
Triangular matrices: If
$A$ is upper or lower triangular, then$$ \det(A) = a_{11} a_{22} \cdots a_{nn}. $$
-
Row/column swap: Interchanging two rows (or columns) multiplies the determinant by
$-1$ . -
Row/column scaling: Multiplying a row (or column) by a scalar
$c$ multiplies the determinant by$c$ . -
Row/column addition: Adding a multiple of one row to another does not change the determinant.
-
Transpose:
$$ \det(A^T) = \det(A). $$
-
Multiplicativity:
$$ \det(AB) = \det(A)\det(B). $$
-
Invertibility:
$A$ is invertible if and only if$\det(A) \neq 0$ .
Example 6.2.1. For
Example 6.2.2. Let
Then
Since
This matches the multiplicativity rule:
- Row swaps: flipping orientation of space.
- Scaling a row: stretching space in one direction.
- Row replacement: sliding hyperplanes without altering volume.
- Multiplicativity: performing two transformations multiplies their scaling factors.
These properties make determinants both computationally manageable and geometrically interpretable.
Determinant properties connect computation with geometry and theory. They explain why Gaussian elimination works, why invertibility is equivalent to nonzero determinant, and why determinants naturally arise in areas like volume computation, eigenvalue theory, and differential equations.
-
Compute the determinant of
$$ A = \begin{bmatrix} 1 & 2 & 3 \ 0 & 1 & 4 \ 0 & 0 & 2 \end{bmatrix}. $$
-
Show that if two rows of a square matrix are identical, then its determinant is zero.
-
Verify
$\det(A^T) = \det(A)$ for$$ A = \begin{bmatrix} 2 & -1 \ 3 & 4 \end{bmatrix}. $$
-
If
$A$ is invertible, prove that$$ \det(A^{-1}) = \frac{1}{\det(A)}. $$
-
Suppose
$A$ is a$3\times 3$ matrix with$\det(A) = 5$ . What is$\det(2A)$ ?
While determinants of small matrices can be computed directly from formulas, larger matrices require a systematic method. The cofactor expansion (also known as Laplace expansion) provides a recursive way to compute determinants by breaking them into smaller ones.
For an
- The minor
$M_{ij}$ is the determinant of the$(n-1) \times (n-1)$ matrix obtained by deleting the$i$ -th row and$j$ -th column of$A$ . - The cofactor
$C_{ij}$ is defined by
The sign factor
$$ \begin{bmatrix}
- & - & + & - & \cdots \
- & + & - & + & \cdots \
- & - & + & - & \cdots \ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}. $$
The determinant of
Example 6.3.1. Compute
Expand along the first row:
- For
$C_{11}$ : $M_{11} = \det \begin{bmatrix} 4 & 5 \ 0 & 6 \end{bmatrix} = 24$, so$C_{11} = (+1)(24) = 24$ . - For
$C_{12}$ : $M_{12} = \det \begin{bmatrix} 0 & 5 \ 1 & 6 \end{bmatrix} = 0 - 5 = -5$, so$C_{12} = (-1)(-5) = 5$ . - For
$C_{13}$ : $M_{13} = \det \begin{bmatrix} 0 & 4 \ 1 & 0 \end{bmatrix} = 0 - 4 = -4$, so$C_{13} = (+1)(-4) = -4$ .
Thus,
- Expansion along any row or column yields the same result.
- The cofactor expansion provides a recursive definition of determinant: a determinant of size
$n$ is expressed in terms of determinants of size$n-1$ . - Cofactors are fundamental in constructing the adjugate matrix, which gives a formula for inverses:
Cofactor expansion breaks down the determinant into contributions from sub-volumes defined by fixing one row or column at a time. Each cofactor measures how that row/column influences the overall volume scaling.
Cofactor expansion generalizes the small-matrix formulas and provides a conceptual definition of determinants. While not the most efficient way to compute determinants for large matrices, it is essential for theory, proofs, and connections to adjugates, Cramer’s rule, and classical geometry.
-
Compute the determinant of
$$ \begin{bmatrix} 2 & 0 & 1 \ 3 & -1 & 4 \ 1 & 2 & 0 \end{bmatrix} $$
by cofactor expansion along the first column.
-
Verify that expanding along the second row of Example 6.3.1 gives the same determinant.
-
Prove that expansion along any row gives the same value.
-
Show that if a row of a matrix is zero, then its determinant is zero.
-
Use cofactor expansion to prove that
$\det(A) = \det(A^T)$ .
Determinants are not merely algebraic curiosities; they have concrete geometric and computational uses. Two of the most important applications are measuring volumes and testing invertibility of matrices.
Given vectors
Then
- In
$\mathbb{R}^2$ ,$|\det(A)|$ gives the area of the parallelogram spanned by$\mathbf{v}_1, \mathbf{v}_2$ . - In
$\mathbb{R}^3$ ,$|\det(A)|$ gives the volume of the parallelepiped spanned by$\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3$ . - In higher dimensions, it generalizes to
$n$ -dimensional volume (hypervolume).
Example 6.4.1. Let
Then
So the parallelepiped has volume
A square matrix
- If
$\det(A) = 0$ : the transformation collapses space into a lower dimension (area/volume is zero). No inverse exists. - If
$\det(A) \neq 0$ : the transformation scales volume by$|\det(A)|$ , and is reversible.
Example 6.4.2. The matrix
has determinant
Determinants also provide an explicit formula for solving systems of linear equations when the matrix is invertible.
For
where
The sign of
Determinants condense key information: they measure scaling, test invertibility, and track orientation. These insights are indispensable in geometry (areas and volumes), analysis (Jacobian determinants in calculus), and computation ( solving systems and checking singularity).
-
Compute the area of the parallelogram spanned by
$(2,1)$ and$(1,3)$ . -
Find the volume of the parallelepiped spanned by
$(1,0,0), (1,1,0), (1,1,1)$ . -
Determine whether the matrix $\begin{bmatrix} 1 & 2 \ 3 & 6 \end{bmatrix}$ is invertible. Justify using determinants.
-
Use Cramer’s rule to solve
$$ \begin{cases} x + y = 3, \ 2x - y = 0. \end{cases} $$
-
Explain geometrically why a determinant of zero implies no inverse exists.
To extend the geometric ideas of length, distance, and angle beyond
An inner product on a real vector space
that assigns to each pair of vectors
-
Symmetry:
$\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle.$ -
Linearity in the first argument:
$\langle a\mathbf{u} + b\mathbf{w}, \mathbf{v} \rangle = a \langle \mathbf{u}, \mathbf{v} \rangle + b \langle \mathbf{w}, \mathbf{v} \rangle.$ -
Positive-definiteness:
$\langle \mathbf{v}, \mathbf{v} \rangle \geq 0$ , and equality holds if and only if$\mathbf{v} = \mathbf{0}$ .
The standard inner product on
The norm of a vector is its length, defined in terms of the inner product:
For the dot product in
The inner product allows us to define the angle
Thus, two vectors are orthogonal if
Example 7.1.1.
In
So,
Example 7.1.2.
In the function space
defines a length
This generalizes geometry to infinite-dimensional spaces.
- Inner product: measures similarity between vectors.
- Norm: length of a vector.
- Angle: measure of alignment between two directions.
These concepts unify algebraic operations with geometric intuition.
Inner products and norms allow us to extend geometry into abstract vector spaces. They form the basis of orthogonality, projections, Fourier series, least squares approximation, and many applications in physics and machine learning.
-
Compute
$\langle (2,-1,3), (1,4,0) \rangle$ . Then find the angle between them. -
Show that
$|(x,y)| = \sqrt{x^2+y^2}$ satisfies the properties of a norm. -
In
$\mathbb{R}^3$ , verify that$(1,1,0)$ and$(1,-1,0)$ are orthogonal. -
In
$C[0,1]$ , compute$\langle f,g \rangle$ for$f(x)=x$ ,$g(x)=1$ . -
Prove the Cauchy–Schwarz inequality:
$$ |\langle \mathbf{u}, \mathbf{v} \rangle| \leq |\mathbf{u}| , |\mathbf{v}|. $$
One of the most useful applications of inner products is the notion of orthogonal projection. Projection allows us to approximate a vector by another lying in a subspace, minimizing error in the sense of distance. This idea underpins geometry, statistics, and numerical analysis.
Let
Given a vector
The formula is
The error vector
Let
So
The error vector is
Suppose
This is the unique vector in
Orthogonal projection explains the method of least squares. To solve an overdetermined
system
Thus, least squares is just projection in disguise.
- Projection finds the closest point in a subspace to a given vector.
- It minimizes distance (error) in the sense of Euclidean norm.
- Orthogonality ensures the error vector points directly away from the subspace.
Orthogonal projection is central in both pure and applied mathematics. It underlies the geometry of subspaces, the theory of Fourier series, regression in statistics, and approximation methods in numerical linear algebra. Whenever we fit data with a simpler model, projection is at work.
- Compute the projection of
$(2,3)$ onto the vector$(1,1)$ . - Show that
$\mathbf{v} - \text{proj}_{\mathbf{u}}(\mathbf{v})$ is orthogonal to$\mathbf{u}$ . - Let
$W = \text{span}{(1,0,0), (0,1,0)} \subseteq \mathbb{R}^3$ . Find the projection of$(1,2,3)$ onto$W$ . - Explain why least squares fitting corresponds to projection onto the column space of
$A$ . - Prove that projection onto a subspace
$W$ is unique: there is exactly one closest vector in$W$ to a given$\mathbf{v}$ .
The Gram–Schmidt process is a systematic way to turn any linearly independent set of vectors into an orthonormal basis. This is especially useful because orthonormal bases simplify computations: inner products become simple coordinate comparisons, and projections take clean forms.
Given a linearly independent set of vectors
We proceed step by step:
- Start with
$\mathbf{v}_1$ , normalize it to get$\mathbf{u}_1$ . - Subtract from
$\mathbf{v}_2$ its projection onto$\mathbf{u}_1$ , leaving a vector orthogonal to$\mathbf{u}_1$ . Normalize to get$\mathbf{u}_2$ . - For each
$\mathbf{v}_k$ , subtract projections onto all previously constructed $\mathbf{u}1, \dots, \mathbf{u}{k-1}$, then normalize.
For
The result
Take
- Normalize
$\mathbf{v}_1$ :
- Subtract projection of
$\mathbf{v}_2$ on$\mathbf{u}_1$ :
So
Normalize:
- Subtract projections from
$\mathbf{v}_3$ :
After computing, normalize to obtain
The result is an orthonormal basis of the span of
Gram–Schmidt is like straightening out a set of vectors: you start with the original directions and adjust each new vector to be perpendicular to all previous ones. Then you scale to unit length. The process ensures orthogonality while preserving the span.
Orthonormal bases simplify inner products, projections, and computations in general. They make coordinate systems easier to work with and are crucial in numerical methods, QR decomposition, Fourier analysis, and statistics (orthogonal polynomials, principal component analysis).
- Apply Gram–Schmidt to
$(1,0), (1,1)$ in$\mathbb{R}^2$ . - Orthogonalize
$(1,1,1), (1,0,1)$ in$\mathbb{R}^3$ . - Prove that each step of Gram–Schmidt yields a vector orthogonal to all previous ones.
- Show that Gram–Schmidt preserves the span of the original vectors.
- Explain how Gram–Schmidt leads to the QR decomposition of a matrix.
An orthonormal basis is a basis of a vector space in which all vectors are both orthogonal to each other and have unit length. Such bases are the most convenient possible coordinate systems: computations involving inner products, projections, and norms become exceptionally simple.
A set of vectors
-
$\langle \mathbf{u}_i, \mathbf{u}_j \rangle = 0$ whenever$i \neq j$ (orthogonality), -
$|\mathbf{u}_i| = 1$ for all$i$ (normalization), - The set spans
$V$ .
Example 7.4.1. In
is orthonormal under the dot product.
Example 7.4.2. In
is orthonormal.
Example 7.4.3. Fourier basis on functions:
is an orthogonal set in the space of square-integrable functions on
After normalization, it becomes an orthonormal basis.
-
Coordinate simplicity: If
${\mathbf{u}_1,\dots,\mathbf{u}_n}$ is an orthonormal basis of$V$ , then any vector$\mathbf{v}\in V$ has coordinates$$ [\mathbf{v}] = \begin{bmatrix} \langle \mathbf{v}, \mathbf{u}_1 \rangle \ \vdots \ \langle \mathbf{v}, \mathbf{u}_n \rangle \end{bmatrix}. $$
That is, coordinates are just inner products.
-
Parseval’s identity: For any
$\mathbf{v} \in V$ ,$$ |\mathbf{v}|^2 = \sum_{i=1}^n |\langle \mathbf{v}, \mathbf{u}_i \rangle|^2. $$
-
Projections: The orthogonal projection onto the span of
${\mathbf{u}_1,\dots,\mathbf{u}_k}$ is$$ \text{proj}(\mathbf{v}) = \sum_{i=1}^k \langle \mathbf{v}, \mathbf{u}_i \rangle \mathbf{u}_i. $$
- Start with any linearly independent set, then apply the Gram–Schmidt process to obtain an orthonormal set spanning the same subspace.
- In practice, orthonormal bases are often chosen for numerical stability and simplicity of computation.
An orthonormal basis is like a perfectly aligned and equally scaled coordinate system. Distances and angles are computed directly using coordinates without correction factors. They are the ideal rulers of linear algebra.
Orthonormal bases simplify every aspect of linear algebra: solving systems, computing projections, expanding functions, diagonalizing symmetric matrices, and working with Fourier series. In data science, principal component analysis produces orthonormal directions capturing maximum variance.
- Verify that
$(1/\sqrt{2})(1,1)$ and$(1/\sqrt{2})(1,-1)$ form an orthonormal basis of$\mathbb{R}^2$ . - Express
$(3,4)$ in terms of the orthonormal basis${(1/\sqrt{2})(1,1), (1/\sqrt{2})(1,-1)}$ . - Prove Parseval’s identity for
$\mathbb{R}^n$ with the dot product. - Find an orthonormal basis for the plane
$x+y+z=0$ in$\mathbb{R}^3$ . - Explain why orthonormal bases are numerically more stable than arbitrary bases in computations.
The concepts of eigenvalues and eigenvectors reveal the most fundamental behavior of linear transformations. They identify the special directions in which a transformation acts by simple stretching or compressing, without rotation or distortion.
Let
for some scalar
Equivalently, if
Example 8.1.1. Let
Then
So
Example 8.1.2.
Rotation matrix in
If
Eigenvalues arise from solving the characteristic equation:
This polynomial in
- Eigenvectors are directions that remain unchanged in orientation under a transformation; only their length is scaled.
- Eigenvalues tell us the scaling factor along those directions.
- If a matrix has many independent eigenvectors, it can often be simplified (diagonalized) by changing basis.
- Stretching along principal axes of an ellipse (quadratic forms).
- Stable directions of dynamical systems.
- Principal components in statistics and machine learning.
- Quantum mechanics, where observables correspond to operators with eigenvalues.
Eigenvalues and eigenvectors are a bridge between algebra and geometry. They provide a lens for understanding linear transformations in their simplest form. Nearly every application of linear algebra-differential equations, statistics, physics, computer science-relies on eigen-analysis.
- Find the eigenvalues and eigenvectors of $\begin{bmatrix} 4 & 0 \ 0 & -1 \end{bmatrix}$.
- Show that every scalar multiple of an eigenvector is again an eigenvector for the same eigenvalue.
- Verify that the rotation matrix
$R_\theta$ has no real eigenvalues unless$\theta = 0$ or$\pi$ . - Compute the characteristic polynomial of $\begin{bmatrix} 1 & 2 \ 2 & 1 \end{bmatrix}$.
- Explain geometrically what eigenvectors and eigenvalues represent for the shear matrix $\begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix}$.
A central goal in linear algebra is to simplify the action of a matrix by choosing a good basis. Diagonalization is the process of rewriting a matrix so that it acts by simple scaling along independent directions. This makes computations such as powers, exponentials, and solving differential equations far easier.
A square matrix
where
The diagonal entries of
- A matrix is diagonalizable if it has
$n$ linearly independent eigenvectors. - Equivalently, the sum of the dimensions of its eigenspaces equals
$n$ . - Symmetric matrices (over
$\mathbb{R}$ ) are always diagonalizable, with an orthonormal basis of eigenvectors.
Let
- Characteristic polynomial:
So eigenvalues are
- Eigenvectors:
- For
$\lambda = 4$ , solve$(A-4I)\mathbf{v}=0$ : $\begin{bmatrix} 0 & 1 \ 0 & -2 \end{bmatrix}\mathbf{v} = 0$, giving$\mathbf{v}_1 = (1,0)$ . - For
$\lambda = 2$ :$(A-2I)\mathbf{v}=0$ , giving$\mathbf{v}_2 = (1,-2)$ .
- Construct $P = \begin{bmatrix} 1 & 1 \ 0 & -2 \end{bmatrix}$. Then
Thus,
-
Computing powers: If
$A = P D P^{-1}$ , then$$ A^k = P D^k P^{-1}. $$
Since
$D$ is diagonal,$D^k$ is easy to compute. -
Matrix exponentials:
$e^A = P e^D P^{-1}$ , useful in solving differential equations. -
Understanding geometry: Diagonalization reveals the directions along which a transformation stretches or compresses space independently.
Not all matrices can be diagonalized.
has only one eigenvalue
Diagonalization means we have found a basis of eigenvectors. In this basis, the matrix acts by simple scaling along each coordinate axis. It transforms complicated motion into independent 1D motions.
Diagonalization is a cornerstone of linear algebra. It simplifies computation, reveals structure, and is the starting point for the spectral theorem, Jordan form, and many applications in physics, engineering, and data science.
-
Diagonalize
$$ A = \begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix}. $$
-
Determine whether
$$ A = \begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix} $$
is diagonalizable. Why or why not?
-
Find
$A^5$ for$$ A = \begin{bmatrix} 4 & 1 \ 0 & 2 \end{bmatrix} $$
using diagonalization.
-
Show that any
$n \times n$ matrix with$n$ distinct eigenvalues is diagonalizable. -
Explain why real symmetric matrices are always diagonalizable.
The key to finding eigenvalues is the characteristic polynomial of a matrix. This polynomial encodes the values
of
For an
The roots of
Example 8.3.1. Let
Then
Thus eigenvalues are
Example 8.3.2. For
(rotation by 90°),
Eigenvalues are
Example 8.3.3. For a triangular matrix
the determinant is simply the product of diagonal entries minus
So eigenvalues are
-
The characteristic polynomial of an
$n \times n$ matrix has degree$n$ . -
The sum of the eigenvalues (counted with multiplicity) equals the trace of
$A$ :$$ \text{tr}(A) = \lambda_1 + \cdots + \lambda_n. $$
-
The product of the eigenvalues equals the determinant of
$A$ :$$ \det(A) = \lambda_1 \cdots \lambda_n. $$
-
Similar matrices have the same characteristic polynomial, hence the same eigenvalues.
The characteristic polynomial captures when
Characteristic polynomials provide the computational tool to extract eigenvalues. They connect matrix invariants (trace and determinant) with geometry, and form the foundation for diagonalization, spectral theorems, and stability analysis in dynamical systems.
-
Compute the characteristic polynomial of
$$ A = \begin{bmatrix} 4 & 2 \ 1 & 3 \end{bmatrix}. $$
-
Verify that the sum of the eigenvalues of $\begin{bmatrix} 5 & 0 \ 0 & -2 \end{bmatrix}$ equals its trace, and their product equals its determinant.
-
Show that for any triangular matrix, the eigenvalues are just the diagonal entries.
-
Prove that if
$A$ and$B$ are similar matrices, then$p_A(\lambda) = p_B(\lambda)$ . -
Compute the characteristic polynomial of $\begin{bmatrix} 1 & 1 & 0 \ 0 & 1 & 1 \ 0 & 0 & 1 \end{bmatrix}$.
Eigenvalues and eigenvectors are not only central to the theory of linear algebra-they are indispensable tools across mathematics and applied science. Two classic applications are solving systems of differential equations and analyzing Markov chains.
Consider the system
where
If
is a solution.
-
Eigenvalues determine the growth or decay rate:
- If
$\lambda < 0$ , solutions decay (stable). - If
$\lambda > 0$ , solutions grow (unstable). - If
$\lambda$ is complex, oscillations occur.
- If
By combining eigenvector solutions, we can solve general initial conditions.
Example 8.4.1. Let
Then eigenvalues are
Thus one component grows exponentially, the other decays.
A Markov chain is described by a stochastic matrix
Iterating gives
Understanding long-term behavior reduces to analyzing powers of
- The eigenvalue
$\lambda = 1$ always exists. Its eigenvector gives the steady-state distribution. - All other eigenvalues satisfy
$|\lambda| \leq 1$ . Their influence decays as$k \to \infty$ .
Example 8.4.2. Consider
Eigenvalues are
Thus, regardless of the starting distribution, the chain converges to
- In differential equations, eigenvalues determine the time evolution: exponential growth, decay, or oscillation.
- In Markov chains, eigenvalues determine the long-term equilibrium of stochastic processes.
Eigenvalue methods turn complex iterative or dynamical systems into tractable problems. In physics, engineering, and finance, they describe stability and resonance. In computer science and statistics, they power algorithms from Google’s PageRank to modern machine learning.
-
Solve $\tfrac{d}{dt}\mathbf{x} = \begin{bmatrix} 3 & 0 \ 0 & -2 \end{bmatrix}\mathbf{x}$.
-
Show that if
$A$ has a complex eigenvalue$\alpha \pm i\beta$ , then solutions of$\tfrac{d}{dt}\mathbf{x} = A\mathbf{x}$ involve oscillations of frequency$\beta$ . -
Find the steady-state distribution of
$$ P = \begin{bmatrix} 0.7 & 0.2 \ 0.3 & 0.8 \end{bmatrix}. $$
-
Prove that for any stochastic matrix
$P$ ,$1$ is always an eigenvalue. -
Explain why all eigenvalues of a stochastic matrix satisfy
$|\lambda| \leq 1$ .
A quadratic form is a polynomial of degree two in several variables, expressed neatly using matrices. Quadratic forms appear throughout mathematics: in optimization, geometry of conic sections, statistics (variance), and physics (energy functions).
Let
Expanded,
Because
Example 9.1.1. For
Example 9.1.2. The quadratic form
corresponds to the matrix
Example 9.1.3. The conic section equation
is described by the quadratic form
By choosing a new basis consisting of eigenvectors of
Thus quadratic forms can always be expressed as a sum of weighted squares:
where
Quadratic forms describe geometric shapes:
- In 2D: ellipses, parabolas, hyperbolas.
- In 3D: ellipsoids, paraboloids, hyperboloids.
- In higher dimensions: generalizations of ellipsoids.
Diagonalization aligns the coordinate axes with the principal axes of the shape.
Quadratic forms unify geometry and algebra. They are central in optimization (minimizing energy functions), statistics ( covariance matrices and variance), mechanics (kinetic energy), and numerical analysis. Understanding quadratic forms leads directly to the spectral theorem.
- Write the quadratic form
$Q(x,y) = 3x^2 + 4xy + y^2$ as$\mathbf{x}^T A \mathbf{x}$ for some symmetric matrix$A$ . - For $A = \begin{bmatrix} 1 & 2 \ 2 & 1 \end{bmatrix}$, compute
$Q(x,y)$ explicitly. - Diagonalize the quadratic form
$Q(x,y) = 2x^2 + 2xy + 3y^2$ . - Identify the conic section given by
$Q(x,y) = x^2 - y^2$ . - Show that if
$A$ is symmetric, quadratic forms defined by$A$ and$A^T$ are identical.
Quadratic forms are especially important when their associated matrices are positive definite, since these guarantee positivity of energy, distance, or variance. Positive definiteness is a cornerstone in optimization, numerical analysis, and statistics.
A symmetric matrix
-
Positive definite if
$$ \mathbf{x}^T A \mathbf{x} > 0 \quad \text{for all nonzero } \mathbf{x} \in \mathbb{R}^n. $$
-
Positive semidefinite if
$$ \mathbf{x}^T A \mathbf{x} \geq 0 \quad \text{for all } \mathbf{x}. $$
Similarly, negative definite (always < 0) and indefinite (can be both < 0 and > 0) matrices are defined.
Example 9.2.1.
is positive definite, since
for all
Example 9.2.2.
has quadratic form
This matrix is not positive definite, since
For a symmetric matrix
-
Eigenvalue test:
$A$ is positive definite if and only if all eigenvalues of$A$ are positive. -
Principal minors test (Sylvester’s criterion):
$A$ is positive definite if and only if all leading principal minors ( determinants of top-left$k \times k$ submatrices) are positive. -
Cholesky factorization:
$A$ is positive definite if and only if it can be written as$$ A = R^T R, $$
where
$R$ is an upper triangular matrix with positive diagonal entries.
- Positive definite matrices correspond to quadratic forms that define ellipsoids centered at the origin.
- Positive semidefinite matrices define flattened ellipsoids (possibly degenerate).
- Indefinite matrices define hyperbolas or saddle-shaped surfaces.
- Optimization: Hessians of convex functions are positive semidefinite; strict convexity corresponds to positive definite Hessians.
- Statistics: Covariance matrices are positive semidefinite.
- Numerical methods: Cholesky decomposition is widely used to solve systems with positive definite matrices efficiently.
Positive definiteness provides stability and guarantees in mathematics and computation. It ensures energy functions are bounded below, optimization problems have unique solutions, and statistical models are meaningful.
-
Use Sylvester’s criterion to check whether
$$ A = \begin{bmatrix} 2 & -1 \ -1 & 2 \end{bmatrix} $$
is positive definite.
-
Determine whether
$$ A = \begin{bmatrix} 0 & 1 \ 1 & 0 \end{bmatrix} $$
is positive definite, semidefinite, or indefinite.
-
Find the eigenvalues of
$$ A = \begin{bmatrix} 4 & 2 \ 2 & 3 \end{bmatrix}, $$
and use them to classify definiteness.
-
Prove that all diagonal matrices with positive entries are positive definite.
-
Show that if
$A$ is positive definite, then so is$P^T A P$ for any invertible matrix$P$ .
The spectral theorem is one of the most powerful results in linear algebra. It states that symmetric matrices can always be diagonalized by an orthogonal basis of eigenvectors. This links algebra (eigenvalues), geometry (orthogonal directions), and applications (stability, optimization, statistics).
If
-
All eigenvalues of
$A$ are real. -
There exists an orthonormal basis of
$\mathbb{R}^n$ consisting of eigenvectors of$A$ . -
Thus,
$A$ can be written as$$ A = Q \Lambda Q^T, $$
where
$Q$ is an orthogonal matrix ($Q^T Q = I$ ) and$\Lambda$ is diagonal with eigenvalues of$A$ on the diagonal.
- Symmetric matrices are always diagonalizable, and the diagonalization is numerically stable.
- Quadratic forms
$\mathbf{x}^T A \mathbf{x}$ can be expressed in terms of eigenvalues and eigenvectors, showing ellipsoids aligned with eigen-directions. - Positive definiteness can be checked by confirming that all eigenvalues are positive.
Let
- Characteristic polynomial:
Eigenvalues:
- Eigenvectors:
- For
$\lambda=1$ : solve$(A-I)\mathbf{v} = 0$ , giving$(1,-1)$ . - For
$\lambda=3$ : solve$(A-3I)\mathbf{v} = 0$ , giving$(1,1)$ .
- Normalize eigenvectors:
- Then
So
The spectral theorem says every symmetric matrix acts like independent scaling along orthogonal directions. In geometry, this corresponds to stretching space along perpendicular axes.
- Ellipses, ellipsoids, and quadratic surfaces can be fully understood via eigenvalues and eigenvectors.
- Orthogonality ensures directions remain perpendicular after transformation.
- Optimization: The spectral theorem underlies classification of critical points via eigenvalues of the Hessian.
- PCA (Principal Component Analysis): Data covariance matrices are symmetric, and PCA finds orthogonal directions of maximum variance.
- Differential equations & physics: Symmetric operators correspond to measurable quantities with real eigenvalues ( stability, energy).
The spectral theorem guarantees that symmetric matrices are as simple as possible: they can always be analyzed in terms of real, orthogonal eigenvectors. This provides both deep theoretical insight and powerful computational tools.
-
Diagonalize
$$ A = \begin{bmatrix} 4 & 2 \ 2 & 3 \end{bmatrix} $$
using the spectral theorem.
-
Prove that all eigenvalues of a real symmetric matrix are real.
-
Show that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal.
-
Explain geometrically how the spectral theorem describes ellipsoids defined by quadratic forms.
-
Apply the spectral theorem to the covariance matrix
$$ \Sigma = \begin{bmatrix} 2 & 1 \ 1 & 2 \end{bmatrix}, $$
and interpret the eigenvectors as principal directions of variance.
Principal Component Analysis (PCA) is a widely used technique in data science, machine learning, and statistics. At its core, PCA is an application of the spectral theorem to covariance matrices: it finds orthogonal directions (principal components) that capture the maximum variance in data.
Given a dataset of vectors
-
Center the data by subtracting the mean vector
$\bar{\mathbf{x}}$ . -
Form the covariance matrix
$$ \Sigma = \frac{1}{m} \sum_{i=1}^m (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T. $$
-
Apply the spectral theorem:
$\Sigma = Q \Lambda Q^T$ .- Columns of
$Q$ are orthonormal eigenvectors (principal directions). - Eigenvalues in
$\Lambda$ measure variance explained by each direction.
- Columns of
The first principal component is the eigenvector corresponding to the largest eigenvalue; it is the direction of maximum variance.
Suppose we have two-dimensional data points roughly aligned along the line
Eigenvalues are about
- First principal component: the line
$y = x$ . - Most variance lies along this direction.
- Second component is nearly orthogonal (
$y = -x$ ), but variance there is tiny.
Thus PCA reduces the data to essentially one dimension.
- Dimensionality reduction: Represent data with fewer features while retaining most variance.
- Noise reduction: Small eigenvalues correspond to noise; discarding them filters data.
- Visualization: Projecting high-dimensional data onto top 2 or 3 principal components reveals structure.
- Compression: PCA is used in image and signal compression.
The covariance matrix
PCA demonstrates how abstract linear algebra directly powers modern applications. Eigenvalues and eigenvectors give a practical method for simplifying data, revealing patterns, and reducing complexity. It is one of the most important algorithms derived from the spectral theorem.
- Show that the covariance matrix is symmetric and positive semidefinite.
- Compute the covariance matrix of the dataset
$(1,2), (2,3), (3,4)$ , and find its eigenvalues and eigenvectors. - Explain why the first principal component captures the maximum variance.
- In image compression, explain how PCA can reduce storage by keeping only the top
$k$ principal components. - Prove that the sum of the eigenvalues of the covariance matrix equals the total variance of the dataset.
Linear algebra is the language of modern computer graphics. Every image rendered on a screen, every 3D model rotated or projected, is ultimately the result of applying matrices to vectors. Rotations, reflections, scalings, and projections are all linear transformations, making matrices the natural tool for manipulating geometry.
A counterclockwise rotation by an angle
For any vector
This preserves lengths and angles, since
In three dimensions, rotations are represented by
Similar formulas exist for rotations about the
More general 3D rotations can be described by axis–angle representation or quaternions, but the underlying idea is still linear transformations represented by matrices.
To display 3D objects on a 2D screen, we use projections:
-
Orthogonal projection: drops the
$z$ -coordinate, mapping$(x,y,z) \mapsto (x,y)$ .$$ P = \begin{bmatrix} 1 & 0 & 0 \ 0 & 1 & 0 \end{bmatrix}. $$
-
Perspective projection: mimics the effect of a camera. A point
$(x,y,z)$ projects to$$ \left(\frac{x}{z}, \frac{y}{z}\right), $$
capturing how distant objects appear smaller.
These operations are linear (orthogonal projection) or nearly linear (perspective projection becomes linear in homogeneous coordinates).
To unify translations and projections with linear transformations, computer graphics uses homogeneous coordinates. A 3D
point
Example: Translation by
- Rotations preserve shape and size, only changing orientation.
- Projections reduce dimension: from 3D world space to 2D screen space.
- Homogeneous coordinates allow us to combine multiple transformations (rotation + translation + projection) into a single matrix multiplication.
Linear algebra enables all real-time graphics: video games, simulations, CAD software, and movie effects. By chaining simple matrix operations, complex transformations are applied efficiently to millions of points per second.
- Write the rotation matrix for a 90° counterclockwise rotation in
$\mathbb{R}^2$ . Apply it to$(1,0)$ . - Rotate the point
$(1,1,0)$ about the$z$ -axis by 180°. - Show that the determinant of any 2D or 3D rotation matrix is 1.
- Derive the orthogonal projection matrix from
$\mathbb{R}^3$ to the$xy$ -plane. - Explain how homogeneous coordinates allow translations to be represented as matrix multiplications.
Linear algebra provides the foundation for many data science techniques. Two of the most important are dimensionality reduction, where high-dimensional datasets are compressed while preserving essential information, and the least squares method, which underlies regression and model fitting.
High-dimensional data often contains redundancy: many features are correlated, meaning the data essentially lies near a lower-dimensional subspace. Dimensionality reduction identifies these subspaces.
-
PCA (Principal Component Analysis): As introduced earlier, PCA diagonalizes the covariance matrix of the data.
- Eigenvectors (principal components) define orthogonal directions of maximum variance.
- Eigenvalues measure how much variance lies along each direction.
- Keeping only the top
$k$ components reduces data from$n$ -dimensional space to$k$ -dimensional space while retaining most variability.
Example 10.2.1. A dataset of 1000 images, each with 1024 pixels, may have most variance captured by just 50 eigenvectors of the covariance matrix. Projecting onto these components compresses the data while preserving essential features.
Often, we have more equations than unknowns-an overdetermined system:
An exact solution may not exist. Instead, we seek
This leads to the normal equations:
The solution is the orthogonal projection of
Fit a line
Matrix form:
Solve
- Dimensionality reduction: Find the best subspace capturing most variance.
- Least squares: Project the target vector onto the subspace spanned by predictors.
Both are projection problems, solved using inner products and orthogonality.
Dimensionality reduction makes large datasets tractable, filters noise, and reveals structure. Least squares fitting powers regression, statistics, and machine learning. Both rely directly on eigenvalues, eigenvectors, and projections-core tools of linear algebra.
- Explain why PCA reduces noise in datasets by discarding small eigenvalue components.
- Compute the least squares solution to fitting a line through
$(0,0), (1,1), (2,2)$ . - Show that the least squares solution is unique if and only if
$A^T A$ is invertible. - Prove that the least squares solution minimizes the squared error by projection arguments.
- Apply PCA to the data points
$(1,0), (2,1), (3,2)$ and find the first principal component.
Graphs and networks provide a natural setting where linear algebra comes to life. From modeling flows and connectivity to predicting long-term behavior, matrices translate network structure into algebraic form. Markov chains, already introduced in Section 8.4, are a central example of networks evolving over time.
A network (graph) with
For weighted graphs, entries may be positive weights instead of
- The number of walks of length
$k$ from node$i$ to node$j$ is given by the entry$(A^k)_{ij}$ . - Powers of adjacency matrices thus encode connectivity over time.
Another important matrix is the graph Laplacian:
where
-
$L$ is symmetric and positive semidefinite. - The smallest eigenvalue is always
$0$ , with eigenvector$(1,1,\dots,1)$ . - The multiplicity of eigenvalue
$0$ equals the number of connected components in the graph.
This connection between eigenvalues and connectivity forms the basis of spectral graph theory.
A Markov chain can be viewed as a random walk on a graph. If
describes the distribution of positions after
- The steady-state distribution is given by the eigenvector of
$P$ with eigenvalue$1$ . - The speed of convergence depends on the gap between the largest eigenvalue (which is always
$1$ ) and the second largest eigenvalue.
Consider a simple 3-node cycle graph:
This Markov chain cycles deterministically among the nodes. Eigenvalues are the cube roots of
unity:
- Search engines: Google’s PageRank algorithm models the web as a Markov chain, where steady-state probabilities rank pages.
- Network analysis: Eigenvalues of adjacency or Laplacian matrices reveal communities, bottlenecks, and robustness.
- Epidemiology and information flow: Random walks model how diseases or ideas spread through networks.
Linear algebra transforms network problems into matrix problems. Eigenvalues and eigenvectors reveal connectivity, flow, stability, and long-term dynamics. Networks are everywhere-social media, biology, finance, and the internet-so these tools are indispensable.
-
Write the adjacency matrix of a square graph with 4 nodes. Compute
$A^2$ and interpret the entries. -
Show that the Laplacian of a connected graph has exactly one zero eigenvalue.
-
Find the steady-state distribution of the Markov chain with
$$ P = \begin{bmatrix} 0.5 & 0.5 \ 0.4 & 0.6 \end{bmatrix}. $$
-
Explain how eigenvalues of the Laplacian can detect disconnected components of a graph.
-
Describe how PageRank modifies the transition matrix of the web graph to ensure a unique steady-state distribution.
Modern machine learning is built on linear algebra. From the representation of data as matrices to the optimization of large-scale models, nearly every step relies on concepts such as vector spaces, projections, eigenvalues, and matrix decompositions.
A dataset with
$$ X = \begin{bmatrix}
- & \mathbf{x}_1^T & - \
- & \mathbf{x}_2^T & - \ & \vdots & \
- & \mathbf{x}_m^T & - \end{bmatrix}, $$
where each row
At the heart of machine learning are linear predictors:
where
This is solved efficiently using matrix factorizations.
The SVD of a matrix
where
- Singular values measure the importance of directions in feature space.
- SVD is used for dimensionality reduction (low-rank approximations), topic modeling, and recommender systems.
- PCA (Principal Component Analysis): diagonalization of the covariance matrix identifies directions of maximal variance.
- Spectral clustering: uses eigenvectors of the Laplacian to group data points into clusters.
- Stability analysis: eigenvalues of Hessian matrices determine whether optimization converges to a minimum.
Even deep learning, though nonlinear, uses linear algebra at its core:
- Each layer is a matrix multiplication followed by a nonlinear activation.
- Training requires computing gradients, which are expressed in terms of matrix calculus.
- Backpropagation is essentially repeated applications of the chain rule with linear algebra.
Machine learning models often involve datasets with millions of features and parameters. Linear algebra provides the algorithms and abstractions that make training and inference possible. Without it, large-scale computation in AI would be intractable.
-
Show that ridge regression leads to the normal equations
$$ (X^T X + \lambda I)\mathbf{w} = X^T \mathbf{y}. $$
-
Explain how SVD can be used to compress an image represented as a matrix of pixel intensities.
-
For a covariance matrix
$\Sigma$ , show why its eigenvalues represent variances along principal components. -
Give an example of how eigenvectors of the Laplacian matrix can be used for clustering a small graph.
-
In a neural network with one hidden layer, write the forward pass in matrix form.