Understanding the Gram-Schmidt process

Question

I would like to better understand the gram-schmidt process. The statement of the theorem in my textbook is the following:

The Gram-Schmidt sequence $[u_1, u_2,\ldots]$ has the property that $\{u_1, u_2,\ldots, u_n\}$ is an orthonormal base for the linear span of $\{x_1, x_2, \ldots, x_k\}$ for $k\geq 1$. The formula for $\{u_1, u_2,\ldots, u_n\}$ is: \begin{equation} x_k = \left|\left| x_k - \sum\limits_{i<k}\langle x_k, u_i\rangle u_i \right|\right|_2^{-1} \left(x_k - \sum\limits_{i<k}\langle x_k, u_i\rangle u_i\right) \end{equation}

Note that I am primarily interested in how all of the vectors are orthogonal. The norm term in the above equation tells me that all the vectors will be unit vectors and hence we get an orthonormal set. Anyway, I see how this works algebraically; Let $v = x_k - \sum\limits_{i<k}\langle x_k, u_i\rangle u_i$. Now, take the dot product of $\langle v, u_j\rangle$ for some $j<k$: \begin{equation} \langle v, u_j\rangle = \langle x_k, u_j\rangle - \sum\limits_{i<k}\langle x_k, u_i\rangle\langle u_i, u_j\rangle \end{equation} When we assume in the induction hypothesis that we have an orthonormal basis for $i<k$ then the sum is zero except when $i=j$. This leaves us with: \begin{equation} \langle v, u_j\rangle = \langle x_k, u_j\rangle - \langle x_k, u_j\rangle = 0 \end{equation}

OK, I can logically follow algebra, but how can I see this geometrically? Can someone provide both 2D and 3D examples/plots? Since I am specifically interested in seeing how all the vectors meet at 90 degrees.

The idea is straightforward. At each stage you have a subspace (with an orthonormal basis). You have some vector that does not lie in this subspace. Then subtract off any component of this vector that lies in the subspace so that the resultant is perpendicular to the subspace. Now normalize the resulting vector and create a new, larger subspace, and repeat... — copper.hat, Mar 02 '14 at 05:49

score 18 · Answer 1 · answered Mar 02 '14 at 06:04

Consider the following diagram, courtesy of mathinsight.org:

dot product

You can think of $(a \cdot u) u$ as the piece of $a$ that is in the direction of $u$. The part that is left over, $a - (a \cdot u) u$, must naturally be the missing side of the triangle, and hence is perpendicular to $u$. So at each step of the Gram-Schmidt process, the formula

$$ v_{n+1} = a - \sum_{j=1}^n \langle a, u_j \rangle u_j, \quad u_{n+1} = v_{n+1}/ \|v_{n+1} \|$$

does the following: it first subtracts all the pieces of $a$ that are in the same direction as all the $u_j$, then it renormalizes. The resulting vector must be orthogonal to all the $u_j$'s since you just subtracted out all the pieces that were not perpendicular.

Callus - Reinstate Monica · Answer 2 · 2014-03-03T07:22:49.010

The geometric picture from Gram-Schmidt is this:

You start with a basis. Take the first vector. Scale it so that it's a unit vector. Good start. Take the second vector. If it's orthogonal to the first vector, great. otherwise, subtract off a multiple of the first vector until it is. Then scale it so that it's a unit vector. Moving on, take the third vector. Subtract off enough of the first vector from it so that it's orthogonal to the first vector now. Then subtract off enough of the second vector so it's orthogonal to that one, too. Now scale it so that it's a unit vector. Keep going like this, by taking the next vector and subtracting of bits of the previous vectors so that it's orthogonal to all of them, and then rescale so that it's a unit vector. The "bit" that you have to subtract off is the projection of the vector you're currently working on onto the unit vector, and the formula for that is given by the dot product.

You might find this animation helpful, but I actually found it a little difficult to follow. http://www.youtube.com/watch?v=pIy8xqh9sWs

score 1 · Answer 3 · answered Dec 19 '20 at 03:23

Question has already been answered, but I just want to make an important point. Gram-Schmidt uses the fact that if you want to project a vector $v$ onto the span of two other vectors $a_1, a_2$, then you just add the projections of $v$ on each of these vectors independently. That is,

$$\text{proj}_{\text{span}(a_1, a_2)} v = \text{proj}_{a_1}v + \text{proj}_{a_2}v$$ BE CAREFUL!!! This formula only works of $a_1, a_2$ are already orthogonal to each other.

To see why this formula fails if they're not, consider $a_1 = (0, 1000, 0)$, $a_2 = (1, 1000, 0)$ and $v = (0, 1000, 50)$. Then $$\text{proj}_{\text{span}(a_1, a_2)} v = (0, 1000, 0).$$

But $$\text{proj}_{a_1}v = (0, 1000, 0)$$ and $$\text{proj}_{a_2}v \approx (0, 1000, 0)$$ so adding these would give $$\text{proj}_{\text{span}(a_1, a_2)} v \approx (0, 2000, 0).$$ which is wrong.

score 1 · Answer 4 · answered Mar 02 '14 at 06:11

Let $S_{j}$ be the space spanned by $\{\mathbf{x}_{1},\ldots,\mathbf{x}_{j}\}$.

We use an inductive approach, and assume that $\{\mathbf{u}_{1},\ldots,\mathbf{u}_{j}\}$ is an orthonormal basis for $S_{j}$, with the goal of proving that $\{\mathbf{u}_{1},\ldots,\mathbf{u}_{j+1}\}$ is an orthonormal basis for the space $S_{j+1}$.

The method is easiest to understand geometrically if you think in terms of projections. Let $P_{j}$ be the orthogonal projection onto $S_{j}$.

The main insight is that $$P_{j}\mathbf{x}_{j+1}=\sum_{i\leq j}\langle \mathbf{x}_{j},\mathbf{u}_{i}\rangle\mathbf{u}_{i}$$ so that this sum is the decomposition of $P_{j}\mathbf{x}_{j+1}$ into the orthonormal basis for $S_{j}$. Furthermore, the vector $\left(\mathbf{x}_{j+1}-P_{j}\mathbf{x}_{j+1}\right)$ is the error term associated with this projection, and as such is orthogonal to each of the $\mathbf{u_{i}}$s in $S_{j}$. It follows that $\left\{\mathbf{u}_{1},\ldots, \mathbf{u}_{j},\left(\mathbf{x}_{j+1}-P_{j}\mathbf{x}_{j+1}\right)\right\}$ is an orthogonal basis for $S_{j+1}$. All that remains to generate our orthonormal basis is to divide $\left(\mathbf{x}_{j+1}-P_{j}\mathbf{x}_{j+1}\right)$ by its length. This normalized vector is our $\mathbf{u}_{j+1}$.

score 0 · Answer 5 · answered Jun 22 '24 at 11:09

Let $\mathbb{K} \in \{\mathbb{R}, \mathbb{C}\}$ be either the set of real number $\mathbb{R}$ or the set of complex numbers $\mathbb{C}$ (with their respective algebra).

Definition (Standard inner product on $\mathbb{K}^n$). The standard inner product between two vectors $\mathbf{x},\mathbf{y} \in \mathbb{K}^n$ is the scalar \begin{equation} \langle \mathbf{x}, \mathbf{y} \rangle = \mathbf{x}^* \mathbf{y} = \sum_{i=1}^n \overline{x}_i y_i, \end{equation} where $\bullet^*$ denotes the the conjugate transpose and $\overline{\bullet}$ the complex conjugate. In particular, for $\mathbf{x},\mathbf{y} \in \mathbb{R}^n$, we have \begin{equation} \langle \mathbf{x}, \mathbf{y} \rangle = \mathbf{x}^\mathrm{T} \mathbf{y} = \sum_{i=1}^n x_i y_i. \end{equation}

Definition (Standard norm on $\mathbb{K}^n$). The standard norm on $\mathbb{K}^n$ is the norm induced by the standard inner product on $\mathbb{K}^n$, i.e. $\Vert{\mathbf{x}}\Vert := \langle \mathbf{x}, \mathbf{x} \rangle^{\tfrac{1}{2}}$.

Definition (Orthogonal projection). The orthogonal projection of a vector $\mathbf{x} \in \mathbb{R}^n$ onto the subspace $\mathrm{span}(\mathbf{e}_i)_{i=1}^n$ spanned by the orthonormal vectors $(\mathbf{e}_i)_{i=1}^m$, with \begin{equation} \langle \mathbf{e}_i, \mathbf{e}_j \rangle = \delta_{ij} := \begin{cases} 1 & i = j \\ 0 & \text{otherwise} \end{cases}, \end{equation} is the vector \begin{equation} \underset{\mathrm{span}(\mathbf{e}_i)_{i=1}^m}{\mathrm{proj}} \mathbf{x} := \sum_{i=1}^m \langle \mathbf{a}_i, \mathbf{e}_i \rangle \mathbf{e}_i. \end{equation}

Proposition (Complement). For some vector $\mathbf{x} \in \mathbb{R}^n$ and an orthonormal system $(\mathbf{e}_i)_{i=1}^m$, if \begin{equation} \hat{\mathbf{x}} = \underset{\mathrm{span}(\mathbf{e}_i)_{i=1}^m}{\mathrm{proj}} \mathbf{x}, \end{equation} then, for any vector $\mathbf{y} \in \mathrm{span}(\mathbf{e}_i)_{i=1}^m$, we have \begin{equation} \langle \mathbf{x} - \hat{\mathbf{x}}, \mathbf{y} \rangle = 0. \end{equation}

Proof. As $\mathbf{y} \in \mathrm{span}(\mathbf{e}_i)_{i=1}^m$, we have \begin{equation} \mathbf{y} = \sum_{i=1}^m \langle \mathbf{y}, \mathbf{e}_i \rangle \mathbf{e}_i, \end{equation} so \begin{equation} \langle \hat{\mathbf{x}}, \mathbf{y} \rangle = \langle \sum_{i=1}^m \langle \mathbf{x}, \mathbf{e}_i \rangle \mathbf{e}_i, \mathbf{y} \rangle = \sum_{i=1}^m \langle \mathbf{x}, \mathbf{e}_i \rangle \langle \mathbf{y}, \mathbf{e}_i \rangle = \langle \mathbf{x}, \mathbf{y} \rangle. \qquad \square \end{equation}

Proposition (Gram-Schmidt) Let $(\mathbf{a}_i)_{i=1}^n$ be a basis for $\mathbb{K}^n$. Define the system $(\mathbf{e}_i)_{i=1}^n$ recursively by the Gram-Schmidt procedure, \begin{equation} \mathbf{u}_1 := \mathbf{a}_1, \qquad \mathbf{e}_1 := \frac{\mathbf{u}_1}{\Vert \mathbf{u}_1 \Vert} \end{equation} and \begin{equation} \mathbf{u}_j := \mathbf{a}_j - \underset{\mathrm{span}(\mathbf{e}_k)_{k=1}^{j-1}}{\mathrm{proj}} \mathbf{a}_j, \qquad \mathbf{e}_j := \frac{\mathbf{u}_j}{\Vert \mathbf{u}_j \Vert} \qquad (\,{j=2,3,\ldots,n}\,). \end{equation} Then, $(\mathbf{e}_i)_{i=1}^n$ is an orthonormal system and \begin{equation} \mathrm{span}(\mathbf{e}_i)_{i=1}^m = \mathrm{span}(\mathbf{a}_i)_{i=1}^m \qquad {(\,m=1,2,\ldots,n\,)}, \end{equation} and in particular $\mathbb{K}^n = \mathrm{span}(\mathbf{a}_i)_{i=1}^n = \mathrm{span}(\mathbf{e}_i)_{i=1}^n$ so $(\mathbf{e}_i)_{i=1}^n$ is an orthonormal basis for $\mathbb{K}^n$.

Proof. We wish to show that, during the Gram-Schmidt process, we do not divide by zero and that the procedure arrives at the desired orthonormal system.
Let $(\mathbf{a}_i)_{i=1}^n$ be a basis for $\mathbb{K}^n$. We proceed by induction on the size $m = 1,2,\ldots,n$ of the subsystem $(\mathbf{a}_i)_{i=1}^m$.
( Base case ) Consider the subsystem $(\mathbf{a}_i)_{i=1}^n = (\mathbf{a}_1)$ of size $m=1$. All vectors in $(\mathbf{a}_i)_{i=1}^n$ are non-zero, otherwise $(\mathbf{a}_i)_{i=1}^n$ would not be linearly independent. Hence, $\mathbf{u}_1 := \mathbf{a}_1 \neq \mathbf{0}$, the norm $\Vert \mathbf{u}_1 \Vert \neq 0$, and we can divide $\mathbf{u}_1$ by its norm to get $\mathbf{e}_1 := \frac{\mathbf{u}_1}{\Vert \mathbf{u}_1 \Vert}$. The system $(\mathbf{e}_1) = (\mathbf{e}_1)_{i=1}^1$ is trivially orthonormal and $\mathrm{span}(\mathbf{a}_1) = \mathrm{span}(\mathbf{u}_1) = \mathrm{span}(\mathbf{e}_1)$.
( Induction hypothesis ) Suppose that, for some subsystem $(\mathbf{a}_j)_{j=1}^{m}$ of size $m = 1,2,\ldots,n-1$, we have that the corresponding system $(\mathbf{e}_j)_{j=1}^m$ is orthonormal and that $\mathrm{span}(\mathbf{e}_j)_{j=1}^{m} = \mathrm{span}(\mathbf{a}_j)_{j=1}^{m}$.
( Induction step ) Consider the next system $(\mathbf{a}_j)_{j=1}^{m+1}$. By definition, \begin{equation} \mathbf{u}_{m+1} = \mathbf{a}_{m+1} - \underset{\mathrm{span}(\mathbf{e}_k)_{k=1}^{m}}{\mathrm{proj}} \mathbf{a}_{m+1}. \end{equation} Suppose towards a contradiction that $\mathbf{u}_{m+1} = \mathbf{0}$, and we cannot divide by $\Vert \mathbf{u}_{m+1} \Vert$. Then, \begin{equation} \mathbf{u}_{m+1} = \mathbf{0} \quad \implies \quad \mathbf{a}_{m+1} = \underset{\mathrm{span}(\mathbf{e}_k)_{k=1}^{m}}{\mathrm{proj}} \mathbf{a}_{m+1} \in \mathrm{span}(\mathbf{e}_k)_{k=1}^m = \mathrm{span}(\mathbf{a}_k)_{k=1}^m, \end{equation} which contradicts our assumption that $(\mathbf{a}_k)_{k=1}^{m+1}$ is linearly independent. Hence, $\mathbf{u}_{m+1} \neq \mathbf{0}$, and $\mathbf{e}_{m+1} = \frac{\mathbf{u}_{m+1}}{\Vert \mathbf{u}_{m+1}\Vert} $ is well-defined. By the proposition above, we have \begin{equation} \langle \mathbf{u}_{m+1}, \mathbf{u}_j \rangle = \mathbf{0} \qquad (\,j=1,2,\ldots,m\,) \end{equation} so \begin{equation} \langle \mathbf{e}_{i}, \mathbf{e}_j \rangle = \frac{1}{\Vert \mathbf{u}_{i} \Vert \Vert \mathbf{u}_j \Vert} \langle \mathbf{u}_{i}, \mathbf{u}_j \rangle = \delta_{ij} \qquad (\,j=1,2,\ldots,m\,), \end{equation} and $(\mathbf{e}_j)_{j=1}^{m+1}$ is an orthonormal system. Furthermore, \begin{equation} \mathrm{span}(\mathbf{e}_i)_{i=1}^{m+1} = \mathrm{span}(\mathbf{u}_i)_{i=1}^{m+1} = \mathrm{span}(\mathbf{u}_1,\mathbf{u}_2,\ldots,\mathbf{u}_{m}, \mathbf{u}_{m+1}) = \mathrm{span}(\mathbf{a}_1,\mathbf{a}_2,\ldots,\mathbf{a}_m,\mathbf{a}_{m+1}-\mathrm{proj}\,\mathbf{a}_{m+1}) = \mathrm{span}(\mathbf{a}_i)_{i=1}^{m+1}. \end{equation} ( Conclusion ) By induction, we have shown that the Gram-Schmidt procedure does not produce division by zero, and that it yields the desired orthonormal system $(\mathbf{e}_i)_{i=1}^n$. $\qquad \square$

Understanding the Gram-Schmidt process

5 Answers5

Linked

Related