22

I am doing a basic course on linear algebra, where the guy says $a^Ta$ is a number and $aa^T$ is a matrix not.m Why?

Background:

Say we are projecting a vector $b$ onto a vector $a$.

By the condition of orthogonality, the dot product is zero $$a^T(b-xa)=0$$ then $$x =\frac{a^Tb} {a^Ta}$$. The projection vector $p$ since it lies on $a$ is: $$p=ax$$ $$p=a\frac{a^Tb} {a^Ta}$$

$$p=\frac{aa^T} {a^Ta}b$$ To me both $aa^T$ and $a^Ta$ are dot products and the order shouldn't matter. Then $p=b$. But it is not. Why?

Abhishek Bhatia
  • 1,758
  • 7
  • 23
  • 38

2 Answers2

29

You appear to be conflating the dot product $a\cdot b$ of two column vectors with the matrix product $a^Tb$, which computes the same value. The dot product is symmetric, but matrix multiplication is in general not commutative. Indeed, unless $A$ and $B$ are both square matrices of the same size, $AB$ and $BA$ don’t even have the same shape.

In the derivation that you cite, the vectors $a$ and $b$ are being treated as $n\times1$ matrices, so $a^T$ is a $1\times n$ matrix. By the rules of matrix multiplication, $a^Ta$ and $a^Tb$ result in a $1\times1$ matrix, which is equivalent to a scalar, while $aa^T$ produces an $n\times n$ matrix: $$ a^Tb = \begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix}\begin{bmatrix}b_1\\b_2\\ \vdots\\b_n\end{bmatrix} = \begin{bmatrix}a_1b_1+a_2b_2+\cdots+a_n b_n\end{bmatrix} \\ a^Ta = \begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix}\begin{bmatrix}a_1\\a_2\\ \vdots\\a_n\end{bmatrix} = \begin{bmatrix}a_1^2+a_2^2+\cdots+a_n^2\end{bmatrix} $$ so $a^Tb$ is equivalent to $a\cdot b$, while $$aa^T = \begin{bmatrix}a_1\\a_2\\ \vdots\\a_n\end{bmatrix}\begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix} = \begin{bmatrix}a_1^2&a_1a_2&\cdots&a_1a_n \\ a_2a_1&a_2^2&\cdots&a_2a_n \\ \vdots&\vdots&\ddots&\vdots \\ a_na_1&a_na_2&\cdots&a_n^2\end{bmatrix}.$$ Note in particular that $b\cdot a=b^Ta$, not $ba^T$, as the latter is also an $n\times n$ matrix.

The derivation of the projection might be easier to understand if you write it slightly differently. Start with dot products: $$p={a\cdot b\over a\cdot a}a={1\over a\cdot a}a(a\cdot b)$$ then replace the dot products with equivalent matrix products: $$p={1\over a^Ta}a(a^Tb).$$ This expression is a product of the scalar ${1\over a^Ta}$ with three matrices. Since matrix multiplication is associative, we can regroup this as $${1\over a^Ta}(aa^T)b.$$ This is a scalar times an $n\times n$ matrix times an $n\times1$ matrix, i.e., a vector.

Addendum: The scalar factor can be absorbed into the $n\times n$ matrix $aa^T$; the resulting matrix $\pi_a$ represents orthogonal projection onto (the span of) $a$. That it is a projection is easy to verify: $$\pi_a^2 = \left({aa^T\over a^Ta}\right)^2 = {(aa^T)(aa^T)\over (a^Ta)(a^Ta)} = {a(a^Ta)a^T\over(a^Ta)(a^Ta)} = {(a^Ta)(aa^T)\over(a^Ta)(a^Ta)} = {aa^T\over a^Ta} = \pi_a,$$ again using associativity of matrix multiplication and the fact that $a^Ta$ is a scalar so commutes with matrices. In addition, $$\pi_aa = {aa^T\over a^Ta}a = {a^Ta\over a^Ta}a = a,$$ as expected.

In the above derivation of projection onto $a$, $b$ was an arbitrary vector, so for all $b$, $\pi_ab$ is some scalar multiple of $a$. In other words, the image (column space) of $\pi_a$ is spanned by $a$—it’s the line through $a$—and so the rank of $\pi_a$ is one. This can also be seen by examining $aa^T$ directly: each column is a multiple of $a$.

As a final note, the above derivation requires that the vectors and matrices be expressed relative to a basis that’s orthonormal with respect to the dot product. It’s possible to remove this restriction, but the expression for the projection matrix will be more complex.

amd
  • 55,082
  • Can you if possible tell about the column space and rank of $aa^T$. The answer is a line through $a$ and 1 respectively Not sure why. – Abhishek Bhatia Jul 09 '16 at 13:22
  • @AbhishekBhatia See the addendum to my answer. The rank and column space of $aa^T$ are, of course, the same as those of $\pi_a$ since one is a nonzero scalar multiple of the other. Also, I realized that it would be even easier to follow the derivation by starting from $p={a\cdot b\over|a|}{a\over|a|}$ and then replacing $|a|^2$ with $a^Ta$ at the very end, but I didn’t want to change the answer that you had accepted. – amd Jul 09 '16 at 20:35
8

If $a = \begin{bmatrix} x\\y\\z \end{bmatrix}$ and $b = \begin{bmatrix} u\\v\\w \end{bmatrix}$, then

$a^Tb = [xu + yv + zw] \;\;$ and $\langle a,b \rangle = xu + yv + zw$

The difference between the one-by-one vector $[xu+yv+zw]$ and the scalar $xu+yv+zw$ is so trivial that it is ignored.

Note also that $ab^T = \begin{bmatrix} xu & xv & xw\\ yu & yv & yw\\ zu & zv & zw \\ \end{bmatrix}$.

  • I understand this, I don't understand why are you multiplying matrices. My question is why is not a dot product? $a^T.a=a.a^T=$scalar. In the condition of orthogonality aren't we taking dot products?Does this question make sense? – Abhishek Bhatia Jul 09 '16 at 07:42
  • 2
    @AbhishekBhatia You are indeed taking dot products here. Note, though, that $a$ is a column vector, but $a^T$ is a row vector. The dot product is only defined for two vectors of the same type, so your expressions $a^T\cdot a$ and $a\cdot a^T$ are meaningless. However, because of the rules of matrix multiplication and the fact that a $1\times1$ matrix can be treated as a scalar, $a\cdot b=a^Tb$. On the other hand, $b\cdot a\ne ba^T$, which is an $n\times n$ matrix. The correct expression is $b\cdot a=b^T a$. – amd Jul 09 '16 at 08:53
  • Does this actually mean, $ \vec{a}^T\vec{b} $ is a vector while $ \vec{a}\vec{b} $ is a scalar? – Ben Aug 22 '21 at 15:29
  • @Ben: its complicated. That is to say there is not one standard for representing and differentiating between the elements of the ring $\Bbb R^r$ and the vector space $Bbb R^n$. I don't know, but I think the answer is that both "products" are treated as scalars because the difference is trivial. – Steven Alexis Gregory Aug 22 '21 at 17:04
  • 1
    @stevengregory Thanks for the fast answer. Tbh, I'm confused now. I understood the above equations such that the one is a vector $ [ xu + yv + zw ] $ and the other is a scalar $ xu + yv + zw $. What else do the brackets $ [] $ stand for? – Ben Aug 22 '21 at 21:14
  • 1
    $[x]$ is a 1 by 1 vector. – Steven Alexis Gregory Aug 22 '21 at 22:04