Product of a vector and its transpose (Projections)

Question

I am doing a basic course on linear algebra, where the guy says $a^Ta$ is a number and $aa^T$ is a matrix not.m Why?

Background:

Say we are projecting a vector $b$ onto a vector $a$.

By the condition of orthogonality, the dot product is zero $$a^T(b-xa)=0$$ then $$x =\frac{a^Tb} {a^Ta}$$. The projection vector $p$ since it lies on $a$ is: $$p=ax$$ $$p=a\frac{a^Tb} {a^Ta}$$

$$p=\frac{aa^T} {a^Ta}b$$ To me both $aa^T$ and $a^Ta$ are dot products and the order shouldn't matter. Then $p=b$. But it is not. Why?

$a^Ta$ is not a number the projection you get from this form is $A(A^TA)^{-1}A^T$ — Nebo Alex, Jul 09 '16 at 06:20
@Boris In dot product the result is always scalar, isn't it? Can you explain how you to that form. — Abhishek Bhatia, Jul 09 '16 at 06:21
@Boris I understand your point that it is not number in case of matrix multiplication, but in the condition of orthogonality aren't we taking dot products? In dot products the result is scalar. $a^T.(b-xa)=(a^T.b-a^T.xa)==(a^T.b-x(a^T.a))$? — Abhishek Bhatia, Jul 09 '16 at 07:07
If $a$ is an $n \times 1$ matrix, then $a^Ta$ is a $1 \times 1$ matrix, i.e., a scalar and $aa^T$ is an $n \times n$ matrix. — Steven Alexis Gregory, Jul 09 '16 at 07:17
@StevenGregory I don't understand why is it matrix multiplication instead of dot product. — Abhishek Bhatia, Jul 09 '16 at 07:22

amd · Accepted Answer · 2016-07-09T20:43:03.630

You appear to be conflating the dot product $a\cdot b$ of two column vectors with the matrix product $a^Tb$, which computes the same value. The dot product is symmetric, but matrix multiplication is in general not commutative. Indeed, unless $A$ and $B$ are both square matrices of the same size, $AB$ and $BA$ don’t even have the same shape.

In the derivation that you cite, the vectors $a$ and $b$ are being treated as $n\times1$ matrices, so $a^T$ is a $1\times n$ matrix. By the rules of matrix multiplication, $a^Ta$ and $a^Tb$ result in a $1\times1$ matrix, which is equivalent to a scalar, while $aa^T$ produces an $n\times n$ matrix: $$ a^Tb = \begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix}\begin{bmatrix}b_1\\b_2\\ \vdots\\b_n\end{bmatrix} = \begin{bmatrix}a_1b_1+a_2b_2+\cdots+a_n b_n\end{bmatrix} \\ a^Ta = \begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix}\begin{bmatrix}a_1\\a_2\\ \vdots\\a_n\end{bmatrix} = \begin{bmatrix}a_1^2+a_2^2+\cdots+a_n^2\end{bmatrix} $$ so $a^Tb$ is equivalent to $a\cdot b$, while $$aa^T = \begin{bmatrix}a_1\\a_2\\ \vdots\\a_n\end{bmatrix}\begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix} = \begin{bmatrix}a_1^2&a_1a_2&\cdots&a_1a_n \\ a_2a_1&a_2^2&\cdots&a_2a_n \\ \vdots&\vdots&\ddots&\vdots \\ a_na_1&a_na_2&\cdots&a_n^2\end{bmatrix}.$$ Note in particular that $b\cdot a=b^Ta$, not $ba^T$, as the latter is also an $n\times n$ matrix.

The derivation of the projection might be easier to understand if you write it slightly differently. Start with dot products: $$p={a\cdot b\over a\cdot a}a={1\over a\cdot a}a(a\cdot b)$$ then replace the dot products with equivalent matrix products: $$p={1\over a^Ta}a(a^Tb).$$ This expression is a product of the scalar ${1\over a^Ta}$ with three matrices. Since matrix multiplication is associative, we can regroup this as $${1\over a^Ta}(aa^T)b.$$ This is a scalar times an $n\times n$ matrix times an $n\times1$ matrix, i.e., a vector.

Addendum: The scalar factor can be absorbed into the $n\times n$ matrix $aa^T$; the resulting matrix $\pi_a$ represents orthogonal projection onto (the span of) $a$. That it is a projection is easy to verify: $$\pi_a^2 = \left({aa^T\over a^Ta}\right)^2 = {(aa^T)(aa^T)\over (a^Ta)(a^Ta)} = {a(a^Ta)a^T\over(a^Ta)(a^Ta)} = {(a^Ta)(aa^T)\over(a^Ta)(a^Ta)} = {aa^T\over a^Ta} = \pi_a,$$ again using associativity of matrix multiplication and the fact that $a^Ta$ is a scalar so commutes with matrices. In addition, $$\pi_aa = {aa^T\over a^Ta}a = {a^Ta\over a^Ta}a = a,$$ as expected.

In the above derivation of projection onto $a$, $b$ was an arbitrary vector, so for all $b$, $\pi_ab$ is some scalar multiple of $a$. In other words, the image (column space) of $\pi_a$ is spanned by $a$—it’s the line through $a$—and so the rank of $\pi_a$ is one. This can also be seen by examining $aa^T$ directly: each column is a multiple of $a$.

As a final note, the above derivation requires that the vectors and matrices be expressed relative to a basis that’s orthonormal with respect to the dot product. It’s possible to remove this restriction, but the expression for the projection matrix will be more complex.

Can you if possible tell about the column space and rank of $aa^T$. The answer is a line through $a$ and 1 respectively Not sure why. — Abhishek Bhatia, Jul 09 '16 at 13:22
@AbhishekBhatia See the addendum to my answer. The rank and column space of $aa^T$ are, of course, the same as those of $\pi_a$ since one is a nonzero scalar multiple of the other. Also, I realized that it would be even easier to follow the derivation by starting from $p={a\cdot b\over|a|}{a\over|a|}$ and then replacing $|a|^2$ with $a^Ta$ at the very end, but I didn’t want to change the answer that you had accepted. — amd, Jul 09 '16 at 20:35

score 8 · Answer 2 · edited Dec 11 '21 at 15:58

8

If $a = \begin{bmatrix} x\\y\\z \end{bmatrix}$ and $b = \begin{bmatrix} u\\v\\w \end{bmatrix}$, then

$a^Tb = [xu + yv + zw] \;\;$ and $\langle a,b \rangle = xu + yv + zw$

The difference between the one-by-one vector $[xu+yv+zw]$ and the scalar $xu+yv+zw$ is so trivial that it is ignored.

Note also that $ab^T = \begin{bmatrix} xu & xv & xw\\ yu & yv & yw\\ zu & zv & zw \\ \end{bmatrix}$.

edited Dec 11 '21 at 15:58

Gennaro Arguzzi

1,210

answered Jul 09 '16 at 07:38

Steven Alexis Gregory

27,619

I understand this, I don't understand why are you multiplying matrices. My question is why is not a dot product? $a^T.a=a.a^T=$scalar. In the condition of orthogonality aren't we taking dot products?Does this question make sense? – Abhishek Bhatia Jul 09 '16 at 07:42
2

@AbhishekBhatia You are indeed taking dot products here. Note, though, that $a$ is a column vector, but $a^T$ is a row vector. The dot product is only defined for two vectors of the same type, so your expressions $a^T\cdot a$ and $a\cdot a^T$ are meaningless. However, because of the rules of matrix multiplication and the fact that a $1\times1$ matrix can be treated as a scalar, $a\cdot b=a^Tb$. On the other hand, $b\cdot a\ne ba^T$, which is an $n\times n$ matrix. The correct expression is $b\cdot a=b^T a$. – amd Jul 09 '16 at 08:53
Does this actually mean, $ \vec{a}^T\vec{b} $ is a vector while $ \vec{a}\vec{b} $ is a scalar? – Ben Aug 22 '21 at 15:29
@Ben: its complicated. That is to say there is not one standard for representing and differentiating between the elements of the ring $\Bbb R^r$ and the vector space $Bbb R^n$. I don't know, but I think the answer is that both "products" are treated as scalars because the difference is trivial. – Steven Alexis Gregory Aug 22 '21 at 17:04
1

@stevengregory Thanks for the fast answer. Tbh, I'm confused now. I understood the above equations such that the one is a vector $ [ xu + yv + zw ] $ and the other is a scalar $ xu + yv + zw $. What else do the brackets $ [] $ stand for? – Ben Aug 22 '21 at 21:14
1

$[x]$ is a 1 by 1 vector. – Steven Alexis Gregory Aug 22 '21 at 22:04

Product of a vector and its transpose (Projections)

2 Answers2

Linked