Notice that
$$w^TX^TXw−2w^TX^Ty+y^Ty=\langle Xw-y,Xw-y\rangle =\|Xw-y\|^2,$$
where $\langle \cdot,\cdot \rangle $ is the inner (or the dot) product of $\mathbb{R}^n$.
If you denote $$f(w)=\langle Xw-y,Xw-y\rangle,$$ then the inner product properties leads to
$f(w+h)=f(w)+2\langle Xw-y,Xh\rangle+\langle Xh,Xh\rangle=f(w)+2\langle X^T(Xw-y),h\rangle+\langle Xh,Xh\rangle$.
If you use the definition and properties of directional derivative, you find that $$f(w+h)=f(w)+\langle \nabla f(w),h\rangle + \langle Xh,Xh\rangle,$$ and hence $$\frac{d f}{d w}=\nabla f(w)=2X^T(Xw−y).$$
You can find more comments on this thread, and in Matrix Calculus. Try also search for ''\|Xw-y\|^2 derivative'' on SearchOnMath.