In this algorithmic analysis of least squares regression, we throw away big-$O$ terms that will be dominated by the biggest, and keep only the dominant term.
On the other hand, in this algorithmic analysis of matrix multiplication followed by truncation, the matrix multiplication dominates the truncation, but we keep the lesser big-O term, and the final result is the product of both.
Why is this? When should we throw away non-dominating terms, and when should we keep them?