While studying discriminant functions for linear classification, I encountered the following:
.. if $\textbf{x}$ is a point on the decision surface, then $y(\textbf{x}) = 0$, and so the normal distance from the origin to the decision surface is given by:
<p>$$\frac{\textbf{w}^T \textbf{x}}{\lvert\lvert \textbf{w} \lvert\lvert} = -\frac{w_0}{\lvert\lvert \textbf{w} \lvert\lvert} \tag 1 $$
Where $\textbf{w}$ is a weight vector, and $w_0$ is a bias. In an attempt to derive the above formula I tried the following:
\begin{align*} & \textbf{w}^T \textbf{x} + w_0 = 0 \tag 2\\ & \textbf{w}^T \textbf{x} = -w_0 \tag 3 \end{align*}
After which I am basically stuck. I think that the author gets about from equation $(3)$ to equation $(1)$ by normalising. But isn't calculating the normal (perpendicular) distance quite separate from normalising a vector? Secondly, how does equation $(1)$ translate into the normal distance being $ - \frac{w_0}{\lvert\lvert \textbf{w} \lvert\lvert}$ i.e. How is the quantity $\frac{\textbf{w}^T \textbf{x}}{\lvert\lvert \textbf{w} \lvert\lvert}$ the normal distance ?