(I am using the notation $x=(x_1,x_2)$.)
If we pick some direction $h$, then the rate of change of $f$ in the direction $h$ at a point $x$ is given by $df(x,h) = \lim_{t\downarrow 0} { f(x+th)-f(x) \over h}$.
It is easy to show that if $\lambda \ge 0$, then $df(x,\lambda h) = \lambda df(x,h)$, so we can just look at the case where $\|h\| = 1$.
A (unit) direction $\hat{h}$ gives the greatest descent if $df(x,\hat{h}) \ge df(x,h)$ for all other unit length directions $h$.
If the function is differentiable, we can show that $df(x,h) = \langle \nabla f(x), h \rangle$, and the Cauchy-Bunyakovsky-Schwarz-Bieber inequality gives
$|\langle \nabla f(x), h \rangle| \le \|h\| \|\nabla f(x)\|$. Furthermore,
if we let $\hat{h} = {1 \over \|\nabla f(x)\|} \nabla f(x)$, we have
$\langle \nabla f(x), \hat{h} \rangle = \| \nabla f(x) \|$, and
so $df(x,h) \le df(x,\hat{h})$ for all other unit length $h$.
The particular length doesn't matter since we are only interested in the direction, so we can just pick the direction $h=\nabla f(x)$.
In your example, $\nabla f(x) = \binom{8x}{2y+2}$, so $\nabla f((1,2)) = \binom{8}{6}$.