There is a lot to be said on the subject, but the least technical point of view (in my opinion) is the following:
Consider first the situation in $\mathbb{R}^n$. Let $X,Y \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$ be vector fields. To define the directional derivative of the vector field $X$ in the direction of the vector field $Y$ at a point $p \in \mathbb{R}^n$, we can mimic usual definition of directional derivative:
$$ (\nabla_Y X)(p) := \lim_{t \to 0} \frac{X(p + tY(p)) - X(p)}{t}. $$
The result $(\nabla_Y X)$ is a vector field on $\mathbb{R}^n$. You can check that the operation $\nabla$ defined as above satisfies the following two properties:
- $\nabla_{fY}(X) = f\nabla_Y X$.
- $\nabla_Y(fX) = (Yf)X + f\nabla_YX$.
Here, $X,Y \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$ are vector fields and $f \colon \mathbb{R}^n \rightarrow \mathbb{R}$ is a scalar function. The function $Yf$ (at a point $p$) is the directional derivative of $f$ at $p$ in the direction $Y(p)$.
Now let us try and mimic the above construction on a general manifold. Given vector fields $X,Y \in \mathfrak{X}(M)$, we try to use the same formula and define
$$ (\nabla_Y X)(p) := \lim_{t \to 0} \frac{X(p + tY(p)) - X(p)}{t}. $$
However, we see that there are two problems. First, the expression $X(p + tY(p))$ is not defined because we don't have a way of adding a point $p \in M$ to a tangent vector $tY(p) \in T_pM$. This is not so bad because we can actually replace the expression $p + tY(p)$ with any curve "which goes in the direction $Y(p)$" such as the flow $\varphi_t^Y(p)$. The more serious problem is that we need to subtract the tangent vector $X(p) \in T_pM$ from the tangent vector $X(\varphi_t^Y(p)) \in T_{\varphi_t^Y(p)}$ and those are two tangent vectors that belong to different vector spaces. In general, without any extra data, we have no way of identifying tangent spaces at different points of $M$.
To summarize, we see that we can differentiate vector fields along vector fields without any problem on $\mathbb{R}^n$ but we encounter problems when we try and do it on a general manifold. But $\mathbb{R}^n$ is also a manifold so what makes it special? The fact that it is not only a manifold but a vector space and an affine space and so we can add points to vectors and identify tangent spaces at different points using translations. This is something we don't have on a general manifold.
The definition of an affine connection is meant to supply the manifold $M$ "externally" with an operation $\nabla \colon \mathfrak{X}(M) \times \mathfrak{X}(M) \rightarrow \mathfrak{X}(M)$ which satisfies properties $(1)-(2)$ and so allows us to differentiate vector fields along vector fields. That is, instead of defining the directional derivative of a vector field along a vector field, we require that somebody handles us a mechanism $\nabla$ which satisfies the properties that the familiar derivative satisfied on $\mathbb{R}^n$ and then we will think of it as a directional derivative.
Obviously this raises quite a lot of questions. Does such mechanism always exists? (Yes). Is it unique? (No). Is there a natural choice of such differentiation mechanism? (Yes, under certain circumstances). Can we use this mechanism to recover the ability to identify tangent vectors at different points that was necessary to define the regular directional derivative in $\mathbb{R}^n$? (Yes, at least along curves. This leads to the notion of parallel transport). I refer you to the extensive article on the covariant derivative (which is pretty much another name for an affine connection) on wikipedia for further details.