Fix measures $\mu_0,\mu_1,\nu_0,\nu_1$ on the same probability space with $\mu_0\ll\nu_0$ and $\mu_1\ll\nu_1$, i.e. the interesting case. For $\alpha\in[0,1]$ let $\mu_\alpha=(1-\alpha)\mu_0+\alpha\mu_1$ and $\nu_\alpha=(1-\alpha)\nu_0+\alpha\nu_1$. Notice that $\mu_\alpha\ll\nu_\alpha$.
For the following argument, I prefer to work with random variables, so let $B_\alpha\in\{0,1\}$ be Bernoulli with success probability $\alpha$.
Let $X_\alpha$ given $B_\alpha$ have the law $\mu_{B_\alpha}$, and let $Y_\alpha$ given $B_\alpha$ have the law $\nu_{B_\alpha}$.
Using this machinery, I can simply write $\mathcal H(X_\alpha\|Y_\alpha)=\mathcal H((1-\alpha)\mu_0+\alpha\mu_1\|(1-\alpha)\nu_0+\alpha\nu_1)$.
The right hand side of the inequality is the conditional relative entropy of $X_\alpha$ given $B_\alpha$ and $Y_\alpha$ given $B_\alpha$. However, since $\mathcal H(B_\alpha\|B_\alpha)=0$ is clearly trivial, the conditional relative entroy equals the joint relative entropy, that is, we have $\mathcal H(B_\alpha,X_\alpha\|B_\alpha,Y_\alpha)=(1-\alpha)\mathcal H(X_0\|Y_0)+\alpha\mathcal H(X_1\|Y_1)$, which we verify by establishing that the Radon-Nikodym derivative of $(B_\alpha,X_\alpha)$ with respect to $(B_\alpha,Y_\alpha)$ coincides with the $(X_0,Y_0)$-derivative on $B_\alpha=0$ and with the $(X_1,Y_1)$-derivative on $B_\alpha=1$.
Now, let us introduce some notation for the conditional relative entropy, say $\mathcal H(X_\alpha|B_\alpha\|Y_\alpha|B_\alpha)$, so we just discussed that
$\mathcal H(B_\alpha,X_\alpha\|B_\alpha,Y_\alpha)=\mathcal H(B_\alpha\|B_\alpha)+\mathcal H(X_\alpha|B_\alpha\|Y_\alpha|B_\alpha)=\mathcal H(X_\alpha|B_\alpha\|Y_\alpha|B_\alpha)$, by the chain rule for the relative entropy. This chain rule goes two ways, so we also have
$\mathcal H(B_\alpha,X_\alpha\|B_\alpha,Y_\alpha)=\mathcal H(X_\alpha\|Y_\alpha)+\mathcal H(B_\alpha|X_\alpha\|B_\alpha|Y_\alpha)$.
So, the convexity of the relative entropy is just a special case of the chain rule, using that the conditional relative entropy $\mathcal H(B_\alpha|X_\alpha\|B_\alpha|Y_\alpha)\ge 0$ is non-negative.
Unfortunately, the conditional relative entropy is usually only defined for finite supports, using kernels or under other restrictions, since it seems that we may need access to some kind of conditional distribution. However, this is not true, the conditional relative entropy can be defined in general, as discussed here, namely using the chain rule for the definition. Non-negativity is also shown here, which amounts to a straightforward application of Jensen's inequality.