We denote the two statements we are interested in as follows,
Mixing: A measure space $(X,\mathscr{A},\mu)$ with measure preserving map $T:X\rightarrow X$ is said to be mixing if for every $A,B\in \mathscr{A}$, $$\lim_{n\to\infty}\mu(A\cap T^{-n}B)=\mu(A)\mu(B).$$
Alternate Mixing: A measure space $(X,\mathscr{A},\mu)$ with measure preserving map $T:X\rightarrow X$ is said to be alternate mixing if for every $f,g\in D\subseteq L^2(\mu)$,
$$\lim_{n\to\infty}\langle U_T^n f,g\rangle=\lim_{n\to\infty}\langle f(T^n),g\rangle=\langle f,1\rangle\langle 1,g\rangle=\mathbb{E}(f)\mathbb{E}(g).$$ Where $D$ is a dense subset of $L^2(\mu)$.
As stated by the OP, the implication (Alternate Mixing)$\implies$(Mixing) follows easily by considering indicator functions.
For the other implication, we will need the following lemma that follows from Lemma $3.13$ of Rudin's Real and Complex
Analysis, $3$rd Ed, p. $69$.
Lemma Let $(X,\mathscr{A},\mu)$ be any measure space. Then the set $S$ of all simple functions with finite support are dense in $L^2(\mu)$.
(Mixing)$\implies$(Alternate Mixing)
Assume that the statement of the mixing definition holds true. Take any $r,t\in S$. We define,
$$ r(x)=\sum^n_{i=1}a_i\mathbf{1}_{A_i}(x)\qquad t(x)=\sum^m_{j=1}b_j\mathbf{1}_{B_j}(x)$$
Where $\bigcup_{i=1}^n{A_i}\subseteq X$ and $\bigcup_{j=1}^m{B_j}\subseteq X$ and also $\{a_i\}_{i=1}^n\cup\{b_j\}_{j=1}^m\subseteq \mathbb{R}$.
Consider then,
$$\langle r(T^n),t\rangle=\int_X \left(\sum_{i}a_i\mathbf{1}_{T^{-n}A_i}\right) \left(\sum_{j}b_j\mathbf{1}_{B_j}\right)d\mu$$
$$= \int_X \sum_{i,j}a_ib_j\ \mathbf{1}_{T^{-n}A_i\cap B_j}\ d\mu.$$
By the linearity of the integral,
$$\langle r(T^n),t\rangle=\sum_{i,j}a_ib_j \int_X\mathbf{1}_{T^{-n}A_i\cap B_j}\ d\mu=\sum_{i,j}a_ib_j\ \mu(T^{-n}A_i\cap B_j).$$
Therefore,
$$\lim_{n\rightarrow\infty}\langle r(T^n),t\rangle=\lim_{n\rightarrow\infty}\sum_{i,j}a_ib_j\ \mu(T^{-n}A_i\cap B_j)=\sum_{i,j}a_ib_j\ \lim_{n\rightarrow\infty}\mu(T^{-n}A_i\cap B_j).$$
By our assumption, we have that,
$$\lim_{n\to\infty}\langle U_T^n f,g\rangle=\lim_{n\rightarrow\infty}\langle r(T^n),t\rangle=\sum_{i,j}a_ib_j\mu(A_i)\mu(B_j)=\left(\sum_{i}a_i\mu(A_i)\right)\left(\sum_{j}b_j\mu(B_j)\right)$$
$$=\mathbb{E}(r)\mathbb{E}(t)=\langle r,1\rangle\langle 1,t\rangle.$$
And the required result follows.
We can use the result we have just proven to prove that the alternate mixing definition holds true on all of $L^2(\mu)$.
This will follow from the fact that any $f,g\in L^2(\mu)$ can be approximated arbitrarily well by two sequences of simple measurable functions in $S$. Using these sequences, the above argument can be modified to prove the general result.