Before anything else, we should not forget about the point of making predictions in real life. Indeed, without the absence of information, we would have complete knowledge of an object; and hence, it would not make sense to speak of a prediction. For example, if you pay a predetermined amount of rent each month. One may compare this obvious fact with $E(X|\mathcal{F})=X$ if $\mathcal{F}$ represents the full sigma algebra, i.e perfect information.
However, how do we know whether it makes sense to predict a random variable $X$ in the context of mathematical probability? Well, you would have to examine what $X$ really represents, in particular, whether you can describe this rv as a function explicitly. For example, if we consider discrete probabilities on the outcomes of a dice throw, you could write down $X(i)=i$ and analyze how it behaves on each of the subsets of $\{1,2,...,6\}$. The situation becomes much harder if we consider rvs given implicitly in terms of other rvs (such as time). In what follows, we consider the classical example of a sum of a random number of random variables (which naturally occurs in branching processes).
Let $Y_1,Y_2,...$ be iid rvs and rv $N\in\mathbb{N}$ independent of the $Y_i$, and let's consider the random sum $X=Y_1+...+Y_N$. Okay, why is this implicit? Well, because there is no single representative which maps to an outcome of $w\in\Omega$; we could easily have $Y_1(w)=Y_1(w)+Y_2(w)+Y_3(w)$, and so, deciding when $X\in B$ for Borel $B$, is borderline impossible (unless of course, if you check where each $w\in\Omega$ lands, but this goes against the spirit of probability itself). Again, if you could separate the outcomes of $X$ for different outcomes of $N$ our life would become easy since we would just have to analyze the functions $G_k=Y_1+...+Y_k$ for a fixed integer $k$; however, by the above observation, this is not the case in our problem.
Then, what can we do? We normally like to obtain metrics such as $EX$ or $VarX$ but due to the complexity of X, there is no telling how to compute these values. As "good" architects, let's try to test the age-old wisdom of breaking a complex building into smaller blocks:
We don't know $X$ for sure, except on sections $\{N=n\}$ (think of
these as partial knowledge/information). What if we came up with
something we know for sure (all possible sections of this something
should be given by the partial information we already know)? How can
we achieve this (i)? Apart from that we would like this something to
resemble $X$ at least on the sections $\{N=n\}$ (ii). If we achieved
both, we would have obtained the best guess of $X$.
The question (i) motivates the measurability of our best guess $E(X|\mathcal{F})$ with respect to the sections. That is, our partial knowledge of $X$ becomes the definite knowledge. For this purpose we take sub-sigma algebra generated by N, $\mathcal{F}=\sigma(N)$. And, (ii) motivates the fact that the best guess and $X$ have to average out to the same number on the sections. As an important remark, it is somewhat a miracle that only these two properties provide existence of a well-defined object (by Radon Nikodym). This is why, a priori, posing questions (i) and (ii) in conjunction makes sense.
Having motivated the definition of conditional expectation, let's now see how it can be applied (elegantly!) to our problem. Assume $EN^2,EY^2<\infty$, $EY_1=\mu$, $VarY_1=\sigma^2$. With minimal effort one may prove the identity:
$$Var(X)=E(Var(X|\mathcal{F}))+Var(E(X|\mathcal{F}))$$ where $Var(X|\mathcal{F})=E(X^2|\mathcal{F})-E(X|\mathcal{F})^2$. Choosing $\mathcal{F}$ as above allows you to conclude that on each $N_i=\{N=i\}$, $E(X|\mathcal{F})$ is $Y_1+...+Y_i$ (check the conditions hold!). Now, combining the blocks, it's an easy exercise to compute $E(E(X|\mathcal{F}))$ and verify that $VarX=\sigma^2EN+\mu^2VarN$, a variant of Wald's identity you might have mysteriously came upon in earlier courses!!