Let $(\Omega,\mathcal{F},\mathbb{P})$ be a nice filtered probability space with an $m$-dimensional standard Brownian motion $W$. Fix a time horizon $T>0$. Let $\mu \colon [0,T] \times \mathbb{R}^d \rightarrow \mathbb{R}^d$ and $\sigma \colon [0,T] \times \mathbb{R}^d \rightarrow \mathbb{R}^{d \times m}$ be nice. Let $X^{t_0,x}$ be the stochastic process that solves the SDE $$ X_t^{t_0,x} = x + \int_{t_0}^t \mu_s(X_s^{t_0,x}) ds + \int_{t_0}^t \sigma_s(X_s^{t_0,x}) dW_s $$ with initial condition $X_{t_0}^{t_0,x} = x$ at time $t_0$. The generator of this SDE is $$ L_tf(x) = \nabla f(x)^T \mu_t(x) + \frac{1}{2} \mathrm{trace}(\sigma_t(y)\sigma_t(y)^T\nabla^2f(x)). $$ The Feynman-Kac formula in backwards formulation states that the function $$ u_t(x) = \mathbb{E}[f(X_T^{t,x})] $$ solves the PDE $$ \partial_t u_t(x) = -L_tu_t(x) $$ with terminal condition $u_T(x) = f(x)$. The forward formulation states that $$ v_t(x) = \mathbb{E}[f(X_t^{0,x})] $$ solves the PDE $$ \partial_t v_t(x) = L_tv_t(x) $$ with initial condition $v_0(x) = f(x)$.
On the level of PDEs, it is intuitive and easy to see how the forward and backward formulations interact, namely $u_t(x) = v_{T-t}(x)$. On the level of SDEs, there is also a clear conceptual analogy between $$ X_T^{t,x} = x + \int_t^T \mu_s(X_s^{t,x}) ds + \int_t^T \sigma_s(X_s^{t,x}) dW_s \qquad\qquad (1) $$ and $$ X_t^{0,x} = x + \int_0^t \mu_s(X_s^{0,x}) ds + \int_0^t \sigma_s(X_s^{0,x}) dW_s. \qquad\qquad (2) $$ However, I struggle to grasp the direct link between these two formulations on the level of SDEs. More precisely, it follows from $u_t(x) = v_{T-t}(x)$ that $$ \mathbb{E}[f(X_T^{t,x})] = \mathbb{E}[f(X_{T-t}^{0,x})]. \qquad\qquad(3) $$ But the process $t \mapsto X_{T-t}^{0,x}$ is not even $\mathcal{F}$-adapted. So, question number 1: is there an intuition and a proof for (3) without arguing via the PDEs?
Further, many other results in stochastic analysis are formulated for processes of the form (1) and it seems to me that they are, at least formally, not directly applicable to processes of the form (2). Therefore, question number 2: why would I use the backward formulation? I can use the forward formulation instead, where all the other machinery is applicable, and then convert to the backward formulation at the very end. Are there any advantages/disadvantages that I am missing?
Cheers!
EDIT: I have received some helpful remarks from a colleague, reflected in my last comment below. Let me redirect the question a bit based on those. You may focus your attention on this edit, but any additional input to the original question is also highly appreciated!
I was mislead to believe that the so-called forward formulation acctually holds also in the case of time-dependent coefficients. Thus, in the time-dependent case, both questions 1 and 2 as posed above seemingly end up being void.
In the time-independent case, Bart and Sebastian in the comments have pointed in the right direction to understand (3). I understand the idea of their solution and can give an informal argument to support it, but this informal argument is based on a non-rigorous time-change in the Ito integral. I would appreciate any pointers to a very rigorous discussion of this time-homogeneity (textbook or directly as an answer).
Question 2 for the time-independent case still stands in a slightly different way: the proof I have seen for the (backward) Feynman-Kac formula involves the flow property of $X^{t_0,x}_t$. I do not see how the proof would go through for the forward version. So, is there a direct proof of the forward version or does one have to first prove the backward version and then use (3)? (Instead of going the other way around and deducing (3) from the assumed validity of both Feynman-Kac formulas as I had done above.)