I am trying to understand the connection between white noise and the Wiener process in the context of SDEs. At the beginning one starts with a differential equation including white noise $\xi_t$, e.g.,
$\frac{dX_t}{dt}=a(X_t,t)+b(X_t,t)\xi_t$.
and some initial value $X_0$. This is equivalent to the integral equation
$X_t=X_0+\int_{0}^{t}a(X_s,s)ds+\int_{0}^{t}b(X_s,s)\xi_sds$.
And now comes the step I do not understand. The second integral is equal to the Ito integral
$\int_{0}^{t}b(X_s,s)dW_s$
where $\{W_t\}_{t\geq0}$ is a Wiener process.
Why is that? How can one explain this in a mathematical rigorous way? Many textbooks just argue that the white noise is some kind of derivative of the Wiener process, i.e., $"\frac{dW}{dt}=\xi_t"$ but do not go into more detail. Why can we "replace $\xi_tdt$ by $dW_t$"?
For reference, here is the definition of a white noise process I am working with:
A white noise process is defined to be a generalized wide-sense stationary Gaussian process $Z_t$ with mean zero and covariance function $E[Z_sZ_t]=\delta_0(t-s)$. Here $\delta_0$ is the Dirac Delta function at 0.