Update:
I am not able to come up with an expression for the desired expected value. One approach that looks useful is splitting the region $D_n<1$ by permutations of $X_i$, and integrating over an $n$-D basis including $\begin{bmatrix}1&1&\cdots&1\end{bmatrix}^T$ and the coefficients for the expression for $D_n$ in that permutation. This isn't too hard to do for a given permutation, but it turns out that permutations with the same number of terms in $D_n$ can have different hypervolumes. It might be possible to find a stricter notion of equivalence for the volumes of permutations, but it would still need to be the case that a pattern emerges for the values of the integrals.
Using simulations, I was able to find that the first few terms of the summation are
$$P[D_n<1]=\frac{a_n}{n!2^{n-3}}$$
$$a_{n,n\geq2}=[1,5,26,133,662,3210,15220,274000\pm8,\dots]$$
Evaluating the sum as described below with these $P[D_n<1]$ up to $n=9$, we get $3.8186$, which is consistent with your simulations.
Answer to related problem, assumes $X_0=0$:
I have come up with an analytic expression for the expected value. It might be possible that my expression can be simplified with a greater understanding of group theory, but I highly doubt that the infinite sum can be resolved into a closed-form expression.
First of all, let $D_n=\sum_{i=1}^n\lvert X_i-X_{i-1}\rvert$ be the random variable which is the total distance traveled after $n$ steps. As a step toward finding $E[N]$, we want to find $P[D_n<1]$.
Now, $D_n$ has all those nasty absolute values, but we can get rid of them by splitting $P[D_n<1]$ into every possible ordering-by-magnitude of $X_i$:
$$P[D_n<1] = P[(X_1>X_2>\cdots>X_n) \wedge (D_n<1)] + \cdots +P[(X_{i_1}>X_{i_2}>\cdots>X_{i_n}) \wedge (D_n<1)]+\cdots$$
Let $S_n$ be the set of permutations of $n$ elements, and let $\sigma=[i_1,i_2,\dots,i_n]\in S_n$. Then we can let $I(\sigma)$ be the event that $X_{i_1}>X_{i_2}>\cdots>X_{i_n}$. Also let $\sigma[j]=i_j$.
Let us now define $\epsilon_i(\sigma)$ to have unity magnitude, but be positive whenever $X_i>X_{i-1}$, and negative when $X_{i-1}>X_i$. This can be done by defining $\sigma^{-1}$ to be the inverse permutation of $\sigma$, and defining
$$\epsilon_1(\sigma)=1;\qquad\epsilon_i(\sigma)=\left\{\begin{matrix}1&{\rm when\ }\sigma^{-1}[i]<\sigma^{-1}[i-1]\\-1&{\rm when\ }\sigma^{-1}[i]>\sigma^{-1}[i-1]\end{matrix}\right.,\ 2\leq i\leq n$$
This gives us that for a given $\sigma$, $D_n=\sum_{i=1}^n\epsilon_i(\sigma)(X_i-X_{i-1})$. We can simplify this a little more by defining
$$c_i(\sigma)=\epsilon_i(\sigma)-\epsilon_{i+1}(\sigma),\ 1\leq i<n;\qquad c_n(\sigma)=\epsilon_i(\sigma)$$
Now for a given $\sigma$ we have
$$D_n=\sum_{i=1}^nc_i(\sigma)X_i$$
We also have that
$$P[D_n<1]=\sum_{\sigma\in S_n}P\left[I(\sigma)\wedge\sum_{i=1}^nc_i(\sigma)X_i<1\right]$$
Now to find $P\left[I(\sigma)\wedge\sum_{i=1}^nc_i(\sigma)X_i<1\right]$, we observe that the event is a subset of the $n$-D unit hypercube. It turns out that this region is an $n$-simplex (eg. triangle, tetrahedron, pentachoron) with vertices at $x_0,x_1,\dots,x_n$, where $x_k$ has the coordinates in dimensions $\sigma[j],\ j\leq k$ equal to the same constant, and the rest zero. Additionally, the vertices $x_1,\dots,x_n$ satisfy the equation
$$\sum_{i=1}^nc_i(\sigma)x_{ki}=1,\qquad x_k=\begin{bmatrix}x_{k1}\\x_{k2}\\\vdots\\x_{kn}\end{bmatrix}$$
Thus we have that
$$x_{kj}=\frac{1}{\sum_{i=1}^kc_{\sigma[i]}(\sigma)}\left\{\begin{matrix}1&{\rm when\ }\sigma^{-1}[j]\leq k\\0&{\rm when\ }\sigma^{-1}[j]> k\end{matrix}\right.$$
The hypervolume of the $n$-simplex is known to be $\frac{1}{n!}$ the volume of the parallelotope (eg. parallelogram, parallelepiped) with the same edges $x_k-x_0$, and can thus be calculated as
$$V_\sigma=\frac{1}{n!}\left\lvert\det\begin{bmatrix}x_1&x_2&\cdots&x_n\end{bmatrix}\right\rvert$$
Now this matrix can be transformed with row rearranging into a triangular matrix, with one non-zero element from each $x_k$ on the diagonal. Then the determinant is the product of the elements on the diagonal, so
$$V_\sigma=\frac{1}{n!}\prod_{k=1}^n\frac{1}{\sum_{i=1}^kc_{\sigma[i]}(\sigma)}$$
This volume in the event space is the probability $P\left[I(\sigma)\wedge\sum_{i=1}^nc_i(\sigma)X_i<1\right]$, so we can recombine the permutations to get the probability we have not reached the required distance in $n$ steps:
$$P[D_n<1]=\sum_{\sigma\in S_n}\frac{1}{n!\prod_{k=1}^n\sum_{i=1}^kc_{\sigma[i]}(\sigma)}$$
Now, to determine the expected value of the number of steps needed $N$, we observe that $P[D_n<1]=P[N>n]$. Then
\begin{align*}
E[N]&=\sum_{n=1}^\infty nP[N=n] \\
&=1-P[N>1]+\sum_{n=2}^\infty n(P[N>n-1]-P[N>n]) \\
&=1+\sum_{n=1}^\infty P(N>n) \\
&=\boxed{1+\sum_{n=1}^\infty\sum_{\sigma\in S_n}\frac{1}{n!\prod_{k=1}^n\sum_{i=1}^kc_{\sigma[i]}(\sigma)}}
\end{align*}
Now, I highly doubt that this expression can be reduced to a closed form, but the infinite sum seems to converge fairly quickly. Implementing this expression in MATLAB, I was able to calculate the first 10 terms within a minute, but it seems to run in at least $O(n!)$ time. The results are shown below:
Max n E[N]
1 2.00000
2 2.75000
3 3.16667
4 3.34896
5 3.41458
6 3.43464
7 3.43996
8 3.44120
9 3.44146
10 3.44151
From this sequence, we can conclude that the true expected value is probably around $E[N]=3.4416$.
Simulating the process for $10^6$ iterations with 10 trials, we get the sorted results:
E[N]
3.4389
3.4399
3.4410
3.4412
3.4416
3.4417
3.4418
3.4420
3.4421
3.4424
which do appear to be centered around the calculated value of 3.4416.
Code:
%% Formula Calculation
N = 10;
P = zeros(N,1);
for n=1:N
p = perms(1:n);
c = zeros(size(p));
c(:,1)=1;
for ii = 2:n
c_plus = zeros(size(p));
c_plus(:,ii) = 1;
c_plus(:,ii-1) = -1;
is_pos = find(p'==ii)<find(p'==ii-1);
c = c+(2is_pos-1).c_plus;
end
r = zeros(size(p));
for ii = 1:size(p,1)
r(ii,:) = c(ii,p(ii,:));
end
P(n) = sum(1./prod(cumsum(r,2),2))/factorial(n);
end
E = 1+cumsum(P);
E(end)
%% Simulation
E_est = zeros(10,1);
N_iter=1e6;
for ii = 1:10
X = [zeros(N_iter,1), rand(N_iter,30)];
D = abs(diff(X,1,2));
D_tot = cumsum(D,2);
Y = sum(D_tot<1,2)+1;
E_est(ii) = mean(Y);
end
sort(E_est)
Although the $X_i$ are independent by definition, $Y_1$ and $Y_2$ can be correlated since each of them depends on the same random variable $X_1$. In fact, we can show that $E[Y_2]\neq E[Y_2|Y_1]$.
In general, we can easily compute $E[Y_2]=\frac{1}{3}$. But suppose we knew $Y_1=1$, and we compute $E[Y_2|Y_1=1]$. $Y_1=1\Rightarrow X_1\in{0,1}$, so $E[Y_2|Y_1=1]=E[Y_2|X_1\in {0,1}]$. Whether $X_1=1$ or $X_1=2$, we observe that $Y_2\sim\mathbb{U}(0,1)$, meaning $E[Y_2|Y_1=1]=\frac{1}{2}\neq E[Y_2]$.
– user1145925 Aug 28 '24 at 23:07