Prove Brownian Bridge is Sampling Without Replacement

Question

Let's say we have a bowl containing $n$ many $+1$'s and $n$ many $-1$'s. You sample numbers from the bowl randomly without replacing. Let $k_1^{(n)}, k_2^{(n)}, ..., k_{2n}^{(n)}$ denote the random sequence of numbers from the process of sampling.

Let the partial sum process: $$S_0=0 \; , \; S_a=\sum_{i=1}^a k_i^{(n)} \; , \; 1\leq a\leq 2n$$ We can define a sequence of continuous processes by scaling time and space via Donsker's theorem

$$X_{\frac{a}{2n}}^{(n)} = \frac{S_a}{\sqrt{n}} \; , \; a=0,1,2,...,2n$$

by linearly interpolating we get

$$X_{t}^{(n)} = X^{(n)}_{\frac{\lfloor 2nt\rfloor}{n}} + \frac{nt-\lfloor nt\rfloor}{\sqrt{n}} k_{\lfloor 2nt\rfloor+1}^{(n)} \; , \; 0 \leq t\leq1$$

How can we prove that the process $X^{(n)}$ converges weakly to a constant multiple of the standard Brownian bridge?

Any individual configuration of $(S_a)_{a \leq 2n}$ has probability $\binom{2n}{n}^{-1}$, same as a simple random walk conditioned to end at $0$ after $2n$ steps. But after this I don't know what to do... — shalin, Jan 25 '18 at 03:12
Now sure if you are allowed to use some advanced results, but notice that $(S_j : 0\leq j \leq 2n)$ has the same distribution as $(\tilde{S}j : 0\leq j \leq 2n)$ conditioned to $\tilde{S}{2n} = 0$, where $\tilde{S}$ is a simple random walk on $\mathbb{Z}$. So you can apply the invariance principle for Brownian bridge. (For instance, check this paper.) — Sangchul Lee, Jan 25 '18 at 09:57

score 1 · Answer 1 · answered Jul 14 '20 at 16:22

I had this question too and wanted to share my attempt at a sketch of a proof. Despite falling short in both areas of completeness and rigor, I hope it moves in a generally correct direction; hopefully someone with greater prowess is willing to help me fill in the missing bits! (On that note, if this doesn't rise to the guidelines set forth for answers, I'll gladly remove or change to a comment.)

The goal is to use the formulation of the Brownian Bridge as the unique Gaussian process $X_t, t\in[0, 1]$ with $\mathop{\mathbf{E}}X_t=0$ and $\mathbf{Cov}[X_s,X_t] = \min(s,t)(1-max(s,t))$.

Note first that in the above for fixed $n$, the $k_i$ are identical but not independent two-point random variables with $\mathbf{P}(k_i=1)=\mathbf{P}(k_i=-1)=\frac{1}{2}\Rightarrow\mathop{\mathbf{E}}k_i=0,\mathop{\mathbf{Var}}k_i=1$ and covariance:

$$\mathbf{Cov}[k_i,k_j]=\mathop{\mathbf{E}}k_ik_j=\frac{n-1}{2n-1}-\frac{n}{2n-1}=-\frac{1}{2n-1}$$

Redefining the scaled process to be $X_\frac{a}{2n}^{(n)}=\frac{1}{\sqrt{2n}}S_a$, from linearity of the expectation we immediately have: $$\mathop{\mathbf{E}}X_\frac{a}{2n}^{(n)} = \frac{1}{\sqrt{2n}}\mathop{\mathbf{E}}\sum_{i=1}^{a}k_i=\frac{1}{\sqrt{2n}}\sum_{i=1}^{a}\mathop{\mathbf{E}}k_i=0$$

As for the covariance, letting $a\leq b$ without loss of generality: $$\mathbf{Cov}\left[X_\frac{a}{2n}^{(n)},X_\frac{b}{2n}^{(n)}\right]=\frac{1}{2n}\mathop{\mathbf{E}}\sum_{i=1}^{a}k_i\sum_{j=1}^{b}k_j=\frac{1}{2n}\left(a(1)-a(b-1)\frac{1}{2n-1}\right)=\frac{a}{2n}\left(1-\frac{b-1}{2n-1}\right)$$

We thus obtain the desired mean and covariances for the process $X_t$ by passing to the limit, if we are allowed to do so.

It also remains to be shown that the $X_t$ is Gaussian, which would follow from a suitable central limit theorem, if one exists! Note here that the $k_i^{(n)}$ are "asymptotically independent" in the sense that $\mathbf{P}(k_i^{(n)}=x|k_j^{(n)}=y) = \frac{n-\mathbb{1}(x=y)}{2n-1}\rightarrow\frac{1}{2}$ as $n\rightarrow\infty$. Whether that's sufficient for a central limit theorem I unfortunately couldn't find an answer for, but I'll keep poking around! (Note that even pure uncorrelatedness is not sufficient, e.g. this answer.)

Prove Brownian Bridge is Sampling Without Replacement

1 Answers1

Linked