1

Let $a_1, ..., a_N$ denote the alternating sequence $-1, 1, -1, 1, ..., -1, 1$ (assume $N$ is even for simplicity). Let $x_1, ..., x_N$ denote a sequence obtained by random sampling without replacement from the entire alternating sequence. Consider the random variable: \begin{equation} s_N = N^{-1/2} \sum_{n=1}^N x_n a_n . \end{equation} For finite $N$, $s_N$ is a discrete random variable and when $N$ is small it is straightforward to just write the distribution down. For example, if $N=2$, then $\mathbb{P}(s_N = -2/\sqrt{2}) = 1/2$ and $\mathbb{P}(s_N = 2/\sqrt{2}) = 1/2$. As $N$ gets large, the number of possible outcomes for $s_N$ gets very large.

Interestingly, under simulation, when $N$ is large, $s_N$ starts to look an awful lot like a like a Standard Normal random variable.

Question: Does $s_N$ have a CLT as $N \rightarrow \infty$? If so, how would one go about proving this?

UPDATE: md5 method is very nice. However I've also realized that once you write down the finite sample distribution as md5 has, this is pretty much all you need to do, since that probability distribution just so happens to be a special case of the Hypergeometric distribution, which further just so happens to satisfy the assumptions in equation 7.4 in Chapter 7 of Feller's An Introduction to Probability Theory and its Applications: Volume 1 and so the Normal limit is known. See also this question here

2 Answers2

2

Here it is not too hard to explicitly compute the distribution of $s_{2n}$. By counting the number of $x_k$ that match $a_k$, we see that for all $x\in\{0,\ldots,n\}$, $$ P(\sqrt{2n}s_{2n}=2n-4x)=\frac{\binom{n}{x}\binom{n}{n-x}}{\binom{2n}{n}}. $$ Now it suffices to apply some local limit theorem. What we need to prove is a statement of the form: for all $$ x_n=\frac{n}{2}+\frac{\sqrt{n}t}{4}+o(\sqrt{n}), $$ it holds that $P(s_{2n}=x_n)/\sqrt{n}\to e^{-t^2/2\sigma^2}/\sqrt{2\pi\sigma^2}.$ This should not be too hard, using standard estimates such as $$ \binom{n}{\frac{n}{2}+\sqrt{n}t/4+o(\sqrt{n})}2^{-n}\sim\sqrt{\frac{2}{\pi n}}e^{-t^2/2}. $$

md5
  • 2,853
  • This is a great answer, thank you, although I must admit I'm having difficulty finding the asymptotic estimate you've given at the end of your answer. You don't have a reference for it by any chance? – Colin T Bowers Feb 04 '22 at 06:46
  • @ColinTBowers - I don't have any precise reference in mind, but this is how you would get the CLT for i.i.d. random variables uniform in ${0,1}$ using a local limit theorem. In fact I haven't double checked the constants here (looking at it, maybe it should have been $\sqrt{n}t/2$ instead?). Have you tried to estimate the LHS with Stirling? – md5 Feb 04 '22 at 10:03
  • I'd mainly just checked the asymptotic approximations in Shagnik Das's note on binomial coefficients, but I'll have a go at using Stirling on Monday. Thanks again. – Colin T Bowers Feb 05 '22 at 01:09
  • I've added an answer below to give some detail to the method in your answer. It all works perfectly until the final step where we are out by a multiple of 2. Apologies, this is a bit of an imposition, but would you mind having a quick look at my answer and seeing if you can spot where I've gone wrong? I've been banging my head against the wall on this one all day. – Colin T Bowers Feb 17 '22 at 11:47
  • Actually I've just realized the distribution you've written is a special case of the Hypergeometric distribution which satisfies the conditions discussed here and so the Normal limit is known. Thanks again. – Colin T Bowers Feb 18 '22 at 04:27
0

This answer is here to flesh out the detail in the method provided by md5 in the accepted answer.

As md5 notes, given the setup: \begin{equation} \mathbb{P}(s_{2n} = -2n + 4k) = \frac{\binom{n}{k} \binom{n}{n-k}}{\binom{2n}{n}} , \quad k = 0, 1, ..., n . \end{equation} A well known result is that is also straightforward to establish via Stirling's approximation is: \begin{equation} \binom{2n}{n} = 2^{2n} \frac{1}{\sqrt{\pi n}} (1 + O(1/n)) . \end{equation} Substituting this in to the above, and noting that $\binom{n}{n-k} = \binom{n}{k}$ gives: \begin{equation} \mathbb{P}(s_{2n} = -2n + 4k) = \sqrt{\pi n} \binom{n}{k} 2^{-n} \binom{n}{k} 2^{-n} (1 + O(1/n)) . \end{equation} Another well known result sometimes referred to as the Normal approximation to the binomial means that if $d = O(\sqrt{n})$, then: \begin{equation} \binom{n}{\frac{n}{2} - d} 2^{-n} = \frac{1}{\sqrt{\pi \frac{n}{2}}} e^{-2d^2 / n} + O(1/n^{3/2}) \end{equation} Setting $k = \frac{n}{2} - d$, and using this approximation, we get: \begin{equation} f(x) = \mathbb{P}(s_{2n} = -2n + 4k) = \sqrt{\pi n} \frac{2}{\pi n}e^{-4d^2 / n} (1 + O(1/n)) . \end{equation} $x$ denotes outcomes for $s_{2n}$, so $x = -2n + 4k$, which means: \begin{equation} k = \frac{x + 2n}{4} . \end{equation} Also by our definitions $d = \frac{n}{2} - k$, so subbing in for $k$: \begin{equation} d = \frac{n}{2} - \frac{x + 2n}{4} , \end{equation} which simplifies to $d = \frac{x}{4}$. Thus: \begin{equation} -\frac{4d^2}{n} = -\frac{x^2}{2(2n)} . \end{equation} Subbing in to our probability function gives: \begin{equation} f(x) = \sqrt{\pi n} \frac{2}{\pi n}e^{-x^2 / 2(2n)} (1 + O(1/n)) . \end{equation} At this point things are looking good, because our exponent term is correct for a Normal distribution with zero mean and variance $2n$, which is exactly what we want. The problem comes when we evaluate the term in front of the exponent. We want the term in front of the exponent to be: \begin{equation} \frac{1}{\sqrt{2n} \sqrt{2 \pi}} , \end{equation} because the standard deviation is $\sqrt{2n}$. But instead we have: \begin{equation} \frac{2}{\sqrt{n} \sqrt{\pi}} . \end{equation} Everything would be correct if that $2$ was in the denominator instead of the numerator, but it isn't, and I can't see where I went wrong here.