3

The following question is from Introduction to Probability by Joe Blitzstein and Jessica Hwang:

Let $X_1, \cdots , X_n$ be i.i.d. r.v.s with mean $\mu$ and variance $\sigma^2$, and $n \geq 2$. A bootstrap sample of $X_1, \cdots , X_n$ is a sample of $n$ r.v.s $X_1^{\ast}, \cdots, X_n^{\ast}$ formed from the $X_j$ by sampling with replacement with equal probabilities. Let $\overline{X_n}^{\ast}$ denote the sample mean of the bootstrap sample: $$\overline{X_n}^{\ast} = \frac{1}{n} (X_1^{\ast}, \cdots, X_n^{\ast})$$

(a) Calculate E($X_j^{\ast}$) and Var($X_j^{\ast}$) for each $j$.

(b) Calculate E($\overline{X_n}^{\ast}|X_1, \cdots , X_n$) and Var($\overline{X_n}^{\ast}|X_1, \cdots , X_n$). Hint: Conditional on $X_1, \cdots , X_n$, the $X_j^{\ast}$ are independent, with a PMF that puts probability $\frac{1}{n}$ at each of the points $X_1, \cdots , X_n$. As a check, your answers should be random variables that are functions of $X_1, \cdots , X_n$.

(c) Calculate E($\overline{X_n}^{\ast}$) and Var($\overline{X_n}^{\ast}$).

(d) Explain intuitively why Var($\overline{X_n}$) < Var($\overline{X_n}^{\ast}$).

I thought that for part (a), the mean and variance of each bootstrap sample would be the same as the mean and variance of each i.i.d, which would be $\mu$ and $\sigma^2$ respectively. However, I have seen some answers that use Adam's Law and Eve's Law, which confuses me as I'm not sure why there is a need to condition on the original r.v.s.

Similarly, for part (b), I don't understand how the expectation and variance will be a function of the random variables $X_1, \cdots, X_n$, which in turn would make part (c) clearly to apply Adam's Law and Eve's Law to the answer of part (b).

Hector Lombard
  • 649
  • 3
  • 11

2 Answers2

1

The precise meaning of " Conditional on $X_1, \cdots , X_n$, the $X_j^{\ast}$ are independent, with a PMF that puts probability $\frac{1}{n}$ at each of the points $X_1, \cdots , X_n$." is the following: for each measurable function $f\colon\mathbb R^n\to\mathbb R$, $$ \mathbb E\left[f\left(X_1^*,\dots,X_n^*\right)\mid X_1,\dots, X_n\right] =\int f(y_1,\dots,y_n)d\mu_{X_1,\dots,X_n}(y_1)\dots d\mu_{X_1,\dots,X_n}(y_n), $$ where $\mu_{X_1,\dots,X_n}(A)=\frac 1n\sum_{i=1}^n\delta_{X_i}(A)$.

In particular, with $f(y_1,\dots,y_n)=y_j$, one finds that $$ \mathbb E\left[X_j^*\mid X_1,\dots, X_n\right]=\int y_jd\mu_{X_1,\dots,X_n}(y_j)=\frac 1n\sum_{j=1}^nX_j. $$ For the computation of conditional variance, you will need to compute terms of the form $\mathbb E\left[X_j^*X_k^*\mid X_1,\dots, X_n\right]$, $j\neq k$, for which you will apply the previous result to $f(y_1,\dots,y_n)=y_jy_k$.

Davide Giraudo
  • 181,608
0

The statement

$X^*_1,\ldots,X^*_n$ are independent when conditioned by $X=(X_1,\ldots,X_n)$

does not seem correct. More specifically, denote by $\nu_j$ the number of times where $X_j$ has been chosen. Then $$\Pr (\nu_1=k_1,\ldots,\nu_n=k_n)=\frac{1}{n^n}\frac{n!}{k_1!\ldots k_n!}$$ for $k_1+\cdots+k_n=n$ while $X_j^*=X_j\nu_j.$ Therefore $$E(e^{s_1\nu_1+\cdots+s_n\nu_n})=\left(\frac{1}{n}(e^{s_1}+\cdots+e^{s_n})\right)^n,\ \ \ \ (*)$$ $$E(e^{s_1X_1^*+\cdots+s_nX_n^*}|X)=E(e^{s_1\nu_1X_1+\cdots+s_n\nu_nX_n}|X)=\left(\frac{1}{n}(e^{s_1X_1}+\cdots+e^{s_nX_n})\right)^n$$ and this last function of $s_1,\ldots,s_n$ is not a product of a function of $s_1$ alone by a function of $s_2$ alone and so on; which means that the said independence does not take place.

I have also computed $E(X_1^*)=E(\nu_1X_1)=E(\nu_1)E(X_j)=E(X_1)$ since $\nu_1\sim B(n+1/n)$ is binomial. Also $E(\nu_1^2)= 2-\frac{1}{n}$ leading to $$\sigma^2(X_1^*)=\sigma^2(X_1)+(1-\frac{1}{n})E(X_1)^2.$$ Thus your claim that $\sigma^2(X_1^*)=\sigma^2(X_1)$ does not seem correct.

I have also computed $\sigma^2(\overline{X_n})<\sigma^2(\overline{X_n^*}),$ but the calculation is a bit too long to be displayed here. It uses (*) for computing $E(\nu_1\nu_2).$

  • It would be really incredible if the bootstrap random variables are not independent after conditioning on the sample. I don’t see how $X^*_j=\nu_j X_j$? – Andrew Aug 04 '23 at 11:26
  • In fact, I would say that by definition, $X_1^,…,X_n^\overset{\text{i.i.d}}{\sim}\mathrm{Unif}(X_1,…X_n)$ – Andrew Aug 04 '23 at 12:11
  • so clearly I do not understand what the boostrap is. Thanks. – Gérard Letac Aug 04 '23 at 12:57
  • ....since $X_1^$ was interpreted as the number of times where $X_1$ is picked up multiplied by $X_1$ itself, while it seems that $X_1^$ is only the first $X_j$ picked up. – Gérard Letac Aug 07 '23 at 06:57