4

I am trying to learn from the paper
[paper] http://proceedings.mlr.press/v139/karimireddy21a/karimireddy21a.pdf
[proofs] http://proceedings.mlr.press/v139/karimireddy21a/karimireddy21a-supp.pdf ,
where we estimate the average of vectors in the Euclidean space $\mathbf{x} \in \mathbb{R}^{d}$
and compute the upperbound for the variance of this estimation.

Let us denote the average and its expectation as \begin{align} (D1) ~~~ \bar{\mathbf{x}} &:= \frac{1}{|{\mathcal{G}}|}\sum_{i\in \mathcal{G}}\mathbf{x}_{i} \\ ~~~\pmb{\mu} &:= \mathbb{E}[\bar{\mathbf{x}}]. \end{align} The cardinality of samples ${|\mathcal{G}|}$ is fixed for sampling trials, and samples ${\mathbf{x}_{i} \in \mathbb{R}^{d}}$ are
independently and identically distributed (i.i.d) according to some (unknown) probability density function.

We are given an assumption
\begin{align} (A1) ~~~~~~~~~~ \mathbb{E}[\|\mathbf{x}_{i} - \mathbf{x}_{j}\|_2^{2}] \leq \rho^{2} &&\forall i,j \in \mathcal{G} \end{align} that the distance between any samples is always bounded in expectation (Definition C in the paper), and the goal is to show (Appendix D. Proof of Theorem IV) \begin{align} (R1) ~~~~~ \mathbb{E}[\|\bar{\mathbf{x}} - \pmb{\mu}\|_{2}^{2}] \leq \frac{\rho^{2}}{|\mathcal{G}|} \end{align} that the variance of such a mean estimator (D1) is upperbounded by the right-hand side quantity, using Assumption (A1). The intuition of this inequality (R1) is that the distance boundness due to Assumption (A1) reduces the variance by factor $\frac{1}{|\mathcal{G}|}$, the cardinality of samples per trial. However, I only end up with much loose bound $\frac{\rho^{2}}{1} \geq \frac{\rho^{2}}{|\mathcal{G}|}$, so I write down my attempt below to gather advice.

\begin{align} ~&\mathbb{E}[\| \bar{\mathbf{x}} - \pmb{\mu} \|_{2}^{2}] &&...(1)\\ =~&\mathbb{E}\left[\left\| \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \mathbb{E}[\bar{\mathbf{x}}] \right\|_{2}^{2} \right] &&...(2)\\ =~&\mathbb{E}\left[\left\| \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \mathbb{E}\left[ \frac{1}{|\mathcal{G}|} \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right] \right\|_{2}^{2} \right] &&...(3)\\ =~&\mathbb{E}\left[\left\| \mathbb{E}\left[ \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \frac{1}{|\mathcal{G}|} \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right] \right\|_{2}^{2} \right] &&...(4)~\text{independence of samples}~\mathbf{x}_{i},\mathbf{x}_{j}\\ \leq~&\mathbb{E}\left[\mathbb{E}\left[ \left\| \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \frac{1}{|\mathcal{G}|} \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right\|_{2}^{2} \right] \right] &&...(5)~\text{Jensen's inequality for expectation}\\ =~&\mathbb{E}\left[ \left\| \frac{1}{|\mathcal{G}|} \left( \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right)\right\|_{2}^{2} \right] &&...(6)\\ =~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ \left\| \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(7)\\ =~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ \left\| (\mathbf{x}_{i,1} + ... + \mathbf{x}_{i,|\mathcal{G}|}) - (\mathbf{x}_{j,1} + ... + \mathbf{x}_{j,|\mathcal{G}|}) \right\|_{2}^{2} \right] &&...(8)~\text{verbose writing}\\ =~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ \left\| \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \left(\mathbf{x}_{i} - \mathbf{x}_{j} \right) \right\|_{2}^{2} \right] &&...(9)~\text{equivalent expression to}~(7)\\ \leq~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ |\mathcal{G}| \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \left\| \mathbf{x}_{i} - \mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(10)~\text{Jensen's inequality}~\left\| \sum_{k=1}^{m} \mathbf{v}_{k} \right\|_{2}^{2} \leq m \sum_{k=1}^{m} \left\| \mathbf{v}_{k} \right\|_{2}^{2}\\ =~& \frac{1}{|\mathcal{G}|} \mathbb{E}\left[ \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \left\| \mathbf{x}_{i} - \mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(11)\\ =~& \frac{1}{|\mathcal{G}|} \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \mathbb{E}\left[ \left\|\mathbf{x}_{i} - \mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(12)~\text{linearity of expectation}\\ \leq~& \frac{1}{|\mathcal{G}|} \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \rho^{2} &&...(13)~\text{using Assumption (A1)}\\ =~& \frac{|\mathcal{G}|}{|\mathcal{G}|} \rho^{2} = \rho^{2} &&...(14)~\text{worse upperbound than (R1)}. \end{align}

From (7) to (9), $\mathbf{x}_{i}, \mathbf{x}_{j}$ are random variables from the same i.i. distribution, so the indices' order for each sum does not matter to reduce to a single sum.

In the comment of (10), I multiplied $m^2$ in both sides of the Jensen's inequality given $f(\mathbf{x}) := \|\mathbf{x}\|_{2}^{2}$ and convex-combination weights $\frac{1}{m}$.

For (12) linearity of expectation, I am using \begin{align} \mathbb{E}\left[ \sum_{i=1}^{N} a_{i} X_{i} \right] = \sum_{i=1}^{N} a_{i} \mathbb{E}\left[ X_{i} \right] \end{align} from Wikipedia (https://en.wikipedia.org/wiki/Expected_value#Properties), where ${X_{i}}$ is the random variable.

How should I fix my attempt?
Thanks for your time, and I appreciate your advice.
I am barely experienced with handling expectations.

Neustart
  • 321

1 Answers1

1

For $i \neq j$, as $X_i, X_j$ are independent and identically distributed, we have: $$ \mathbb{E}[\lVert X_i - X_j \rVert_2^2] = \mathbb{E}[\lVert X_i - \mu \rVert_2^2] + \mathbb{E}[\lVert X_j - \mu \rVert_2^2] = 2 \mathbb{E}[\lVert X_i - \mu \rVert_2^2] $$ Thus, by (A1) condition, we must have $$ \mathbb{E}[\lVert X_i - \mu \rVert^2_2] \le \dfrac{\rho^2}{2} $$ For simplicity, I denote $n = \vert \mathcal{G}\vert$. Now, $$ \begin{align*} \mathbb{E}[\lVert \bar{X} - \mu \rVert_2^2] &= \mathbb{E}\left[\left\lVert \dfrac{1}{n}\sum_{i = 1}^n (X_i - \mu)\right\rVert_2^2\right]\\ &= \dfrac{1}{n^2}\mathbb{E}\left[\left\lVert\sum_{i = 1}^n (X_i - \mu)\right\rVert_2^2 \right]\\ &\color{red}{\boldsymbol{=}} \dfrac{1}{n^2}\sum_{i = 1}^n \mathbb{E}[\lVert X_i - \mu \rVert_2^2] = \dfrac{\mathbb{E}[\lVert X_1 - \mu \rVert_2^2]}{n} \le \dfrac{\rho^2}{2n} \end{align*} $$ So, $$ \mathbb{E}[\lVert \bar{X} - \mu \rVert_2^2] \le \dfrac{\rho^2}{2\vert \mathcal{G} \vert} < \dfrac{\rho^2}{\vert \mathcal{G}\vert} $$


Explain the third equality $\color{red}{\boldsymbol{=}}$, we have: $$ \begin{align*} \mathbb{E}\left[\left\lVert\sum_{i = 1}^n (X_i - \mu)\right\rVert_2^2 \right] &= \mathbb{E}\left[\left(\sum_{i = 1}^n (X_i - \mu)\right)^T \left(\sum_{j = 1}^n (X_j - \mu)\right)\right] \\ &= \sum_{i = 1}^n \sum_{j = 1}^n \mathbb{E}[(X_i - \mu)^T(X_j - \mu)] \end{align*} $$ If $i \neq j$, by independence, we have $$ \mathbb{E}[(X_i - \mu)^T(X_j - \mu)] = [\mathbb{E}(X_i - \mu)]^T \mathbb{E}(X_j - \mu) = 0 $$ Otherwise, $$ \mathbb{E}[(X_i - \mu)^T(X_i - \mu)] = \mathbb{E}\lVert X_i - \mu \rVert_2^2 $$

Therefore, $$ \mathbb{E}\left[\left\lVert\sum_{i = 1}^n (X_i - \mu)\right\rVert_2^2 \right] = \sum_{i = 1}^n \mathbb{E}\lVert X_i - \mu \rVert_2^2 $$

  • How does this equality ${\dfrac{1}{n^2}\mathbb{E}\left[\left\lVert\sum_{i = 1}^n (X_i - \mu)\right\rVert_2^2 \right] = \dfrac{1}{n^2}\sum_{i = 1}^n \mathbb{E}[\lVert X_i - \mu \rVert_2^2]}$ hold? – Neustart May 10 '24 at 04:44
  • 1
    @fordicus I have updated my answer explaining this. – Thành Nguyễn May 10 '24 at 05:11