I am trying to learn from the paper
[paper] http://proceedings.mlr.press/v139/karimireddy21a/karimireddy21a.pdf
[proofs] http://proceedings.mlr.press/v139/karimireddy21a/karimireddy21a-supp.pdf ,
where we estimate the average of vectors in the Euclidean space $\mathbf{x} \in \mathbb{R}^{d}$
and compute the upperbound for the variance of this estimation.
Let us denote the average and its expectation as
\begin{align}
(D1) ~~~ \bar{\mathbf{x}} &:= \frac{1}{|{\mathcal{G}}|}\sum_{i\in \mathcal{G}}\mathbf{x}_{i}
\\
~~~\pmb{\mu} &:= \mathbb{E}[\bar{\mathbf{x}}].
\end{align}
The cardinality of samples ${|\mathcal{G}|}$ is fixed for sampling trials,
and samples ${\mathbf{x}_{i} \in \mathbb{R}^{d}}$ are
independently and identically distributed (i.i.d) according to some
(unknown) probability density function.
We are given an assumption
\begin{align}
(A1) ~~~~~~~~~~
\mathbb{E}[\|\mathbf{x}_{i} - \mathbf{x}_{j}\|_2^{2}] \leq \rho^{2}
&&\forall i,j \in \mathcal{G}
\end{align}
that the distance between any samples is always bounded in expectation
(Definition C in the paper), and the goal is to show (Appendix D. Proof of Theorem IV)
\begin{align}
(R1) ~~~~~
\mathbb{E}[\|\bar{\mathbf{x}} - \pmb{\mu}\|_{2}^{2}] \leq \frac{\rho^{2}}{|\mathcal{G}|}
\end{align}
that the variance of such a mean estimator (D1) is upperbounded by the right-hand side quantity, using Assumption (A1).
The intuition of this inequality (R1) is that the distance boundness due to Assumption (A1) reduces the variance by factor $\frac{1}{|\mathcal{G}|}$, the cardinality of samples per trial. However, I only end up with much loose bound $\frac{\rho^{2}}{1} \geq \frac{\rho^{2}}{|\mathcal{G}|}$, so I write down my attempt below to gather advice.
\begin{align} ~&\mathbb{E}[\| \bar{\mathbf{x}} - \pmb{\mu} \|_{2}^{2}] &&...(1)\\ =~&\mathbb{E}\left[\left\| \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \mathbb{E}[\bar{\mathbf{x}}] \right\|_{2}^{2} \right] &&...(2)\\ =~&\mathbb{E}\left[\left\| \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \mathbb{E}\left[ \frac{1}{|\mathcal{G}|} \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right] \right\|_{2}^{2} \right] &&...(3)\\ =~&\mathbb{E}\left[\left\| \mathbb{E}\left[ \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \frac{1}{|\mathcal{G}|} \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right] \right\|_{2}^{2} \right] &&...(4)~\text{independence of samples}~\mathbf{x}_{i},\mathbf{x}_{j}\\ \leq~&\mathbb{E}\left[\mathbb{E}\left[ \left\| \frac{1}{|\mathcal{G}|} \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \frac{1}{|\mathcal{G}|} \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right\|_{2}^{2} \right] \right] &&...(5)~\text{Jensen's inequality for expectation}\\ =~&\mathbb{E}\left[ \left\| \frac{1}{|\mathcal{G}|} \left( \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right)\right\|_{2}^{2} \right] &&...(6)\\ =~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ \left\| \sum_{i \in \mathcal{G}}\mathbf{x}_{i} - \sum_{j \in \mathcal{G}}\mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(7)\\ =~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ \left\| (\mathbf{x}_{i,1} + ... + \mathbf{x}_{i,|\mathcal{G}|}) - (\mathbf{x}_{j,1} + ... + \mathbf{x}_{j,|\mathcal{G}|}) \right\|_{2}^{2} \right] &&...(8)~\text{verbose writing}\\ =~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ \left\| \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \left(\mathbf{x}_{i} - \mathbf{x}_{j} \right) \right\|_{2}^{2} \right] &&...(9)~\text{equivalent expression to}~(7)\\ \leq~& \frac{1}{|\mathcal{G}|^{2}} \mathbb{E}\left[ |\mathcal{G}| \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \left\| \mathbf{x}_{i} - \mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(10)~\text{Jensen's inequality}~\left\| \sum_{k=1}^{m} \mathbf{v}_{k} \right\|_{2}^{2} \leq m \sum_{k=1}^{m} \left\| \mathbf{v}_{k} \right\|_{2}^{2}\\ =~& \frac{1}{|\mathcal{G}|} \mathbb{E}\left[ \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \left\| \mathbf{x}_{i} - \mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(11)\\ =~& \frac{1}{|\mathcal{G}|} \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \mathbb{E}\left[ \left\|\mathbf{x}_{i} - \mathbf{x}_{j} \right\|_{2}^{2} \right] &&...(12)~\text{linearity of expectation}\\ \leq~& \frac{1}{|\mathcal{G}|} \sum_{i,j \in \mathcal{G}}^{|\mathcal{G}|} \rho^{2} &&...(13)~\text{using Assumption (A1)}\\ =~& \frac{|\mathcal{G}|}{|\mathcal{G}|} \rho^{2} = \rho^{2} &&...(14)~\text{worse upperbound than (R1)}. \end{align}
From (7) to (9), $\mathbf{x}_{i}, \mathbf{x}_{j}$ are random variables from the same i.i. distribution, so the indices' order for each sum does not matter to reduce to a single sum.
In the comment of (10), I multiplied $m^2$ in both sides of the Jensen's inequality given $f(\mathbf{x}) := \|\mathbf{x}\|_{2}^{2}$ and convex-combination weights $\frac{1}{m}$.
For (12) linearity of expectation, I am using \begin{align} \mathbb{E}\left[ \sum_{i=1}^{N} a_{i} X_{i} \right] = \sum_{i=1}^{N} a_{i} \mathbb{E}\left[ X_{i} \right] \end{align} from Wikipedia (https://en.wikipedia.org/wiki/Expected_value#Properties), where ${X_{i}}$ is the random variable.
How should I fix my attempt?
Thanks for your time, and I appreciate your advice.
I am barely experienced with handling expectations.