6

Suppose $X\sim\mathcal N(\mu,\sigma^2)$. The first negative moment $\mathsf E(1/X)$ does not exist; however, we can define it in the sense of the Cauchy principal value: $$ \tag{1} \mathsf E(1/X)\overset{PV}{=}\frac{\sqrt{2}}{\sigma}\,\mathcal{D}\left(\frac{\mu}{\sqrt{2}\sigma}\right), $$ where $\mathcal{D}(z)=e^{-z^{2}}\int_{0}^{z}e^{t^{2}}\,\mathrm{d}t$ is the Dawson integral. The nonexitence of the $\mathsf E(1/X)$ manifests itself in the sample mean as $$ (\overline{1/X})_n=\frac{1}{n}\sum_{k=1}^n\frac{1}{X_k} $$ never settles down to any particular value as $n$ increases ($1/X$ is in the domain of attraction of the Cauchy law and therefore does not abide by the CLT). For example, consider the running mean generated from sampling $X\sim\mathcal N(1,1)$: enter image description here Because $|\sigma/\mu|$ is relatively large we regularly observe $X$ near zero and our sample mean never converges. However, in practice, this behavior is not always observed. Consider the same experiment except with sampling taken from $X\sim\mathcal N(6,1)$: enter image description here We see the sample mean does settle down with increasing $n$. Moreover, the value the sample mean is approaching is the principal value moment $(1)$. In theory, this behavior is just an artifact of finite sampling in that if we continue increasing $n$ we should eventually observe values of $X$ close to zero and our sample mean will be disrupted. However, no matter how large I make $n$, I never actually observe these values in practice due to the relatively small value of $|\sigma/\mu|$.

Does this make $(1)$ a useful analytical expression for the expected value $\mathsf E(1/X)$ so long as $|\sigma/\mu|$ is small? If so, what would be a coherent theoretical justification for such a statement?

For example, if $|\sigma/\mu|$ is small, we may not observe the set $\{X|\epsilon>|X|\}$ in practice and since $\mathsf E(1/X|\epsilon\leq |X|)$ exists, our sample mean is well behaved. But why should the sample mean converge to $(1)$ in this case?

Edit:

From my previous question we can further define the higher order moments $\mathsf E(1/X^m)$ for $m\in\Bbb N$ with the use of a generating function via: $$ \tag{2} \mathsf E(1/X^m):=\frac{\sqrt 2}{\sigma (m-1)!}\partial_t^{m-1}\mathcal D\left(\frac{\mu-t}{\sqrt 2 \sigma}\right)\bigg|_{t=0}. $$ As before, the sample moments $(\overline{1/X^m})_n=\frac{1}{n}\sum_{k=1}^nX_k^{-m}$ will agree with the "regularized" moments $(2)$ whenever $|\sigma/\mu|$ is small. But why?

  • It seems confusing to keep saying "the sample mean converges" when you know very well that it does not. Isn't it just that when $\sigma/\mu$ is large, the sample means diverge more slowly, slowly enough that your tests are too short to observe it? There should be results about the rate of divergence., – Nate Eldredge May 11 '22 at 17:06
  • 1
    If you condition on $|X| \ge \epsilon$, you've effectively clipped out a small symmetric neighborhood of 0 from the density function. And isn't that exactly the definition of the principal value integral? – Nate Eldredge May 11 '22 at 17:16
  • @NateEldredge In theory it doesn't converge. I agree. I think the issue may lie in the model $X\sim\mathcal N(\mu,\sigma)$ in practical situations...we just don't observe $X$ out in the extreme tails of the distribution when simulating data or modeling real random processes with the normal. Maybe that's the answer but I don't know. Also, I can think of several real world examples where parameters are estimated with estimators that have undefined moments under a specified theoretical model. But it's useful regardless... – Aaron Hendrickson May 11 '22 at 17:24
  • @NateEldredge ...The question is, how can we assign analytical expressions for moments in these cases. Maybe the divergence from the normal model is just a deficiency of the model itself. – Aaron Hendrickson May 11 '22 at 17:24
  • @NateEldredge M. H. QUENOUILLE, NOTES ON BIAS IN ESTIMATION, Biometrika, Volume 43, Issue 3-4, December 1956, Pages 353–360. This paper is a good example of what I'm talking about. – Aaron Hendrickson May 11 '22 at 17:35
  • 1
    @AaronHendrickson On that vein: several concentration results exist when one uses the trimmed mean to estimate a centrality parameter (even in heavy-tailed cases). Perhaps using these ideas one can arrive at a concentration result for the trimmed mean of $1/X$ concentrating around the PV of $E(1/X)$? Once you realise that in practice the sample mean of $1/X$ is a trimmed mean, you get concentration around the PV. – Jose Avilez May 11 '22 at 17:37
  • @JoseAvilez This may be going in the direction I'm looking for. Could you elaborate more? In an example I'm particularly interested in, I have noisy sensor data that only takes values in a finite range yet it's very well modeled by the normal and $1/\mu$ is a parameter of interest. So long as my signal is high (large $\mu$) it seems like the PV of $1/X$ does a much better job of describing my estimator mean than Taylor series and I'm really trying to understand why. – Aaron Hendrickson May 11 '22 at 17:49
  • @AaronHendrickson Unfortunately, I don't know of any specific results that may be useful in your situation. The estimator I was thinking about was this estimator for the centrality parameter in a Cauchy distribution. The bigger $\mu / \sigma$ is, the smaller you can pick $\alpha$ and maintain a suitable estimator. A small $\alpha$ requires a lot of data before you start dropping observations. – Jose Avilez May 11 '22 at 18:09
  • As a rough indication why you don't see the big sample that blows up the mean: the $n$th sample will move the sample mean by a unit amount if $|X_n| < 1/n$. This has probability about $f_X(0)/n$. So the expected number of times this happens within $N$ samples is $\sum_{n=1}^N \frac{1}{n} f_X(0) \approx f_X(0) \log N$. For $X \sim N(6,1)$, we have $f_X(0) \approx 10^{-8}$. Thus if you want to see this happen 1 time on average, you need to have about $e^{10^8}$ samples. I guess you can see why the $10^5$ in your graph wasn't nearly enough. – Nate Eldredge May 11 '22 at 18:21
  • Even if you just want to see a movement of $0.01$ to make it visible on the graph you drew, that still needs $e^{10^6}$ samples (the first few don't count since my approximations are not good for small $n$). Something like $10^{430000}$. – Nate Eldredge May 11 '22 at 18:26

1 Answers1

1

If $X \sim N(\mu,\sigma)$, then $X \in (\mu - k\sigma, \mu+k\sigma)$ with probability $\Phi(k)-\Phi(-k)$, which for $k>5$ is more than $99.9999 \%$. This means that it is very unlikely, even in big samples, to observe values outside of this interval, which would also exclude values close to $0$ if $0\notin (\mu-k\sigma,\mu+k\sigma)$.

If we rather than sample from $X$ choose to sample from $X|X\in (\mu-k\sigma,\mu+k\sigma)$ (which in practice is equivalent for large $k$), then we would expect, that $$\frac1n \sum_{i=1}^n g(X_i) \rightarrow \frac{1}{\Phi(k)-\Phi(-k)} \frac{1}{\sqrt{2\pi}\sigma}\int_{\mu-k\sigma}^{\mu+k\sigma}g(x)e^{(x-\mu)^2/(2\sigma^2)} \:dx$$ for any $g \in L^1((\mu-k\sigma,\mu+k\sigma))$. So to answer your question, we must see, why $ \frac{1}{\sqrt{2\pi}\sigma}\int_{\mu-k\sigma}^{\mu+k\sigma}\frac1x e^{(x-\mu)^2/(2\sigma^2)} \:dx$ is close to the principal value of $\frac{1}{\sqrt{2\pi}\sigma}\int_{-\infty}^{\infty}\frac1x e^{(x-\mu)^2/(2\sigma^2)} \:dx$. Clearly the tails are not important, so what really matters is the principal value of $\frac{1}{\sqrt{2\pi}\sigma}\int_{-\epsilon}^{\epsilon}\frac1x e^{(x-\mu)^2/(2\sigma^2)} \:dx$ for some $\epsilon > 0$. Using the laurent series representation of $\frac1x e^{(x-\mu)^2/(2\sigma^2)}$ we can write $$\frac1x e^{(x-\mu)^2/(2\sigma^2)}=\frac{e^{-\mu^2/(2\sigma^2)}}{x} + \frac{\mu e^{-\mu^2/(2\sigma^2) }}{\sigma^2}+ O(|x|)$$ and we can thus see, that $$\int_{-\epsilon}^\epsilon \frac1x e^{(x-\mu)^2/(2\sigma^2)} \: dx \overset{PV}= 2\epsilon \frac{\mu e^{-\mu^2/(2\sigma^2) }}{\sigma^2} + O(\epsilon^2)$$ which goes to $0$ as $\epsilon$ goes to $0$.

  • Remark: I did not see the edit about higher order moments, when i started writing the answer, so it is not included. I suspect that you can make similar arguments here. Though you would need to argue that the regularized moments agree with $\frac{1}{\sqrt{2\pi}\sigma}\int_{\mu-k\sigma}^{\mu+k\sigma} \frac{1}{x^k} e^{-(x-\mu)^2/(2\sigma^2)}:dx$. – Leander Tilsted Kristensen May 11 '22 at 18:50