8

I have encountered the following two definitions of heavy-tailedness (right tail) for a $[0,\infty)$-valued random variable $X$ satisfying $\mathbb{E}[X]<\infty$:

(i) $\limsup_{x\to\infty}\frac{\mathbb{P}(X>x)}{e^{-\lambda x}}>0$ for all $\lambda>0$,

(ii) $\mathbb{E}[X-u|X>u]\to\infty$ as $u\to\infty$.

Are these two notions equivalent? If yes, how to prove equivalence? Thank you in advance!

JohnSmith
  • 1,544
  • Thanks, but the Cauchy-distribution is defined on the whole real line rather than on $[0,\infty)$. Integrability is a good point, I add this assumption to my question. – JohnSmith Jan 09 '15 at 23:00
  • Cheers, I didn't read properly. Great question btw. Deleting my comment now – Chinny84 Jan 09 '15 at 23:07
  • As an aside to my answer: Wikipedia gives a nice little overview of heavy-tailed distributions. http://en.wikipedia.org/wiki/Heavy-tailed_distribution. This would suggest that (i) is the more common definition of heavy tailed. In any case, $(i) \nLeftrightarrow (ii)$ –  Jan 10 '15 at 07:43

2 Answers2

4

We will need to show that $(i)\implies (ii)$ and $(ii) \implies (i)$


Part 1: $(i) \implies (ii)$

Lets assume that $P(X>x)$ is a smoothly decreasing function bounded below by $0$.

Assume (i): $\limsup_{x\to\infty}\frac{P(X>x)}{e^{-\lambda x}} >0 \;\forall \lambda>0.$ Since $e^{-\lambda x},P(X>x)$ are positive, monotonic decreasing functions, $(i) \implies \forall \lambda>0,\{\exists c>0:P(X>x)\geq e^{-\lambda x}\;\forall x>c\}$

Now, we also know that $P(X>y|X>u)=\mathbf{1}_{<u}(y)+\frac{\mathbf{1}_{\geq u}(y)P(X>y)}{P(X>u)}$

Let $g(u):=E[X-u|X>u]$ where $g(u)\geq 0$ by definition of the conditional probability, then the above result (combined with the fact that $X\in [0,\infty)$, means that $g(u)=\int_0^{u}\mathbf{1}_{<u}(x)dx+\frac{\int_u^{\infty}P(X>x)dx}{P(X>u)}-u=\frac{\int_u^{\infty}P(X>x)dx}{P(X>u)}$

Combining this with our implication from $(i)$ gives us:

$\forall (\lambda>0),\exists c: \left\{g(u):=\frac{\int_u^{\infty}P(X>x)dx}{P(X>u)}\geq\frac{\int_u^{\infty} e^{-\lambda x}dx}{e^{-\lambda u}}=\int_u^{\infty} e^{-\lambda (x-u)} dx=\frac{1}{\lambda}\;\forall u>c\right\}$

However, $\lim_{\lambda \to 0} \frac{1}{\lambda} = \infty$; therefore, $g(u)$exceeds any bound, hence $\lim_{u\to\infty} g(u) = \infty$

Thus $(i)\implies (ii)$


Part 2: $(ii) \implies (i)$

Now, let $X$ be an arbitrary, positive continuous, integrable random variable, as you've specified in your post. Define the shifted conditional tail expectation $E[X-u|X>u]=E[X|X>u]-u=\frac{\int_u^{\infty}x f_X(x)dx}{P(X>u)}-u:=g(u)$

Assuming $(ii)$, we know that $g(u)$ must grow without bound, hence $g'(u)=O(\frac{1}{u^p}), p\leq1$. The most stringent value for $p$ is $p=1$, as it is on the threshold of convergence. Then we get:

$g'(u)=O(u^{-1})\implies g(u)=O(\ln(u))$. For concreteness, lets take $g(u)=\ln(u)$. This implies:

$\frac{\int_u^{\infty}x f_X(x)dx}{P(X>u)}-u = \ln(u) \implies \int_u^{\infty}x f_X(x)dx=(1-F_X(u))(\ln(u)+u)=\ln(u)+u -F_X(u)\ln(u)-uF_X(u)$.

Taking the derivative wrt $u$, we get:

$-uF_X'(u)=u^{-1}+1-\frac{F_X(u)}{u}-\ln(u)F_X'(u)-uF_X'(u)-F_X(u)$

Simplifying, we get:

$F_X'(u)=(1-F_X(u))\left[\frac{1}{u\ln(u)}+\frac{1}{\ln(u)}\right]$

Suppressing the fact that the distribution is for $X$ with argument $u$ and separating differentials, we get:

$\frac{dF}{1-F}=\frac{1+u}{u\ln(u)}du$

Integrating both sides we get:

$-\ln(1-F)+A=\ln(\ln(u))-\Re(\Gamma(0,-\ln(u)))+B$

Exponentiating both sides and simplifying, we get:

$\frac{e^A}{1-F}=\frac{e^B\ln(u)}{e^{\Re(\Gamma(0,-\ln(u))})}$

A little more algebra to isolate $(1-F)$, and we get (after combining the two unknown constants of integration):

$\frac{Ce^{\Re(\Gamma(0,-\ln(u)))}}{\ln(u)}=1-F=P(X>u)$

Thus $F(x) = 1-\frac{Ce^{\Re(\Gamma(0,-\ln(u)))}}{\ln(u)}$. If we set $C=1$, this function is a valid distribution function for a random variable defined on $[1.72719,\infty)$. Its plot is shown below:

Note: This is no

Note that it crosses the $x-$axis at $x\approx 1.72719$, so the random variable's distribution function would be $P(X<x)= \mathbf{1}_{>1.72719}(x)F(x)$

Unfortunately, $P(X>x)=1-\mathbf{1}_{>1.72719}(x)F(x)$ shrinks faster than an exponential, hence it will not satisfy $(i)$. Thus, $(ii) \nRightarrow (i)$

$\square$


Therefore, it appears that $(i)$ is a stronger definition than $(ii)$, in the sense that it implies $(ii)$, which is a fact that makes sense given a distribution is heavy tailed (i.e., $(ii)$ should apply to all heavy-tailed distribution). However, $(ii)$ also applies to distributions that do not satisfy $(i)$, which is also a reasonable criteria for a heavy-tailed distribution.

The definition I've seen most often is $(i)$, and I think the above demonstration shows why that's the case. It's a difference between what properties are necessary vs sufficient. $(i)$ is sufficient while $(ii)$ appears merely necessary.

Please let me know if any of the above steps do not make sense, or, as per @Did, there are gaps or oversights...I'm not a professional "proof-writer", but this problem seemed very interesting so I wanted to give it a go, especially since you haven't received other answers.

  • 1
    Your proof of the fact that (i) does not imply (ii) is flawed since (i) assumes that some limsup is positive for every $\lambda\gt0$ and you are only assuming that some limsup is positive for some $\lambda\gt0$. Your proof that (ii) does not imply (i) is incomplete since one does not know if the formula for F you arrive at, indeed defines a CDF. – Did Jan 10 '15 at 09:37
  • @Did Thanks for pointing this out. I'll complete them when I have a moment. –  Jan 10 '15 at 13:19
  • @Did I expanded the proof. Indeed, I found that $(i)\implies (ii)$ after using the fact that $(i)$ applies to all $\lambda$. I also demonstrated that $F$ is indeed a valid distribution function for some random variable. –  Jan 11 '15 at 20:14
-1

A comment about the given "definition" of heavy-tailedness. One can define anything however one wants for local use, but the general application of heavy-tailed distributions is to model outlier-prone processes. For that purpose, the stated "definition" of heavy tailedness is too limited. After all, one can have outlier-prone processes that are bounded, e.g., mix a U(-1,1) with a U(-100, 100), with mixing probabilities .99 and .01.