1

This question is linked to the normal distribution for a random variable. The probability density function (pdf) is expressed as:

\begin{equation} \frac{1}{\sigma\sqrt{2\pi}}\, e^{-\frac{(x - \mu)^2}{2 \sigma^2}} \end{equation}

and the cumulative distribution function (cdf) is expressed as:

\begin{equation} \frac12\left[1 + \operatorname{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}}\right)\right] \end{equation}

pdf or cdf functions might take extremely low values. When computing logarithm of pdf,

\begin{equation} - \operatorname{log}(\sigma\sqrt{2\pi}) - \frac{(x - \mu)^2}{2 \sigma^2} \end{equation}

is used directly, instead of first computing eq. 1, then taking the log.

Currently I compute the log of the cdf by first computing eq. 2, then taking its log. This creates numerical underflows in my program. I would like to compute the log directly, using a formula similar to eq. 3.

Is it possible ?

vkubicki
  • 1,954
  • I think this depends on what your numerical problem is: is it full underflows (numbers going below $2^{-1024}$) or is it roundoff errors (numbers on the order of $1$ getting added to numbers on the order of $2^{-53}$)? The solutions to these two kinds of problems are rather different. – Ian Jun 30 '15 at 15:45
  • It is a bit of both. I need to add various log. The majority of which are close to 0., but some are under 1.7E- 308, which is the limit of the double type I use to compute the cdf. When I compute the sum I get $-\infty$. – vkubicki Jun 30 '15 at 15:51
  • When adding numbers of enormously varying magnitude, it's generally a good idea to find the summand of largest magnitude, divide all the terms by that, add them, and then multiply by what you divided by. – Ian Jun 30 '15 at 16:30
  • I do this in other part of the code. The problem here is that can not even compute the value of log(cdf). – vkubicki Jun 30 '15 at 17:20
  • You can't compute individual summands? How small should your actual answer be, more or less? It seems that if you can't compute individual summands and all of your summands have the same sign then your answer might not even be a floating point representable number. In this case you would need to use a higher precision arithmetic library to do anything. – Ian Jun 30 '15 at 17:23
  • I can sum logs and get values like -100000 for example. The problem is that I must be able to distinguish between near 0 and 0 probabilities. If I can not compute log(cdf) directly, I will have to use tricks like considering a cdf inferior to 1.7E- 308 as 1.7E- 308, and then forcing log(cdf) = log(1.7E- 308) = -708 . I wanted to avoid this and compute the real value of log(cdf) directly. – vkubicki Jun 30 '15 at 17:40
  • The problem is that you have log of an integral, which is like log of a sum, and which therefore has no simple algebraic tricks for its calculation. (By contrast the log of a product, like the log of the pdf in your question, has a simple algebraic trick to help us calculate it.) About the best thing you can do with the log of a sum is to try to write it as $\log(X+x)=\log(X)+\log(1+x/x) \approx \log(X)+x/X$, where $X$ is "big" and $x$ is "small". So in estimating $\log(P(X \leq a))$, you might choose $b$ so small that $\frac{P(X \leq b)}{P(b \leq X \leq a)}$ is less than some tolerance. – Ian Jun 30 '15 at 18:26
  • (Cont.) and then replace $P(X \leq a)$ with $P(b \leq X \leq a)$. – Ian Jun 30 '15 at 18:29
  • Another idea: when your probabilities get small, approximate the logarithm using http://math.stackexchange.com/questions/28751/proof-of-upper-tail-inequality-for-standard-normal-distribution Then try to analytically see how bad your errors are. – Ian Jun 30 '15 at 18:40

0 Answers0