2

Discrete Case: Assume I have $N$ data points $y_i$, all finite. I can select the interval $[\min_i(y_i), \max_i(y_i)]$, and split it into $M$ equi-distant bins. count the total number of points in each bin and divide by $N$. I can call the resulting bin values $p_i$. As far as I am aware, $p_i$ form a discrete probability distribution, since they are finite, non-negative and sum to 1.

Continuous Case: Now I have a curve $y(x)$, where $x\in[0,1]$. It may be assumed that $y$ is finite, continuous and differentiable within this interval.

Question: Does there exist a probability density $\rho(y)$ which would be the extension of the discrete probability distribution (defined above) for infinite number of points, computable for the curve $y(x)$. If yes, please tell me how it is called and how to construct it. In particular, I am interested in an extension that does not require specification of the number of bins $M$, assuming that $M\rightarrow \infty$.

Edit: Following some feedback, I have realised that the function I seek could be constructed using the following algorithm.

  1. Subdivide the curve into monotonic segments $y_i(x)$
  2. Invert each segment to get $x_i(y)$
  3. Add the segments together
  4. Normalize

The question still remains if this is indeed how this problem is addressed in practice, and if not, what is the state of the art to address this problem. Methods approximating this function are thus welcome, as long as it is clear that they are more practical than the exact solution.

  • Other than normalizing by $\int_0^1 y(x) ,\mathrm{d}x$, why not use $y(x)$ as your PDF? – Eric Towers May 29 '19 at 06:03
  • @EricTowers If I normalize $y(x)$, I would get a probability density over $x$. I need a probability density over $y$. Also, I can't just invert $y(x)$ because it is not monotonic in general. – Aleksejs Fomins May 29 '19 at 07:33

1 Answers1

1

I'm not sure I understood the question, excuse me if I answer something inappropriate.

I understand your function $y(x)$ is the same $y$ with which you construct your histogram and you are asking for the probability distribution that follows $y$, $\mathcal{P}(y)$. One way to go, only if $y(x)$ is differentiable and one-to-one, is relating the probability distributions between $x$ and $y$: $$\left| \mathcal{P}(y) dy \right|= \left|\mathcal{P}(x) dx \right| = \left| dx \right|,$$ because $\mathcal{P}(x) = 1 $ assuming you draw any value of $x$ uniformly. Then: $$ \mathcal{P}(y) = \left| \frac{dx}{dy} \right| = \frac{1}{\left| y'(x)|_{x(y)} \right| }.$$ But then you still need to invert $y(x)$, which you could do numerically.

Edit: In case your function $y(x)$ is not one-to-one, the way to go is through the CDF and then derivating it: $$ \mathcal P (y) = \frac{d\mathcal F_Y (y)}{dy}.$$ An example on how to find $\mathcal F_Y (y)$ when knowing the probability distribution $ \mathcal P (x) $ (in your case it would be uniform and $1$) and $y(x)$ is: Transformation of Random Variable $Y = X^2$

Puco4
  • 383