1

I'm wondering what's going on here, if we consider an experiment where we generate X uniform samples from a distribution, and then take the difference between two consecutive samples from that set of samples, let's say the 9th and 10th. Now if we repeat this process N times, we can get a distribution for the difference between the 10th and 9th sample.

The resulting distribution is what? Why is it tailed to the left as well? Can someone help me explain this?

skidjoe
  • 375
  • If the distribution is uniform, then you can consider taking a circle and $11$ points on it, taking one as the beginning and end (the $0$th and $11$th) and the others as your ten samples. Symmetry/exchangeability will tell you all $11$ gaps have the same distribution, and it is particularly easy to find the distribution function and density of the first gap. It should be intuitively clear that small gaps are more likely than large gaps, since a small gap gives more possibilities for the other gaps than a large gap does. – Henry Aug 14 '22 at 21:47
  • I'm not sure I understand your answer. I'm just asking about the resulting distribution from the difference of two consecutive points from a uniform distribution. Is it a Beta, Exponential or something else? And how can you know? – skidjoe Aug 15 '22 at 14:44
  • If the distribution is uniform on $[0,1]$ you in effect have a uniform Dirichlet distribution for the $11$ gaps (including below the minimum and above the maximum of the sample). For each gap, the marginal CDF is $1-(1-x)^{-10}$ and so the marginal density $10(1-x)^9$ when $0\le x\le 1$. With a larger sample and suitable rescaling, the distribution converges towards an exponential distribution, though this is only an approximation for a finite sample: note $10(1-x)^9 \approx 10e^{-10x}$ – Henry Aug 15 '22 at 15:27
  • I just tried doing this sampling using R and it seems that you get a Normal distribution, now, I'm wondering, what happens to the resulting distribution if you sort from largest to smallest or vice versa? – skidjoe Aug 23 '22 at 16:46

1 Answers1

1

Based on comments:

If the distribution is uniform, then you can consider taking a circle and $11$ points on it, taking one as the beginning and end (the $0$th and $11$th) and the others as your ten samples. Symmetry/exchangeability will tell you all $11$ gaps have the same distribution, and it is particularly easy to find the distribution function and density of the first gap. It should be intuitively clear that small gaps are more likely than large gaps, since a small gap gives more possibilities for the other gaps than a large gap does.

If the distribution is uniform on $[0,1]$ you in effect have a uniform Dirichlet distribution for the $11$ gaps (including below the minimum and above the maximum of the sample). For each gap, the marginal CDF is $1−(1−x)^{10}$ and so the marginal density $10(1−x)^9$ when $0≤x≤1$.

This is a Beta distribution with parameters $\alpha=1$ and $\beta=10$. With a larger sample and suitable rescaling, the distribution converges towards an exponential distribution, though this is only an approximation for a finite sample: note $10(1−x)^9\approx 10e^{−10x}$.

You said in a comment "I just tried doing this sampling using R and it seems that you get a Normal distribution". That is not what should happen and the following simulation suggests my result is correct, with small values more likely than large values and a close to exponential distribution. It takes $10^5$ simulations of sample size $10$ and looks at the gap between the $9$th and $10$th values, both as a CDF and as a density. (You could look at all the gaps by plotting sims rather than sims[10,] but you would get the same sort of thing.)

Simulated data is in black, theoretical curves in red (essentially overlapping the simulated data) and approximate exponential curves in blue:

set.seed(2022)
gaps <- function(samplesize){
  diff(sort(c(0, runif(samplesize), 1)))
  }   
sims <- replicate(100000, gaps(10))

plot.ecdf(sims[10,]) curve(1-(1-x)^10, add=TRUE, col="red") curve(1-exp(-11*x), add=TRUE, col="blue")

enter image description here

plot(density(sims[10,]))
curve(10*(1-x)^9, add=TRUE, col="red")
curve(10*exp(-10*x), add=TRUE, col="blue")  

enter image description here

Henry
  • 169,616
  • I see, would the distribution be different if we examined the difference between the 8th and 9th gap? Or for instance the 5th and the 6th? It shouldn’t right?

    Regarding the circle example, can you explain a bit more? We’re taking the unit circle and then throwing in 10 dots inside of it?

    – skidjoe Aug 23 '22 at 21:36
  • All 11 gaps have the same distribution. You can check this using the same R code in the sims matrix – Henry Aug 23 '22 at 23:14
  • Take a circle of circumference $1$. Choose $11$ iid uniformly random points on the circumference: clearly the distribution of the the gaps are all the same. Cut it at one of the those points chosen uniformly and straighten it out: the remaining $10$ points correspond to your $10$ points sorted in order and so you can say the $11$ gaps each have the same distribution – Henry Aug 23 '22 at 23:17
  • Thank you for taking the time to answer my questions. One clarification: if I sort the uniform distribution then I get the beta distribution as mentioned, however, if I don’t sort then I get a normal distribution. Why is this the case? – skidjoe Aug 28 '22 at 11:03
  • @skidjoe If you do not sort them, then the difference between successive samples is like taking two samples and you can get positive or negative values. You in fact get a triangular distribution on $[-1,1]$ with a peak at $0$ so density $1-|x|$ and cdf $\frac12(1+2x-x|x|)$ when $-1 \le x \le 1$. If you took the absolute value of the difference between successive samples then the density would be $2(1-x)=2-2x$ and cdf $1-(1-x)^2 =2x-x^2$ when $0 \le x \le 1$. Neither of these are normal. – Henry Aug 28 '22 at 18:39
  • I see... my final question: how do you interpret the result in the second graph (density)? That the difference between the xth data point and the x+1st data point is usually a little greater than 0? – skidjoe Aug 29 '22 at 09:38
  • @skidjoe If you sample $n$ values, then the expected value of the gap between successive ordered values is $\frac1{n+1}$: it is more likely to be less than that than it is to be more, though could be anything from $0$ (the mode) up to $1$. – Henry Aug 29 '22 at 11:03