0

If $u,v\in[0,1)$ are independent and uniformly distributed, then $$(x,y):=(uw,uh)$$ is uniformly distributed on the rectangle $R:=[0,w)\times[0,h)$, where $w,h\in\mathbb N$ are given.

Now, think of $R$ as being subdivided into $w$ columns and $h$ rows forming $wh$ cells. Selecting a cell $C$, consider the starp $S$ formed by $C$ together with the cells above, below, left and right of $C$.

How can we map $u,v$ to $S$ such that the resulting vector is uniformly distributed on $S$?

0xbadf00d
  • 14,208

1 Answers1

1

There are two ways, one is the way that you aim at, and the other one is very general and good to know.

  1. The cross naturally subdivides into five parts, namely left, right, top, bottom, middle. That's not convenient. Let's consider four parts, that is we take the entire vertical bar, which we split in two, and add the left part and the right part. Each of these parts is rectangular, so you can obtain them from $u$ and $v$ as you described, by stretching to the desired length and height individually. However, we have to pay attention that the total masses match. So, for the cross, compute the entire area, and then the four individual areas. This gives you four ratios $p_1,\dots,p_4\in[0,1]$ that add up to $1$, a probability mass function. Notice that $p_1=p_2$ and $p_3=p_4$, since our splits were symmetric. Thus, we can for example split $u\in[0,0.5)\cup[0,5,1)$ and $v\in[0,2p_1)\cup[2p_1,1)$. This gives you a split of the unit square into four parts, all with the correct individual total mass, matching their counterparts in the cross. The last step is to map the individual rectangles in the unit square to the corresponding target rectangles, by stretching them as you described. How do we implement the above? Let $C=R_1\cup R_2\cup R_3\cup R_4$ be the cross composed of the $R_1$, the upper half of the vertical bar, $R_2$, the lower half of the vertical bar, $R_3$ the left part, and the right part $R_4$. Let $x_1<x_2<x_3<x_4$ be the relevant $x$-coordinates, $y_1<y_3<y_4<y_5$ the relevant $y$-coordinates, and let $y_2=0.5y_1+0.5y_5$ be the split of the vertical bar. Then we have $R_1=[x_2,x_3]\times[y_2,y_5]$, $R_2=[x_2,x_3]\times[y_1,y_2)$, $R_3=[x_1,x2)\times[y_3,y_4]$ and $R_4=(x_3,x_4]\times[y_3,y_4]$. Let $w_1=x_3-x_2$ be the width of $R_{1,2}$, and $h_1=y_5-y_2=y_2-y_1$ their height. Let $w_2=x_2-x_1=x_4-x_3$ and $h_2=y_4-y_3$ be the lengths for $R_{3,4}$. Thus, the total area is $A=2w_1h_1+2w_2h_2$, and the ratios are $p_1=p_2=w_1h_1/A$, $p_3=p_4=w_2h_2/A$. Now, we set \begin{align*} (x,y)=&\unicode{120793}\{u<0.5,v<2p_1\}(x_2+2w_1u,y_2+\frac{h_1}{2p_1}v)+\\ &\unicode{120793}\{u\ge 0.5,v<2p_1\}(x_2+2w_1u,y_1+\frac{h_1}{2p_1}v)+\\ &\unicode{120793}\{u<0.5,v\ge 2p_1\}(x_1+2w_2u,y_3+\frac{h_2}{1-2p_1}v)+\\ &\unicode{120793}\{u>0.5,v\ge 2p_1\}(x_3+2w_2u,y_3+\frac{h_2}{1-2p_1}v). \end{align*} The probability to end up in $R_1$ and $R_2$ is $0.5\cdot 2p_1=p_1$ each. The probability to end up in $R_3$ and $R_4$ is $0.5\cdot (1-2p_1)=0.5\cdot 2p_3=p_3$ each, so that's good. Now, conditional to being in $R_1$ (meaning given the event $u<0.5, v<2p_1$), $u$ and $v$ are still independent and uniform. But then also the resulting coordinates are independent and uniform, because the first one only depends on $u$ and the second one only on $v$, and both are affine transformations. For the first coordinate we notice that it reaches from $x_2$ to $x_2+2w_1\cdot 0.5=x_3$, as desired. The rest follows analogously. This shows that conditional to being in $R_i$ we're uniform on $R_i$, and the probability of ending up in $R_i$ is correct. One last question remains: If $z\in C$ is uniform, is it true that $z\in R_i$ is uniform, given that $z\in R_i$? Yes, it is, and we obtain that by integration, meaning $$\mathbb P(z\in\mathcal E|z\in R_1)=\frac{\mathbb P(z\in\mathcal E\cap R_1)}{\mathbb P(z\in R_1)}=\frac{\frac{1}{A}\int_{\mathcal E\cap R_1}\mathrm d z}{\frac{1}{A}w_1h_1} =\frac{1}{w_1h_1}\int_{\mathcal E\cap R_1}\mathrm d z.$$

  2. The idea, frequently used in computer science, is that you can obtain a random variable by inverting the cumulative distribution function. In our case, let $A=\int_S\mathrm dx$ be the area of $S$, and let $F_1:\mathbb R\rightarrow[0,1]$, $x\mapsto\frac{1}{A}\int_{S\cap((-\infty,x]\times\mathbb R)}\mathrm d x'$, be the cumulative distribution function of the marginal on the first coordinate (which is not uniform). For given $x\in\mathbb R$ let $S_x=\{y\in\mathbb R:(x,y)\in S\}$ be the section of $S$ at $x$, further let $A_x=\int_{S_x}\mathrm dy$ be the length of $S_x$, and let $F_{2,x}:\mathbb R\rightarrow[0,1]$, $y\mapsto\frac{1}{A_x}\int_{S_x\cap(-\infty,y]}\mathrm{d}y'$ for all $x\in\{x'\in\mathbb R:A_{x'}>0\}$. The second distribution is the conditional probability given that the first coordinate is $x$, in the kernel sense. Notice that $F_1$ is invertible on $[x_-,x_+]$, where $x_-=\max\{x:F_1(x)=0\}$ and $x_+=\min\{x:F_1(x)=1\}$. Let $X=F_1^{-1}(u)$, using $u,v$ i.i.d. uniform on $[0,1)$. Similarly, let $Y=F_{2,X}^{-1}(v)$, then $(X,Y)$ is uniform on $S$. Clearly, this is a fairly involved construction for this problem (and we haven't even yet shown that $(X,Y)$ is measurable), but when the target distribution is something that is not as accessible as the uniform distribution on $S$, then this method might prove helpful.

With some extra effort, you can even construct a mapping that is invariant to dimensions.

EDIT I just saw that exactly this problem is also currently open, which can be found here. There, we show that $u$ is suffices to create arbitrarily many i.i.d. uniform variables in $[0,1]$.

Matija
  • 3,663
  • 1
  • 5
  • 24
  • Thank you for your answer! Two questions: (a) How does (1.) solve my problem? What you wrote about the four parts is clear to me, but how does it help for the cross? (b) Okay, that's clear to me as well, but how would I compute a uniformly distributed sample on $S$ in practice from that? – 0xbadf00d Oct 24 '22 at 18:49
  • BTW, what do you think about the following approach: Assuming that $w$ is another uniformly distributed random variable on $[0,1)$, independent of $u,v$, couldn't we use $w$ to choose one of the five rectangles uniformly and then apply the mapping described in the question to transform $(u,v)$ into a uniformly distributed sample on this rectangle? The result should be a uniformly distributed sample on $S$. Assuming I'm not wrong, are there any issues with this approach in practice (besides the fact that we need another random variable $w$)? – 0xbadf00d Oct 24 '22 at 19:32
  • I added more detail to the first part. With respect to the second part: This is used in software development, for all advanced programming languages as mentioned here, like Python, C++ and Java. – Matija Oct 24 '22 at 20:52
  • As to your suggestion, this works just fine. I would say that's the cleanest approach, since you don't have to dissect (my answer shows how messy it gets :-). That's a huge plus, in particular if someone else has to read and understand the transform. A downside is that random number generation is expensive. So, if you want to sample from your distribution a trillion times, you should maybe make do with two (and more importantly check your RNG specs). I implemented 1RSB population dynamics a while ago, in this context performance was king (since I don't own a super-computer). – Matija Oct 24 '22 at 20:59