Calculating the Probabilities of a 100-Sided "Weighted" Dice

Question

I have always been interested in learning how I can make a custom Probability Distribution that corresponds to some particular situation (e.g. constraints).

For example - suppose I have a single dice that has 100 sides and the following conditions:

Condition 1: This dice has the highest probability of landing on 50
Condition 2: Sides closer to 50 have higher probabilities compared to sides farther away from 50 (i.e. monotonically strictly decreasing, e.g. Probability of landing on side 50 > 49 > 48 ... AND Probability of landing on side 50> 51 > 52 >...)
Condition 3: Sides at equal distance to 50 have the same probabilities (e.g. Probability of landing on side 49 = Probability of landing on side 51, Probability of landing on side 48 = Probability of landing on side 52, etc.)

My Question: I want to make a separate dice that corresponds to each one of these situations below:

Situation 1: I want to create a dice that satisfies Condition 1, Condition 2 and Condition 3. How can I define a Probability Distribution Function for this situation?
Situation 2: I want to create a dice that satisfies Condition 1, Condition 2 and Condition 3 AND the probability of landing on side 50 is given by $p_{50} = 0.5$. How can I define a Probability Distribution Function for this situation?
Situation 3: I want to create a dice that satisfies Condition 1, Condition 2 and Condition 3 AND the probability of landing on side 50 is given by $p_{50} = 0.5$ AND the probability of landing on side 49 = side 51 = 0.3. How can I define a Probability Distribution Function for this situation?

I am not sure how to solve these kinds of questions analytically. Ideally, I would be interested in defining an exact theoretical probability distribution corresponding to each situation (e.g. a multinomial distribution with certain properties).

What I tried so far: For the time being, I tried to solve this question by simulation (e.g. Situation 1). Using the R programming language, I simulated numbers from a Normal Distribution (centered around 50), truncated the results (i.e. only allowed numbers between 0 and 100), and calculated the probabilities of landing between any given ranges:

# define mean, standard deviation of a normal distribution with a large number of simulations
    mean <- 50
    sd <- 15
    n <- 100000
simulate from this normal distribution
set.seed(123)  
numbers &lt;- rnorm(n, mean, sd)


truncate the distribution (i.e. only keep numbers between 0 and 100)
numbers &lt;- ifelse(numbers &lt; 0, 0, ifelse(numbers &gt; 100, 100, numbers))


Define the intervals
min_interval <- seq(0, 99, by = 1)
max_interval <- seq(1, 100, by = 1)
count <- vector("numeric", length(min_interval))
percentage <- vector("numeric", length(min_interval))
Calculate the count and percentage of numbers in each interval
for (i in seq_along(min_interval)) {
    count[i] <- sum(numbers >= min_interval[i] & numbers < max_interval[i])
    percentage[i] <- count[i] / length(numbers) * 100
}
store results
df <- data.frame(min_interval = min_interval,
                 max_interval = max_interval,
                 count = count,
                 percentage = percentage)
#sort results
df <- df[order(-df$percentage), ]

As we can see, the results of this simulation approximately correspond to Situation 1 (Condition 2 and Condition 3 are not fully met):

# plot results
plot(density(numbers))

We can see that numbers around 50 have higher probabilities (i.e. percentage/100) compared to numbers further away from 50 (even though Condition 2 and Condition 3 are not fully met):

# view results
head(df)
   min_interval max_interval count percentage
51           50           51  2714      2.714
49           48           49  2632      2.632
50           49           50  2628      2.628
53           52           53  2626      2.626
48           47           48  2615      2.615
54           53           54  2611      2.611
tail(df)
    min_interval max_interval count percentage
95            94           95    22      0.022
3              2            3    19      0.019
100           99          100    16      0.016
4              3            4    14      0.014
98            97           98    12      0.012
99            98           99    10      0.010

I also included an optional visualization:

library(ggplot2)
ggplot(df, aes(x=min_interval, y=percentage/100)) +
    geom_bar(stat="identity", fill="steelblue") +
    labs(x="Min Interval", y="Probability", title="Approximation of Discrete Probability Distribution for Situation 1") +
    theme_minimal()
ggplot(df, aes(x=min_interval, y=percentage)) +
    geom_bar(stat="identity", fill="steelblue", color="steelblue", width=1) +
    labs(x="Min Interval", y="Percentage", title="Approximation of Discrete Probability Distribution for Situation 1") +
    theme_minimal()

But is there a way to mathematically (i.e. analytically) calculate these probabilities for Situation 1, Situation 2 and Situation 3? Can a system of equations be created corresponding to Situation 1, Situation 2 and Situation 3 alongside a set of constraints - such that these probabilities can be calculated analytically? Perhaps this can be done with a Multinomial Distribution? Maybe an Exponential Decay Function that can be used such that it passes through all points?

Thanks!

Notes:
My own previous attempt at (incorrectly) approaching a similar question How to Define a Bell Curve
Is this current question even possible? Will it require complex non-linear optimization algorithms?

You talk about "distance" and "sides" as if the die is a geometric object, but in your analysis, you simply assume that these properties reduce down to the absolute value of the difference of the face value from $50$, rather than specifying the geometry and labeling of an actual die. Then you impose a criterion that makes no sense: condition 3 would imply that the sum of the probabilities on three faces exceeds $1$. The probability of obtaining 49, 50, or 51 would be $0.5 + 0.3 + 0.3 > 1.$ — heropup, Nov 11 '23 at 17:31
@ heropup: thank you for your reply! My analysis (i.e. the R computer code) was a very crude way of solving this problem that is inexact. I was just looking for something to get started with. — stats_noob, Nov 11 '23 at 17:53
Just to clarify, I meant that: There are 3 general conditions (Condition 1, Condition 2, Condition 3). Then, there are 3 separate situations (Situation 1, Situation 2, Situation 3). I want to create a new dice for each for these 3 situations .... but each dice obeys the 3 general conditions. — stats_noob, Nov 11 '23 at 17:57
Is this an attempt to rewrite https://math.stackexchange.com/questions/4804681/how-to-define-a-bell-curve ? or https://math.stackexchange.com/questions/4805161/is-there-a-discrete-version-of-the-normal-distribution ? — Gerry Myerson, Nov 12 '23 at 02:53
@Gerry Myerson: thank you for your reply! The question you linked was my initial attempt at solving this question. This new question is a later attempt after doing more work. I am actually planning on deleting the older question you linked. — stats_noob, Nov 12 '23 at 02:55
It might be better to keep all the related questions up and each linked to the others, so people could see where you're coming from and what has already been accepted or rejected before they reinvent the wheel. — Gerry Myerson, Nov 12 '23 at 03:00
It seems to me that your "truncation" of the distribution gathered all outcomes less than $0$ into the "$0$" bucket while discarding all outcomes greater than $100$. So if there was anything to truncate, this "truncation" will unbalance the distribution slightly. — David K, Nov 12 '23 at 23:57
It also seems to me that you have a bucket for results $49\leq x<50$ and a bucket for results $50\leq x<51$ which have equal probability according to a normal distribution with mean $50$, and these buckets each have greater probability than any other bucket. So you have a symmetric discrete distribution, but it isn't centered at the discrete value $50$, it's centered between the two equally likely outcomes $49$ and $50$ with all other probabilities symmetric around that center ($p_{48}=p_{51}$, for example). This is not easy to see from the simulation because the simulation is, ahem, random. — David K, Nov 13 '23 at 00:01
@ David K: thank you so much for your comments- much appreciated. — stats_noob, Nov 13 '23 at 04:21

score 3 · Answer 1 · answered Nov 14 '23 at 22:23

3

The probabilities have to add up to $1$. Subtract any fixed probabilities e.g. $p_{50}=0.5$

The put $p_{1}=k, p_{2}=2k$ etc...

Then $2(k+2k+3k+....49k)=0.5$

$k=\frac{1}{4900}$ and make the 51, 52 etc... probabilities symmetrical with the 49,48 etc...

(the sum of numbers 1 to $n$ is $\frac{1}{2}n(n+1)$)

You can adjust this method depending on what central values are being fixed.

answered Nov 14 '23 at 22:23

John Hunter

691
4
13

@ John Hunter: thank you so much for your answer! Can your answer be used to create a multinomial distribution? Thanks! – stats_noob Nov 22 '23 at 02:35
@stats noob I don't think you can have p(50) = 0.5 and still have a binomial shaped distribution, but you could use the Binomial B(99,0.5) or B(100,0.5) probabilities, they give the right shape and have 100 different probs, that peak at 50. It's the same as the probabilities of getting k heads when flipping a coin 100 times, 50 heads is most likely, just that the peak probability won't be as high as 0.5, it's just too high – John Hunter Nov 23 '23 at 08:44

Starlight · Answer 2 · 2023-11-15T06:56:03.397

Situation 3 does not obey the laws of probability since $$p(50)+p(49)+p(51)=0.5+0.3+0.3=1.1$$

When creating a probability distribution, ensure normalization: the sum of probabilities in the sample space must be 1.

Situation 2 is asymmetrical and modelling analytically becomes more difficult: $$1<2<...<49: 49 \text{Outcomes}$$ $$5)<52<...<100: 50 \text{Outcomes}$$

With 101 outcomes, the modelling is easier:

Binomial Distribution with number of trials $=n=101$, and probability of success$=p=\frac{1}{2}$ gives a symmetric distribution with: $$p(1)<p(2)<...<p(50)<p(51), p(51)>p(52)>...>p(101)$$

Geometric Parameter Generalizing $p(51)=p, p(49)=p(52)=kp, p(48)=p(53)=k^2p$ gives: $$p(51-n)=p(51+n)=k^np$$

Summing all the probabilitites to 1 gives: $$p+k(2p)+k^2(2p)+...k^{50}(2p)=1$$

In your case $p(50)=p=0.5\Rightarrow2p=1$ $$0.5+k+k^2+...k^{50}=1$$ $$k+k^2+...k^{50}=0.5$$

This is now a geometric series with first term $k$, common ratio $k$, and 50 terms. Use the formula for the sum of a geometric series to find the value of $k$ and you are done.

@ Starlight: thank you so much for your answer! a few points I wanted to write — stats_noob, Nov 22 '23 at 02:32

Calculating the Probabilities of a 100-Sided "Weighted" Dice

simulate from this normal distribution

truncate the distribution (i.e. only keep numbers between 0 and 100)

Define the intervals

Calculate the count and percentage of numbers in each interval

store results

2 Answers2

Linked