I'm using a random number generator to produce a huge string of random hexadecimal characters which I then cache and pull from to generate base-10 integers within a requested range. The original (flawed) steps looked like:
- Request a random base-10 integer within certain bounds
e.g.:getRandomInt(200, 250), which has a range of 250 - 200 = 50 - Determine the minimum number of hex characters required to satisfy that range
e.g.: for a range of 50, we need 2 hex chars (which covers 1-256) - Pull that many hex characters from the hexadecimal cache
e.g.: "3A" - Convert those hexadecimal characters to a base-10 integer
e.g.: 3A16 = 5810 - Use a modulus function to ensure the resulting integer is within the desired range
e.g.: 58 % 50 = 8 - Add this to the lower bound for the final result
e.g.: 200 + 8 = 208
I recently realized that this biases the results towards lower numbers for ranges that aren't evenly divisible into/by 16. e.g.: if you request a number in the range [0,11], then 016 becomes 010, A16 becomes 1010, and B16 becomes 1110, but C16 also becomes 010 again, giving you a 2/16 chance of generating a 0 but only a 1/16 chance of generating a 10.
Potential Solution?
After chatting with ChatGPT and facing how little math I math, I've modified the above Step #3 to pull 1 extra hexadecimal character. So if you request a random base-10 integer between [200,250] (a range of 50, which is satisfied by 2 hex chars), you'll no longer pull 2 hex characters but 3.
This seems to solve the issue and produce evenly distributed results for all ranges I've tested, but I can't say for certain if/why it works. I can kinda make sense of it when relating it to a random floating point number (e.g.: my intuition tells me that using hex characters to generate a random number such as 0.123456780, multiplying it by 10, and then removing the decimal portion to produce a random integer in the range [0,9] would be void of any bias), and I assume the same principle is at play here? With the principle being something along the lines of we can add some amount of excess to the end of our randomly generated number and trim it off to remove biases. But I don't know:
- if the implemented solution is actually removing biases
- if any of the above conjecture is true
- if it is true, whether it applies to the above 6 steps
- if it is true and applicable, how to determine the amount of excess which should be added to / trimmed from the end to ensure no biases are produced for a given range
- if there is a better solution completely different from my approach