I have a means of generating pairs of semiprimes of a given bit length. The process entails
Step 1. Generate a semiprime P1 of bitlength N, with factors close in length (close as defined in the answer to the following [alpha=2]: relative size of most factors of semiprimes, close?)
Step 2. Leaving intact the leading and trailing 1s, take the encapsulated bitlength of N-2 from P1 [The most significant and least significant bits are retained to assure that P2 in step 4 is the appropriate length and odd]
Step 3. Apply a hash (MD5) to the N-2 bitlength number and truncate the digest in binary format to an N-2 bitlength
Step 4. Produce P2, equivalent in length to P1, by concatenating 1+(truncated hash)+1
Step 5. Test P2 to establish whether it is a semiprime with factors close in length
Step 6. If not, discard P1 & P2 and start over
In reviewing the post I referenced previously, relative size of most factors of semiprimes, close? , I am attempting to calculate the number of paired semiprimes (P1 & P2) that exist up to a given bitlength.
From the estimate of $\pi_{1/2;2}(x)$ as $x/log(x)$
I calculated the probability that any pair of numbers up to a given length, $x$, related via the MD5 hash process above, satisfying the criteria that they are both close semiprimes at:
$1/log(x)^2$
To arrive at this I divided $x/log(x)$ by $x$ for the probability of a single semiprime and squared this to get the joint probability for both
I then multiplied this by $x$ to re-establish the population for a given bitlength,
I don't think I've got it right, however. When I generate the estimate $\pi_{1/2;2;2}(x)$ as $x/log(x)^2$, I get something which doesn't seem to jibe.
The number in parentheses below is calculated as
$$ \frac{\pi_{1/2;2;2}(x) * log(x)^2}{x} $$
and it diverges from what I would expect. Mathematics is not my day job. I wonder where I've gone wrong and what an accurate estimate should look like.
List of bitstring primes up to 2**24 generated
time elapsed: 7.843750 seconds
LIMIT(Bin) Primes Semi Close semi Close semi pair
2 2 0 0 0 (0.0000)
3 4 2 2 0 (0.0000)
4 6 6 4 1 (0.4805)
5 11 10 6 2 (0.7507)
6 18 22 9 3 (0.8108)
7 31 42 17 5 (0.9196)
8 54 82 28 10 (1.2011)
time elapsed: 59.734375 seconds
9 97 157 47 17 (1.2922)
10 172 304 89 31 (1.4545)
11 309 589 171 68 (1.9303)
12 564 1124 311 113 (1.9087)
13 1028 2186 584 197 (1.9526)
14 1900 4192 1086 325 (1.8680)
15 3512 8110 2093 586 (1.9332)
16 6542 15658 4023 1062 (1.9931)
time elapsed: 364.593750 seconds
17 12251 30253 7617 1953 (2.0689)
18 23000 58546 14597 3712 (2.2043)
19 43390 113307 27817 6765 (2.2380)
20 82025 219759 53301 12672 (2.3225)
21 155611 426180 101532 23589 (2.3832)
22 295947 827702 195376 43998 (2.4393)
23 564163 1608668 374928 82250 (2.4920)
24 1077871 3129211 721504 153318 (2.5290)
time elapsed: 423.453125 seconds
Update (6/27/2019):
In reviewing my code for the output for the last column, I erred in at least two ways. I was comparing the factor bit length rather than the factor itself in determining 'closeness'. A second, smaller issue was that I was looking, post-hash, at only distinct semiprimes.
Once I corrected for these two issues, the estimate I generated previously looks inline with my results. This may be mere coincidence as I'm not convinced my logic is sound. If anyone would care to comment, feel free. I'm including my code (python) for anyone seeking to verify my results.
LIMIT: 16777216
List of bitstring primes up to 2**24 generated
time elapsed: 7.812500 seconds
LIMIT(Bin) Primes Semi Close semi Close semi pair
2 2 0 0 0 (0.0000)
3 4 2 2 0 (0.0000)
4 6 6 4 1 (0.4805)
5 11 10 6 1 (0.3754)
6 18 22 9 1 (0.2703)
7 31 42 17 1 (0.1839)
8 54 82 28 7 (0.8408)
time elapsed: 62.281250 seconds
9 97 157 47 10 (0.7601)
10 172 304 89 14 (0.6569)
11 309 589 171 27 (0.7664)
12 564 1124 311 44 (0.7432)
13 1028 2186 584 82 (0.8128)
14 1900 4192 1086 137 (0.7874)
15 3512 8110 2093 258 (0.8511)
16 6542 15658 4023 459 (0.8614)
time elapsed: 377.125000 seconds
17 12251 30253 7617 783 (0.8295)
18 23000 58546 14597 1454 (0.8634)
19 43390 113307 27817 2655 (0.8783)
20 82025 219759 53301 4940 (0.9054)
21 155611 426180 101532 9061 (0.9155)
22 295947 827702 195376 16908 (0.9374)
23 564163 1608668 374928 31200 (0.9453)
24 1077871 3129211 721504 57728 (0.9522)
time elapsed: 435.359375 seconds
primes_bit_close_factors.py
from time import process_time
from math import sqrt
from math import floor
from math import ceil
from numpy import log2
from numpy import log
from hashlib import md5
from miller_rabin import factors
time1 = process_time()
# initialize list of bitstring segments as all '1's [TRUE]
def init_prime_bits():
for n in range(LIMIT // NUM_PER_SEGMENT):
prime_bits.append(2 ** BIT_PER_SEGMENT - 1)
# populate list of prime bitstrings up to [LIMIT]
def pop_prime_bits():
print(f"LIMIT: {LIMIT}")
for x in range(3, SUB_LIMIT, 2):
segmnt = x // NUM_PER_SEGMENT
locatn = x % NUM_PER_SEGMENT // 2
if not (prime_bits[segmnt] >> locatn & 1):
continue
for y in range(x * x, LIMIT, x * 2):
segmnt = y // NUM_PER_SEGMENT
locatn = y % NUM_PER_SEGMENT // 2
prime_bits[segmnt] &= ~(1 << locatn)
print(f"List of bitstring primes up to 2**{POWER_OF_TWO} generated")
print(f"time elapsed: {process_time()-time1:.6f} seconds")
# initialize list of prime counts to '0' for each power of two being calculated
# 5 registers included: (1) num of primes, (2) num of semiprimes, (3) subset of [2] that are close,
# (4) subset of [3] that are distinct, (5) subset of [4] that are of equal binary length
# (6) subset of [5] whose bookended hash is also semiprime
def init_prime_cnts():
for p in range(POWER_OF_TWO):
prime_cnts.append([0, 0, 0, 0, 0, 0, 0])
def output_header():
digits_hdr = "LIMIT(Bin)"
primes_hdr = "Primes"
sprimes_hdr = "Semi"
csprimes_hdr = "Close semi"
cdsprimes_hdr = "Close dist"
cedsprimes_hdr = "Equal len"
hashsprimes_hdr = "Hash semi"
hasheqsprimes_hdr = "Hash eq semi"
hashcsprimes_hdr = "Close semi pair"
print(
f"{digits_hdr:^10}{primes_hdr:^15}{sprimes_hdr:^15}{csprimes_hdr:^15}{hashcsprimes_hdr:^15}"
)
def output_body(idx):
digits = idx + 2
# count of primes <= 2**digits
primes = prime_cnts[idx][0]
# how close is (primes) to estimate
prime_est = (primes * log(2 ** digits)) / 2 ** digits
# count of semiprimes <= 2**digits
sprimes = prime_cnts[idx][1]
# how close is (sprimes) to estimate
sprime_est = (sprimes * log(2 ** digits)) / (2 ** digits * log(log(2 ** digits)))
# count of close semiprimes
close_sprimes = prime_cnts[idx][2]
# how close is (close_sprimes) to estimate -- #1
csprimes_est1 = (close_sprimes * log(2 ** digits)) / 2 ** digits
# how close is (close_sprimes) to estimate -- #2
csprimes_est2 = 0 if sprimes == 0 else (100 * close_sprimes) / sprimes
# count of close distinct semiprimes
dist_sprimes = prime_cnts[idx][3]
# count of close equal length distinct semiprimes
equalen_sprimes = prime_cnts[idx][4]
# count of hash semiprimes
hash_sprimes = prime_cnts[idx][5]
# count of hash close length semiprimes
hash_close_sprimes = prime_cnts[idx][6]
# how close is (hash_close_sprimes) to estimate
hcsprimes = (hash_close_sprimes * log(2 ** digits) * log(2 ** digits)) / (2 ** digits)
print(
# f"{digits:>10}{primes:>10} ({prime_est:.4f}){sprimes:>10} ({sprime_est:.4f}){close_sprimes:>10} ({csprimes_est1:.4f}; {csprimes_est2:.4f})"
f"{digits:>10}{primes:>15}{sprimes:>15}{close_sprimes:>15}{hash_close_sprimes:>15} ({hcsprimes:.4f})"
)
# update counters when a power of two is exceeded
prime_cnts[idx + 1][0] += prime_cnts[idx][0]
prime_cnts[idx + 1][1] += prime_cnts[idx][1]
prime_cnts[idx + 1][2] += prime_cnts[idx][2]
prime_cnts[idx + 1][3] += prime_cnts[idx][3]
prime_cnts[idx + 1][4] += prime_cnts[idx][4]
prime_cnts[idx + 1][5] += prime_cnts[idx][5]
prime_cnts[idx + 1][6] += prime_cnts[idx][6]
if digits % 8 == 0:
print(f"time elapsed: {process_time()-time1:.6f} seconds")
def outer_loop():
for n in range(0, LIMIT, 2):
segmnt = n // NUM_PER_SEGMENT
locatn = n % NUM_PER_SEGMENT // 2
if n % NUM_PER_SEGMENT == 0:
outer_loop_primes = f"{prime_bits[segmnt]:0{BIT_PER_SEGMENT}b}"[::-1]
if int(outer_loop_primes[locatn]):
outer_loop_num = n + 1
# this code implements a trick which labels the first bit in the bitstring, '1',
# as a prime and treats it for the purposes of the loops as '2'
if outer_loop_num == 1:
outer_loop_num = 2
outer_loop_idx = int(log2(outer_loop_num)) - 1
prime_cnts[outer_loop_idx][0] += 1
inner_loop(n, outer_loop_num, outer_loop_primes)
# print results when the power of two advances -- starting with 2 bits
if n != 0 and ceil(log2(n + 2)) == floor(log2(n + 2)):
output_body(int(log2(n + 2)) - 2)
def inner_loop(idx, o_loop_num, inner_loop_primes):
for p in range(idx, LIMIT, 2):
segmnt_innr = p // NUM_PER_SEGMENT
locatn_innr = p % NUM_PER_SEGMENT // 2
if p % NUM_PER_SEGMENT == 0:
inner_loop_primes = f"{prime_bits[segmnt_innr]:0{BIT_PER_SEGMENT}b}"[::-1]
if int(inner_loop_primes[locatn_innr]):
inner_loop_num = p + 1
# same trick as above, applied to the inner loop
if inner_loop_num == 1:
inner_loop_num = 2
inner_loop_prd = o_loop_num * inner_loop_num
if inner_loop_prd > LIMIT:
break
inner_loop_idx = int(log2(inner_loop_prd)) - 1
prime_cnts[inner_loop_idx][1] += 1
if inner_loop_num <= o_loop_num ** 2:
prime_cnts[inner_loop_idx][2] += 1
hash = md5(
bin(inner_loop_prd)[3:-1].encode()
).hexdigest()
hash_result = f"1{f'{int(hash,16):0128b}'[:inner_loop_idx+2]}1"
hash_sprime_factors = sorted(factors(int(hash_result, 2)))
if (len(hash_sprime_factors) == 3 and hash_sprime_factors[1] == sqrt(hash_sprime_factors[2])) or \
(len(hash_sprime_factors) == 4 and hash_sprime_factors[2] <= hash_sprime_factors[1]**2):
## print(
## f"{hash_sprime_factors[1]}:{len(bin(hash_sprime_factors[1]))-2}:{hash_sprime_factors[2]}:{len(bin(hash_sprime_factors[2]))-2}"
## )
prime_cnts[inner_loop_idx][6] += 1
def main():
init_prime_bits()
pop_prime_bits()
output_header()
init_prime_cnts()
outer_loop()
print(f"time elapsed: {process_time()-time1:.6f} seconds")
# LIMIT is restricted to a power of 2 whose exponent is divisible by 4 (e.g. 2**12 is acceptable, 2**14 is not)
POWER_OF_TWO = 24
LIMIT = 2 ** POWER_OF_TWO
SUB_LIMIT = int(LIMIT ** 0.5)
# bitstring segment length within the list is a power of 2
BIT_PER_SEGMENT = 2 ** 7
# numbers per bitstring segment are doubled as we only store odd numbers
NUM_PER_SEGMENT = (BIT_PER_SEGMENT) * 2
# list of bitstring segments to maintain prime flags
prime_bits = []
# list of counts of primes by powers of two
prime_cnts = []
if __name__ == "__main__":
main()
miller_rabin.py
from functools import reduce
def _try_composite(a, d, n, s):
if pow(a, d, n) == 1:
return False
for i in range(s):
if pow(a, 2**i * d, n) == n-1:
return False
return True # n is definitely composite
def is_prime(n, _precision_for_huge_n=16):
if n in _known_primes:
return True
if any((n % p) == 0 for p in _known_primes) or n in (0, 1):
return False
d, s = n - 1, 0
while not d % 2:
d, s = d >> 1, s + 1
# Returns exact according to http://primes.utm.edu/prove/prove2_3.html
if n < 1373653:
return not any(_try_composite(a, d, n, s) for a in (2, 3))
if n < 25326001:
return not any(_try_composite(a, d, n, s) for a in (2, 3, 5))
if n < 118670087467:
if n == 3215031751:
return False
return not any(_try_composite(a, d, n, s) for a in (2, 3, 5, 7))
if n < 2152302898747:
return not any(_try_composite(a, d, n, s) for a in (2, 3, 5, 7, 11))
if n < 3474749660383:
return not any(_try_composite(a, d, n, s) for a in (2, 3, 5, 7, 11, 13))
if n < 341550071728321:
return not any(_try_composite(a, d, n, s) for a in (2, 3, 5, 7, 11, 13, 17))
# otherwise
return not any(_try_composite(a, d, n, s)
for a in _known_primes[:_precision_for_huge_n])
def factors(n):
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(pow(n,0.5)) + 1) if not n % i)))
_known_primes = [2, 3]
_known_primes += [x for x in range(5, 1000, 2) if is_prime(x)]