2

I'm trying to wrap my head around going from a seed to a SigningKey, and also obtaining a PrivateKey (encryption key). I'm using NaCl / libsodium.

I created the code below and the results are interesting. Turns out pk1.private_key and pk2.private_key match roughly 3% of the time. However public key matches 100%, all are generated starting with the same seed. What is going on here?

  • I obtain pk1 by using PrivateKey.from_seed(seed)

  • I obtain pk2 by using SigningKey(seed).to_curve25519_private_key()

Examples of mismatch (they're close but aren't equal):

# 1st byte mismatch by 0x01
seed:  f8d9e54a23971beebf2552c1a50ade6150cd051321398394f515e8d4b1ba0404
priv1: c1fd4612ee8ef24d295210a277e196e6bb4a9ae6b93f98d93f197860fe5dc048
priv2: c0fd4612ee8ef24d295210a277e196e6bb4a9ae6b93f98d93f197860fe5dc048

1st byte mismatch by 0x03

seed: d612a66f92ee2f42ab1f7ea9a712a47c815843d21fc988b1d202459f235b6410 priv1: f33d5e80bb556333e2961c9868b1dc7e548836ee56808689ca022f1a19fe86bb priv2: f03d5e80bb556333e2961c9868b1dc7e548836ee56808689ca022f1a19fe867b

1st byte mismatch by 0x01, last byte mismatch by 0x40

seed: 10b7e1c66cf08005a22289158a088e028160f892dc6c20d43025be4690aaed85 priv1: 194898f65d117579d50e80a9b7e07bd048bfd1300d55561dac9dfaed4ef02109 priv2: 184898f65d117579d50e80a9b7e07bd048bfd1300d55561dac9dfaed4ef02149

from nacl.signing import SigningKey
from nacl.public import PrivateKey, PublicKey, Box, SealedBox
from nacl.bindings import crypto_sign_SEEDBYTES
from nacl.utils import StringFixer, random

def run(debug=False):
    seed = random(crypto_sign_SEEDBYTES)
    pk1 = PrivateKey.from_seed(seed)
    pk2 = SigningKey(seed).to_curve25519_private_key()
    if debug:
        print('seed:  ', seed.hex()) 
        print('priv1: ', pk1._private_key.hex())
        print('priv2: ', pk2._private_key.hex())
        print('pub1:  ', bytes(pk1.public_key).hex())
        print('pub2:  ', bytes(pk2.public_key).hex())
    return seed, pk1, pk2

runs = 10000
private_key_match = 0
public_key_match = 0
both_match = 0

for i in range(runs):
    if i % 500 == 0:
        print(i, 'of', runs)
    seed, pk1, pk2 = run()
    x = pk1._private_key == pk2._private_key
    y = bytes(pk1.public_key) == bytes(pk2.public_key)
    if x:
        private_key_match += 1
    if y:
        public_key_match += 1
    if x and y:
        both_match += 1

print('private key match:', private_key_match)
print('public key match: ', public_key_match)
print('both match:       ', both_match)
DannyNiu
  • 10,640
  • 2
  • 27
  • 64
mkl
  • 107
  • 4

2 Answers2

4

'curve25519' (as Bernstein originally called it, now renamed X25519 or XDH25519 for clarity because the same curve in different form is used for Ed25519[ph]) requires the privatekey (multiplier) to be a multiple of 8 to avoid small-subgroup attacks, and conventionally is represented in little-endian so the low 3 bits of the first byte are forced to 0.

X25519 also requires the high bit be 0 (because its underlying field is only about $2^{255}$) and Bernstein specifies the next-to-high bit be 1 (to block an 'optimization' that would allow timing attacks); these affect the last byte in little-endian form, which you apparently didn't notice in your examples also change bb - 7b and 09 - 49.

These two (or three) clampings taken together will change a randomly-chosen value all but (100/32)% of the time, which is close enough to 3% for government work.

Ed25519 actually has the same need for its private multiplier to be a multiple of 8, but it doesn't use the privatekey directly for the multiplier; instead it runs the privatekey through a hash to expand and use for both the multiplier and a secret value that blinds the message, and internally clamps only the multiplier part, without modifying what the user sees as the private key.

Dupe Why are the lower 3 bits of curve25519/ed25519 secret keys cleared during creation?
and Curve25519 key structure
and see/compare https://datatracker.ietf.org/doc/html/rfc7748#page-8
vs https://datatracker.ietf.org/doc/html/rfc8032#section-5.1.5 .

dave_thompson_085
  • 6,523
  • 1
  • 22
  • 25
4

Mathematically, a Curve25519 private key is an element of $\mathbb{F}_{2^{255}-19}$, i.e. an integer modulo $2^{255}-19$. This can naturally be represented as an integer between $0$ and $2^{255}-19-1$, which itself can be represented using 255 bits. Since practical computers work with 8-bit bytes, the smallest practical representation is as a 32-byte string.

There is a standard format for representing Curve25519 private keys, which is described in RFC 7748 §5. This format encodes the integer between $0$ and $2^{255}-19-1$ as a 32-byte (256-bit) string in little-endian order.

A 32-byte string can represent numbers in the range $[0,2^{256}-1]$, which is larger than $[0,2^{255}-19-1]$. Also, for various reasons, not all numbers in the range $[0,2^{255}-19-1]$ are good (see Curve25519 key structure and “May the Fourth Be With You: A Microarchitectural Side Channel Attack on Several Real-World Applications of Curve25519” by Genkin et al). The constraints are:

  • The number must be between $2^{254}$ and $2^{255}-19$, so the most significant two bits of the 256-bit number must be 0 and 1.
  • The number must be a multiple of 8, so the least significant three bits must be 0.

RFC 7748 §5 specifies that when generating a private key, one must take a random (or pseudorandom, as the case may be) 32-byte string and force the five bits mentioned above to their mandatory value. An implementation of Curve25519 that takes a private key as input in the form of a 32-byte string must apply this masking before it starts doing computations on the number that the key represents.

nacl.public.PrivateKey.from_seed returns the raw 32-byte string. nacl.signing.SigningKey(seed) performs the masking, so to_curve25519_private_key exports the value in its canonical, masked form.

The masking concerns 5 bits, so given a random seed, there is a $1/2^5 = 1/32 \approx 3\%$ chance that it already has the correct value for those 5 bits.