4

In the implementation of centered binomial distribution of crystals-kyber, the authors load 24-bits of buffer to a 32 bit integer and then they and the answer with 0x00249249. The complete operations are below:

  1. t = load24_littleendian(buf+3*i)
  2. d = t & 0x00249249
  3. d += (t>>1) & 0x00249249
  4. d += (t>>2) & 0x00249249

Can someone please explain what is actually happening here because in documentation this whole procedure is not mentioned.

1 Answers1

2

If passed uniformly random 24-bit values, this code generates 8 independent, bit-packed $\mathcal B(3,0.5)$ samples. The samples are encoded in the 24 least significant bits of $d$ with each sample represented by one of the tri-bits 000, 001, 010 or 011 according to whether they are 0, 1, 2 or 3.

To see this note that 0x249249 in binary is 1001001001001001001001 and thus masks off every third bit. If the bits of t can be treated as i.i.d. Bernoulli variables with $p=1/2$ (i.e. as uniform independent bits), the sum of any three bit positions of t will give a $\mathcal B(3,0.5)$ sample. The code divides t into three bit chunks and counts the number of bits in each chunk by initialising (in a bit parallel manner) to the value of the least significant bit in each chunk; adding one (in bit-parallel) if the next least significant bit in each chunk is set and finally adding one (in bit-parallel) if the third bit in each chunk is set.

An example might help here. Suppose we have t=011100101011000110111110 as input. To form binomial samples by hand we would divide this into chunks of three: 011 100 101 011 000 110 111 110, then count the bits in each chunk 2 1 2 2 0 2 3 2, then pack these answers as 010 001 010 010 000 010 011 010. The code achieves the same output by first masking off every third bit of t to initialise d=001000001001000000001000, it then adds on to this the value of the bits in every position that is 1 mod 3 by shifting t and masking: 01000000001000001001001 so that after adding (with carries) we have d=010000001010000001010001. Our next summand is the value of the bits in positions 2 mod 3 which we create by shifting t by 2 and masking: 0001001000000001001001 so that after adding (with carries) we have d=010001010010000010011010 as required. Note that the carries from each chunk never overflow into the next chunk up.

Daniel S
  • 29,316
  • 1
  • 33
  • 73