I'm having hard time with the implementation of the S-Boxes by Osvik found in this paper: Speeding up Serpent. At the end of the paper, all the s-boxes are given and then, I just implement them. Here's my implementation of $S_0$ as an example :
UInt32Vector Serpent::S0(const UInt32Vector &Y)
{
UInt32Vector X = Y;
X[3] ^= X[0]; uint32_t X4 = X[1];
X[1] &= X[3]; X4 ^= X[2];
X[1] ^= X[0]; X[0] |= X[3];
X[0] ^= X4; X4 ^= X[3];
X[3] ^= X[2]; X[2] |= X[1];
X[2] ^= X4; X4 = ~X4;
X4 |= X[1]; X[1] ^= X[3];
X[1] ^= X4; X[3] |= X[0];
X[1] ^= X[3]; X4 ^= X[3];
return {X[1], X4, X[2], X[0]};
}
As you can see, this is exactly the $S_0$ from the paper. I have to precise that I checked my s-boxes 3 times to be sure there are the same as the one in the paper. Now, here's how I use $S_i$ for $i=0,\ldots,7$ :
subkeys.push_back(S[k & 7]({W[j], W[j+1], W[j+2], W[j+3]}));
where $k = 3$ and decreasing by 1 as it is mentioned in the algorithm specification.
What's confusing me is the Osvik implementation of his s-boxes compared to the ones described in his paper. This seems to be different. Moreover, in his key schedule, $i = 3$ is increasing instead of decreasing.
Now, here's my questions :
Where can I find test vectors to test my s-boxes ? I found one for the key schedule in Floppy 4 (ecb_iv.txt) from the full submission package, but nothing about the s-boxes.
Why his s-boxes implementation are different than the ones in his paper ?
Are my $S_0$ implementation and usage corrects with what I gave or did I miss something important ?
Thanks a lot for your helps.