Questions tagged [bitslicing]

For questions about bitsliced (SIMD-within-a-register / SWAR) implementation of ciphers and other cryptographic primitives.

Bitslicing, also known as "SIMD within a register" (SWAR), is a technique used to compute multiple evaluations of a function (such as a block cipher, a hash, etc.) in parallel on a single CPU core using bitwise logical operations (AND, OR, XOR, NOT).

In a bitsliced implementation, instead of having a single variable storing, say, a 32-bit number, one would have one variable storing the lowest bit of the number (or, rather, of $n$ numbers, where $n$ is the number of bits the CPU can store in a register), another variable storing the second lowest bit(s) of the number(s), and so on, for a total of 32 variables.

While calculating everything bit-by-bit is generally slower than just calculating things the ordinary way, an $n$-bit processor can run up to $n$ bitsliced instances of the function (e.g. to encrypt up to $n$ blocks of data) in parallel. Thus, as long as the bitsliced implementation is no more than $n$ times slower to run a single instance of the function, it will yield a net gain in throughput.

The main advantages of bitslicing are:

  • Potentially higher throughput than conventional software implementations, especially for functions designed for hardware implementation or for CPUs with a narrow register width.

  • Strong immunity to timing attacks (and potentially some resistance to other side channel attacks) due to the (necessary) elimination of data-dependent branches.

However, not all functions are well suited to bitslicing. While most basic arithmetic and logical operations bitslice well, functions that manipulate large amounts of mutable state or which require data-dependent table lookups can be difficult or impossible to bitslice.

Also, bitslicing necessarily trades off latency for throughput: while the overall time to compute $n$ parallel evaluations of a function using a bitsliced implementation may be lower than with a conventional implementation, the minimum time needed to get the results for one evaluation is almost surely higher. Thus, bitslicing is poorly suited for sequential uses like CBC encryption, where the input to one evaluation of the function depends on the output of the previous one, or for latency-critical applications where getting one result fast is important.

12 questions
28
votes
3 answers

How is bitslicing faster?

I have read a paper on Bit Slicing and Lightweight crypto but cannot understand how bitslicing makes encryption scheme faster. Please can someone explain with an example exactly how bit slicing makes the code faster (even a single xor example will…
8
votes
2 answers

How do Käsper and Schwabe's Bitsliced AES Mixcolumns work?

The only way I see it possible to do the matrix-multiplication in the MixColumns operation of AES is by shifting the bits in the multiplied number, and then reduce with the polynomial if needed. This can be done in constant time for a bitsliced…
5
votes
1 answer

Ideal way to profile/performance-measure cipher implementations

I have implemented a bit-sliced version of two ciphers (Gibbon-120 and Hanuman-120, both part of the Primates family of ciphers: http://primates.ae/wp-content/uploads/primatesv1.02.pdf ) I now want to measure the increase in effectiveness that my…
oPolo
  • 367
  • 1
  • 8
5
votes
2 answers

Converting a 5-bit s-box to its bit-sliced format

I'm currently trying to convert a 5-bit sbox (the one from this cipher: http://primates.ae/wp-content/uploads/primatesv1.02.pdf) to its bit-sliced format (i.e. to a boolean network). Most papers only address techniques for converting up to 4-bit…
oPolo
  • 367
  • 1
  • 8
4
votes
0 answers

How do byte-sliced AES implementations work?

While I understand the principle of bit-slicing, several papers mention byte-sliced AES implementations (see e.g. Homomorphic Evaluation of the AES Circuit and Fast Implementations of AES on Various Platforms). However, I don't clearly understand…
Raoul722
  • 3,003
  • 3
  • 23
  • 42
4
votes
1 answer

How can bit slicing be constant time, when Mix Columns is in the cipher

I'm reading about bit-slicing techniques, and one thing about it caught my mind. The strength with bit sliced implementations are (apart from that they are fast) that they are running in constant time. This outrules side-channel attacks based on…
4
votes
1 answer

Serpent block cipher : S0 to S7 functions unclear

I am presently implementing the serpent block cipher in C++ following the specifications. It's important to mention that I'm implementing the cipher in bitslice mode. You'll need the The full submission package of Serpent which contains the…
Gabriel L.
  • 297
  • 2
  • 9
3
votes
0 answers

Can GCM auth be bitsliced?

Is it possible to use bitslicing for GCM? This seems like a way to allow for a way to avoid cache-timing attacks without PCLMUL instructions and with good performance.
Demi
  • 4,853
  • 1
  • 22
  • 40
2
votes
1 answer

DES SBOX Output with Bitslice

I am not understanding how to compute the output bits of a 6-to-4-SBOX with bitslice technique in DES. Matthew Kwan made a brief recap in his paper "Reducing the Gate Count of Bitslice DES" of Biham original paper. He wrote: Basically, for each…
2
votes
0 answers

Bit slice format of 4x4 s-boxes?

I have studied bit-sliced implementation in SERPENT block cipher and I need to use this technique to implement some 4x4 s-boxes. I couldn't find any procedure to obtain bit-sliced format in any paper or document. I really need this method to use.…
Vahid
  • 39
  • 2
0
votes
1 answer

Getting bit length of exponentiation result before computation

I'm working in a context where it is cheaper to calculate pow(a,b) via modular exponentiation (modexp(a,b,m)) than to do exponentiation directly. For this I need the minimum modulus value such that the function returns only the exponentiation part…
riordant
  • 3
  • 3
0
votes
0 answers

Security of bitsliced AES with a transposition step omitted

Suppose you take the aes_sub_bytes function intended to operate on 4 aes blocks which were bitsliced to 8 u64's but you omit the bitslicing step beforehand. As a result you apply the sbox to the incorrect bits. But nevertheless the sbox is applied…
Lorkin
  • 1
  • 1