5

I have been reviewing literature and I am trying to determine the best method to compare ciphers in hardware. The two methods that seem to be somewhat de facto standard the literature are looking at throughput for a 100kHz clock, and looking at area in GE (gate equivalents). I take issue with these methods because for the clock method, the initial state is not given, ie: I do not know if this is measured from a loaded state. The second method I have an issue with because GE is the area of a NAND gate, and if you use a huge NAND, you look like your implementation is relatively better.

Is there a standardized method to compare ciphers on a hardware level that is not this seemingly de facto method?

Furthermore, is there a standard method to compare Feistel ciphers against a substitution-permutation network?

If someone could point me to a document that outlines a method to compare ciphers from the implementation standpoint, it would be helpful.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
b degnan
  • 5,110
  • 1
  • 27
  • 49

1 Answers1

2

Because of how many ways there are to implement a hardware design there is not a specific comparison protocol that I am aware of. The main comparison points are power, area, and throughput, and comparisons are generally done on the same process, 32nm GF SOI for example. With a given process, area can be measured in GE or mm$^2$.

The algorithm itself will usually be the limiting factor when it comes to clock speed, and this may not be an issue if the throughput is still good. Unrolling loops uses more area but can increase throughput dramatically. Constants like those in SHA2 use a large amount of area, wheras those in SHA-3 use very little.

An implementation can be tuned to be better at a specific trait, such as power usage or speed, or tuned to have a good combination of two of them at the expense of another. Throughput/area or throughput/watt then become the usual target for a given implementation.

    Area          Power         Throughput
A   1mm^2         10mw          1.00Mb/s
B   1mm^2         30mw          2.45Mb/s

C   3mm^2         10mw          2.45Mb/s
D   3mm^2         30mw          6.00Mb/s
E   3mm^2         90mw          14.7Mb/s

F   6mm^2         90mw          44.1Mb/s

Each implementation has an advantage and a cost. D has a 100% increase in throughput/area and throughput/watt over A, but at triple the area. B and C have the same throughput, but one uses less area and the other less power. For a given area, triple the power budget gives a 2.45X increase in throughput. Because the 3mm area implementation is able to radiate more heat thanks to increased surface area, it can be pushed beyond the power limits of the 1mm area implementation. Despite using 1/3 the area and 1/9 the power, A is only 14.7 times slower than E. F is 3X faster than E at the same power, with only double the area used.

So how do you compare them? Which one is the best? The high area implementation obviously has the best throughput, and the best throughput/watt, but you may not have the area to spare, or the power. Would one consider A more efficient than E or the other way around? E has 63% better throughput/watt than A, but at the same wattage A has 22% better throughput/area.

In the end the a good comparison can only be made by keeping the process a constant, then deciding which implementation is better or more efficient based on the requirements of the final product.

Richie Frame
  • 13,278
  • 1
  • 26
  • 42