Difference between Hardware implemented algorithm and software implemented one?

Question

In the articles about cryptography I see the words Hardware implemented and Software implemented. I'm curious to know what is the difference between them?

In the other words, even in the computer when I write a program to do a crypto algorithm, I finally run it on CPU. So why I couldn't call it hardware implemented?

What is the difference between a simple processor and a crypto processor(coprocessor?)?

score 12 · Accepted Answer · edited Aug 12 '16 at 09:59

There are typically four different settings where you want to run your crypto.

The Central Processing Unit (CPU). This may be a classic desktop or laptop CPU or the one of your embedded device. Its characteristic is that it usually has rather few computation cores ( < 20), but it can use the ones it has very fast and can execute arbitrary instructions (in mostly arbitrary order) from its assembler language. Crypto algorithms that run on CPUs are most likely software-implemented because the algorithms (i.e. the instructions) are merely information given to the CPU for execution. CPUs are best at running complex, linear algorithms.
The Graphics Processing Unit (GPU). This may be your desktops or laptops graphics processor or it may be your supercomputing computation accelerator. Its characteristic is the high number of cores, the low speed of each core and the limited instruction set. GPUs are made to run simple algorithms massively parallel. Much like CPUs they accept their algorithms as pieces of information, so it's the software that implements the algorithm.
The Field Programmable Gate Array (FPGA). The FPGA is typically found in small embedded devices and is running specialized algorithms. Its hardware can be configured post shipping, at the expense of lower speeds of operation than with ASICs. With FPGAs you change the hardware layout of your integrated circuit to run your algorithm. Hence algorithms run by FPGAs are said to be hardware implemented, because in its current state, the hardware can run only this exact algorithm, nothing else.
The Application Specific Integrated Circuit (ASIC). This is an integrated circuit that is manufactured to run exactly one algorithm, nothing else. ASICs provide high speed for this algorithm usually and are used when speed matters. An example application are Hardware Security Modules (HSMs) which commonly use ASICs to accelerate the execution of cryptographic operations (like AES encryption). Crypto processors commonly are simple processors with additional crypto-specific ASICs.

As for the security of each platform, the tendency is that if you go down this list, the security will increase. CPUs are usually occupied by many different processes (including the OS), allowing somewhat "easy" side-channel attacks, GPUs are usually not used for crypto (besides hash-cracking), FPGAs should provide more security than CPUs if done right because there less "noise" of other operations on the chip and ASICs have the same benefit, but a bit more extreme.

The speed increases similarly if you go down the list. CPUs must be capable to do many different things and so can't be too much optimized in one direction, some goes for GPUs although they have much more computation power if needed. FPGAs are faster because you can more optimizations and ASICs are king because you can do whatever you want with them and optimize the heck out of them.

The price has a similar order. CPUs are easy to obtain, cheap to program and you can get your program running quickly. GPUs are also quite easy to obtain, a bit more expensive to effectively program and can be get to run your code somewhat fast. FPGAs are more expensive by theirselves and require you to design your algorithm using the hardware language of your FPGA, requiring a lot of time and expertise and thus money. ASICs need to be planned before they are built, have long design cycles, but once you've got the design, you can manufacture them "easily" and for a "low price".

The last two points could already be observed in natura when looking at the development of Bitcoin mining. It started with CPUs on PCs, went on with GPUs, then with FPGAs and is now dominated by ASICs. Note that Bitcoin mining is equivalent to "doing many double-SHA-256 hashes".

score 3 · Answer 2 · answered Aug 16 '16 at 00:19

The terms “hardware crypto” and related terms such as “hardware-implemented crypto” are not precise technical terms. There are two common meanings.

One meaning is cryptography that leverages special-purpose CPU instructions, as opposed to using general-purpose instructions such as additions, multiplicatins, bitwise operations and so on. For example, high-end x86 CPUs have some specialized instructions called AES-NI which perform one round of the AES algorithm. ARMv8 CPUs have instructions that perform steps of an AES, SHA-1 or SHA-256 round. Some processors have dedicated instructions (or more commonly coprocessors) to work with bigints, which helps with asymmetric cryptography. These special-purpose instructions have two advantages over using general-purpose instructions: they're faster (and that also means less power consumption that the software equivalent), and they're typically more resistant to side channel attacks. Hardware cryptography is not automatically more resistant to side channel attacks, but it tends to be easier to protect, e.g. by ensuring during the chip design that the specialized instructions execute in constant time and power. Besides high-end general-purpose processors, such specialized instructions are often found on processors designed for crypto-heavy applications, such as smartcards.

Another meaning of “hardware crypto” is cryptography performed by a dedicated processor using a key that doesn't leave this processor. The reason to do that is to reduce the impact of a security breach. Even if the normal processor is compromised, the attacker won't have the key. The attacker may still be able to make requests to the crypto processor, but can limit the impact of the compromise, for example because the crypto processor can apply rate limits, may have logs that the attacker can't tamper with, etc. Because the crypto processor has a smaller attack surface, there's less of a chance that it gets compromised than the normal processor. Such a crypto processor may not be as fast as a general-purpose processor, even if it has dedicated instructions for cryptography, because they're typically designed to be low-power, tamper-resistant to some extent, and not too expensive (e.g. a \$2 smartcard or a \$10 TPM may be used as a crypto processor in association with a \$100 general-purpose CPU).

Difference between Hardware implemented algorithm and software implemented one?

2 Answers2

Linked