How would timing attack occur on a particular code but not in another code (because of good coding practice)? Could anyone give an example? I am having trouble figuring out how timing attacks would occur based on the way the code is written.
2 Answers
TL;DR at the bottom.
The general ideas of timing attacks are the following:
- Secret data has influence on timing of software
- Attacker measures timing
- Attacker computes influence$^{-1}$ to obtain secret data
A Basic weakness: if( secret )
The base of an exploitable code sensitive to a timing attack looks like this:
if(secret)
{
do_A();
}
else
{
do_B();
}
The idea is that the time to compute A is different to the time to compute B. Thus knowing this difference, you can compute back the secret.
Let us have a look at the following code which is closer to a real world code. This is the core operation in RSA decryption: $a^d \bmod n$ with secret key $d$:
typedef unsigned long long uint64;
typedef uint32_t uint32;
/* This really wants to be done with long integers */
uint32 modexp(uint32 a, uint32 mod, const unsigned char exp[4]) {
int i,j;
uint32 r = 1;
for(i=3;i>=0;i--) {
for(j=7;j>=0;j--) {
r = ((uint64)r*r) % mod;
if((exp[i] >> j) & 1)
r = ((uint64)a*r) % mod;
}
}
return r;
}
In this code you can see that you have a if section which depend on the secret (exp[i] >> j) & 1. If the value of the secret is 1 then you will execute r = ((uint64)a*r) % mod. If the value of the secret is 0 you will execute nothing. Thus you have a difference in the time of the execution depending on the secret.
Let us improve this code a bit:
typedef unsigned long long uint64;
typedef uint32_t uint32;
/* This really wants to be done with long integers */
uint32 modexp(uint32 a, uint32 mod, const unsigned char exp[4]) {
int i,j;
uint32 r = 1, t;
for(i=3;i>=0;i--) {
for(j=7;j>=0;j--) {
r = ((uint64)r*r) % mod;
if((exp[i] >> j) & 1)
r = ((uint64)a*r) % mod;
else
t = ((uint64)a*r) % mod;
}
}
return r;
}
In this code you can think that because we do the same thing on both branch of the if statement that depend on the secret (exp[i] >> j) & 1:
- 1 assignment.
- 1 multiplication
- 1 modulo operation
The time to execute either branch will be the same, Right ?
Well... No. Because... t is a dead variable and the compiler will certainly notice it and will optimize the code. "This variable is useless, get rid of it!". The compiled code will therefore be the same as the one from before where you had your timing vulnerability.
But even if we force the compiler to not optimize the code, it is still not constant time :
- modern CPUs have branch prediction.
- and instruction cache.
Therefore we need to get rid of this branching weakness.
Removing the branches
How do we remove the branching of something like this:
if (s)
r = do_A()
else
r = do_B()
We replace it by something like this:
r = s * do_A() + (1 - s) * do_B()
Because we want fast code, we can expand s to all-one/all-zero mask and use XOR instead of addition, AND instead of multiplication.
So moving back to our Square-and-multiply:
uint32 modexp(uint32 a, uint32 mod, const unsigned char exp[4]) {
int i,j;
uint32 r = 1,t;
for(i=3;i>=0;i--) {
for(j=7;j>=0;j--) {
r = ((uint64)r*r) % mod;
t = ((uint64)a*r) % mod;
cmov(&r, &t, (exp[i] >> j) & 1);
}
}
return r;
}
where cmov (as conditionnal move) is an assembler instruction of same name (which doesn't use any prediction) or is defined as the following:
/* decision bit b has to be either 0 or 1 */
void cmov(uint32 *r, const uint32 *a, uint32 b)
{
uint32 t;
b = -b; /* Now b is either 0 or 0xffffffff */
t = (*r ^ *a) & b;
*r ^= t;
}
Another weakness: table[secret]
The idea is the following, the cache of your processor contains your table. And because everything in the cache is accessible within the same timeframe your secret is safe.
address | content
------------------
0x0001 | 8
0x0002 | 7
0x0003 | 6
0x0004 | 5
0x0005 | 4
0x0006 | 3
0x0007 | 2
0x0008 | 1
That is true... if you only have this code running. However you always have other code running at the same time on your CPU, therefore some parts of your table will be as following:
address | content
------------------
0x0001 | 8
0x0002 | XXX
0x0003 | XXX
0x0004 | 5
0x0005 | XXX
0x0006 | 3
0x0007 | 2
0x0008 | 1
And we assume that the attacker has control over which part of the cache is corrupted. Therefore if secret is 0x0001 then you will have an immediate response from the cache. However, if the secret is 0x0002, the cache is invalid. Therefore the CPU will have to reload the value from the stack / memory. This takes a longer time, because we have a timing difference... We have a possible timing attack.
The counter measures are quite complex and I won't elaborate on them.
References
A big part of this answer is from this slides: Timing Attacks and Countermeasures by Peter Schwabe at the Crypto Summer School 2016 - Croatia.
Some interesting readings:
Osvik, Shamir, Tromer, 2006: Cache Attacks and Countermeasures: the Case of AES. http://eprint.iacr.org/2005/271/
AlFardan, Paterson, 2013: Lucky Thirteen: Breaking the TLS and DTLS Record Protocols. http://www.isg.rhul.ac.uk/tls/Lucky13.html
Yarom, Falkner, 2014: FLUSH + RELOAD: a High Resolution, Low Noise, L3 Cache Side-Channel Attack. http://eprint.iacr.org/2013/448/
Benger, van de Pol, Smart, Yarom, 2014: “Ooh Aah... Just a Little Bit”: A small amount of side channel can go a long way. http://eprint.iacr.org/2014/161/
Bernstein, 2005: Cache-timing attacks on AES. http://cr.yp.to/papers.html#cachetiming
Brickell, 2011: Technologies to Improve Platform Security. http://www.chesworkshop.org/ches2011/presentations/Invited%201/CHES2011_Invited_1.pdf
Bernstein, Schwabe, 2013: A word of warning. https://cryptojedi.org/peter/data/chesrump-20130822.pdf https://cryptojedi.org/peter/data/cacheline.tar.bz2
Yarom, Genkin, Heninger, 2016: CacheBleed: A Timing Attack on OpenSSL Constant Time RSA https://ssrg.nicta.com.au/projects/TS/cachebleed/
Hamburg, 2009: Accelerating AES with Vector Permute Instructions. http://mikehamburg.com/papers/vector_aes/vector_aes.pdf
Biham, 1997: “A Fast New DES Implementation in Software.” http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi?1997/CS/CS0891
TL;DR
If you read this and skipped the whole answer, then it means that you definitively don't want to write crypto code.
If you did read the whole answer, then you probably understood why it is a bad idea to write crypto code by yourself given how many ways you can screw up.
The following example is difficult to exploit in real life, and impossible over a network, but it is simple enough to understand and extrapolate from.
Consider a piece of code on a server that checks a MAC for correctness.
int compare_mac(unsigned char *mac1, unsigned char *mac2, size_t n)
{
for (; n--; mac1++, mac2++) {
if (*mac1 != *mac2) {
return *mac1 - *mac2;
}
return 0;
}
If the MACs do not match, the function exits early. In other words, two MACs differing at the first byte will return almost immediately, while MACs differing only at one of the last bytes will need more time to compare.
Now suppose a malicious user tries to send a message without knowing the secret key. Suppose he can also time very accurately, enough to distinguish between a couple of instructions. He can simply send random MACs and time the response, and figure out each byte of the MAC with maximally 256 guesses per byte. He guesses the first byte first, then the second, etc.
In other words, he will need maximally $256n$ calls to the server to forge a MAC.
If the above coding did not have an early return, and was constant time for all inputs of length $n$, then the attacker will need $256^n$ calls to the server to forge the MAC.
Obviously, this is a huge difference.
In practice, you probably won't be able to distinguish between a few instructions, especially over a network. However, other timing leaks may be much larger, and you can get very far with the application of some basic statistical analysis.
- 1,584
- 10
- 17