Timing-Safety in JVM-Languages

Question

How is it possible to write timing-safe code in JVM-languages (Java, Scala, Clojure...)? Is it possible to make libraries like BouncyCastle safe against timing-attacks?

I know that even in C it is very hard to get those things right – and in C you can take a look at the resulting assembly-code and at least predict how your target-CPU will execute it.

In JVM-languages, you don’t know how the JVM will translate your code to machine-code or which runtime-optimizations will be performed.

Edit: I’m not really worried about local timing-attacks but about remotely-exploitable ones.

fgrieu · Accepted Answer · 2017-07-06T05:55:08.177

The theory is: don't try to write timing-safe code in JVM-languages or other essentially-interpreted-but-perhaps-sometime-compiled languages; rather

Use timing-safe libraries called from the comfort of the JVM-language. Typical example: in a JavaCard that passed Common Criteria evaluation to EAL5+ with AVA_VAN.5 augmentation, it is a safe bet that AES (when available) is timing-safe.
Avoid needing timing-safety in the first place when possible. Typical example: rather than attempting to compare an alleged 16-byte value to a reference 16-byte value in constant time, draw a fresh AES key, encipher the alleged and reference value with this AES key (with a constant-time AES implementation), then compare the results using any method at hand.
As a variant of the above, use countermeasures that make timing attacks less efficient, like blinding.

The above tends to work also against side-channel attacks beyond timing, e.g. power analysis.

When market pressure dictates otherwise, options are to make as few as possible relatively safe bets; and check them carefully when one has the luxury of knowing the actual targets. For example, when writing

// compare two byte arrays so that timing does not leak where a difference lies
public static boolean myEqualByteArrays[(byte[] a, byte[] b) {
    int j = a.length;
    if (j != b.length)
        return false; // arrays are not the same length
    int r = 0; // will stay zero until a difference is found
    while (--j>=0)
        r |= a[j] ^ b[j];
    return r==0; // true if and only if the arrays match
}

it is only assumed that

binary operations ^ and | are constant-time (& and unary ~ probably are, too, though we don't need that in this example);
loops won't be exited even when deep analysis of what follows shows that they can.

So far, if I have met anything breaking these assumption, I did not realize it!

Example of things that can't safely be assumed constant-time:

Any use of if, switch..
short-circuit operators && and ||
Logical not !
Array access (e.g. due to cache).
Integer multiplication (including by a constant): some CPUs have multiplication time dependent on one argument; examples include this and this 32-bit CPUs (kudos to the manufacturer for documenting it).
Division, of course (including by a constant).
Operations on larger data types than supported by the combination of the compiler, runtime, and hardware: conceivably, some internal carry might cause a data-dependent timing dependency even in + or -, e.g. of some 64-bit type internally using 32-bit variables. This a bit less to fear for variable addition than increment, which could be implemented or automagically over-optimized at runtime into: increment low half, and if not zero increment high half.
Shift: timing often depends on the shift count.
Right-shift by a constant shift: that can depend on the sign or high-order bit of the shifted argument (that is less likely with unsigned types; and in Java, it is less likely with the unsigned >>> operator than it is with the signed shift operator >>).
The selection operator ?: ; however, in Java, restricting to built-in signed 2's-complement 32-bit type int, a substitute for d = (a==0) ? b : c; might be
```
// compute the OR of all bits in  a  into the low-order bit of  d
d = (a >>> 1) | a;
d = (d >>> 2) | d;
d = (d >>> 4) | d;
d = (d >>> 8) | d;
d = (d >>>16) | d;
d = -(d&1); // duplicate the low-order bit of  d  over the whole  d
d = ((b^c)&d)^b; // select  b  iff a==0,  c  otherwise
```
and that technique could be extended to, say, constant-time point multiplication on an elliptic-curve over some binary field, if efficiency is truly secondary.

score 18 · Answer 2 · answered Jul 05 '17 at 18:08

Writing constant-time cryptographic code is certainly possible in Java or similar languages (e.g. C#). However you have to do it properly.

"Constant-time" here means that the observable time-related behaviour does not depend upon secret data. It does not mean that execution time is always the same, but only that the variations are not correlated with the secret data elements.

Some sort-of constant-time implementation strategy, often called "microarchitecture defences", tend not to work well with Java. These strategies are mostly about hitting all relevant cache lines. The Java JIT compiler and the GC make it very difficult, if not impossible, to make sure that any piece of data is at a definite emplacement relatively to cache lines. Microarchitecture defences are already quite fragile when implemented in low-level languages like C (they tend to lose their properties when the hardware vendor decides to change something in their implementation); in Java they are mostly unusable.

On the other hand, "true" constant-time is totally possible, in about the same way as it is done in C. By "true" constant-time, I mean that:

There is no conditional jump, whose condition depends on secret data.
There is no memory access whose address depends on secret data.
Opcodes whose execution time depends on secret data are not used.

You may want to have a look at how such things are done in BearSSL. BearSSL is written in C (mostly), but everything explained in that page works similarly in Java.

My C code uses uint32_t to hold boolean flags, with the convention that "true" is 1, and "false" is 0. It is important here not to use boolean values, to dissuade the compiler (C or Java JIT) from using conditional jumps. In Java, you would then use int, again with 1 and 0. Alternatively, you might want to use -1 for "true", because it is an all-one pattern that is convenient for bit masks; one good thing in Java is that it guarantees modular arithmetics, and its >> operator performs sign extension.

Paul Uszak · Answer 3 · 2017-07-05T22:08:53.670

Not in the least. Forget it.

This is written from my experience which is with Java, but all JVM languages will have similar insurmountable problems. There are issues with compile time and run time optimisations that make the byte code almost impossible to predict. And the optimisations majorly and subtly change with each major /minor release. You'd have to lock your code to a particular minor version of Java, and if it changed, all your time calculations might be thrown out.

Unfortunately, optimisation management is the easy part for this. The real problem that makes this impossible is a combination of dynamic threading and garbage collection. C doesn't have these naively so you can manage them. Java does it autonomously.

There are at least four different garbage collectors that have features suited to different applications. All of them kick off unpredictably. That's the result of hardware abstraction which is the primary purpose of JVM based languages. This is why Java applications can suddenly freeze on you as the JVM chucks out unused objects.

Threading is also dynamically allocated. That means that you can't deterministically predict which thread is currently executing within a CPU core. And there may be six threads running in parallel when you simply fire up a JVM to execute a "-version" command. Indeed, this very unpredictability is exploited in the seeding of Java's SecureRandom cryptographic RNG. It's true to say that it's chaotic inside a JVM.

And remember what a JVM is sat on. An fully multi threaded operating system with both hard and soft interrupts. These just confound the number of threads actually being executed at any one time. The consequence of all this chaotic behaviour is that you cannot with any level of certainty predict how long a for /next loop in Bouncy Castle will actually take to complete.

The good test is a jet engine. When Rolls Royce starts shipping turbofans with ECUs running Java then you'll know it's safe to write time critical code in a JVM based language. You'll be waiting a while as there's probably no market demand for something to compete with C at the hardware level.

There are projects like Javalution and JRockit Real Time, but these really are projects with ambitions of real time programming. They're really for smoothing out application execution for a human user. You would't put JRockit on a small Cortex processor to run Bouncy Castle. And these will still succumb to the fact that you'd need an underlying real time OS, such as those listed here.

Post OP edit:

The edit has made my answer even more pertinent. The attacker will now be faced with analysis of server responses entirely overwhelmed with tiny (or not) but entirely non deterministic time fluctuations. There will not only be the uncertain JVM and OS timings, but also all the network timing /latency issues on top. To see this effect try:-

ping baidu.com

and watch the latency. That's in China.

Meir Maor · Answer 4 · 2017-07-05T18:33:15.587

If all you are worried about is secret independent execution time I think you can do so reasonably also in the JVM.

You can add a built in delay with a timer for constant time execution. You may need to use a non open JVM if you want hard real time constraints for example you will want a pauseless GC. Though you may settle for G1 which under manys workloads will not pause measureably. This will allow you to set a reasonable upper bound on execution time. With a standard JVM you will probably not want to set a 100% gurantee upper bound but this may be good enough. If the attacker can't trigger these extreme scenarios. As an addes protection if the result is not ready by the upper bound alotted time you can multiply your alloted time. So even if the attacker can trigger the extreme load situations the measured encryption time will be severely rounded so he will only be measuring his generated load.

The chaotic nature of this complex systems may add many potential voulnerabilities but also makes exploiting them very difficult.

Regardless also on the JVM we can analyze performance to a great extent and though harder we can look at byte code, JIT code and of course empirically measure various micro steps. We need to remember even with assembly there may be processor level actions going on, out of order execution, branch prediction hyper threading, memory cache and more which affect performance. We don't normally take them all into account but we can test with them, at least under reasonable conditions.

A realtime JVM will allow you to make good hard commitments on runtime and could be usefulll.

Obviously we keep finding new side channels and timing attacks are just a relatively easy to exploit one. With a same physical machine attacker there are manu more venues, such as L3 cache.

Nat · Answer 5 · 2017-07-05T17:26:37.147

Yes, definitely. Just use callbacks with a set delay.

For example, say that your cryption transform might take anywhere from 1ns to 500ns; then, just await the authority to use the transform instruction, call it, and have it set the transformed result to some reference. After 1000ns, have the callback fire, using the result from the predetermined reference.

Since you'll need to slow everything down to the slowest to enable constant-time evaluation, this'll slow stuff down. But on the bright side, the CPU can do other work before the callback fires, so it doesn't eat all of the CPU time.

Because the CPU time is still affected by the transforms, an attacker who can flood your system and watch the resulting CPU usage might still exploit it, even if that's more annoying. So if that's a concern, then you'll probably want to add rate-limiting to all crypto evaluations in your program, or have do busy work on the CPU before the callback to ensure that the CPU usage isn't different.

Since this is an asynchronous programming scheme, you'll want to make sure that you're using memory barriers and such, as appropriate, or an attacker can exploit the programming errors introduced by improper multi-threaded coding.

Note that some of the "predetermined" values should probably be checked during operation. For example, if a transform somehow takes longer than expected, then the wait time for all future evaluations should be adjusted upward accordingly - and probably trigger a key change, just in case an attacker happened to glean some knowledge of the current key from that long-evaluation.

Timing-Safety in JVM-Languages

5 Answers5