Are garbage-collection programming languages inherently unsafe for use in cryptography

Question

In JP Aumasson's cryptocoding guidelines, he states that memory containing secret data should be cleared before it goes out of scope. This is to prevent vulnerabilities where, for example, an attacker could access the core dump file or use other memory exploits to recover sensitive data from uncleared memory. This is, of course, a problem for high-level programming languages that rely on garbage collection (such as python or java) since they make it very hard or even impossible to clear sensitive data from memory. So knowing this, does this mean that those languages should not be used for cryptographic applications?

score 18 · Answer 1 · answered Nov 30 '24 at 03:15

What you're talking about isn't an issue of garbage collection, but of "zeroizing" sensitive registers after use. See for example the Rust zeroize crate.

In an ideal world, one would do something like

let vec_of_secrets = Vec::new();
//
// Do secret stuff here
//
for i in vec_of_secrets.iter() {
    vec_of_secrets[i] = 0;
}
// de-allocate vec_of_secrets

The issue is that an optimizing compiler would be able to tell that assigning 0 to vec_of_secrets (which should "delete the secret values from memory") amounts to wasted clock cycles. This optimizing compiler may then elide performing this computation, and might leave the secret vector in memory. This isn't good.

As an easy way to see that this isn't about garbage collection, Rust is not a garbage-collected language, and still suffers this issue. Instead, it is about fighting an optimizing compiler when using secrets stored on the heap.

That being said, if such a programming language has a method to "zeroize" these secrets on the heap, there should be no issue. I don't personally know how easy this is to do in the example languages you mention.

score 12 · Accepted Answer · answered Nov 30 '24 at 09:23

You can almost always zero the memory yourself after use. This may run into problems with optimizing compilers, like Mark says. Garbage collection will not release the memory before the last reference is out of scope, which would only happen after your zeroing operation.

There is a risk that garbage collection would relocate your object in RAM without clearing the old location (compaction). This is relatively low risk if your secret operation is not running for a long time. Some garbage collectors may have a way to disable compaction or do not compact at all. This increases heap fragmentation, but not more than in non-GC languages.

Garbage collection is not the only mechanism that can cause secret data to leak. Especially system swap is risky, as many high level programming languages do not have a way to lock memory from swapping, and the data could persist in the swapfile for a long time. System level configuration can encrypt the swap or force zeroing of it.

Overall, zeroing memory after use is a defence-in-depth strategy. The attacker shouldn't be able to read your memory in the first place, but if they do, zeroing reduces the risk they'll find anything important. The risk will always exist, as the data will be in memory during the processing.

So as a summary:

Zeroing is useful in GC languages, though slightly less effective as in non-GC languages.
Compiler optimizations can accidentally remove zeroing instructions.
Zeroing is about reducing the risk of finding secret data in RAM, but the risk can never be fully eliminated.

score 7 · Answer 3 · answered Nov 30 '24 at 18:37

A language using GC is a red herring. What is desirable is control of critical memory but not necessarily allocation/deallocation. This is not at odds with GC.

Since you mention Python, there are two important features of GC languages to point out with practical examples.

GC does not fundamentally conflict with prompt memory deallocation. The reference implementation CPython uses reference counting as the default memory management strategy and GC only as a fallback for cyclic data. This means that primitive data - such as strings and bytes suitable to store sensitive data - is deallocated immediately after dropping all references.
Allocation and deallocation are not the only means of handling memory. GC is only about the ownership of memory, not the content of it. This means that GC managed objects - such as mutable bytearray or others via memoryview - can allow explicitly changing, overwriting and destroying sensitive data in memory without violating invariants of the GC.

At the end of the day, no matter what kind of language is used it is important to be aware of how sensitive data is handled by the language in practice. Security considerations go well beyond the simple, default memory management semantics of mainstream languages and need explicit handling in either memory model.

score 4 · Answer 4 · answered Dec 01 '24 at 17:34

It's not exactly the GC, but the string. In languages where strings are immutable you can't overwrite them either. So writing password="" or password="notapassword" in Python, Java, and many others will leave previous content in the memory allocated, which then will get collected by the GC, but even then the actual bytes will remain being intact (GC does not zero out memory it reclaims).

Sometimes people try things, like the built-in Console class in Java gives you an array of characters from readPassword(), so you can overwrite the contents after use:

public char[] readPassword()

And then many password hashing functions really support character arrays.

GUI stuff in Java varies, like AWT has its generic text input support an echo character, gives you a String, Swing can get you both String and character array (a question is what it does/stores internally), and then JavaFX is again a text input with minimal changes.

It also matters what you're worried about exactly. By the time high-level language code gets a password, it has been to quite many places, starting with the device drivers that may have buffers internally and may not know you're reading a password at the moment, especially since one Alt-Tab ago keypresses went to a totally different application.

score 2 · Answer 5 · answered Dec 02 '24 at 13:01

Overwriting memory is hard in any language.

The memory contents are typically not part of the contract of language constructs and functions. E.g. printf specifies it shows something on the console, but not whether that is also present in memory or not. The memory contents are not considered externally observable behavior, and programming languages often do not provide control over memory contents.

This is, of course, a problem for high-level programming languages that rely on garbage collection (such as python or java) since they make it very hard or even impossible to clear sensitive data from memory.

Back in 2016 I looked into whether it is possible to clear secrets from memory in Python, and came to the conclusion this is almost impossible.

In Java, a common pattern is to store secrets in a char array instead of in a String. This makes it possible to overwrite the char array, and thus clear the secret from memory.

There is an assumption in there, that the char array in Java is mapped one-to-one to memory. I am not sure this is specified somewhere, or an implementation detail. Perhaps a compliant JVM could exist that keeps the secret in memory after overwriting the char array.

Are garbage-collection programming languages inherently unsafe for use in cryptography

5 Answers5