Can a C/C++ compiler legally cache a variable in a register across a pthread library call?

Question

Suppose that we have the following bit of code:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

void guarantee(bool cond, const char *msg) {
    if (!cond) {
        fprintf(stderr, "%s", msg);
        exit(1);
    }
}

bool do_shutdown = false;   // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;

/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    while (!do_shutdown) {   // while loop guards against spurious wakeups
        res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
        guarantee(res == 0, "Could not wait for shutdown cond");
    }

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

/* Called in Thread 2. */
void trigger_shutdown() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    do_shutdown = true;

    res = pthread_cond_signal(&shutdown_cond);
    guarantee(res == 0, "Could not signal shutdown cond");

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

Can a standards-compliant C/C++ compiler ever cache the value of do_shutdown in a register across the call to pthread_cond_wait()? If not, which standards/clauses guarantee this?

The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown. This seems rather improbable, but I know of no standard that prevents it.

In practice, do any C/C++ compilers cache the value of do_shutdown in a register across the call to pthread_cond_wait()?

Which function calls is the compiler guaranteed not to cache the value of do_shutdown across? It's clear that if the function is declared externally and the compiler cannot access its definition, it must make no assumptions about its behavior so it cannot prove that it does not access do_shutdown. If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?

Yes, but it can do so if and only if there's provably no legitimate way the library function could change the value of the variable (for example if it's an automatic variable and its address is never taken). — R.. GitHub STOP HELPING ICE, Dec 18 '10 at 03:59
@R: correct... and in that case, it's actually safe to do so, since no other thread could possibly be using that variable either. — Ben Voigt, Jan 11 '11 at 06:18

Steve Jessop · Accepted Answer · 2010-12-18T13:42:17.677

6

Of course the current C and C++ standards say nothing on the subject.

As far as I know, Posix still avoids formally defining a concurrency model (I may be out of date, though, in which case apply my answer only to earlier Posix versions). Therefore what it does say has to be read with a little sympathy - it does not precisely lay out the requirements in this area, but implementers are expected to "know what it means" and do something that makes threads usable.

When the standard says that mutexes "synchronize memory access", implementations must assume that this means changes made under the lock in one thread will be visible under the lock in other threads. In other words, it's necessary (although not sufficient) that synchronization operations include memory barriers of one kind or another, and necessary behaviour of a memory barrier is that it must assume globals can change.

Threads Cannot be Implemented as a Library covers some specific issues that are required for a pthreads to actually be usable, but are not explicitly stated in the Posix standard at the time of writing (2004). It becomes quite important whether your compiler-writer, or whoever defined the memory model for your implementation, agrees with Boehm what "usable" means, in terms of allowing the programmer to "reason convincingly about program correctness".

Note that Posix doesn't guarantee a coherent memory cache, so if your implementation perversely wants to cache do_something in a register in your code, then even if you marked it volatile, it might perversely choose not to dirty your CPU's local cache between the synchronizing operation and reading do_something. So if the writer thread is running on a different CPU with its own cache, you might not see the change even then.

That's (one reason) why threads cannot be implemented merely as a library. This optimization of fetching a volatile global only from local CPU cache would be valid in a single-threaded C implementation[*], but breaks multi-threaded code. Hence, the compiler needs to "know about" threads, and how they affect other language features (for an example outside pthreads: on Windows, where cache is always coherent, Microsoft spells out the additional semantics that it grants volatile in multi-threaded code). Basically, you have to assume that if your implementation has gone to the trouble of providing the pthreads functions, then it will go to the trouble of defining a workable memory model in which locks actually synchronize memory access.

If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?

Yes to all of this - if the object is non-volatile, and the compiler can prove that this thread doesn't modify it (either through its name or through an aliased pointer), and if no memory barriers occur, then it can reuse previous values. There can and will be other implementation-specific conditions that sometimes stop it, of course.

[*] provided that the implementation knows the global is not located at some "special" hardware address which requires that reads always go through cache to main memory in order to see the results of whatever hardware op affects that address. But to put a global at any such location, or to make its location special with DMA or whatever, requires implementation-specific magic. Absent any such magic the implementation in principle can sometimes know this.

edited Dec 18 '10 at 13:42

answered Dec 18 '10 at 03:47

Steve Jessop

273,490
39
460
699

I'm 99% sure that somewhere POSIX defines mutex locking and unlocking as full memory barriers, but I can't find it right now. – R.. GitHub STOP HELPING ICE Dec 18 '10 at 03:57
pthreads does enforce a memory barrier across `pthread_cond_wait()`: http://stackoverflow.com/questions/3208060/does-guarding-a-variable-with-a-pthread-mutex-guarantee-its-also-not-cached/3208140#3208140 – Michael Burr Dec 18 '10 at 04:05
1

@Michael, @R..: right, it defines what functions "synchronize memory", but it doesn't define what "synchronize memory" actually means. There's more on this in Boehm's paper, anything else I say will basically be my (probably imperfect) reading of his research. There's not a whole lot of controversy what it means, it means what "memory barriers" actually do on known hardware, plus some complex compiler behaviour to ensure that barriers aren't subverted by certain kinds of re-ordering. It's just not Posix which says this, it's the understandable desire of compiler-writers to supply useful tools. – Steve Jessop Dec 18 '10 at 04:07
So the "if the standard doesn't forbid this then it's permitted" approach of the question basically wins, if an implementer chooses to take that line. You could produce some pile of rubbish, with sometimes-non-viable data synchronization, and plausibly claim that it complies to the letter of the standard. According to Boehm. – Steve Jessop Dec 18 '10 at 04:17
I'll have to have a close read of that paper. On initial glance, it sounds like it's saying that pthreads can only guarantee thread safety with the cooperation of the compiler. – Michael Burr Dec 18 '10 at 04:31
@Michael: that's certainly true, and furthermore it's saying that the pthreads spec doesn't adequately define the cooperation needed from the compiler. – Steve Jessop Dec 18 '10 at 04:38
Just to be clear, the OP asks `... cache a variable in a register across a pthread library call?`. In this particular case, the `register` qualification in the question makes the use of memory barriers moot. No amount of memory barriers will change the contents of a register. – johne Jan 11 '11 at 06:11
@johne: I think the questioner means, "use a register to cache a variable", not "cache a variable which is already in a register". I understand the question to be whether the compiler has to re-load from memory after the call, or whether it can carry on using the value it read from memory earlier. If the variable itself were marked `register` that would indeed change the game, since then its address can't be taken, meaning that it can't be aliased, can't be changed by other code, and the whole problem goes away. But in the question `do_shutdown` is not marked `register`, it's a global. – Steve Jessop Jan 11 '11 at 13:34
@SteveJessop "_If the variable itself were marked register that would indeed change the game, since then its address can't be taken_" It could be taken in C++. Anyway, `register` changes strictly nothing, as the compiler knows very well if the address of a variable is taken or not. Even if it is global. Actually, its address is not taken. Actually, this is entirely irrelevant. – curiousguy Oct 02 '11 at 03:52
@R "_I'm 99% sure that somewhere POSIX defines mutex locking and unlocking as full memory barriers_" POSIX does not defines `pthread_` functions as CPU memory barriers. POSIX defines the effect in term of read and writes of the program, not in term of the CPU. This sub-discussion is mixing up levels. – curiousguy Oct 02 '11 at 03:55
@SteveJessop "_it's saying that the pthreads spec doesn't adequately define the cooperation needed from the compiler._" Than it is saying something trivial: POSIX thread is a standard API for programs, not an ABI for compilers. It is not expected to explain how an implementation (compiler + POSIX) should cooperate. – curiousguy Oct 02 '11 at 04:00
@SteveJessop "_You could produce some pile of rubbish, with sometimes-non-viable data synchronization, and plausibly claim that it complies to the letter of the standard._" It is completely obvious that a compiler that use a fixed memory address in the function calling convention would be conforming and would not allow much multi-threading. – curiousguy Oct 02 '11 at 04:11
"_if the object is non-volatile, and the compiler can prove that this thread doesn't modify it_" The compiler obviously cannot prove that. – curiousguy Oct 28 '11 at 21:37
@curiousguy: I think you should re-read the part of the question that I was responding to here. That part of the question is, "if the compiler can inline a function call, and prove it does not access `do_shutdown` then can `do_shutdown` be cached. What about a non-inlined function in the same TU"? I have responded "yes to both". You have (perhaps accidentally) claimed that a compiler cannot ever prove whether a particular function in the same TU modifies a particular global. That's false. For some functions, it can prove exactly that, for the most obvious example suppose the function is empty. – Steve Jessop Oct 29 '11 at 08:14
I claim that the compiler cannot possibly prove that for any Pthread function, or for any function that does a syscall, or calls or contains asm. Unless you put annotation to allow the compiler to do just that (see GCC asm syntax). – curiousguy Oct 29 '11 at 20:52
@curiousguy: OK, but that's not what I was talking about in the part of my answer you quoted. There are plenty of functions for which the compiler can prove it, those are the ones the questioner is interested in. There are also plenty of functions for which the compiler cannot prove it -- most notably ones for which it isn't true, and the function in fact *does* modify the variable. But also as you say, various conditions can prevent the compiler knowing what a function does. – Steve Jessop Oct 31 '11 at 11:06
The original question is "_The compiler could hypothetically know that `pthread_cond_wait()` does not modify do_shutdown._" So I understood that you referred to functions calling directly or indirectly `pthread_cond_wait`. – curiousguy Oct 31 '11 at 17:39

score 2 · Answer 2 · edited May 23 '17 at 10:33

Since do_shutdown has external linkage there's no way the compiler could know what happens to it across the call (unless it had full visibility to the functions being called). So it would have to reload the value (volatile or not - threading has no bearing on this) after the call.

As far as I know there's nothing directly said about this in the standard, except that the (single-threaded) abstract machine the standard uses to define the behavior of expressions indicates that the variable needs to be read when it's accessed in an expression. The standard permits that reading of the variable to be optimized away only if the behavior can be proven to be "as if" it were reloaded. And that can happen only if the compiler can know that the value was not modified by the function call.

Also not that the pthread library does make certain guarantees about memory barriers for various functions, including pthread_cond_wait(): Does guarding a variable with a pthread mutex guarantee it's also not cached?

Now, if do_shutdown were static (no external linkage) and you have several threads that used that static variable defined in the same module (ie., the address of the static variable was never taken to be passed to another module), That might be a different story. for example, say that you have a single function that used such a variable, and started several thread instances running for that function. In that case, a standards conforming compiler implementation might cache the value across function calls since it could assume that nothing else could modify the value (the standard's abstract machine model doesn't include threading).

So in that case, you would have to use mechanisms to ensure that the value was reloaded across the call. Note that because of hardware intricacies, the volatile keyword might not be adequate to ensure correct memory access ordering - you should rely on APIs provided by pthreads or the OS to ensure that. (as a side-note, recent versions of Microsoft's compilers do document that volatile enforce full memory barriers, but I've read opinions that indicate this isn't required by the standard).

Since the question is "any standards-compliant compiler", perhaps that includes one in which the whole standard library, and all dependencies, are statically linked, so with link-time / whole-program optimizations, the call is fully visible? Perhaps not a realistic scenario. — Steve Jessop, Dec 18 '10 at 03:49
That's what common sense says, unfortunately it's unclear whether it's somehow enforced/guaranteed by the standards. — Ivan Tarasov, Dec 18 '10 at 03:50
If the compiler has whole program knowledge it could cache the value across the function call if it determined that nothing could modify the value. But the pthreads library is designed to prevent that caching across a call that could modify such a value (the library might need to perform something non-standard/implementation specific to do this, but that's the library's problem to solve). — Michael Burr, Dec 18 '10 at 03:53
"_That might be a different story._" Please show the hypothetical code for this different story. "_a standards conforming compiler implementation might cache the value across function call_" No. — curiousguy, Oct 02 '11 at 03:41
"_volatile enforce full memory barriers, but I've read opinions that indicate this isn't required by the standard_" It is not just an "opinion". In general `volatile` has nothing to do with MT. Pthreads **say nothing about `volatile`**. — curiousguy, Oct 02 '11 at 04:07

score 2 · Answer 3 · answered Oct 02 '11 at 03:48

The hand-waving answers are all wrong. Sorry to be harsh.

There is no way

The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown.

If you believe differently, please show proof: a complete C++ program such that a compiler not designed for MT could deduce that pthread_cond_wait does not modify do_shutdown.

It's absurd, a compiler cannot possibly understand what pthread_ functions do, unless it has built-in knowledge of POSIX threads.

score 0 · Answer 4 · answered Jan 11 '11 at 05:57

From my own work, I can say that yes, the compiler can cache values across pthread_mutex_lock/pthread_mutex_unlock. I spent most of a weekend tracing down a bug in a bit of code that was caused by a set of pointers assignments being cached and unavailable to the threads that needed them. As a quick test, I wrapped the assignments in a mutex lock/unlock, and the threads still did not have access to the proper pointer values. Moving the pointer assignments & associated mutex locking to a separate function did fix the problem.

Can a C/C++ compiler legally cache a variable in a register across a pthread library call?

4 Answers4

Linked