Doc. No.:	WG21/N3209 J16/10-0199
Date:	2010-11-11
Reply to:	Hans-J. Boehm, Pablo Halpern
Phone:	+1-650-857-3406
Email:	Hans.Boehm@hp.com, phalpern@halpernwightsoftware.com

N3209: Progress guarantees for C++0x (US 3 and US 186)(revised)

National body comment US 3 points out that the FCD makes no meaningful progress guarantees for multithreaded C++ programs. US 186 points out that the current try_lock specification allows the function to always spuriously fail, again making it impossible for programmers to provide useful progress guarantees.

The complaints are clearly correct; we say nothing about progress guarantees. Many real applications will need to make some assumptions beyond the standard. The question is whether we want to address this directly, or view it in the same way that we currently view performance guarantees: as a quality of implementation issue. The correct answer is not clear. Here we argue for a fairly conservative solution that adds at most "normative encouragement" to the specification. The arguments in favor of such a minimalist solution are essentially:

It's late in the game.
We are not aware of any approach to specifying progress guarantees that is painlessly applicable across the variety of platforms that are likely to want to support C++0x. Any attempt to do something more substantial is likely to be controversial.
It appears likely that providing a meaningful guarantee would necessitate other changes in the library specification.

Our proposal implements the "normative encouragement" consensus that appeared in Rapperswil, though one of us is mildly concerned that even this may be stronger than appropriate.

Here we outline the difficulties in doing anything more substantial.

Difficulties in specifying progress guarantees

Here we list our concerns with adding a hard progress guarantee specification. Essentially all of these were previously discussed on the reflector. They also point to some of the difficulties with providing very precise "normative encouragement".

Java previously considered this issue, and made only very minimal guarantees. I believe Java guarantees that if I write (warning pseudocode!) with x initially false and atomic (i.e. Java volatile)

Thread 1 Thread 2

x = true;
print "hello";
... while (!x) {}

and "hello" is printed, then thread 2 terminates, i.e. the update to x can't be in flight forever. We believe it makes no guarantee that "hello" is ever printed, or the loop terminates if it's not. This was intentional. Everyone agreed that a general purpose JVM should provide stronger guarantees. But it was argued that it was fine for something like the Oracle embedded JVM that processes stored procedures not to use fully preemptive scheduling, and thus this should not be required in all contexts. We believe similar considerations apply for C++ use in embedded systems. Some of us believe that the resulting formulation at http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.9 is still a more complicated specification than we would like in the C++ standard.
The Java guarantee requires that atomic updates become visible to other threads in finite amount of time. This would at least be far easier to implement if hardware provided a similar guarantee, i.e. that values written to a processor store buffer eventually become visible to other threads (even if the processor then goes into a tight infinite loop, touching little memory). To our current knowledge, not all major processor vendors have committed to this, making it unclear what an implementation needs to look like in order to absolutely guarantee this.
US 3 really wants a stronger guarantee than (my interpretation of) Java, namely that the above loop above actually terminate no matter what, i.e. that thread 1 is scheduled and the new value of x eventually becomes visible to thread 2. This is not going to be enforceable on systems with strict priorities in which there might be a single processor, and thread 2 might always have higher priority than thread 1. Since C++0x does not provide priorities, this is not a show-stopper. But it does mean that progress guarantees would have to be weakened if we added priorities in the future. And it means that we would be providing a guarantee that cannot be assumed by library writers on most current systems. For example, a library intended to be callable from an arbitrary pthreads thread should not spin-wait on a helper thread with normal priority, since the helper thread may never be scheduled. We would be guaranteeing that this works.
We suspect that an MxN thread implementation along the lines that Solaris traditionally provided wouldn't satisfy the requested guarantee either in all contexts. But we're not sure about this.
If we want to guarantee progress for all unblocked threads, what does this mean for a thread that occasionally acquires a lock, perhaps as an implementation detail in some library routine? Is it required to eventually acquire the lock if it is available infinitely often? This would effectively require some level of fairness for locks, which may be at odds with performance. Any progress guarantee without such a guarantee seems nearly vacuous, unless we restrict which library routines may acquire locks.
Lock implementations sometimes use a briefly held spin lock to protect the internal data structures of the lock. Any real notion of fairness seems unattainable if an unlucky thread can repeatedly fail to acquire this spin lock, as appears likely in such a setting.

Thus imposing an absolute requirement here is tricky, and may be inappropriate for some settings. The normative encouragement option suggested by US16 as a possible alternative seems much more palatable.

Try_lock spurious failures

If we choose not to provide hard progress guarantees for the C++ threads in general, there are fewer reasons to say anything about try_lock() spurious failures.

It can still be argued that some legitimate algorithms could rely on the fact that try_lock() failed to obtain useful failures. One example, due to the authors of US 186, is that a graph traversal algorithm might legitimately conclude that if a lock on a node is held by another traversal thread, this node is already being visited by another thread, and hence no further work needs to be done.

However, it appears to us that such algorithms can almost always be more simply rewritten using the atomics library. A graph traversal algorithm typically already needs a flag to indicate whether a node has been visited. If that flag is made atomic, rather than protected by a lock, it can also easily be used to detect an in-progress traversal and ensure that exactly one thread processes each node, if that is desired.

On the other hand, there are strong reasons to require that programs be written to tolerate spurious try_lock() failures:

As pointed out in Boehm, Adve, "Foundations of the C++ Concurrency Memory Model", PLDI 08, enforcing sequential consistency for data-race-free programs without spurious try_lock() failures requires significantly stronger memory ordering for lock() operations on try_lock()-compatible mutex types. On some architectures that significantly increases the cost of uncontended mutex acquisitions. This cost appears to greatly outweigh any benefit from prohibiting spurious try_lock() failures.
It allows a user-written try_lock() to fail if, for example, the implementation fails to acquire a low-level lock used to protect the mutex data structure. Or it allows such an operation to be written directly in terms of compare_exchange_weak.
It ensures that client code remains correct when, for example, a debugging thread is introduced that occasionally acquires locks in order to be able to read consistent values from a data structure being checked or examined. Any code that obtains information from try_lock() failure would break with the introduction of another thread that purely locks and reads the data structure.

(All of these were brought out in the reflector discussion that also included Herb Sutter, Bronek Kozicki, Lawrence Crowl.)

On the other hand, we would like to discourage implementations from generating actual spurious failures of try_lock() when they could be avoided, since those introduce performance problems. And we want to make it clear that a try_lock() implementation that always fails is not useful. Thus we propose either normative encouragement, or a note, to explicitly discourage such implementations.

Proposed resolution:

We propose wording that provides normative encouragement to address both issues. We are not sure (with some mild disagreement among the authors) whether this requirement can even be stated precisely enough for normative encouragement. An alternative is to move all or some of the normative text into notes. We propose to add after 1.10p1 [intro.multithread]:

Implementations should ensure that all unblocked threads eventually make forward progress. [Note: Standard library functions may silently block on I/O or locks. Factors in the execution environment, including externally-imposed thread priorities, may prevent an implementation from making certain guarantees of forward progress. -- end note]

We also propose to add at the end of 1.10 [intro.multithread]:

An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

Recall that it is currently unclear whether existing hardware easily supports this as a hard guarantee. But a lot of existing software would not work well if the guarantee were frequently violated. (Note that the corresponding property for ordinary objects is not observable without introducing a data race.)

And finally add at the end of 30.4.1p14 [thread.mutex.requirements]

An implementation should ensure that try_lock() does not consistently return false in the absence of contending mutex acquisitions.

(Does "should" make sense in a requirements table like this? I think it is OK to apply to user-defined implementations, as well as the built-in mutex types. It will in fact apply to only the built-in mutex types if the lockable requirements paper goes in, to both if it doesn't. This is is too vague and untestable to be a "shall" requirement.)

For consistency, add to the compare_exchange_weak specification [atomics.types.operations] 29.6p23, just before the note:

Implementations should ensure that weak compare-and-exchange operations do not consistently return false unless either the atomic object has value different from expected, or there are concurrent modifications to the atomic object.

By stating these explicitly, we preserve our options, but potentially avoid some misunderstandings.

Thread 1	Thread 2
x = true; print "hello"; ...	while (!x) {}