N3152: Progress guarantees for C++0x (US 3 and US 186)
National body comment US 3 points out that the FCD makes no
meaningful progress guarantees for multithreaded C++ programs.
US 186 points out that the current try_lock specification allows
the function to always spuriously fail, again making it impossible
for programmers to provide useful progress guarantees.
The complaints are clearly correct; we say nothing about progress guarantees.
Many real applications will need to make some assumptions beyond the standard.
The question is whether we want to address this directly, or view it in
the same way that we currently view performance guarantees: as a quality
of implementation issue.  The correct answer is not clear.
Here we argue for a fairly conservative solution that
adds at most "normative encouragement" to the specification.
The arguments in favor of such a minimalist solution
are essentially:
	-  It's late in the game.
	
-  We are not aware of any approach to specifying progress
	guarantees that is painlessly applicable across the variety
	of platforms that are likely to want to support C++0x.
	Any attempt to do something more substantial is likely to be
	controversial.
	
-  It appears likely that providing a meaningful guarantee
	would necessitate other changes in the library specification.
Our proposal implements the "normative encouragement" consensus that
appeared in Rapperswil, though one of us is mildly concerned that
even this may be stronger than appropriate.
Here we outline the difficulties in doing anything more substantial.
Difficulties in specifying progress guarantees
Here we list our concerns with adding a hard progress
guarantee specification.  Essentially all of these were previously
discussed on the reflector.  They also point to some of the difficulties
with providing very precise "normative encouragement".
	-  Java previously considered this issue, and made only very minimal
	guarantees.  I believe Java guarantees that if I write
	(warning pseudocode!) with x initially false
	and atomic (i.e. Java volatile)
        
                | Thread 1 | Thread 2 |  
                | x = true; print "hello";
 ...
 | while (!x) {} 
 
 |  
 and "hello" is printed, then thread 2 terminates, i.e. the update
to x can't be in flight forever.  We believe it makes no guarantee
that "hello" is ever printed, or the loop terminates if it's not.
This was intentional.  Everyone agreed that a general purpose JVM
should provide stronger guarantees.  But it was argued that it was
fine for something like the Oracle embedded JVM that processes stored procedures
not to use fully preemptive scheduling, and thus this should not be
required in all contexts.  We believe similar considerations apply for
C++ use in embedded systems.
Some of us believe that the resulting formulation at
	http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.9
is still a more complicated specification than we would like in the C++
standard.
-  The Java guarantee requires that atomic updates become visible to other
threads in finite amount of time.  This would at least be far easier to
implement if hardware provided a similar guarantee, i.e. that values
written to a processor store buffer eventually become visible to other
threads (even if the processor then goes into a tight infinite loop,
touching little memory).  To our current knowledge, not all major processor
vendors have committed to this, making it unclear what an
implementation needs to look like in order to absolutely guarantee this.
-  US 3 really wants a stronger guarantee than (my interpretation of) Java,
namely that the above loop above actually terminate no matter what,
i.e. that thread 1 is scheduled and the new value of x eventually
becomes visible to thread 2.  This is not going to be enforceable on
systems with strict priorities in which there might be a single processor,
and thread 2 might always have higher priority than thread 1.  Since
C++0x does not provide priorities, this is not a show-stopper.  But
it does mean that progress guarantees would have to be weakened if we
added priorities in the future.  And it means that we would be
providing a guarantee that cannot be assumed by library writers on most
current systems.  For example, a library intended to be callable from
an arbitrary pthreads thread should not spin-wait on a helper thread
with normal priority, since the helper thread may never be scheduled.
We would be guaranteeing that this works.
-  We suspect that an MxN thread implementation along the lines that
Solaris traditionally provided wouldn't satisfy the requested guarantee
either in all contexts.  But we're not sure about this.
-  If we want to guarantee progress for all unblocked threads,
what does this mean for a thread that occasionally acquires a lock,
perhaps as an implementation detail in some library routine?
Is it required to eventually acquire the lock if it is available infinitely
often?  This would effectively require some level of fairness for locks,
which may be at odds with performance.  Any progress guarantee without
such a guarantee seems nearly vacuous, unless we restrict which library
routines may acquire locks.
Lock implementations sometimes use a briefly held spin lock to
protect the internal data structures of the lock.  Any real notion of fairness
seems unattainable if an unlucky thread can repeatedly fail to acquire this
spin lock, as appears likely in such a setting.
 
Thus imposing an absolute requirement here is tricky, and may be
inappropriate for some settings.
The normative encouragement option suggested by US16 as a possible
alternative seems much more palatable.
Try_lock spurious failures
If we choose not to provide hard progress guarantees for the C++ threads
in general, there are fewer reasons to say anything about
try_lock() spurious failures.
It can still be argued that some legitimate algorithms could rely on the
fact that try_lock() failed to obtain useful failures.
One example, due to the authors of US 186, is that a graph traversal algorithm
might legitimately conclude that if a lock on a node is held by another
traversal thread, this node is already being visited by another thread,
and hence no further work needs to be done.
However, it appears to us that such algorithms can almost always
be more simply rewritten using the atomics library.  A graph traversal
algorithm typically already needs a flag to indicate whether a node
has been visited.  If that flag is made atomic, rather than protected
by a lock, it can also easily be used to detect an in-progress traversal
and ensure that exactly one thread processes each node, if that is desired.
On the other hand, there are strong reasons to require that programs
be written to tolerate spurious try_lock() failures:
	-  As pointed out in
        
        Boehm, Adve, "Foundations of the C++ Concurrency Memory Model",
        PLDI 08, enforcing sequential consistency
	for data-race-free programs without spurious try_lock()
	failures requires significantly stronger memory ordering
	for lock() operations on try_lock()-compatible
	mutex types.  On some architectures that significantly
	increases the cost of uncontended mutex acquisitions.  This
	cost appears to greatly outweigh any benefit from prohibiting
	spurious try_lock() failures.
	
-  It allows a user-written try_lock() to fail if,
	for example, the implementation fails to acquire a low-level
	lock used to protect the mutex data structure.  Or it allows
	such an operation to be written directly in terms of
	compare_exchange_weak.
	
-  It ensures that client code remains correct when, for
	example, a debugging thread is introduced that occasionally acquires
	locks in order to be able to read consistent values from a data
	structure being checked or examined.  Any code that obtains
	information from try_lock() failure would break with
	the introduction of another thread that purely locks and reads the data
	structure.
(All of these were brought out in the reflector discussion that also
included Herb Sutter, Bronek Kozicki, Lawrence Crowl.)
On the other hand, we would like to discourage implementations from
generating actual spurious failures of try_lock() when they
could be avoided, since those introduce performance problems.
And we want to make it clear that a try_lock() implementation
that always fails is not useful.  Thus we propose either normative
encouragement, or a note, to explicitly discourage such implementations.
Proposed resolution:
We propose wording that provides normative encouragement to address
both issues.  We are not sure (with some mild disagreement
among the authors) whether this requirement can even be stated
precisely enough for normative encouragement.  An alternative
is to move all or some of the normative text into notes.
We propose to add after 1.10p1 [intro.multithread]:
         Implementations should ensure that all unblocked threads
         eventually make forward progress.  [Note: Standard library
         functions may silently block on I/O or locks.  Factors in the
         execution environment, including externally-imposed thread
         priorities, may prevent an implementation from making certain
         guarantees of forward progress. -- end note]
We also propose to add at the end of 1.10 [intro.multithread]:
	 An implementation should ensure that the last value (in
	 modification order) assigned by an atomic or synchronization
	 operation will become visible to all other threads in
         a finite period of time.
Recall that it is currently unclear whether existing hardware easily
supports this as a hard guarantee.  But a lot of existing software would not
work well if the guarantee were frequently violated.
(Note that the corresponding property for ordinary objects is not
observable without introducing a data race.)
And finally add at the end of 30.4.1p14 [thread.mutex.requirements]
	An implementation should ensure that try_lock() does not consistently
	return false in the absence of contending locks.
(Does "should" make sense in a requirements table like this?  I think we
do want this to apply to user-defined implementations, as well as the
built-in mutex types.  But this is too vague and untestable to be
a "shall" requirement.)
By stating these explicitly, we preserve our options, but potentially
avoid some misunderstandings.