Doc. No.:	WG21/N3069 J16/10-0059
Revision of:	WG21/N3040 = J16/10-0030
Date:	2010-3-11
Reply to:	Hans-J. Boehm
Phone:	+1-650-857-3406
Email:	Hans.Boehm@hp.com

N3069: Various threads issues in the library (LWG 1151)

LWG 1151 (US 63) points out that we had made only an incomplete attempt at dealing with threading issues throughout the library. This is an attempt to improve matters by addressing as many of the remaining issues as we are currently aware of. Based on experience with other languages, it seems likely that more issues will arise in this area. But it appears to be getting harder to find the holes. This relies on discussions with many people, including Lawrence Crowl, Howard Hinnant, Peter Dimov, Nick Maclaren, and Paul McKenney. The last section, including the solution, was mostly copied from LWG issue 1329, which was authored by Jeffrey Yaskin.

Communication through I/O

27.4p4 [iostreams.objects] currently specifies:

Concurrent access to a synchronized (27.5.2.4) standard iostream object’s formatted and unformatted input (27.7.1.1) and output (27.7.2.1) functions or a standard C stream by multiple threads shall not result in a data race (1.10). [ Note: users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. -end note ]

This raises the question of whether I can use iostreams to synchronize threads. If I set X to 1 in thread 1, write to a stream f, and then read the written value, or one that depends on it in thread 2, is thread 2 guaranteed to see the write to X (or a later one)?

The above paragraph appears to apply only to the standard iostreams, making the answer to this question a bit unclear, since access to other streams already produces a data race, avoiding the question. Based on discussions in the concurrency group, there still appear to be fairly esoteric way to use IO for thread communication, either by rebinding the standard streams to files, or through the C library. Hence we probably do need to address the issue, in spite of statements to N3040 to the contrary.

We should be consistent with the "sequential consistency for data-race-free programs" rule. Anything that does not introduce a race should ensure the appropriate happens-before relationship unless weaker memory ordering is explicitly specified. Thus, in the above example case, thread 2 should be guaranteed to see the write to X.

Proposed resolution:

Insert a second paragraph into [iostreams.threadsafety], after 27.2.3p1:

If one thread makes a library call a that writes a value to a stream, and as a result another thread reads this value from the stream through a library call b such that this does not result in a data race, then a happens before b.

Additional non-modifying non-const member functions

A number of standard container member functions are treated as const for race detection, in spite of the fact that they are not const functions. The at() function was inadvertently omitted from this list.

During LWG discussions, Alisdair pointed out that "unordered associative containers" are not "associative containers", and should be listed separately.

Proposed Resolution:

Edit 23.2.2p1 [container.requirements.dataraces] as follows:

For purposes of avoiding data races (17.6.4.8), implementations shall consider the following functions to be const: begin, end, rbegin, rend, front, back, data, find, lower_bound, upper_bound, equal_range, at and, except in associative or unordered associative containers, operator[].

More precise definition of "atomic"

The current spec is not very clear what it means for an "atomic read-modify-write" operation to be "atomic". Clarification:

Proposed Resolution:

Before 29.3p11[atomics.order] add a paragraph:

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

Iterator issues

The current draft is unclear when iterator operations may conflict with accesses to other iterators or to the underlying container.

Proposed resolution:

Add the following paragraph after 17.6.4.8p5 [res.on.data.races]:

Operations on iterators obtained by calling a standard library container or string member function may access the underlying container, but shall not modify it. [Note: In particular, container operations that invalidate iterators conflict with operations on iterators associated with that container. -- end note.]

There was a question about whether the race behavior of algorithms such as equal() is sufficiently described. Equal() appears to be adequately described, since it uses InputIterators, implying that, according to 17.6.4.8p5 [res.on.data.races] it is not allowed to update anything through the iterators. Unfortunately, many of the algorithms use ForwardIterators only for input, which means they are underspecified. An implementation currently could read and write back the objects referenced by the iterators, which is clearly unacceptable.

Proposed resolution:

Add after 25.1p3 [algorithms.general]:

For purposes of determining the existence of data races, algorithms shall not modify objects referenced through an iterator argument unless the specification requires such modification.

Howard Hinnant points out that this is not as clear as it should be without more context. I'm also not sure that it is sufficient to deal with something like nth_element(), which has a specification that already seems less clear than I would like about what is being modified. But it was decided that this is a clear improvement over what we have.

Constructors and destructors of synchronization objects can race

This arose during discussion of LWG 1218. LWG 1221 is also related.

I think we all agree that constructors and destructors of all library objects, including synchronization objects like mutexes, are not protected against data races. It is the programmer's responsibility to ensure that construction happens before any other use, and any other use happens before destruction.

The one exception here seem to be the somewhat more liberal condition variable rules, which allow a wait to complete after a condition variable has been destroyed.

There seems to be some agreement that the standard mostly says all of this already. In particular, the object lifetime rules in 3.8 seem to essentially state this, assuming 3.8 is interpreted as referring to happens before (1.10) ordering. Unfortunately, that's not as clear as it should be.

I believe the CWG needs to make a change of adding something like the following at the beginning of 3.8 [basic.life]. This will become a CWG issue. We are NOT requesting that this be added as part of this proposal. However we will assume an interpretation along these lines, and believe that the general intent here is uncontroversial. Possible addition to 3.8:

All statements about the ordering of evaluations in this section, using words like "before", "after", and "during", refer to the happens before order defined in 1.10[intro.multithread]. [Note: We ignore situations in which evaluations are unordered by happens before, since these require a data race (1.10)[intro.multithread], which already results in undefined behavior --end note]

We do propose to make the following related clarification to the library description:

Proposed resolution:

Add the following as a second paragraph to 17.6.3.10 [res.on.objects]:

[Note: In particular, the program is required to ensure that completion of the constructor for any standard-library-defined class happens before any other member function invocation on that class object, and unless otherwise specified, to ensure that completion of any member function invocation, other than destruction, on a class object happens before destruction of that object. This applies even to objects, such as mutexes, intended for thread synchronization. -- end note.]

I believe condition variables are the only intended exception to this. However, the current precondition, as modified by LWG 1221, does not make this clear. It states "There shall be no thread blocked on *this", which is entirely redundant with the default lifetime rules, which would require that any wait calls happen before destruction.

Proposed resolution:

Add after 30.5.1p4 [thread.condition.condvar], whether or not updated by LWG1221, and after 30.5.2p3:

This relaxes the usual rules, which would have required all wait calls to happen before destruction. Only the notifications to unblock the wait must happen before destruction.

Bitset, vector<bool> (LWG 1329)

The common implementation of vector<bool> is as an unsynchronized bitfield. The addition of 23.2.2 [container.requirements.dataraces]/2 would require either a change in representation or a change in access synchronization, both of which are undesireable with respect to compatibility and performance.

The bitset has a conceptually similar issue. Unfortunately, it appears easier to resolve a bit differently.

Proposed resolution:

Modify 23.2.2 [container.requirements.dataraces]:

Edit paragraph 2 as follows:

2 Notwithstanding (17.6.4.8), implementations are required to avoid data races when the contents of the contained object in different elements in the same sequence, excepting vector<bool>, are modified concurrently.

Edit paragraph 3 as follows:

3 [Note: For a vector<int> x with a size greater than one, x[1] = 5 and *x.begin() = 10 can be executed concurrently without a data race, but x[0] = 5 and *x.begin() = 10 executed concurrently may result in a data race. As an exception to the general rule, for a vector<bool> y, y[0] = true may race with y[1] = true. —end note]

Add at the end of the returns clause for bitset operator[] (20.5.2p57 [bitset.members]):

For the purpose of determining the presence of a data race (1.10) any access or update through the resulting reference potentially accesses or modifies, respectively, the entire underlying bitset.