C++ Threads

Lawrence Crowl, 2005-10-03, N1875=05-0135

Introduction

The challenge in defining a C++ threading model is that we need to serve the needs of relatively casual thread programmers and deliver good performance on modern systems. However, we should not attempt to duplicate the work of significant industry standards, such as OpenMP. Furthermore, a standard C++ threading model should be reasonably implementable on existing operating systems.

This paper defers the discussion of a C++ memory model, i.e. memory synchronization, to other papers, most recently N1876=05136 "Memory model for multithreaded C++: August 2005 status update".

This paper defers the discussion of thread-local strorage to paper N1874=05-134 "Thread-Local Storage".

Approach

Proposals for threads in C++ generally adopt one of two approaches, a language extension to classes or a library. While generally workable, these approaches suffer from disadvantages. Extensions to classes tend to tie threading, a control abstraction, to the primary mechanism for data abstraction. This approach prematurely binds the choice of parallelism and synchronization into the API, and the result is often oversynchronized programs. A library approach often requires a fair amount of packaging at the library interface. Furthermore, a library interface implies either a compiler that recognizes the library or a compiler that avoids certain optimizations. Finally, both of these approaches suffer from an inapplicability to C. The problems that threads address are equally prevelant in C programs, and so a C++ solution that can be relatively easily adopted into C would serve the C/C++ community well.

This proposal adopts a syntax-based implementation of threads and synchronization. There are new operators and new control statements to address the new control domain. This proposal requires significant compiler implementation effort. On the other hand, that effort is lower than the effort in recognizing a library-based implementation. In keeping with a desire to for feasible C adoption, the syntax is C-like, just as are C++ control constructs. Likewise, the proposal adopts C-like function names where necessary.

Introducing new keywords is problematic. This proposal follows the _Upper convention for new keywords, recognizing that many persons find this convention ugly. It is however, the least likely to introduce conflicts with existing programs and the most likely to ensure debate on the choice of keywords.

Any C++ thread design is also contrained by the core thread utilities provided by the various operating systems. The two primary constraints are Posix threads and Windows threads.

Fork and Join

The _Fork (prefix) operator creates threads to execute a function call. The thread operator's value is a join object. The _Join (prefix) operator applied to a join object, yields the return value of function. The variables of join objects are declared, by example, with the _Join operator.

int function( int argument ) {
    int _Join joiner = _Fork work1( argument );
    // concurrent execution of work1 has begun
    int value = work2( argument );
    return value + _Join joiner;
}

The join object must be live until the thread has been joined. The return value within joiner is live from the function return until the join.

There is no select-like facility for joining the first ready joiner of a set of joiners. The expectation is that programmers will use other thread synchronization facilities for this purpose.

Timeouts

Several operations require a notion of timeout. The type struct _Timeout provides a means to specify a timeout. There are two interfaces to create timeout values, one suitable for C,

struct _Timeout;
void _Timeout_set( struct _Time * result, int32_t seconds, int32_t nanoseconds );
and the other suitable to C++.
struct _Timeout {
    _Timeout( int32_t seconds, int32_t nanoseconds );
    // private implementation
};

Managing Threads

Managing threads requires a handle on threads. A pointer to the join object serves this purpose. A pointer to a join object of arbitrary type may be cast to a pointer to a join object of void type and back to a pointer of the original type. When joining through such a pointer to a void joiner, the return value is lost. Pointers to join objects may be compared for equality or inequality regardless of the result type of joiner. So that a thread may manage itself, a thread may obtain a pointer to its own join object.

void _Join * _Thread_self()

A thread may indicate a good point to defer processor resources to other threads with the yield function.

void _Thread_yield()

Likewise, a thread may induce a minimum delay with the sleep function.

void _Thread_sleep( struct _Timeout )

It is an open issue whether threads should be able to induce yields and sleeps in other threads.

Mutual Exclusion and Conditions

Mutual exclusion is provided via the monitor paradigm. This paradigm is realized with a mutex variable, a lock statement, a condition variable, a wait statement, and a notify statement.

There are three types of mutex variables, _Mutex_simple, _Mutex_tested, and _Mutex_timed.

The lock statement provides an else clause in the event that the mutex is already held (_Mutex_tested or _Mutex_timed) or cannot be acquired within the specified timeout (_Mutex_timed only).

_Lock ( mutex-lvalue [ ; timeout-value ] )
statement
[ else statement ]

Within the active statement of a lock statment, programs may wait on conditions. These conditions a represented as variables of type _Condition. The corresponding statements are wait and notify.

The wait statement specifies a condition variable, a boolean expression, and an optional timeout.

_Wait ( condition-lvalue ; boolean-expr [ ; timeout-value ] )
statement
[ else statement ]

If the boolean expression of the wait statement evaluates to false, the thread will wait on a notify statement against the same condition variable, and then repeat until the boolean expression evaluates to true. When the expression evaluates to true, the primary statement executes. If the timeout is present and the thread is not notified within the specified time, the thread will reacquire the lock and execute the else statement.

The notify statement specifies a condition variable and an optional count.

_Notify ( condition-lvalue [ ; count ] );

If the count is not present, all threads waiting on the condition will be notified. If the count is present, at most that number of waiting threads will be notified.

For example, an int buffer might be implemented as

class buffer {
    int head;
    int tail;
    int store[10];
    _Mutex_simple mutex;
    _Condition not_full;
    _Condition not_empty;
public:
    buffer() : head( 0 ) , tail( 0 ) { }
    void insert( int arg ) {
        _Wait( not_full; (head+1)%10 != tail ) {
            store[head] = arg;
            head = (head+1)%10;
            _Notify( not_empty );
        }
    }
    int remove() {
        int result;
        _Wait( not_empty; head != tail ) {
            result = store[tail];
            tail = (tail+1)%10;
            _Notify( not_full );
        }
    }
};

Synchronized Initialization

In C++, variables with program extent, e.g. static local variables, may be initialized dynamically, and hence are at risk of unsynchronized initialization. A new storage class, _Synchronized, declares that the variable initialization is to be synchronized. Only one thread will initialize the variable, and all threads will wait for initialization to complete. The _Synchronized keyword is only required on variable definition.


struct type {
     static int shared;
};
_Synchronized static int type::shared = initial_value();

int function( int arg ) {
    _Synchronized static int holder = arg;
    return holder + arg;
}

Due to the potential for deadlock, programmers should minimize reliance on other synchronized variables in the initialization of a synchronized varible.

Atomic Operations

Atomic operations are among the most difficult to standardize because of the wide range of atomic primitives provided by the various machines. Atomic operations prefered on one machine may be inefficient or unavailable on another machine. (Though compare-and-swap is becoming nearly universal on modern general-purpose machines.) However, all machines provide at least one atomic synchronization primitive that may be used to emulate others. The requirement, then, is a mechanism for atomicity that does require any specific atomic operation, and that enables efficient exploitation of many operations.

This requirement is satisfied by the _Atomic statement, which accepts a variable and a statement. The overall model of the _Atomic statement is that of a compiler-generated compare-and-swap loop. However, The intent is that compilers recognize simple statements corresponding to the processor's atomic operations and use those atomic operations. On the other hand, when a compare-and-swap implementation is not feasible, this intent is that the compiler use tested busy locking. As a consequence, the _Atomic statement behavior is undefined in the presence of asynchronous signals.

Because atomic operations may fail, the usual approach is to retry those operations. However, unbounded retry may lead to live-lock. To prevent this problem, the _Atomic statements accepts a count of the number of times an atomic operation may attempted before failing over to the else clause.

_Atomic ( lvalue [ ; attempt-count ] ) statement [ else statement ]

The variable specified in the _Atomic statement may be read at most once and written at most once, in that order, and must be referenced through the same lvalue as specified. Reads of other non-local variables have acquire semantics. Writes to other non-local variables have release semantics.

The classic test-and-set is equivalent to

extern bool bit;
bool old;
_Atomic( bit ) { old = bit; bit = true; }

A simple but more realistic examples is inserting a new list head element.

extern element * head;
element * item;
_Atomic( head ) { item->next = head; head = item; }

A more comprehensive example is decrementing a reference count. When decrementing a reference count, the program needs to know if the count has reached zero to deallocate the item. However, reading the reference count after changing it is not thread-safe. The solution, is to store the value in a function-local variable within the _Atomic statement.

element * item;
int current;
_Atomic( item->refs ) { item->refs = current = item->refs + 1; }

Errors and Exceptions

Exceptions thrown out of the function called at a _Fork operator are propogated through the _Join operator.

It is an open issue on whether or not threads should be able to induce exceptions in another thread. It is clear, however, that such exceptions can be induced only synchronously and at predictable points in the program.

It is an open issue on how to handle errors with a C interface.

Standard Library

The standard library may need significant extension in the definition of concurrent data structures.