C++ Data-Dependency Ordering: Function Annotation

ISO/IEC JTC1 SC22 WG21 N2361 = 07-0221 - 2007-08-02

Paul E. McKenney, paulmck@linux.vnet.ibm.com

Introduction

This document presents an interface and minimal implementation for preservation of data dependency ordering to expedite access to dynamic linked data structures that are read frequently and seldom modified. This document extends the proposal in N2359 and N2360 to permit dependency chains to cross compilation-unit boundaries, by providing annotations for function arguments and return values. The rationale for this proposal may be found in N2359.

This proposal does not affect library functions. Changes to library functions (for example, annotating the Vector templates) were considered, but rejected because current uses of data dependency ordering are restricted to basic operations such as indirection, field selection, array access, and casting. Longer term experience might indicate that a future proposal affecting library classes is warranted, however, there is insufficient motivation for such a proposal at this time.

This proposal is expected to have minimal affect to strongly ordered machines (e.g., x86) and on weakly ordered machines that do not support data dependency ordering (e.g., Alpha). It has no effect on implementations that refrain from breaking dependency chains. The major burden of this proposal would fall on weakly ordered machines that order data-dependent operations, such as ARM, Itanium, and PowerPC. Even for these architectures, a fully conforming compiler could use the same approach as weakly ordered machines that do not support data dependency ordering, albeit at a performance penalty. Alternatively, such a compiler could simply refrain from carrying out optimizations that break dependency chains.

This proposal enforces only data dependencies, not control dependencies. If experience indicates that control dependencies also need to be enforced, a separate proposal will be put forward for them.

This proposal is based on N2153 by Silvera, Wong, McKenney, and Blainey, on N2176 by Hans Boehm, on N2195 by Peter Dimov, on N2260 by Paul E. McKenney, on discussions on the cpp-threads list, and on discussions in the concurrency workgroup at the 2007 Oxford and Toronto meetings.

Current Approach

This proposal is in two parts:

Preserving dependency ordering within the confines of a single function body, as described in N2359 and N2360.
Preserving dependency ordering across multiple functions, possibly in multiple compilation units. This is addressed in the remainder of this document.

Multiple Functions

As noted in the previous section, a dependency chain terminates when a value is passed into or returned from an unannotated function. If a given function's return value is annotated, then dependency chains survive being returned from that function. If a particular argument of a given function is annotated, then dependency chains survive being passed in via that argument. If the function has a prototype, then the annotation must be applied to the prototype as well as to the function definition itself.

For example, the following propagates dependencies through argument y to the return value:

[[dependence_propagate]] int *f([[dependence_propagate]] int *y)
{
        return y;
}

The following example propagates dependency chains in, but not out:

int f([[dependence_propagate]] int *y)
{
        return *y;
}

The following propagates dependency chains out, but not in:

[[dependence_propagate]] struct foo *f(int i)
{
        return &foo_head[i];
}

Finally, the following does not propagate dependency chains at all:

template<T> T *kill_dependency_chain(T *y)
{
        return y;
}

Minimal Implementation

Minimal implementations of [[dependency_propagate]] simply ignore this attribute, assuming the minimal implementation of dependency ordering described in N2359.

Behavior on Dependency Examples

In N2176, Hans Boehm lists a number of example optimizations that can break dependency chains. Most of these are addressed in N2359, the remainder are covered below.

Example 5

N2176 example code:

r1 = x.load(memory_order_relaxed);
if (r1)
	f(&y);
else
	g(&y);

In this case, there is no data dependency leading into f() and g(), so this code-dependency example is out of scope. Modifying the example by replacing &y with r1 to create a data dependency leading into the two functions:

r1 = x.load(memory_order_relaxed);
if (r1)
	f(r1);
else
	g(r1);

Recoding this based on this proposal and on N2359:

void f([[dependence_propagate]] atomic<struct foo *> p);
void g([[dependence_propagate]] atomic<struct foo *> p);

r1 = x.load(memory_order_dependency);
if (r1)
	f(r1);
else
	g(r1);

Assuming that x is an atomic, the x.load(memory_order_dependency) will form the head of a dependency chain. The [[dependence_propagate]] annotations will cause the dependency chain to propagate into f() and g().

Alternatives Considered

Require the compiler globally analyze the program to infer which dependency chains must be preserved. This conflicts with the common practice of compiling C++ programs on a module-by-module basis, so this alternative was rejected.
Prohibit dependency chains from crossing function boundaries. There are a large number of examples of RCU dependency chains crossing function boundaries in the Linux kernel, so this alternative was rejected.
Prohibit dependency chains from crossing compilation-unit boundaries. There are a large number of examples of RCU dependency chains crossing compilation-unit boundaries in the Linux kernel. In some cases, the overall overhead of the instructions making up the dependency chain overwhelms that of a memory fence, however, in a number of cases, the actual number of instructions is small, so that the overhead of a memory fence would still be prohibitive. Therefore, this alternative was rejected.