Converting Memory Fences to N2324 Form

ISO/IEC JTC1 SC22 WG21 N2362 = 07-0222 - 2007-08-04

Paul E. McKenney, paulmck@linux.vnet.ibm.com
Lawrence Crowl, Lawrence@Crowl.org

Introduction

Existing parallel code using memory fences typically uses the address-free varieties provided by most hardware, for example:

When converting such programs to use N2324's address-based memory fences, developers must supply the corresponding variable. This document lists a number of methods that might be used to accomplish this, and is particularly concerned with the memory_order_seq_cst and the memory_order_acq_rel variants. A brief description of each method follows:

  1. Developers could carefully select separate variables for each related use of fences within the system being ported, carefully validating that the additional potential for compiler and hardware misordering did not render the program incorrect.
  2. Developers could randomly select variables that were conveniently in scope, potentially using a different variable for each fence in the program.
  3. Developers could create a new global variable, and assign that global variable to all N2324-based fences. Developers would most likely use macros or inline functions to map their existing API into that provided by N2324.
  4. The C++ standard could specify the name of the global variable to be used in such cases, and developers would be advised to use that variable. N2324 might in addition be extended to take a default argument, which would map to the standard name.
  5. The relevant ABI standard could specify the name for a given platform. N2324 might again be extended to supply this ABI-specific name as the default for fence operations.

Each of these approaches is expanded on in the following sections.

1. Select Separate Fence Variables

This approach offers the greatest potential performance for platforms that can exploit the additional opportunities for reordering. However, it also requires the greatest effort on the part of the programmers and incurs the greatest risk. However, this risk is incurred only on systems that use special facilities or optimizations that take advantage of the greater freedom to reorder or to reduce communications.

In contrast, platforms that chose to implement the N2324 fence operations as address-free fence instructions (as listed above) would be guaranteed to run the program with the old semantics.

Furthermore, given that all existing hardware would likely use address-free fences, any validation that the developers might do would be theoretical. Although it is hoped that program analysis tools will eventually be capable of analyzing fence usage, there is currently no way to test the variable choices, which in turn means that any design errors or even typographical errors would persist. Such a program would therefore -look- like it was written for a machine with address-sensitive fence instructions when it does not in fact run correctly on such hardware.

This situation forces the conclusion that (a) programmers are extremely unlikely to choose this option and (b) if they do choose it, they will almost certainly get it wrong.

2. Select Random Fence Variables

In this scenario, the developers randomly choose any convenient atomic that is in scope for each separate fence primitive. This requires very little effort on the part of the developers, and is guaranteed to preserve program behavior on existing machines with address-free fence instructions. The program would very likely fail on machines with address-sensitive fence instructions, though casual inspection of the program would have a fair chance of fooling the inspector into believing that the variables had been properly selected.

A moment reflecting on human nature and on experience with real people on real projects should be sufficient to force the conclusion that this option is depressingly likely to be chosen.

3. Create New Global Variable

Here, a single new global atomic variable is chosen to be used in conjunction with N2324 fence operations. In standalone roll-your-own software projects, this option is reasonably likely to be chosen, and it has the virtue of preserving program semantics on hardware that has address-sensitive fence instructions. Unfortunately, such a choice of global variable may result in extremely low levels of performance and scalability.

Furthermore, if the program is built using multiple third-party modules and libraries that are independently converted, it is unlikely that all the parties would chose the same global variable, thus raising the possibility of bugs appearing on hardware with address-sensitive fence instructions.

Even worse, a casual investigation of the code might erroneously conclude that the program had been optimized to run on hardware featuring address-sensitive fences. A better approach would make it quite clear that no such optimization had been undertaken.

Of course, such code would continue to run correctly on conventional machines with address-free fence instructions, increasing the likelihood that any such errors would go undetected until much later when the software was actually run on a machine with address-sensitive fence instructions.

4. C++ Standard Specifies Global Variable

With this option, the C++ standard specifies the name of the global variable to be used for N2324 fences, and developers would be advised to use that variable. N2324 might in addition be extended to take a default argument, which would map to the standard name. This latter approach possesses the virtue that developers would be very strongly incented to let the compiler reliably choose the correct name. However, the C language does not permit default arguments, so the C-language API would need to either require the variable be specified or require an additional API member.

This approach would permit programs, even those produced by multiple parties working in isolation, to produce correct results when run on machines with address-sensitive fence instructions. In addition, it would be obvious that the program had not been specifically optimized for hardware featuring address-sensitive fences, as such optimization would almost invariably avoid the standard name.

However, the C++ standard is arguably a strange place to put such a variable name, particularly when no platform that we are currently aware of needs it. On platforms with efficient address-insensitive fence instructions, placing the name in the standard would consume an identifier to no purpose.

5. ABI Standard Specifies Global Variable

A more logical place to put the name of the global variable would be in the relevant ABI standard. This works especially well for the majority of the platforms with address-free fence instructions, as such platforms need not specify a variable name at all, given that they don't need one. In this case, N2324 might be extended to offer a default value for the address to be associated with the fence operation, permitting the common address-free-fence platforms to simply sidestep the whole issue. In addition, the name of the global variable in the ABI standard could potentially be an illegal C++ identifier, which would avoid consuming a legal C++ name.

In other words, only the ABI standards for prospective machines with address-sensitive fence instructions would need to take this requirement into account. Platforms such as Itanium that have both address-free and address-sensitive variants of fence instructions could choose to modify its ABI standard or not, as performance, convenience, or other considerations dictated.

This approach perserves the correctness advantages of option #4, likewise the performance shortfall of naively ported code, but only when such code is run on machines having only address-sensitive fence instructions for which simulating address-free fence instructions is expensive. However, the C language does not permit default arguments, so the C-language API would need to either require the variable be specified or require an additional API member.

Conclusion

At the current moment, option #4 seems to be the most straightforward in terms of standards effort (when the C-language standard is taken into account) and also in terms of correct operation of programs with pre-existing fence operations.

A candidate variable definition is as follows:

const atomic_bool atomic_global_fence_compatibility = { false };