C++ Distributed Counters

ISO/IEC JTC1 SC22 WG21 N3355 = 12-0045 - 2012-01-14

Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org

Introduction
Solution
    Counter Operations
    Simple Counters
    Counter Buffers
    Duplex Counters
    Weak Counters
Synopsis

Introduction

Long-running multithreaded server programs may need to maintain many counters to aid in program diagnosis. As these counters are much more commonly incremented than read, we desire an implementation of counters that minimizes the cost of incrementing the counter, accepting increased costs to obtain the count.

Solution

We propose a set of coordinated counter types that collectively provide for distributed counting. The essential component of the design is a trade-off between the cost of the counter and the "lag" between an increment and its appearance in a "read" value. That is, the read of a counter value may not reflect the most recent increments.

These counters are parameterized by the base integer type that maintains the count. Avoid situations that overflow the integer. (See exchange below.)

Finally, in keeping with the desire for minimum counter cost, these counters provide no memory ordering; their only effect is to maintain the count. (See 1.10 [intro.multithread] of the standard.)

Counter Operations

Several operations are shared among most counters. They are:

constructor( integer );

The parameter is the initial counter value.

constructor();

Equivalent to an initial value of zero.

void inc( integer );

Add a value to the counter. There is no default value.

void dec( integer );

Subtract a value from the counter. There is no default value.

integer load();

Returns the value of the counter.

integer exchange( integer );

Replaces the existing count by the count in the parameter and returns the previous count, which enables safe concurrent count extraction. This operation has particular utility in long-running programs in occasionally draining the counter to prevent integer overflow.

There are no copy or assignment operations.

Simple Counters

The base of the design is an atomic_counter, which provides atomicity, but without reducing increment cost over the plain atomic_int.

A serial_counter is available to ease conversion from concurrent to serial code. It provides the same interface, but without the cost of atomicity.

Counter Buffers

The cost of incrementing a counter is reduced by reducing the contention on the counter. We achieve this by placing a counter buffer in front of a counter. Counts are transfered from the buffer to the counter with a push operation on the buffer. The destructor does an implicit push.

void push();

Transfers the count in the buffer to the primary counter.

Counter buffers require a reference to an atomic_counter as their constructor parameter, i.e. the primary counter. The lifetime of the primary counter must cover the lifetime of the buffer. Any increments in the buffer since the last push call are not reflected in the queries on the prime.

There are two forms of counter buffers, a serial_counter_buffer for use when all increments on the buffer come from the same thread and an atomic_counter_buffer for use when increments on the buffer may come from the different threads. The latter is useful for processor-local buffers, which can substantially reduce inter-processor communication.

An example use of a counter buffer is as follows.

atomic_counter<int> lower_case_count( 0 );

void count_lower_case( std::vector< string > arg )
    serial_counter_buffer<int> cntbuf( lower_case_count );
    for ( arg::iterator i = arg.begin(); i != arg.end(); i++ )
        if ( is_lower_case( *i ) )
            cntbuf.inc(1);
}

You can also place a serial_counter_buffer in front of an atomic_counter_buffer to further reduce costs.

atomic_counter<int> global_counter( 0 );
atomic_counter_buffer<int> processor_local( global_counter );
serial_counter_buffer<int> function_local( processor_local );

Duplex Counters

The push model of counters sometimes yields an unacceptable lag in the observed value of the count. To avoid this lag, the duplex_counter and its duplex_counter_buffer provide a pull model of counters.

duplex_counter<int> lower_case_count( 3 );

void count_lower_case( std::vector< string > arg )
    duplex_counter_buffer<int> cntbuf( lower_case_count );
    for ( arg::iterator i = arg.begin(); i != arg.end(); i++ )
        if ( is_lower_case( *i ) )
            cntbuf.inc(1);
}

Another thread may call lower_case_count.load() and get the current count. That operation will poll each buffer for its count and return the sum. Naturally, any incs done to a buffer after it is polled will be missed, but no counts will be lost.

The exchange operation works by atomically transfering buffer counts to the main counter. That is, every count will be extracted by one and only one exchange operation.

Weak Counters

Duplex counters can be expensive because the counter exchange operation and the buffer inc/dec operations require write concurrency to the same object. To reduce that concurrency cost, the weak_counter and its weak_counter_buffer do not provide the exchange operation. This difference means that the polling is a read-only operation and requires less expensive synchronization. Use this counter when you do not intend to exchange values.

Synopsis

The synopsis is as follows.

Note the presence of a counter_bumper class. This class could be eliminated by parameterizing the buffers on the type of the primary counter rather than on the underlying representation.

template< typename Integral > class serial_counter
{
public:
    constexpr serial_counter();
    constexpr serial_counter( Integral in );
    serial_counter( const serial_counter& ) = delete;
    serial_counter& operator=( const serial_counter& ) = delete;
    void inc( Integral by );
    void dec( Integral by );
    Integral load();
    Integral exchange( Integral to );
};

template< typename Integral > class counter_bumper
{
public:
    constexpr counter_bumper( Integral in );
    counter_bumper( const counter_bumper& ) = delete;
    counter_bumper& operator=( const counter_bumper& ) = delete;
    void inc( Integral by );
    void dec( Integral by );
};

template< typename Integral > class atomic_counter
: public counter_bumper< Integral >
{
public:
    constexpr atomic_counter();
    constexpr atomic_counter( Integral in );
    atomic_counter( const atomic_counter& ) = delete;
    atomic_counter& operator=( const atomic_counter& ) = delete;
    Integral load();
    Integral exchange( Integral to );
};

template< typename Integral > class serial_counter_buffer
{
public:
    serial_counter_buffer( atomic_counter< Integral >& p );
    serial_counter_buffer() = delete;
    serial_counter_buffer( const serial_counter_buffer& ) = delete;
    serial_counter_buffer& operator=( const serial_counter_buffer& )
        = delete;
    void inc( Integral by );
    void dec( Integral by );
    void push();
    ~serial_counter_buffer();
};

template< typename Integral > class atomic_counter_buffer
: public counter_bumper< Integral >
{
public:
    atomic_counter_buffer( counter_bumper< Integral >& p );
    atomic_counter_buffer() = delete;
    atomic_counter_buffer( const atomic_counter_buffer& ) = delete;
    atomic_counter_buffer& operator=( const atomic_counter_buffer& )
        = delete;
    void push();
    ~atomic_counter_buffer();
};

template< typename Integral > class weak_counter
{
public:
    weak_counter();
    weak_counter( Integral in );
    void inc( Integral by );
    void dec( Integral by );
    Integral load();
    ~weak_counter();
};

template< typename Integral >
class weak_counter_buffer
{
public:
    weak_counter_buffer( weak_counter< Integral >& p );
    weak_counter_buffer() = delete;
    weak_counter_buffer( const weak_counter_buffer& ) = delete;
    weak_counter_buffer& operator=( const weak_counter_buffer& ) = delete;
    void inc( Integral by );
    void dec( Integral by );
    ~weak_counter_buffer();
};

template< typename Integral > class duplex_counter
: public counter_bumper< Integral >
{
public:
    duplex_counter();
    duplex_counter( Integral in );
    Integral load();
    Integral exchange( Integral to );
    ~duplex_counter();
};

template< typename Integral >
class duplex_counter_buffer
: public counter_bumper< Integral >
{
public:
    duplex_counter_buffer( duplex_counter< Integral >& p );
    duplex_counter_buffer() = delete;
    duplex_counter_buffer( const duplex_counter_buffer& ) = delete;
    duplex_counter_buffer& operator=( const duplex_counter_buffer& )
        = delete;
    ~duplex_counter_buffer();
};