Jens Maurer <Jens.Maurer@gmx.net>
2004-02-14
Document N1609 = 04-0049

$Id: pre-sydney-paper.html,v 1.4 2004/02/14 00:12:15 jmaurer Exp $

More on Issues with Random Number Generators in the Library TR Proposal

Issue 6 of N1535: Should Random Number Initializers Take Iterators by Reference or by Value?

(See also the thread started with Pete Becker's message c++std-lib-10785.)

The concern is that section 5.1.1, table 5.2 (Pseudo-random number engine requirements), specifies that engines can be constructed and seeded using an iterator range [it1, it2) of unsigned integral values. It appears surprising to the user that, unlike any other function or class in the standard library, the iterators are not both passed by value, but it1 is passed by reference, and is modified during the construction or seeding.

Any solution has to satisfy the following requirements (see also section III.E of N1452):

There are two solutions to remedy this problem. The first keeps the iterator range, but passes the iterators by value. In order to avoid initializing the (potentially large) state of the engine twice, this requires that engines have a "construct, but don't seed" special constructor overload, similar to the "nothrow_t" overload of operator new.

The second avoids iterator ranges altogether, and instead passes a generator (i.e. zero-argument function object) by reference. There is a chance of ambiguous single-argument constructor and seeding overloads here, because most engines can be initialized with an "int" seed as well, but library implementors already have acquired a means to deal with the situation for the similar case of two-argument std::vector constructors. The advantage of passing a generator by reference is that the user can easily decide the behaviour on "end-of-sequence" to be "throw an exception" or "warp around", thereby reducing the complexity of the requirements specifications of the random number library. As outlined in N1452, section III.E, it is also easier to make a generator out of an iterator compared to the reverse, because a generator has less requirements.

After this discussion, I propose to move to a generator-based approach. (Note: The pre-TR random number library in Boost used this approach, but failed to fix the "overload" problem, so the iterator range approach was tried in the TR proposal - and failed due to reasonable design concerns.)

Proposed resolution

In section 5.1.1 [tr.rand.req], replace in the paragraph before table 5.2

... it1 is an lvalue and it2 is a (possibly const) value of an input iterator type It having an unsigned integral value type, ...
by
... g is an lvalue of a zero-argument function object returning values of unsigned integral type, ...
In the same section, replace in table 5.2 the table row for X(it1, it2) by
expression: X(g)

return type: (none)

pre/post-condition: creates an engine with the initial internal state given by the results of successive invocations of g. Throws what and when g throws.

complexity: O(size of state)

In the same section, replace in table 5.2 the table row for u.seed(it1, it2) by
expression: u.seed(g)

return type: void

pre/post-conditions: sets the internal state of u so that u == X(g). If an invocation of g throws, that exception is rethrown, and further use of u (except destruction) is undefined until a seed member function has been executed without throwing an exception.

complexity: same as X(g)

After table 5.2, add a new paragraph following the one starting "Additional Requirements":
For every pseudo-random number engine defined in this clause: [Note: The casts make g an rvalue, unsuitable for binding to a reference.]

[Note to editor: The following changes intend to morph "dereferencing *first" to "invoking g" only. However, complete text has been given.]

In section 5.1.4.1 [tr.rand.eng.lcong], replace the constructor template<class In> linear_congruential(In& first, In last) by

template<class Gen> linear_congruental(Gen& g)
Effects: If c mod m = 0 and g() mod m = 0, sets the state x(i) of the engine to 1 mod m, else sets the state x(i) of the engine to g() mod m.

Complexity: Exactly one invocation of g.

Furthermore, adjust the class synopsis accordingly.

In section 5.1.4.2 [tr.rand.eng.mers], replace the description of the constructor template<class In> mersenne_twister(In& first, In last) by

template<class Gen> mersenne_twister(Gen& g)
Effects: Given the values z[0] ... z[n-1] obtained by successive invocations of g, sets x(-n) ... x(-1) to z[0] mod 2w ... z[n-1] mod 2w.
Complexity: Exactly n invocations of g.
Furthermore, remove the description of the seed(first, last) function, it is subsumed by table 5.2 and the description of the constructor, and adjust the class synopsis accordingly.

In section 5.1.4.3 [tr.rand.eng.sub], replace the description of the constructor template<class In> subtract_with_carry(In& first, In last) by

template<class Gen> subtract_with_carry(Gen& g)
Effects: With n=(w+31)/32 (rounded downward) and given the values z[0] ... z[n*r-1] obtained by successive invocations of g, sets x(-r) ... x(-1) to (z0 * 232 + ... + zn-1 * 232*(n-1)) mod m ... (z(r-1)*n * 232 + ... + zr-1 * 232*(n-1)) mod m. If x(-1) == 0, sets carry(-1) = 1, else sets carry(-1) = 0.
Complexity: Exactly r*n invocations of g.
Furthermore, remove the description of the seed(first, last) function, it is subsumed by table 5.2 and the description of the constructor, and adjust the class synopsis accordingly.

In section 5.1.4.4 [tr.rand.eng.sub1], replace the description of the constructor template<class In> subtract_with_carry_01(In& first, In last) by

template<class Gen> subtract_with_carry_01(Gen& g)
Effects: With n=(w+31)/32 (rounded downward) and given the values z0 ... zn*r-1 obtained by successive invocations of g, sets x(-r) ... x(-1) to (z0 * 232 + ... + zn-1 * 232*(n-1)) * 2-w mod 1 ... (z(r-1)*n * 232 + ... + zr-1 * 232*(n-1)) * 2-w mod 1. If x(-1) == 0, sets carry(-1) = 2-w, else sets carry(-1) = 0.
Complexity: Exactly r*n invocations of g.
Furthermore, remove the description of the seed(first, last) function, it is subsumed by table 5.2 and the description of the constructor, and adjust the class synopsis accordingly.

In section 5.1.4.5 [tr.rand.eng.disc], replace the description of the constructor template<class In> discard_block(In& first, In last) by

template<class Gen> discard_block(Gen& g)
Effects: Constructs a discard_block engine. To construct the subobject b, invokes the b(g) constructor. Sets n = 0.
Furthermore, remove the description of the seed(first, last) function, it is subsumed by table 5.2 and the description of the constructor, and adjust the class synopsis accordingly.

In section 5.1.4.6 [tr.rand.eng.xor], replace the description of the constructor template<class In> xor_combine(In& first, In last) by

template<class Gen> xor_combine(Gen& g)
Effects: Constructs a xor_combine engine. To construct the subobject b1, invokes the b1(g) constructor. Then, to construct the subobject b2, invokes the b2(g) constructor.
Furthermore, remove the description of the seed(first, last) function, it is subsumed by table 5.2 and the description of the constructor, and adjust the class synopsis accordingly.

Issue 7 of N1535: Global comparison operators overspecified

The concern is that section 5.1.1, table 5.2 (Pseudo-random number engine requirements) specifies that each of the following must be valid expressions:
x == y
x != y
Redundantly, each of the engines is specified to have global operators for each of these operations. This restricts the implementation choices of the library implementor, for no good reason.

The following wording change removes the overspecification.

Proposed resolution

In section 5.1.1 [tr.rand.req], table 5.2, replace in the pre-/post-condition column for x == y

== is an equivalence relation. The current state x(i) of x is equal to the current state y(j) of y.
by
== is an equivalence relation. Given the current state x(i) of x and the current state y(j) of y, returns true if x(i+k) is equal to y(j+k) for all integer k >= 0, false otherwise.
In section 5.1.4.1 [tr.rand.eng.lcong], remove the prototypes for operator== and operator!= from the synopsis.

In section 5.1.4.2 [tr.rand.eng.mers], remove the prototypes for operator== and operator!= from the synopsis.

In section 5.1.4.3 [tr.rand.eng.sub], remove the prototypes for operator== and operator!= from the synopsis.

In section 5.1.4.4 [tr.rand.eng.sub1], remove the prototypes for operator== and operator!= from the synopsis.

In section 5.1.4.5 [tr.rand.eng.disc], remove the prototypes for operator== and operator!= from the synopsis.

In section 5.1.4.6 [tr.rand.eng.xor], remove the prototypes for operator== and operator!= from the synopsis.

Issue 23 of N1535: Streaming underspecified

Of concern are the following operators:
os << x
is >> x
The concern here is twofold: First, similar to issue 7 of N1535 (see above), the streaming operators are described in section 5.1.1, table 5.2 (Pseudo-random number engine requirements), and each engine has a specific declaration, which restricts implementation choices. Second, after discussion on the core reflector, the consensus was that it is desirable to have a portable external representation of the engine state. This can be done if a sequence of unsigned longs is written/read, where the representation of each unsigned long requires at most 32 bits.

Mixing streaming with seeding does not appear advisable, because seeding may have employ one-way hash functions that interfere with the goal of streaming, namely to completely restore the state.

Proposed resolution

After table 5.2, add a new paragraph following the one starting "Additional Requirements":

If a textual representation was written by os << x and that representation was read by is >> v, then x == v, provided that no intervening invocations of x or v have occurred.
In section 5.1.4.1 [tr.rand.eng.lcong], remove the prototypes for operator<< and operator>> from the synopsis. Also, remove the description of operator<<. Add after "The size of the state x(i) is 1.":
The textual representation is the value of x(i).

In section 5.1.4.2 [tr.rand.eng.mers], remove the prototypes for operator<< and operator>> from the synopsis. Also, remove the description of operator<<. Add after "The size of the state x(i) is n.":

The textual representation is the values of x(i-n), ..., x(i-1), in that order.

In section 5.1.4.3 [tr.rand.eng.sub], remove the prototypes for operator<< and operator>> from the synopsis. Also, remove the description of operator<<. Add after "The size of the state is r.":

The textual representation is the values of x(i-r), ..., x(i-1), carry(i-1), in that order.

In section 5.1.4.4 [tr.rand.eng.sub1], remove the prototypes for operator<< and operator>> from the synopsis. Also, remove the description of operator<<. Add after "The size of the state is r.":

With n = (w+31)/32 (rounded downward) and integer numbers z[k,j] such that x(i-k)*2w = z[k,0] + z[k,1] * 232 + z[k,n-1] * 232(n-1), the textual representation is the values of z[r,0], ... z[r,n-1], ... z[1,0], ... z[1,n-1], carry(i-1)*2w, in that order. [Note: The algorithm ensures that only integer numbers representable in 32 bits are written.]

In section 5.1.4.5 [tr.rand.eng.disc], remove the prototypes for operator<< and operator>> from the synopsis. Also, remove the description of operator<<. Add after "The size of the state is the size of b plus 1.":

The textual representation is the textual representation of b followed by the value of n.

In section 5.1.4.6 [tr.rand.eng.xor], remove the prototypes for operator<< and operator>> from the synopsis. Also, remove the description of operator<<. Add after "The size of the state is the size of the state of b1 plus the size of the state of b2.":

The textual representation is the textual representation of b1 followed by the textual representation of b2.

New Issue J17

Section 5.1.1 [tr.rand.req], table 5.2, specifies that the streaming operators take std::ostream and std::istream. This does not take wide streams nor the template nature of streams into account.

Suggested resolution

Replace std::ostream and std::istream by appropriate templates or template instantiations, or decide that writing/reading state to/from narrow streams is sufficient.

Acknowledgements

Thanks to Pete Becker for preparing N1535 and raising some of the issues on the library reflector.