Doc. no. | N2081 |
---|---|

Date | 2016-09-15 |

Reply to | Clark Nelson |

The working draft that CPLEX handed over to WG14 (N2017) describes parallelism using only traditional multi-threading. At the time, CPLEX discussed producing a new document describing parallelism using vector/SIMD technology, starting with array sections, described in this document.

However, more recent discussions in CPLEX have favored less strictness about implementation technology. In particular, it is felt that an implementation should be able to use multiple threads, and not just SIMD instructions, to implement array section operations.

WG14 needs to discuss whether different parallelization technologies should be treated differently at the languge level, and how that would affect document publication strategy.

This document provides a specification for the array section portion of the Intel® Cilk™ Plus language extension. Array sections are intended to allow users to directly express high level parallel vector array operations in their programs. This assists the compiler in performing vectorization and auto-parallelization. From the users' point of view, they will see more predictable vectorization, improved performance and better hardware resource utilization. Array sections are an extension of the standard C/C++ languages, including features that are designed for easy expression of array operations and simplified parallel function invocation.

The section expression selects multiple array elements for a data-parallel operation.

The syntax of a section expression is as follows:

- postfix-expression:
`section-expression`

- section-expression:
`postfix-expression``[`

`section-triplet``]`

- section-triplet:
`expression``:`

`expression``:`

`expression``expression``:`

`expression``:`

Each of the expressions in a section triplet
shall have integer type.
The postfix expression in a section expression shall have type
“pointer to complete object `type`”;
the type of the section expression
is “`type`”
(i.e. the same type
as the corresponding “simple” subscript expression;
there is no section type).

The sequence of expressions delimited by the brackets
in a section expression is termed a section triplet
(even when there are fewer than three expressions).
The expressions in a triplet are interpreted, respectively as:
`begin`, `length`, and `stride`.
Each section triplet represents a sequence of subscript values,
starting at `begin`, with `length` elements,
where each element increases by `stride`:
`begin, begin + stride, begin + stride * 2, …,
begin + stride * (length - 1)`.

NOTE: See the rationale for the interpretation of the limit of a section.

When no stride expression is present,
the value of `stride` is 1.
When the triplet contains no expressions
(i.e. consists entirely of a single colon),
the value of `begin` is 0,
and the value of `length`
is the number of elements in the array being subscripted.
If this shorthand is used,
the type of the array being subscripted
shall have a declared size
(which can be non-constant in the case of a VLA).

EXAMPLE:
`A[0:3:2]`

refers to elements 0, 2, and 4
of the array `A`

.
`A[:]`

refers to the entire array `A`

,
assuming `A`

is an array with known upper bound.

The expressions in a triplet
are converted to `ptrdiff_t`

.

If `stride` is negative,
then `begin` identifies the uppermost index.
If `length` is less than or equal to zero,
the sequence of subscript values is empty.

Every expression has a rank, determined as follows.

- The rank of an expression with no nested sub-expression is zero. (This rule applies to identifiers and constants.)
- Unless otherwise specified, the rank of an expression with one sub-expression operand is the rank of its operand. (This rule applies to parenthesized expressions, most postfix expressions, most unary expressions, and cast expressions.)
- Unless otherwise specified,
in an expression with more than one sub-expression operand,
the rank is the common rank of its operands.
The common rank of two expressions is
- if the rank of either expression is zero, the common rank is the rank of the other expression;
- otherwise, if the expressions have the same rank, that is the common rank;
- otherwise, the program is ill-formed.

- The rank of a section expression
(
`postfix-expression``[`

`section-triplet``]`

) is one greater than the rank of its postfix expression operand. The rank of each expression in a section triplet shall be zero. - The rank of a simple subscript expression
(
`postfix-expression``[`

`expression``]`

) is the sum of the ranks of its operand expressions. The rank of the subscript operand shall not be greater than one. - The rank of an argument expression list (in a function-call expression) is the common rank of the argument expressions if there are more than one, or the rank of the argument expression if there is exactly one, or zero if there are no argument expressions.
- The rank of a non-member function-call expression is the rank of its argument expression list. The rank of the postfix expression identifying the function to call shall be zero.
- The rank of a member function call expression is determined as if the object expression appeared as an additional expression in the argument list.
- The rank of a comma expression is the rank of its second operand.
- The rank of a
`lambda-expression`is zero.

In an assignment expression, if the right operand has nonzero rank, the left operand shall have the same rank as the right operand.

EXAMPLES:

Expression | Rank |
---|---|

`A[3:4][0:10]` |
2 |

`A[3][0:10]` |
1 |

`A[3:4][0]` |
1 |

`A[:][:]` |
2 |

`A[3][0]` |
0 |

An array section is an lvalue postfix expression with rank greater than zero.

EXAMPLE: `A[0:3][0:4]`

refers to 12 elements
in the two-dimensional array A,
starting at row 0, column 0, and ending at row 2, column 3.

EXAMPLES:

int *p; int A[n][m]; p[:] = ... // not valid. p has no declared size. A[:][:] = ... // The entire 2D array A. p[1:5] = ... // p[1],p[2], ... p[5].

Every section triplet of an array section has a relative rank, defined as its ordinal number among the other triplets, from left to right, starting at 0.

EXAMPLES:

Expression | Relative rank of `0:10` |
---|---|

`A[:][0:10][:]` |
1 |

`A[0:10][:][:]` |
0 |

`A[:][:][0:10]` |
2 |

`A[i[:]][j[0:10]][k[:]]` |
1 |

The shape of an array section is defined as a vector:
`(length _{0},length_{1},...,length_{n-1})`,
where

A full expression shall have rank zero,
unless it appears in an expression statement
or as the controlling expression
of an `if`

statement.

In general, a statement containing a section expression is executed once for each element of the section; some operations of these executions may be performed in parallel. However, in such a statement, a sub-expression with rank zero is evaluated only once.

EXAMPLE:

a[:] = f1(b[:]) + f2(c) + d;

`f1`

is called for each element of `b`

.
`f2(c)`

is evaluated once;
that value is used to compute the new value
of each element of `a`

.
If `f2`

changes the value of `d`

,
the value used for `d`

in this example
is unspecified.
If `f1`

changes the value of `d`

,
the behavior is likely to be undefined.

When two array sections are required to have common rank, if they do not have the same shape, the behavior is undefined.

When two expressions are required to have common rank, and one of them has zero rank, the expression with zero rank is evaluated only once; the resulting value is used as many times as necessary to fully execute the containing statement.

An array section assignment is a parallel operation that modifies every element of the array section on the left-hand side.

When the left operand of assignment has non-zero rank, the assignment of each element is unsequenced with respect to the assignment of every other element, and the value computation for each element is unsequenced with respect to the value computation for every other element.

NOTE: see the rationale for the handling of assignment to an overlapping section.

EXAMPLES: If any section on the right-hand side of an assignment overlaps with the left-hand side, and the overlap is not complete, the behavior is undefined.

a[0:10] = a[10:10]; // no overlap; well-defined a[0:10:2] = a[1:10:2]; // no overlap; well-defined a[0:10] = a[0:10] + 1; // complete overlap; well-defined a[0:10] = a[1:10]; // incomplete overlap; undefined

EXAMPLES:

// Copy elements 10->19 in A to elements 0->9 in B. B[0:10] = A[10:10];

// Undefined behavior. Triplets 0:10 and 0:100 are not the same size. B[0:10] = A[0:100];

// Transpose row 0, columns 0-9 of A, into column 0, rows 0-9 of B. B[0:10][0] = A[0][0:10];

// Copy the specified array section in the 2nd and 3rd dimensions of A into // the 1st and 4th dimensions of B. B[0:10][0][0][0:5] = A[3][0:10][0:5][5]

// Undefined behavior. The corresponding triplets with the same relative rank // (0:9, 0:5) and (0:5, 0:9) do not have the same number of elements. B[0:9][0:5] = A[0:5][0:9];

// OK since the triplets on both sides have the same number of elements. // The 5 elements in A are scattered to the even-index elements in B // (0,2,4...8). The values of the odd-index elements (1,3,5...7) are not // changed. B[0:5:2] = A[0:5];

When applied to an array section or sections, the C/C++ arithmetic operators are applied to each element of the array section(s). Note that multiplication is performed element-wise and does not correspond to traditional vector/matrix multiplication.

EXAMPLES:

// Set all elements of A to 1.0. A[:] = 1.0;

// Set every element of 3x3 shape in A to the value of B[0]. A[0:3][0:3] = B[0];

// Error: the number of dimensions (rank) must match, // or be equal to 0. A[0:2][0:2] = B[0:2];

// Element-wise addition of all elements in A and B, resulting in C. C[:] = A[:] + B[:];

// Element-wise multiplication of all elements in A and B, // resulting in C. C[:] = A[:] * B[:];

// Add elements 10->19 from A with elements 0->9 from B and place in // elements 20->29 in C. C[20:10] = A[10:10] + B[0:10];

// Element-wise addition of the first 10 elements in column 2 of A and // column 3 of B, resulting in column 0 of C. C[0:10][0] = A[0:10][2] + B[0:10][3];

// Matrix addition of the 2x2 matrices in A and B starting at A[3][3] // and B[5][5], placed in C starting at C[0][0]. C[0:2][0:2] = A[3:2][3:2] + B[5:2][5:2];

// Add the array section along the 1st and 2nd dimensions of B to the // elements in the array section along the 2nd and 3rd dimensions of A, // placing them in an array section in C. C[0:9][0][0:9] = A[0][0:9][0:9] + B[0:9][0:9][4];

// Element-wise addition of first 10 elements in A and B, // resulting in A. A[0:10] = A[0:10] + B[0:10];

// Element-wise negation of first 10 elements in A, resulting in A. A[0:10] = -A[0:10];

// Multiply every element in A by 2. A[:] *= 2;

// Add one to each element in A. A[:]++;

// Element-wise equality test of B and C, resulting in an array of // Boolean values, which are placed in A. A[:] = B[:] == C[:];

If a function is called with an array section argument, the function is “mapped”, or called with successive elements in the array.

NOTE: See the rationale for the meaning of an array section as an argument to a function.

EXAMPLE:

typefn(typearg1,type2arg2);typein[N], out[N];type2in2[N]; out[x:y:z] = fn(in[x:y:z], in2[x:y:z]);

The function `fn`

is mapped over sections
of arrays `in`

and `in2`

.
The function `fn`

will be called
with arguments `(in[x], in2[x])`

,
with arguments `(in[x+z], in2[x+z])`

, etc.
The results of the function calls are collected
into a section of array `out`

.

The executions of function calls mapped from a given statement are unsequenced with respect to one another.

The reduction operations accumulate a result over all the values in an array section.

A reduction operation is expressed using the following syntax:

typedef-name`(`

expression`)`

The typedef name shall refer to a reduction type. The argument of a reduction operation shall have rank greater than zero. The type of the argument shall be assignable to the proxied type of the reduction type. The type of the reduction operation is the proxied type of the reduction type. The rank of a reduction operation is zero.

The reduction is computed as if by the use
of a hypothetical reduction object of that type,
initialized by its initializer aspect,
and combined with each result value of the argument expression.
The value of the reduction operation
is the same as would be produced in the root view
of such a reduction object.
~~The finalizer of the reduction type is not used
by a reduction operation.~~

If the argument expression of a reduction operation is an lvalue, it is unspecified whether any new object is created corresponding to a non-root view of the hypothetical reduction object. If the implementation creates such an object, it is initialized and finalized according to the aspects of the reduction type.

We would like to have a way to express a reduction that computes the index of the minimum or maximum value in an array, but it's not currently clear how we would do that.

Invocations of the function
designated by the ~~operation argument~~
combiner of the reduction type
are unsequenced with respect to one another.
~~It is unspecified how the elements of the section,
the initial/result object,
and any introduced temporary objects,
are paired by calls to the operation function,
except that if the rank of the section is one
and the operation function is associative,
then the result is the same as for left-to-right reduction,
where the initial value is taken as leftmost,
and the element with index ~~`begin + stride * (length - 1)`
is taken as rightmost.

EXAMPLE:

void fn(type*in1,typeconst *in2);typein[N], out; typedef _Reduction { _Type:type, _Combiner: fn, _Initializer: initial } fn_reduction; out = fn_reduction(in[x:y:z]);

The reduction will be computed analogously to:

tmp= initial;for each element X ofin[x:y:z] fn(&tmp, &X);

The result of the reduction will be the final value of `tmp`.

EXAMPLE: the two reduction operations given here compute the same result:

void add(double *out, double *in) { *out += *in; } typedef _Reduction { _Type: double, _Combiner: += } double_add_reduction; typedef _Reduction { _Type: double, _Combiner: add } double_add_fn_reduction; out = double_add_fn_reduction(in[x:y:z]); // accumulate using add() out = double_add_reduction(in[x:y:z]); // accumulate using built-in +=

NOTE: The compiler may produce more optimized code when a reduction with a built-in combiner operations is used.

In writing code that uses array sections, it is sometimes useful to explicitly reference the indices of the individual elements in a section. For example, the user may wish to fill an array with a function of the element index, rather than a single value.

Conceptually, an array section operation
can be thought of as expanding into a loop
with an implicit index variable
for each relative rank of the section.
For each relative rank,
the value of the implicit index variable ranges between zero
and one less than the length of the triplet
with that relative rank.
The `__sec_implicit_index`

operation
returns the value of the implicit index variable
for a specified relative rank.
It behaves as a function with the following declaration:

intptr_t __sec_implicit_index(intrelative_rank);

The argument shall be an integer constant expression, whose value shall be greater than or equal to zero, and less than the rank of the containing full expression. For purposes of rank checking, the rank of an implicit index operation is zero, although it is reevaluated for each element, like an expression of rank one.

EXAMPLES:

int A[10], B[10][10]; // A[0] = 0, A[1] = 1, A[2] = 2,... A[:] = __sec_implicit_index(0);

// B[i][j] = i+j; B[:][:] = __sec_implicit_index(0) + __sec_implicit_index(1);

// The length of each dimension is 2. The value of __sec_implicit_index // is either 0 or 1, regardless of the subscripts of the affected elements. A[i:2:s][j:2:t] = __sec_implicit_index(0) ^ __sec_implicit_index(1);

`if`

statementsIf the rank of the controlling expression
of an `if`

clause is nonzero,
the rank of every full expression in every substatement
of that `if`

statement
(including substatements of those substatements, recursively)
shall equal the rank of the controlling expression
of the `if`

clause.
If the shape of a full expression of a substatement
does not match the shape of the controlling expression,
the behavior is undefined.

When the controlling expression of an `if`

statement
has nonzero rank,
the entire `if`

statement,
including its dependent statements,
is executed for each element
of the shape of the controlling expression.

Array sections enables a “vector kernel” style of programming, where vector code is encapsulated within a function, with a parameterized vector length. In this case, concurrency happens within a single function invocation, unlike when “mapping” function calls over an array section, where concurrency happens between function invocations.

The following example
illustrates how to combine array section vectorization
inside a function body
with threading for parallel function calls.
The vector length `m` is 256 in this example.

void saxpy_vec(int m, float a, float x[restrict m], float y[m]) { y[:] += a * x[:]; } int main(void) { float a[2048], b[2048]; _Cilk_for (int i = 0; i < 2048; i += 256) saxpy_vec(256, 2.0, (a + i), (b + i)); }

By writing the function explicitly with array arguments, the programmer can write code with easily customizable vector lengths and runtime model choices.

Please note that functions cannot return array section values. It may be helpful to return a pointer to the array as in standard C/C++ and take sections of the return value in the callee.

The concepts and terminology for array sections in C/C++ are taken directly from Fortran. However, there are a few significant differences.

In Fortran, the limit of an array section is identified by the index of its last element (inclusively). This makes sense in Fortran, where an array declaration also specifies the inclusive uppermost index. Therefore, in an array section expression that explicitly references the whole array, the same value would be used in the limit expression as in the array declaration.

In C, however, an array declaration works differently; what is specified is the number of elements, or alternatively the exclusive uppermost index (given that array indexing starts at zero). So for consistency with declaration syntax, one of those interpretations should be used for the limit expression of an array section. The choice of which was nearly arbitrary, but as the number of elements of an array or array section relates directly to its size and shape, it was considered to be an even more essential characteristic than its index limit.

In Fortran, the right-hand side of an assignment statement is always fully evaluated before the result is assigned. In terms of understandability, the advantage of this rule is obvious. However, when a large array section is assigned, and the value to be assigned references the same array, but with different indices, this rule can impose significant overhead. In the most general case, it would be necessary to allocate temporary space to hold the entire array value to be assigned, and the total amount of space would have to be computed and allocated at run-time. There are many methods by which this overhead can be reduced or eliminated, but the compiler still has to do analysis to determine whether such a method can be used, increasing the compile-time overhead.

Balancing these trade-offs has been tricky enough for implementers and users of Fortran. But the character of C is such that even the possibility of overheads like this would be considered surprising by many users, perhaps most. A C program is generally allowed to give a quick answer, even if it may be surprising. It seemed more in the spirit of C to allow implementations to give a quick, surprising answer for overlapping array sections as well.

In Fortran, the abstraction of an array section can extend into a subroutine. Specifically, a subroutine can take an arbitrary array section as an argument, and access that array section using its own index space, independently of the index space of the complete original array. This effectively requires the use of dope vectors by Fortran implementations.

In C and C++, which don't even have parameters of array type (they are immediately rewritten as pointers), the effort to support usages like this would be enormous, in terms of concepts, specification, compilation, and run-time. There is not yet sufficient experience to demonstrate that all that effort would be worthwhile.

So while an array section can be used as an argument in a function call, such a call is interpreted consistently with other operations; specifically, the call is applied to each element of the section, as opposed to extending the section abstraction into the called function.