Named Lambdas and Local Functions

Document Number: N2511(08-0021)
2008-02-04
Alisdair Meredith <alisdair.meredith@codegear.com>

Background and Rationale

Local functions are a feature of many languages, notably Pascal and Ada, yet lacking from C++. They provide a convenient way of encapsulating small chunks of functionality close to their point of use. By limiting their scope to just the surrounding function there is much less pressure for developers to come up with a globally distinct and meaningful name. Maintenance developers (and source code analysis tools) better understand the scope where the function may be called. The risk of 'accidental re-use' is greatly reduced.

A traditional barrier to acceptance has been the expectation that a local function has access to the enclosing function's local variables. Providing such a similar facility in C++ has been deemed complicated, costly, and not necessarily desirable.

Examples and use-cases

The example running through this paper is from a simple text report generator. It is counting lines and performing local formatting as a consequence.

void data::report( ostream & output ) {
  void WriteLine( ostream & os, int & lines, const string & name, int value ) {
    os << name << " : " << value << endl;
    if( ++lines % 5 ) {
      os << "+--------------+" << endl;
    }
  }
  
  int i = 0;
  WriteLine( output, i, "Ground Speed", ground );
  WriteLine( output, i, "Air Speed", air );
  WriteLine( output, i, "Accelaration", longacc );
  ...
  WriteLine( output, i, "Max load", limit );
  WriteLine( output, i, "Lines Written", i );
}

Here WriteLine is a simple function with no access to the surrounding function state. As such it could be expressed with a local functor in C++ today. However, if WriteLine had access to the enclosing function's scope our example would be simpler:

void data::report( ostream & os ) {
  void WriteLine( const string & name, int value ) {
    os << name << " : " << value << endl;
    if( ++lines % 5 ) {
      os << "+--------------+" << endl;
    }
  }

  int lines = 0;
  WriteLine( "Ground Speed", ground );
  WriteLine( "Air Speed", air );
  WriteLine( "Accelaration", longacc );
  ...
  WriteLine( "Max load", limit );
  WriteLine( "Lines Written", lines );
}

Interaction with C++0x

A number of features proposed for C++0x come strikingly close to offering the desired functionality. In some case this relies on using a known variant on the proposed syntax.

Linkage for Local Classes

Anthony William's proposal to allow local classes to be used by templates (N2402) resolves a major functional problem with using local classes as functors to substitute for local functions. Note that such functors have no access to the enclosing scope, although references to local variables can always be stored via the functor constructor.

int data::report( ostream & output ) {
  struct WriteLineImpl { // Not usable with function binders etc. in C++03
    ostream & os;
    int & lines;
    
    WriteLineImpl( ostream & out, int * i )
      : os(out)
      , lines( i )
    {
    }
    
    void operator()( const string & name, int value ) {
      os << name << " : " << value << endl;
      if( ++lines % 5 ) {
        os << "+--------------+" << endl;
      }
    }
  };
  
  int linecount = 0;
  WiteLineImpl WriteLine( output, linecount );
  
  WriteLine( "Ground Speed", ground );
  WriteLine( "Air Speed", air );
  WriteLine( "Accelaration", longacc );
  ...
  WriteLine( "Max load", limit );
  WriteLine( "Lines Written", lines );
  return linecount;
}

While this example would compile and run under C++03 the syntax is far from appealing, which is a major reason this idiom is not in more widespread use.

Lambda Functions

Lambda functions (N2487), sometimes called anonymous functions have many of the properties we desire. Note that there are 3 variations on the proposed lambda syntax, each of which controls how state of the enclosing function is passed into the lambda expression.

We note that a named variable of a lambda type, should such a thing be permitted, is very close to our ideal for a local function.

New function declaration syntax

N2445 describes a new fuction declaration syntax, overloading the auto keyword to indicate the result type appears after the parameter list.

Some concern has been expressed at the many overloaded meanings the auto keyword has been picking up in C++0x. One suggested alternative for function declarations is to use empty angle brackets in place of auto for this syntax, just like the form of lambda that does not specify a default convention for passing local state.

The main problem with using empty angle bracket is that the syntax is ambiguous with a function template specialization.

The proposal

Note: We do not propose local functions or named lambdas for C++0x as there is not enough time to explore all the implications. In particular, the idea of scope needs to be carefully considered.

Note2: The propsal is written in terms of the suggested syntax from the BSI Lambda Position paper, mainly to avoid the problems with template specialization mentioned earlier. For reference, here is the syntax comparison:

    N2487    N2510
     <>       <.>
     <&>      <&>
     <=>      <+>

Proposal for C++0x

However, we do propose that the design of both the lambda feature and the new function declaration syntax are harmonized, to be forward compatible with such an extension in the future. Specifically, a new style function declaration should be of the form:

  <.> identifier( param-list ) -> result_type;

Note that the <&> and <+> forms of lambda are not to be supported in this manner. Nor is there any support for capturing variables from the surrounding environment.

If the syntax for lambda is changed then the new style function declaration should follow that change.

Quite specifically, the default form of lambda expression cannot use empty angle brackets for its syntax, and one of the alternate forms suggested by Clark Nelson in N2487 should be chosen.

Example Reworked

Here is how our example would look using a named lambda syntax:

<.> data::report( ostream & os ) -> int {
  int lines = 0;  // We must declare local state to be used in lambda in scope
  
  <&> WriteLine( const string & name, int value ) -> void {
    os << name << " : " << value << endl;
    if( ++lines % 5 ) {
      os << "+--------------+" << endl;
    }
  }

  WriteLine( "Ground Speed", ground );
  ...
  WriteLine( "Lines Written", lines );
  
  return lines;
}

Note the selection of the by-reference lambda. As this local expression will not outlive the enclosing function, there is no danger of a dangling reference. Therefore it is safe to use this form, and variables from the enclosing function can be used without explicitly calling them out in an override list.

It remains an open design decision if the other forms of lambda expression should be similarly supported.

Proposal beyond C++0x

The above example is sufficient to demonstrate support for the feature. However, to truly feel integrated it should be possible to write local functions using the 'classic' declaration syntax. The proposal is that this would be interpreted as-if declaring a named, by-reference, lambda exactly as above. The example syntax would fairly obviously be:

int data::report( ostream & os ) {
  int lines = 0;  // We must declare local state to be used in lambda in scope

  void WriteLine( const string & name, int value ) {
    os << name << " : " << value << endl;
    if( ++lines % 5 ) {
      os << "+--------------+" << endl;
    }
  }

  WriteLine( "Ground Speed", ground );
  ...
  WriteLine( "Lines Written", lines );

  return lines;
}

The main difference from our early 'ideal syntax' in the motivation section is that lines must now be declared before the definition of WriteLine to be in scope.

An Approximation in C++0x

While the finer details of the lambda syntax are being nailed down in Core, it is possible that the following syntax will be supported, and deliver a strong approximation of the desired feature:

auto data::report( ostream & os ) -> int {
  int lines = 0;  // We must declare local state to be used in lambda in scope

  auto WriteLine = <&> ( const string & name, int value ) -> void {
      ...
    }
  }

  WriteLine( "Lines Written", lines );
  return lines;
}

The main difference from our ideal of identical syntax is the = <&> between the function name and the parameter list. Also note that in this case the similarity comes from re-using auto for the new function declaration syntax.

However, it is not yet clear that lambda expressions can be assigned to auto variables. If this turns out not to be the case, the other obvious workaround is to use a local function variable:

int data::report( ostream & os ) {
  int lines = 0;  // We must declare local state to be used in lambda in scope

  std::function<void( const string & , int )> WriteLine
    = <&> ( const string & name, int value ) -> void {
       ...
    }
  }

  WriteLine( "Lines Written", lines );
  return lines;
}

While this syntax is expected to work in C++0x, it would be harder for the compiler to aggressively optimize through the function object calls. Neither case comes close to resembling the 'classic' syntax.

Design Concerns

The following are some design concerns that any fully specified proposal will have to deal with.

Which Lambda-producer should be used?

The assumption above is that the default lambda producer <.> should be used to declare functions at global / namespace / class scope, and the capture-by reference form <&> should be used for local functions. The syntaxes are pleasingly similar, yet retain the subtle distinction that a local function is slightly more than a global function with a smaller potential scope.

However, an equally compelling argument could be made to consistently use the same form, so all function declarations look the same, whether global, local or member functions. In this case we strongly advocate the by-reference form <&>

Confusing functions and lambdas

If the new function syntax and the lambda syntax are identical, how does the compiler know when it has a lambda or a function?

The simple answer for this is by scope. Function declarations are only allowed at an 'outermost' scope, such as global, namespace or class scope. Lambdas can only be declared within a function. As a 'local function' is really a 'named lambda' there is no confusion here either.

Visibility of the enclosing function

Typical local function definitions in other languages allow the local function visibility of entities in the enclosing function scope. We note that it might be possible to treat the lambda capture variables as having 'function scope' in a manner similar to labels. With a bit of code path analysis it should be possible to make any call to a local function ill-formed if its capture variables have not been declared at the point of call. This would allow for convenient code organisation, where local function definitions all float to the top of the enclosing function definition.

While such a specification would be useful, it is not clear that the implementation would be more or less confusing than requiring all captured variables to be in scope at the point of declaration of the local function.

This design point is left open for now, with a preference for the simpler implementation.

Forward declaring local functions

Should it be possible to forward-declare a local function? It is not entirely clear that this would make sense for a named lambda, and there is the tricky question of whether variables of the enclosing function must be visible at the point of declaration, or the point of definition.

Even though the syntax using a 'classic' function declaration seems clear and intuitive, it should be ill-formed if the named lambda variant is ill-formed.

Supporting all three lambda syntaxes

While capture-by-reference is the default behaviour of other languages with this feature (and hence our experience) are there use cases for using capture-by-value? Certainly we could imagine use cases when trying to achieve pure functions for fine-grained parallelism, and we suspect further motivation to arise with use.

Acknowledgements

Daveed Vandevoorde originally noted the concerns with overloading auto, and was the first to suggest the lambda marker as the alternative. The similarity between named lambdas and local functions was observed during a BSI panel meeting, although the originator is lost to the heat of the moment.