Proposal for C2y
WG14 3195

Title:               Named loops
Author, affiliation: Alex Celeste, Perforce
Date:                2023-11-30
Proposal category:   New feature
Target audience:     Compiler implementers, users

Abstract

The loop exit keywords break and continue in C only work with the most immediate enclosing loop or switch structure. This proposal introduces a named loop syntax consistent with other languages which allows breaking out of control structures outside of the immediately enclosing one, by passing the target loop of the jump by name.


Named loops

Reply-to:     Alex Celeste (aceleste@perforce.com)
Document No:  N3195
Revises:      (n/a)
Date:         2023-11-30

Summary of Changes

N3195

Introduction

A facility for breaking out of outer loops that enclose an immediately-nested one is a frequently-requested feature for C. This feature is found in other languages and is a handy way to make clear where the intended result of a loop exiting jump is actually intended to be.

Much of the rationale for the feature itself, the ability to break out of more than one nested control structure while sticking to descriptive, structured jump statements rather than falling back to goto, was already discussed in N2859 by Steenberg. It is widely accepted that the need to be able to do this is widely needed - any example with two nested loops can easily be constructed - but is not necessarily easy to express in a way that is pleasing to the C user (wither reader or writer), and most of the options significantly reduce the readability of the loop over simply

for (int i = 0; i < IK; ++ i) {
  for (int j = 0; j < JK; ++ j) {
    
    if (cond)
      just_give_up; // <- this is difficult to make elegant
    
  }
}

Steenberg proposaed break break, which was rejected by the Committee on multiple different grounds (it has no precedent; it is context-dependent; it is subjectively ugly to most of the reviewers; it is harder to read; etc.). In the past, previous proposals have also suggested passing a constant numeric parameter to break or continue, which suffers from essentially all of the same problems - it has no precedent, and is completely context-dependent, which means it cannot reliably be used in combination with macros because intervening liines of code can arbitrarily change the jump target.

The problem with a depth operand

break break; is essentially the same proposal as break N;, but encoded in the repetition of the break keyword rather than as an expression operand. The core problem of the two proposals is therefore identical:

for (int i = 0; i < IK; ++ i) {
  for (int j = 0; j < JK; ++ j) {
    
    if (cond)
      break 2; // clear enough for now
      
    // (several pages of in and out)
    if (cond)
      break 3; // ...how much do you trust the indent level?
    // (several pages of in and out)
    
  }
}

The first problem is simply that because it doesn't name the target, there is immediate loss of clarity when the loop to be existed goes more than a couple of lines away. The compiler has no clear way to bind the break to the loop or switch, and will not be able to warn if the levels start to mismatch, which may not be found for some time if on a less-common path (this is likely error handling). The human reader will have a hard time working it out if a lot of mouse-wheeling up and down is needed, and it relies on clear formatting to be readable.

More insidiously, say we have a macro that abstracts block control (using the Claire's Device for illustration, may also be used for e.g. nogc or with or any number of other inventive schemes):

#if CONFIG_1
#define NEVER if (0)
#else
#define NEVER while (0)
#endif

for (int i = 0; i < IK; ++ i) {
  for (int j = 0; j < JK; ++ j) {
    
    // NEVER is equally correct either way...
    NEVER funclet: {
      some_work ();
      break 2;  // ...but this isn't!
    }
      
    if (cond)
      goto funclet;
    else
      goto junklet; // ...
    
  }
}

That break 2 depends entirely on exaclty one definition of the funclet.

More commonly control structure macros build on something like a BEFORE_AFTER block, which can be implemented with for loops:

// one?
#define M_BEFORE_AFTER(BEFORE, AFTER)  \
  for(int _b = (BEFORE, 1); _b; (AFTER, _b = 0))
  
// or two? (if BEFORE needs to admit a decl)
#define M_BEFORE_AFTER(BEFORE, AFTER)  \
  for (_Bool M_ONCEFOR_enter = 1; M_ONCEFOR_enter ; AFTER )  \
    for (BEFORE ; M_ONCEFOR_enter ; M_ONCEFOR_enter = 0)
      /* { body } */

The number of loops hidden here shouldn't matter to the user, because they aren't serving as loops, they are the assembly language opcodes for control structure metaprogramming and the effect here does not involve anything that the user interacts with as a loop. Ideally the user shouldn't have to either think to jump out of this block - which is probably being used as the basis for, say, an autorelease pool, or a monadic optional handler. They certainly don't want to need to know how many and which "opcodes" it uses to build its braced structure.

Even in "normal" code that doesn't use such macros, using an explicit or implicit depth argument still harms readability because of switch, which binds to a break but not to a continue, creating an inconsistency even when all levels of the control structures are controlled by builtin keywords. This is potentially confusing (and is an existing flaw in the language where every break or continue has an implicit 1).

Therefore a break 2 or break break breaks in combination with this technique.

Named operand

In contrast, if loops can have names:

outer_ij:
for (int i = 0; i < IK; ++ i) {
  for (int j = 0; j < JK; ++ j) {
    
    if (cond)
      break outer_ij; // this is fine
    
    NEVER funclet: {
      some_work ();
      break outer_ij; // this is also fine
    }
      
    autorelease_pool {
      break outer_ij; // this is fine too
    }
    
  }
}

All jumps clearly mark which loop they want to terminate (or potentially to continue, which has almost exactly the same concerns).

The existing asymmetry between loops and switch is also improved:

selector:
switch (n) {
  
  for (int i = 0; i < IK; ++ i) {
    break selector; // break the switch from a loop!
  }
  
}

loop:
for (int j = 0; j < JK; ++ j) {
  switch (n) {
   
    break loop; // break the loop from a switch!
    continue loop; // this was valid anyway, 
                   // but now it's symmetrical
  } 
}

Alternatives

There are three obvious alternatives using existing control structures: repetition, return, and goto.

Repetition

This example was given by Steenberg:

for (int i = 0; i < n; ++ i) {
  for (int j = 0; j < n; ++ j) {
    if (something (i, j))
      break;
  }
  
  if(j < n)
    break;
}

The compiler can handle this well enough, but this makes the human reader do a lot of work to understand the intent - jumping out of one loop and then checking again is not what this code actually means. It simply adds cognitive and maintenance load because the user wanted to avoid the next option:

goto

for (int i = 0; i < n; ++ i) {
  for (int j = 0; j < n; ++ j) {
    if (something (i, j))
      goto end;
  }
}
end:

This code is clear enough, but the goto is socially problematic. The social complaint is partially valid - although this use of goto is structured, nothing forces the user to put it at the end of the loop in this way. This is an unstructured jump that relies entirely on user discipline, whereas naming the loop target binds tightly to the control structures themselves and works to encourage clearer usage directly.

This pattern probably also has a high "surprise factor" when used to emulate continue.

return

There isn't much to say here except that return can obviously be used as a jump across any number of loops and other structures (though it cannot really imitate continue easily), which is a common enough pattern in C++:

auto const loop = [&] {
  for (int i = 0; i < n; ++ i) {
    for (int j = 0; j < n; ++ j) {
      
      if (cond)
        return;
        
    }
  }
};

loop ();

This is just about usable in a language with lambdas, but does not provide the full functionality. Rewriting C this way with fully-separated function bodies is probably the least readable option much of the time.

Prior Art

Named loops also have a distinct advantage of having substantial prior art across multiple other programming languages. C is not bound by any other language but to have a control feature behave in exactly the same way as precedent set by the wider Community is extremely good for readability and lowers the surprise factor. The idiom has been proven to work well in practice, and there is no good reason for C to diverge from a model the rest of the programming language meta-community seems to have found clearest.

For instance, in Rust:

#![allow(unreachable_code, unused_labels)]

fn main() {
    'outer: loop {
        println!("Entered the outer loop");

        'inner: loop {
            println!("Entered the inner loop");

            // This would break only the inner loop
            //break;

            // This breaks the outer loop
            break 'outer;
        }

        println!("This point will never be reached");
    }

    println!("Exited the outer loop");
}

In Javascript (see also ECMA-262 14.8, 14.9, 14.13):

let str = '';

loop1: for (let i = 0; i < 5; i++) {
  if (i === 1) {
    continue loop1;
  }
  str = str + i;
}

In Java (see also JLS 14.7, 14.15, 14.16):

search:
for (i = 0; i < arrayOfInts.length; i++) {
    for (j = 0; j < arrayOfInts[i].length;
         j++) {
        if (arrayOfInts[i][j] == searchfor) {
            foundIt = true;
            break search;
        }
    }
}

The proposed syntax matches all three of these languages exactly (modulo Rust's slightly different syntax for the label name itself). This is therefore almost certainly the least confusing and most user-friendly option to imitate.

Proposed wording

The proposed changes are based on the latest public draft of C23, which is N3096. Bolded text is new text when inlined into an existing sentence.

Labels

Add two new paragraphs to 6.8.1 "Labeled statements" in Semantics, after paragraph 4:

If statement is an iteration or selection statement, that statement is named by the label label. If statement is a labeled-statement, the statement named by this label is the same statement named within the nested labeled-statement, if any. footnote

footnote) a statement may therefore be named by more than one label.

EXAMPLE Only an iteration or selection statement may be named by a label, by appearing as its immediate syntactic operand:

loop:
for (int i = 0; i < IK; ++ i) { // this for is named by loop:
  ...
}

braces:
{        // no statement is named by braces:
  for (int i = 0; i < IK; ++ i) {
    ...
  }
}

(We do not list the kinds of iteration or selection statements here. If for some reason the list would change, this should change implicitly and automatically.)

Jumps

Modify 6.8.6 "Jump statements", Syntax, paragraph 1:

jump-statement:
goto identifier ;
continue identifieropt ;
break identifieropt ;
return expressionopt ;

continue

Add a new paragraph to 6.8.6.3 "The continue statement" in Constraints, after paragraph 1:

A continue statement with an identifier operand shall appear within a statement named by the label with the corresponding identifier.

Modify the first sentence of paragraph 2, removing the reference to "innermost":

A continue statement causes a jump to the loop-continuation portion of an enclosing iteration statement; that is, to the end of the loop body.

Add a new paragraph after paragraph 2:

If the continue statement has an identifier operand, the jump is to the loop-continuation of the iteration statement named by the label with the corresponding identifier. Otherwise, the jump is to the loop-continuation of the innermost enclosing iteration statement.

Add an example:

EXAMPLE In the following code, continue only jumps to the loop-continuation of the inner nested loop, whereas continue outer jumps to the loop-continuation of the outermost loop:

outer:
for (int i = 0; i < IK; ++ i) {
  for (int j = 0; j < JK; ++ j) {
    
    continue;       // jumps to CONT1
    
    continue outer; // jumps to CONT2
    
    // CONT1
  }
  // CONT2
}

break

Add a new paragraph to 6.8.6.3 "The break statement" in Constraints, after paragraph 1:

A break statement with an identifier operand shall appear within a statement named by the label with the corresponding identifier.

Modify paragraph 2, removing the reference to "innermost" and actually explaining how the loop is broken out from (!):

A break statement terminates execution of a switch or iteration statement**,** as if by jumping to a label immediately following it in the surrounding scope using goto.

Add a following paragraph:

Therefore the following statement are equivalent:

while (/* ... */) {        while (/* ... */) {
  break;                     goto there;
}                          }
// jumps here              there:

Add a following paragraph:

If the break statement has an identifier operand, the jump exits the switch or iteration statement named by the label with the corresponding identifier. Otherwise, the jump exits the innermost enclosing switch or iteration statement.

Add an example:

EXAMPLE In the following code, break only exits the switch, whereas break outer exits the enclosing loop as well:

outer:
for (int i = 0; i < IK; ++ i) {
  switch (i) {
    case 1:
      break;       // jumps to CONT1
    case 2:
      break outer; // jumps to CONT2
  }
  // CONT1
}
// CONT2

Questions for WG14

Does WG14 want to add named loops to C using the proposed syntax and wording?

References

C23 public draft
N2859
Named loops in Java
Named loops in Javascript
Named loops in Rust
Claire's Device
M_BEFORE_AFTER in CRFI-4
Java language specification
ECMAScript 2023