Document number: N4223

Ville Voutilainen
Mikael Kilpeläinen
Jonathan Wakely
2014-10-10

Response To: Let return {expr} Be Explicit

Abstract

We think N4074 needs to be shelved; it's doubtful whether it's an issue common and major enough to warrant a core language addition; it's more doubtful whether it's major enough to warrant a core language change. We do not know how much code such a change will break, and there are potential teachability problems with the proposal. There are seemingly palatable existing alternative solutions, and it's likely that there are more palatable alternative core language additions that do not change or confuse existing semantics.

Contents

Introduction

This paper is a revision of N4094 with kind permission from Howard Hinnant. It is in response to:

N4074: Let return {expr} Be Explicit

N4074 proposes (with the best of intentions) to ignore the explicit qualifier on conversions when the conversion would take place on a return statement and is surrounded with curly braces. For example:

struct Type2
{
    explicit Type2(int) {}
};

Type2
makeType2(int i)
{
    return {i}; // error now, ok with adoption of N4074.
}

Note that this code today results in a compile-time error, not because of the braces, but because of the explicit qualifier on the constructor taking an int. Remove that explicit qualifier as in the example below, and the code compiles today:

struct Type1
{
    explicit Type1(int) {}
};

Type1
makeType1(int i)
{
    return {i}; // ok since C++11, implicit conversion from int to Type1
}

The motivation for this change comes from the original of N4074, N4029:

I believe this is inconsistent because the named return type is just as "explicit" a type as a named local automatic variable type. Requiring the user to express the type is by definition redundant — there is no other type it could be.

This is falls into the (arguably most) hated category of C++ compiler diagnostics: "I know exactly what you meant. My error message even tells you exactly what you must type. But I will make you type it."

We can relate. Unnecessary typing is frustrating. However we believe it is possible that ignoring the explicit qualifier on type conversions, even if only restricted to return statements enclosed with curly braces, could turn compile-time errors into run-time errors. Possibly very subtle run-time errors (the worst kind).

How likely is it that such run-time errors might be introduced? <shrug> Such conjecture is always highly speculative. But in this instance, we can offer a realistic example where exactly this would happen. And it happens very close to our home: by ignoring the explicit qualifier of the constructor of a std-defined class type.

The Counter-Example

Bear with us while we set up the likelihood of this counter-example actually happening in the wild...

Today in <string> we have the following functions which turn a std::string into an arithmetic type:

int                stoi  (const string& str, size_t* idx = 0, int base = 10);
long               stol  (const string& str, size_t* idx = 0, int base = 10);
unsigned long      stoul (const string& str, size_t* idx = 0, int base = 10);
long long          stoll (const string& str, size_t* idx = 0, int base = 10);
unsigned long long stoull(const string& str, size_t* idx = 0, int base = 10);

float       stof (const string& str, size_t* idx = 0);
double      stod (const string& str, size_t* idx = 0);
long double stold(const string& str, size_t* idx = 0);

It seems quite reasonable that a programmer might say to himself:

I would like to write a function that turns a std::string into a std::chrono::duration.

Which duration? Well, let's keep it simple and say: any one of the six "predefined" durations: hours, minutes, seconds, milliseconds, microseconds, or nanoseconds can be parsed and returned as nanoseconds.

What would be the syntax for doing this? Well, it would be quite reasonable to follow the same syntax we already have for duration literals:

2h     // 2 hours
2min   // 2 minutes
2s     // 2 seconds
2ms    // 2 milliseconds
2us    // 2 microseconds
2ns    // 2 nanoseconds

And here is quite reasonable code for doing this:

std::chrono::nanoseconds
stons(std::string const& str)
{
    using namespace std::chrono;
    auto errno_save = errno;
    errno = 0;
    char* endptr;
    auto count = std::strtoll(str.data(), &endptr, 10);
    std::swap(errno, errno_save);
    if (errno_save != 0)
        throw std::runtime_error("error parsing duration");
    if (str.data() < endptr && endptr < str.data() + str.size())
    {
        switch (*endptr)
        {
        case 'h':
            return hours{count};
        case 'm':
            if (endptr + 1 < str.data() + str.size())
            {
                switch (endptr[1])
                {
                case 'i':
                    if (endptr + 2 < str.data() + str.size() && endptr[2] == 'n')
                        return {count};
                    break;
                case 's':
                    return milliseconds{count};
                }
            }
            break;
        case 's':
            return seconds{count};
        case 'u':
            if (endptr + 1 < str.data() + str.size() && endptr[1] == 's')
                return microseconds{count};
            break;
        case 'n':
            if (endptr + 1 < str.data() + str.size() && endptr[1] == 's')
                return nanoseconds{count};
            break;
        }
    }
    throw std::runtime_error("error parsing duration");
}

Here is a simplistic "are you breathing" test for this code:

int
main()
{
    using namespace std;
    using namespace std::chrono;
    cout << stons("2h").count() << "ns\n";
    cout << stons("2min").count() << "ns\n";
    cout << stons("2s").count() << "ns\n";
    cout << stons("2ms").count() << "ns\n";
    cout << stons("2us").count() << "ns\n";
    cout << stons("2ns").count() << "ns\n";
}

Ok, so what is our point?

There is a careless type-o / bug in stons. Do you see it yet? stons is not trivial. But neither is it horribly complicated. Nor is it contrived. Nor is the bug in it contrived. It is a type of bug that occurs commonly.

Today the bug in stons results in a compile-time error. This compile-time error was intended by the designers of the <chrono> library.

If we accept N4074, the bug in stons becomes a run-time error. Have you spotted it yet? Here is a clue. The output of the test (with N4074 implemented) is:

7200000000000ns
2ns
2000000000ns
2000000ns
2000ns
2ns

The error is this line that parses "min" after the count:

                        return {count};

which should read:

                        return minutes{count};

Once the error is corrected, the correct output of the test is:

7200000000000ns
120000000000ns
2000000000ns
2000000ns
2000ns
2ns

Does anyone actually use curly braces on return?

The expression return {x}; has only been legal since C++11. There have only been 3 (official) years for it to be adopted by programmers. Is there any evidence anyone is actually using this syntax?

Yes.

These were just the ones we were able to find. Searching for "return[\s]*{" is exceedingly challenging. But just the above results clearly demonstrate a use that is significantly greater than "vanishingly rare."

The tuple Counter-Example

What if the return isn't a scalar? For example is this just as unsafe?

return {5, 23};

If the return type is tuple<seconds, nanoseconds> (which we still do not want to construct implicitly from int), then this would definitely not be safe!

We never want to implicitly construct a duration from an int. Here is how we say that in C++:

template <class Rep2> constexpr explicit duration(const Rep2& r);

That is, we put explicit on the constructor (or on a conversion operator). To change the language to ignore explicit in some places is a dangerous direction.

All that being said, if the constructions are implicit then they are safe. For example:

tuple<seconds, nanoseconds>
test1()
{
    return {3, 4};  // unsafe, 3 does not implicitly convert to seconds
                    //         4 does not implicitly convert to nanoseconds
}

tuple<seconds, nanoseconds>
test2()
{
    return {3h, 4ms};  // safe, 3h does implicitly convert to seconds
                       //       4ms does implicitly convert to nanoseconds
}

tuple<int, float, string>
test3()
{
    return {3, 4.5, "text"};  // safe, 3 does implicitly convert to int
                              //       4.5 does implicitly convert to float
                              //       "text" does implicitly convert to string
}

None of the three examples compile today according to C++14 because the std::tuple constructor invoked is marked explicit. However test2 and test3 are both perfectly safe. It is unambiguous what is intended in these latter two cases. And they would indeed compile if the tuple constructor invoked was not marked explicit.

N3739 is a proposal which makes test2 and test3 legal, while keeping test1 a compile-time error. We fully support this proposal. Indeed Howard implemented this proposal in libc++ years ago, for precisely the same reasons that N4074 is now proposed: To ease the syntax burden on the programmer. But unlike N4074, N3739 never disregards the explicit keyword that the programmer has qualified one of his constructors with.

Another Example

Perhaps there is some defective quality about chrono::duration which is causing the problem. Is chrono::duration the only problematic example?

Consider a class OperatesOnData.

class OperatesOnData
{
public:
    OperatesOnData(VerifiedData);  // knows that the Data has been verified
    // Use this constructor ONLY with verified data!
    explicit OperatesOnData(Data); // trusts that the Data has been verified
};

Data that comes from untrusted sources must first be verified to make sure it is in the proper form, all invariants are met, etc. This unverified Data is verified with the type VerifiedData. This VerifiedData can then be safely sent to OperatesOnData with no further verification.

However verifying the data is expensive. Sometimes Data comes from a trusted source. So the author of OperatesOnData has set up an explicit constructor that takes Data. Clients know that OperatesOnData can't verify Data all the time because of performance concerns. So when they know that their Data need not be verified, they can use an explicit conversion to OperatesOnData to communicate that fact.

Is this the best design in the world? That's beside the point. This is a reasonable design one might find coded in C++.

Now we need to get the Data and start working on it:

VerifiedData getData();

OperatesOnData
startOperation()
{
    return {getData()};
}

So far so good. This is valid and working C++14 code.

Three years go by, and the programmers responsible for this code change. C++17 decides that making return {x}; bind to explicit conversions is a good idea.

Now, because of some changes on the other side of a million line program, it is now too expensive for getData() to do the verification as it has acquired some new clients that must do verification anyway. So getData() changes to:

Data getData();

Bam. Unverified data gets silently sent to OperatesOnData. The original code was designed precisely to catch this very error at compile time.

Data getData();

OperatesOnData
startOperation()
{
    return {getData()};
}

test.cpp:23:12: error: chosen constructor is explicit in copy-initialization
    return {getData()};
           ^~~~~~~~~~~
test.cpp:14:14: note: constructor declared here
    explicit OperatesOnData(Data); // trusts that the Data has been verified
             ^
1 error generated.

But because the committee has redefined what return {getData()} means between C++11 and C++17 (a period of 6 years), a serious run time error (perhaps a security breach) has been created.

This example highlights the fact that the return type of an expression is not always clear at the point of the return statement, nor in direct control of the function author. When what is being returned is the result of some other function, declared and defined far away, there is potential for error. If explicit conversions are implicitly allowed between these two independently written pieces of code, the potential for error increases greatly:

shared_ptr<BigGraphNode>
makeBigGraphNode()
{
    return {allocateBigGraphNode()};
}

What type does allocateBigGraphNode() return? Today, if this compiles, we are assured that it is some type that is implicitly convertible to shared_ptr<BigGraphNode>. Implicit conversions are conversions that the author of shared_ptr deemed safe enough that the conversion need not be explicitly called out at the point of use.

Explicit conversions on the other hand are not as safe as implicit conversions. And for whatever reasons, the author of shared_ptr deemed the conversions of some types to shared_ptr sufficiently dangerous that they should be called out explicitly when used.

We ask again: What type does allocateBigGraphNode() return?

The answer today is that we don't know without further investigation, but it is some type that implicitly converts to shared_ptr<BigGraphNode>. And we think that is a much better answer than "some type that will convert to shared_ptr<BigGraphNode> explicitly or implicitly." This latter set includes BigGraphNode*. If the above code is written today, and the meaning of that return statement changes tomorrow. And the next day the return type of allocateBigGraphNode() changes, there is a very real potential for turning compile-time errors into run-time errors.

The above potential for error is exactly the same as it is for this code:

void acceptBigGraphNode(shared_ptr<BigGraphNode>);
// ...
acceptBigGraphNode(allocateBigGraphNode());

If performing explicit conversions is dangerous in one of the above two examples, then it is dangerous in both. The presence of curly braces in the first example makes zero difference, as illustrated by the stons example. Indeed, the presence of curly braces is today legal in the second example:

acceptBigGraphNode({allocateBigGraphNode()});

but only if the conversion is implicit.

More about smart pointers

In N4029, Sutter responds to the concerns about implicit ownership transfer by saying "it's okay", and continues

I believe return p; is reasonable "what else could I possibly have meant" code, with the most developer-infuriating kind of diagnostic (being able to spell out exactly what the developer has to redundantly write to make the code compile, yet forcing the developer to type it out), and I think that it's unreasonable that I be forced to repeat std::unique_ptr<T>{} here.

It's not at all clear that it's "obvious" what the programmer meant. It's NOT about knowing the return type. It's about knowing the source type and its semantics. As it happens, with a raw pointer, we do not know its ownership semantics, because it doesn't have any. We do not know where it came from - it may be a result of a malloc inside a legacy API. It may be a result of a strdup (same difference, really). It might not be a pointer to a resource dynamically-allocated with a default_delete-compatible allocator.

The implementation diagnostic is not saying "write the following boiler-plate". It's saying "stop and think". It's specifically saying "you're about to cross the border between no ownership semantics and strict ownership semantics, please make sure you're doing what you think you're doing".

It also seems questionable that we would break the design expectations of smart pointers well older than a decade just to make interfacing with legacy raw pointers easier. Modern C++ code uses make_unique and make_shared, not plain new. And such modern C++ code uses smart pointers as return types for factory functions that create the resources, not raw pointers. Even worse, this proposal means that the most convenient (since shortest) way to create a unique_ptr<T> from arguments of T is not

return make_unique<T>(args);

but rather

return {new T{args}};

The Finnish experts pull no punches when they say such a thing is the wrong evolution path to take. They have voiced vocal astonishment to the proposed core language change just to support use cases that should be or at the very least should become rare and legacy, especially when that change breaks design expectations of tons of libraries.

Choosing braces to mean "non-explicit" is illogical

Returning a braced-list has a meaning today. Consider:

char f() {int x = 0; return {x};} // error: narrowing conversion of 'x' from 'int' to 'char' inside { }

People learn that in order to avoid potentially-corrupting narrowing conversions between built-in types, they can use braces. It seems very illogical that for library types with explicit conversions, braces suddenly bypass similar checking for potentially-corrupting cases!

Breaking existing code

The proposal does break existing code:


#include <iostream>

struct A {
        explicit operator int() const {
                 std::cout << "meow" << std::endl; return 0;
        }
        operator short() const {
                 std::cout << "wuff" << std::endl; return 0;
        }
};

int f() 
{
        A a;
        return {a};
}

int main()
{
	f();
}

This code currently prints "wuff". With the proposal, it would print "meow". Is that a significant problem? We don't know.

Are there still cases where the return type must be repeated anyway?

Consider the following code:


struct B {};
     
struct A {
	operator B() const {return B{};}
};
     
B f() 
{
	A a;
	return {a}; // clang and gcc reject
	//return B{a}; // clang and gcc still reject
	//return B(a); // clang and gcc accept
	//return a; // clang and gcc accept
}
     
int main()
{
	f();
}

It would seem that a return of a braced-init-list would not work for such code. Does the proposal cover such cases, too?

Alternative solutions

There are a couple of ways to alleviate the burden of having to repeat a return type that have already been suggested:

Summary

For us, these examples stress the point that when a type author writes "explicit", we should never ignore it. Not on a return statement. Not anywhere. The type author knows best. He had the option to not make his conversion explicit, and he felt strongly enough about the decision to opt-in to an explicit conversion.

We recall that when <chrono> was going through the standardization process, the entire committee felt very strongly that int should never implicitly convert to a duration.

If we want more conversions to be implicit, then we should not mark those conversions with the explicit qualifier. This is a far superior solution compared to marking a conversion explicit, and then treating the conversion as implicit in a few places. And this is the direction of N3739. Instead of changing the rules for every type, N3739 removes the explicit qualifier from specific types where it is undesirable to have it, while keeping the explicit qualifier in places that respect the type author's wishes.

All of the chrono, tuple and smart pointer examples are practical cases of a general conceptual case: a conversion from a type with no semantics to a type with strict semantics. Bypassing the explicitness of such conversions is exceedingly dangerous, especially when it happens after-the-fact that the designers of those library types chose to mark such conversions explicit. The proposed change is moving the earth under the library designers' feet.