P2733R3
Fix handling of empty specifiers in std::format

Published Proposal,

Authors:
Audience:
LEWG
Project:
ISO/IEC JTC1/SC22/WG21 14882: Programming Language — C++

1. Introduction

This paper fixes a number of issues in range and tuple formatting related to handling of empty specifiers for element types and clarifies that empty and not present format specifiers are handled equivalently. Originally it also amended the proposed resolution of [LWG3776] to allow omitting calls to formatter::parse for empty specifiers per LEWG feedback but this part was removed due to new LEWG feedback.

2. Changes from R2

3. Changes from R1

4. Changes from R0

5. Proposal

[LWG3776] "Avoid parsing format-spec if it is not present or empty" proposed omitting the call to formatter::parse for empty format specifiers (format-spec in [format.string.general] of [N4917]).

Consider the following example:

struct S {};

template <>
struct std::formatter<S> {
  constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
  auto format(S, format_context& ctx) const { return ctx.out(); }
};

int main() {
  auto s1 = std::format("{}", S());  // (1) no format-spec
  auto s2 = std::format("{:}", S()); // (2) empty format-spec
}

In (1) format-spec is not present and in (2) it is present but empty. There is nothing to parse in both of these cases and therefore requiring implementations to call formatter::parse doesn’t make a lot of sense. It only adds unnecessary overhead for the common case which is what [LWG3776] was proposing to eliminate. Implementation experience in {fmt} showed that requiring the call to parse has negative impact on formatting of ranges where we had to unnecessarily call this function from multiple places. The same issue may exist in other contexts such as format string compilation. In the tuple case there aren’t even nested format specifiers to call the underlying parse on.

Additionally [LWG3776] made a drive-by fix, clarifying that the two cases are equivalent which was not obvious from existing wording. This is arguably even more important than omitting parse, particularly because formatting of ranges ([P2286]) doesn’t allow distinguishing between the two forms for nested specifiers, e.g.

auto s = std::format("{::}", std::vector<S>(2));
//                       ^ empty format-spec for S

Having the two cases equivalent is also more intuitive and consistent with all existing standard formatters.

Library Evolution Working Group (LEWG) reviewed [LWG3776] in Kona and approved it with the amendment that implementations are allowed but not required to omit the call to formatter::parse for empty format-spec.

Barry Revzin pointed out an existing limitation of the formatting ranges design that requires calling set_debug_format from the parse function. However, as discovered by Mark de Wever while implementing ranges formatting in libc++, the formatter specialization for tuples already omits the call to parse for the underlying type so we need to fix this anyway. The following example illustrates the fix:

auto s = fmt::format("{}", std::make_tuple(std::make_tuple('a')));
Before After
s == ((a)) s == (('a'))

Alternative resolutions for the nested range/tuple formatting bug are:

The table below compares alternative solutions with the earlier version (R1) of the current proposal denoted as S0:

char {} a a a a
char {:?} 'a' 'a' 'a' 'a'
vector<char> {} ['a'] ['a'] ['a'] ['a']
vector<char> {::} [a] [a] ['a'] [a]
vector<char> {::c} [a] [a] [a] [a]
vector<char> {::?} ['a'] ['a'] ['a'] ['a']
map<char, char> {} {a: a} {'a': 'a'} {'a': 'a'} {'a': 'a'}
set<char> {} {'a'} {'a'} {'a'} {'a'}
set<char> {::} {a} {a} {'a'} {a}
set<char> {::c} {a} {a} {a} {a}
set<char> {::?} {'a'} {'a'} {'a'} {'a'}
tuple<char> {} ('a') ('a') ('a') ('a')
vector<vector<char>> {} [[a]] [['a']] [['a']] [['a']]
vector<vector<char>> {::} [['a']] [['a']] [['a']] [['a']]
vector<vector<char>> {:::} [[a]] [[a]] [['a']] [[a]]
vector<vector<char>> {:::c} [[a]] [[a]] [[a]] [[a]]
vector<vector<char>> {:::?} [['a']] [['a']] [['a']] [['a']]
vector<tuple<char>> {} [(a)] [('a')] [('a')] [('a')]
tuple<tuple<char>> {} ((a)) (('a')) (('a')) (('a'))
tuple<vector<char>> {} ([a]) (['a']) (['a']) (['a'])

S1 and S2 are inconsistent with the resolution of [LWG3776] earlier approved by LEWG and were not originally proposed. However, after LEWG reversed its two earlier decisions and LWG found issues with implementability of S0, S1 and S2 are the only viable options and the other ones are only included for reference. We propose making range and tuple formatters provide set_debug_format (option S2) since they have a debug representation and it is compatible with always calling parse and future optimizations that may omit redundant calls to parse (not proposed in this paper).

6. LEWG Poll Results

POLL: Relax the requirements table 74 and 75 to make the optimization allowed by the issue resolution of LWG3776 a QoI issue with additional changes to the handle class removed

SF F N A SA
1 9 2 1 1

Outcome: consensus in favour

POLL: Adopt the amended proposed resolution of LWG3776 "Avoid parsing format-spec if it is not present or empty". Return the issue to LWG for C++23 (to be confirmed by electronic polling)

SF F N A SA
2 6 1 2 1

Outcome: weak consensus in favour

7. Wording

This wording is relative to [N4917].

Modify [format.string.general] as indicated:

-1- ...

format-specifier:
  : format-spec

format-spec:
  as specified by the formatter specialization for the argument type ; cannot start with }

Modify 22.14.6.1 [formatter.requirements] as indicated:

-3- Given character type charT, output iterator type Out, and formatting argument type T, in Table 74 and Table 75:

...

pc.begin() points to the beginning of the format-spec (22.14.2 [format.string]) of the replacement field being formatted in the format string. If format-spec is not present or empty then either pc.begin() == pc.end() or *pc.begin() == '}'.

In [format.range.formatter]

namespace std {
  template<class T, class charT = char>
    requires same_as<remove_cvref_t<T>, T> && formattable<T, charT>
  class range_formatter {
    ...
    constexpr const formatter<T, charT>& underlying() const { return underlying_; }

    constexpr void set_debug_format();

    template<class ParseContext>
      constexpr typename ParseContext::iterator
        parse(ParseContext& ctx);
  };
}

...

constexpr void set_brackets(basic_string_view<charT> opening, basic_string_view<charT> closing);
Effects: Equivalent to:
opening-bracket_ = opening;
closing-bracket_ = closing;
constexpr void set_debug_format();
Effects: Calls underlying_.set_debug_format() if it is a valid expression.

In [format.range.fmtstr]:

namespace std {
  template<range_format K, ranges::input_­range R, class charT>
    requires (K == range_format::string || K == range_format::debug_string)
  struct range-default-formatter<K, R, charT> {

    ...

  public:
    constexpr void set_debug_format();
    
    template<class ParseContext>
      constexpr typename ParseContext::iterator
        parse(ParseContext& ctx);

    ...
  };
}
constexpr void set_debug_format();
Effects: Calls underlying_.set_debug_format() if it is a valid expression and K == range_format::debug_string.
template<class ParseContext>
  constexpr typename ParseContext::iterator
    parse(ParseContext& ctx);

In [format.tuple]:

-1- For each of pair and tuple, the library provides the following formatter specialization where pair-or-tuple is the name of the template:

namespace std {
  template<class charT, formattable<charT>... Ts>
  struct formatter<pair-or-tuple<Ts...>, charT> {

  ...

  constexpr void set_brackets(basic_string_view<charT> opening,
                              basic_string_view<charT> closing);

  constexpr void set_debug_format();
                              
  template<class ParseContext>
    constexpr typename ParseContext::iterator
      parse(ParseContext& ctx);
  };
}

...

constexpr void set_brackets(basic_string_view<charT> opening, basic_string_view<charT> closing);

-6- Effects: Equivalent to:

opening-bracket_ = opening;
closing-bracket_ = closing;
constexpr void set_debug_format();
Effects: For each element e in underlying_, calls e.set_debug_format() if it is a valid expression.

8. Acknowledgements

Thanks to Barry Revzin and Mark de Wever for pointing out issues with debug formatting of ranges and tuples.

References

Informative References

[LWG3776]
Mark de Wever. Avoid parsing format-spec if it is not present or empty. URL: https://cplusplus.github.io/LWG/issue3776
[N4917]
Thomas Köppe; et al. Working Draft, Standard for Programming Language C++. URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/n4917.pdf
[P2286]
Barry Revzin. Formatting Ranges. URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2286r8.html