This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of New status.

3606. Missing regex_traits::locale_type requirements

Section: 32.2 [re.req] Status: New Submitter: Jonathan Wakely Opened: 2021-09-28 Last modified: 2021-10-14

Priority: 3

View other active issues in [re.req].

View all other issues in [re.req].

View all issues with New status.

Discussion:

Why is locale_type part of the regular expression traits requirements in 32.2 [re.req]? When would locale_type not be std::locale? What are the requirements on the type? Does it have to provide exactly the same interface as std::locale, or just some unspecified interface that a custom regex traits type needs from it? Why is none of this specified?

Currently the only requirement on locale_type in the standard is that it's copy constructible. Clearly it needs to be default constructible as well, otherwise you can't construct a basic_regex, since none of them allows passing in a locale, so they have to default construct it (see also LWG 2431).

The other requirements on locale_type are a mystery. Why do we allow custom locale types, but not say anything about what they should do? Can we just require locale_type to be std::locale? Is anybody really going to use boost::locale with std::basic_regex, when they could just use boost::basic_regex instead?

Why does the regular expression traits requirements table say that imbue and getloc talk about the locale used, "if any". How would there not be one already?

Why is imbuing a locale into a basic_regex a separate operation from compiling the regular expression pattern? Is the following supposed to change the compiled regex?

std::regex r("[a-z]");
r.imbue(std::locale("en_GB.UTF-8"));

Hasn't the regex constructor already made use of the locale to compile the "[a-z]" pattern, and so changing the locale is too late? So do we need to do the following to compile the regex with a specific locale?

std::regex r;
r.imbue(std::locale("en_GB.UTF-8"));
r.assign("[a-z]");

Why require two-stage initialization like this, is it just so that we appear consistent with the imbue/getloc API of std::ios_base? It works for ios_base, because the new locale is effective after imbuing it, but for basic_regex the pattern has already been compiled using the old locale and imbuing a new one can't change that. Is the basic_regex supposed to store the pattern and recompile it after imbue, or is this just an inappropriate API for basic_regex?

[2021-10-14; Reflector poll]

Set priority to 3 after reflector poll.

Proposed resolution: