Enhancement of regex

Document Number	P1844R1
Date	2019-11-22
Audience	LEWGI
Intended Ship Vehicle	C++23
Reply-To	Nozomu Katō
Revises	P1844R0

VI. Technical Specifications

1. <regex>

The following changes are proposed:

30.1 General [re.general]

2 The following subclauses describe a basic regular expression class template and its traits that can handle char-like (21.1) template arguments, ~~two~~five specializations of this class template that handle sequences of char ~~and~~, wchar_t, char8_t, char16_t, and char32_t, a class template that holds the result of a regular expression match, a series of algorithms that allow a character sequence to be operated upon by a regular expression, three specializations of this series that handle sequences of char8_t, char16_t, and char32_t, and two iterator types for enumerating regular expression matches, as summarized in Table 122.
30.4 Header <regex> synopsis [re.syn]

// 30.8, class template basic_regex

template<class charT, class traits = regex_traits<charT>> class basic_regex;

using regex = basic_regex<char>;
using wregex = basic_regex<wchar_t>;
using u8regex = basic_regex<char8_t>;
using u16regex = basic_regex<char16_t>;
using u32regex = basic_regex<char32_t>;

// 30.9, class template sub_match

template<class BidirectionalIterator>
  class sub_match;

using csub_match = sub_match<const char*>;
using wcsub_match = sub_match<const wchar_t*>;
using u8csub_match = sub_match<const char8_t*>;
using u16csub_match = sub_match<const char16_t*>;
using u32csub_match = sub_match<const char32_t*>;
using ssub_match = sub_match<string::const_iterator>;
using wssub_match = sub_match<wstring::const_iterator>;
using u8ssub_match = sub_match<u8string::const_iterator>;
using u16ssub_match = sub_match<u16string::const_iterator>;
using u32ssub_match = sub_match<u32string::const_iterator>;

// 30.10, class template match_results

template<class BidirectionalIterator,
         class Allocator = allocator<sub_match<BidirectionalIterator>>>
  class match_results;

using cmatch = match_results<const char*>;
using wcmatch = match_results<const wchar_t*>;
using u8cmatch = match_results<const u8char_t*>;
using u16cmatch = match_results<const u16char_t*>;
using u32cmatch = match_results<const u32char_t*>;
using smatch = match_results<string::const_iterator>;
using wsmatch = match_results<wstring::const_iterator>;
using u8smatch = match_results<u8string::const_iterator>;
using u16smatch = match_results<u16string::const_iterator>;
using u32smatch = match_results<u32string::const_iterator>;

// 30.11.3, function template regex_search

template<class BidirectionalIterator, class Allocator, class charT, class traits>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class BidirectionalIterator, class charT, class traits>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class charT, class Allocator, class traits>
  bool regex_search(const charT* str,
                    match_results<const charT*, Allocator>& m,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class charT, class traits>
  bool regex_search(const charT* str,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class ST, class SA, class charT, class traits>
  bool regex_search(const basic_string<charT, ST, SA>& s,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class ST, class SA, class Allocator, class charT, class traits>
  bool regex_search(const basic_string<charT, ST, SA>& s,
                    match_results<typename basic_string<charT, ST, SA>::const_iterator,
                                  Allocator>& m,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class ST, class SA, class Allocator, class charT, class traits>
  bool regex_search(const basic_string<charT, ST, SA>&&,
                    match_results<typename basic_string<charT, ST, SA>::const_iterator,
                                  Allocator>&,
                    const basic_regex<charT, traits>&,
                    regex_constants::match_flag_type
                      = regex_constants::match_default) = delete;
template<class BidirectionalIterator, class Allocator, class charT, class traits>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class BidirectionalIterator, class charT, class traits>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char8_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char8_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char16_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char16_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char32_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);
template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char32_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

// 30.12.1, class template regex_iterator

template<class BidirectionalIterator,
         class charT = typename iterator_traits<BidirectionalIterator>::value_type,
         class traits = regex_traits<charT>>
  class regex_iterator;

using cregex_iterator = regex_iterator<const char*>;
using wcregex_iterator = regex_iterator<const wchar_t*>;
using u8cregex_iterator = regex_iterator<const char8_t*>;
using u16cregex_iterator = regex_iterator<const char16_t*>;
using u32cregex_iterator = regex_iterator<const char32_t*>;
using sregex_iterator = regex_iterator<string::const_iterator>;
using wsregex_iterator = regex_iterator<wstring::const_iterator>;
using u8sregex_iterator = regex_iterator<u8string::const_iterator>;
using u16sregex_iterator = regex_iterator<u16string::const_iterator>;
using u32sregex_iterator = regex_iterator<u32string::const_iterator>;

// 30.12.2, class template regex_token_iterator

template<class BidirectionalIterator,
         class charT = typename iterator_traits<BidirectionalIterator>::value_type,
         class traits = regex_traits<charT>>
  class regex_token_iterator;

using cregex_token_iterator = regex_token_iterator<const char*>;
using wcregex_token_iterator = regex_token_iterator<const wchar_t*>;
using u8cregex_token_iterator = regex_token_iterator<const char8_t*>;
using u16cregex_token_iterator = regex_token_iterator<const char16_t*>;
using u32cregex_token_iterator = regex_token_iterator<const char32_t*>;
using sregex_token_iterator = regex_token_iterator<string::const_iterator>;
using wsregex_token_iterator = regex_token_iterator<wstring::const_iterator>;
using u8sregex_token_iterator = regex_token_iterator<u8string::const_iterator>;
using u16sregex_token_iterator = regex_token_iterator<u16string::const_iterator>;
using u32sregex_token_iterator = regex_token_iterator<u32string::const_iterator>;

namespace pmr {
  template<class BidirectionalIterator>
    using match_results =
      std::match_results<BidirectionalIterator,
                         polymorphic_allocator<sub_match<BidirectionalIterator>>>;

  using cmatch = match_results<const char*>;
  using wcmatch = match_results<const wchar_t*>;
  using u8cmatch = match_results<const char8_t*>;
  using u16cmatch = match_results<const char16_t*>;
  using u32cmatch = match_results<const char32_t*>;
  using smatch = match_results<string::const_iterator>;
  using wsmatch = match_results<wstring::const_iterator>;
  using u8smatch = match_results<u8string::const_iterator>;
  using u16smatch = match_results<u16string::const_iterator>;
  using u32smatch = match_results<u32string::const_iterator>;
}

30.5.1 Bitmask type syntax_option_type [re.synopt]

namespace std::regex_constants {
  using syntax_option_type = T1;
  inline constexpr syntax_option_type icase = unspecified ;
  inline constexpr syntax_option_type nosubs = unspecified ;
  inline constexpr syntax_option_type optimize = unspecified ;
  inline constexpr syntax_option_type collate = unspecified ;
  inline constexpr syntax_option_type ECMAScript = unspecified ;
  inline constexpr syntax_option_type basic = unspecified ;
  inline constexpr syntax_option_type extended = unspecified ;
  inline constexpr syntax_option_type awk = unspecified ;
  inline constexpr syntax_option_type grep = unspecified ;
  inline constexpr syntax_option_type egrep = unspecified ;
  inline constexpr syntax_option_type multiline = unspecified ;
  inline constexpr syntax_option_type ECMAScript2019 = unspecified ;
  inline constexpr syntax_option_type dotall = unspecified ;
}

1 The type syntax_option_type is an implementation-defined bitmask type (16.4.2.2.4). Setting its elements has the effects listed in Table 124. A valid value of type syntax_option_type shall have at most one of the grammar elements ECMAScript, basic, extended, awk, grep, egrep, ECMAScript2019, set. If no grammar element is set, the default grammar is ECMAScript2019 when a value of type syntax_option_type is passed to an instance of one of the specializations basic_regex<char8_t>, basic_regex<char16_t>, and basic_regex<char32_t>; otherwise ECMAScript.

...

Table 124: `syntax_option_type` effects [tab:re.synopt]
Element	Effect(s) if set
`icase`	Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case.
`nosubs`	Specifies that no sub-expressions shall be considered to be marked, so that when a regular expression is matched against a character container sequence, no sub-expression matches shall be stored in the supplied `match_results` object.
`optimize`	Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output.
`collate`	Specifies that character ranges of the form "[a-b]" shall be locale sensitive. This flag has no effect when the `ECMAScript2019` engine is selected.
`ECMAScript`	Specifies that the grammar recognized by the regular expression engine shall be that used by ECMAScript in ECMA-262 third edition, as modified in 30.13. See also: ECMA-262 third edition 15.10 If this flag is passed to an instance of `basic_regex<char8_t>`, `basic_regex<char16_t>`, or `basic_regex<char32_t>`, it shall be interpreted as if no grammar element is set.
`basic`	Specifies that the grammar recognized by the regular expression engine shall be that used by basic regular expressions in POSIX. See also: POSIX, Base Definitions and Headers, Section 9.3 If this flag is passed to an instance of `basic_regex<char8_t>`, `basic_regex<char16_t>`, or `basic_regex<char32_t>`, it shall be interpreted as if no grammar element is set.
`extended`	Specifies that the grammar recognized by the regular expression engine shall be that used by extended regular expressions in POSIX. See also: POSIX, Base Definitions and Headers, Section 9.4 If this flag is passed to an instance of `basic_regex<char8_t>`, `basic_regex<char16_t>`, or `basic_regex<char32_t>`, it shall be interpreted as if no grammar element is set.
`awk`	Specifies that the grammar recognized by the regular expression engine shall be that used by the utility awk in POSIX. If this flag is passed to an instance of `basic_regex<char8_t>`, `basic_regex<char16_t>`, or `basic_regex<char32_t>`, it shall be interpreted as if no grammar element is set.
`grep`	Specifies that the grammar recognized by the regular expression engine shall be that used by the utility grep in POSIX. If this flag is passed to an instance of `basic_regex<char8_t>`, `basic_regex<char16_t>`, or `basic_regex<char32_t>`, it shall be interpreted as if no grammar element is set.
`egrep`	Specifies that the grammar recognized by the regular expression engine shall be that used by the utility grep when given the -E option in POSIX. If this flag is passed to an instance of `basic_regex<char8_t>`, `basic_regex<char16_t>`, or `basic_regex<char32_t>`, it shall be interpreted as if no grammar element is set.
`multiline`	Specifies that `^` shall match the beginning of a line and `$` shall match the end of a line, if the `ECMAScript` or `ECMAScript2019` engine is selected.
`ECMAScript2019`	Specifies that the grammar recognized by the regular expression engine and the behavior of an algorithm that uses an instance of `basic_regex` constructed with this flag shall be those used and performed by ECMAScript in ECMA-262 2019 or later with the `u` flag being set, as modified in 30.14. See also: ECMA-262 2019 21.2 If this flag is passed to an instance of `basic_regex` other than `basic_regex<char8_t>`, `basic_regex<char16_t>`, and `basic_regex<char32_t>`, it shall be interpreted as if no grammar element is set.
`dotall`	Specifies that `.` shall match any code point including new-line characters, if the `ECMAScript2019` engine is selected.

30.5.2 Bitmask type `match_flag_type` [re.matchflag]

Table 125: `regex_constants::match_flag_type` effects when obtaining a match against a character container sequence `[first, last)`. [tab:re.matchflag]
Element	Effect(s) if set
`...`	...
`format_default`	When a regular expression match is to be replaced by a new string, the new string shall be constructed using the rules used by the ECMAScript replace function in ECMA-262 third edition, part 15.5.4.11 String.prototype.replace. In addition, during search and replace operations all non-overlapping occurrences of the regular expression shall be located and replaced, and sections of the input that did not match the expression shall be copied unchanged to the output string.

30.8 Class template basic_regex [re.regex]

30.8.7 basic_regex specializations [re.regex.special]

1 The header <regex> defines three specializations of the class template basic_regex: basic_regex<char8_t>, basic_regex<char16_t>, and basic_regex<char32_t>.

2 [Note: These specializations are not required necessarily to be implemented separately; typical implementations will use an internal iterator class template that has specializations for char8_t, char16_t, and char32_t to translate an input sequence of UTF-8, UTF-16, and UTF-32 respectively to a sequence of Unicode code points, and construct a finite state machine by parsing that translated sequence in a base class shared by these three specializations. —end note]

3 These specializations shall not use regex_traits to construct a internal finite state machine. [Note: Particularly, case folding, translating a character prior to comparison without regard to case, shall be performed as defined in ECMA-262 2019 or later, and shall not be performed as defined in traits::translate_nocase(c). —end note]

30.8.7.1 class basic_regex<char8_t> specializations [re.regex.special.char8_t]

namespace std {
  template<>
    class basic_regex<char8_t> {
    public:

          // types

      using value_type = char8_t;
      using traits_type = void;
      using string_type = basic_string<char8_t>;
      using flag_type = regex_constants::syntax_option_type;
      using locale_type = locale;

          // 30.5.1, constants

      static constexpr flag_type icase = regex_constants::icase;
      static constexpr flag_type nosubs = regex_constants::nosubs;
      static constexpr flag_type optimize = regex_constants::optimize;
      static constexpr flag_type multiline = regex_constants::multiline;
      static constexpr flag_type ECMAScript2019 = regex_constants::ECMAScript2019;
      static constexpr flag_type dotall = regex_constants::dotall;

          // 30.8.7.1.1, construct/copy/destroy

      basic_regex();
      explicit basic_regex(const char8_t* p, flag_type f = regex_constants::ECMAScript2019);
      basic_regex(const char8_t* p, size_t len, flag_type f = regex_constants::ECMAScript2019);
      basic_regex(const basic_regex&);
      basic_regex(basic_regex&&) noexcept;
      template<class ST, class SA>
        explicit basic_regex(const basic_string<char8_t, ST, SA>& p,
                             flag_type f = regex_constants::ECMAScript2019);
      template<class ForwardIterator>
        basic_regex(ForwardIterator first, ForwardIterator last,
                    flag_type f = regex_constants::ECMAScript2019);
      basic_regex(initializer_list<char8_t>, flag_type = regex_constants::ECMAScript2019);

      ~basic_regex();

      basic_regex& operator=(const basic_regex&);
      basic_regex& operator=(basic_regex&&) noexcept;
      basic_regex& operator=(const char8_t* ptr);
      basic_regex& operator=(initializer_list<char8_t> il);
      template<class ST, class SA>
        basic_regex& operator=(const basic_string<char8_t, ST, SA>& p);

          // 30.8.7.1.2, assign

      basic_regex& assign(const basic_regex& that);
      basic_regex& assign(basic_regex&& that) noexcept;
      basic_regex& assign(const char8_t* ptr, flag_type f = regex_constants::ECMAScript2019);
      basic_regex& assign(const char8_t* p, size_t len, flag_type f);
      template<class string_traits, class A>
        basic_regex& assign(const basic_string<char8_t, string_traits, A>& s,
                            flag_type f = regex_constants::ECMAScript2019);
      template<class InputIterator>
        basic_regex& assign(InputIterator first, InputIterator last,
                            flag_type f = regex_constants::ECMAScript2019);
      basic_regex& assign(initializer_list<char8_t>,
                          flag_type = regex_constants::ECMAScript2019);

          // 30.8.7.1.3, const operations

      unsigned mark_count() const;
      unsigned gname_to_gnumber(const char8_t* p) const;
      unsigned gname_to_gnumber(const char8_t* p, size_t len) const;
      template<class string_traits, class A>
        unsigned gname_to_gnumber(const basic_string<char8_t, string_traits, A>& s) const;
      template<class InputIterator>
        unsigned gname_to_gnumber(InputIterator first, InputIterator last) const;
      flag_type flags() const;

          // 30.8.7.1.4, locale

      locale_type imbue(locale_type loc);
      locale_type getloc() const;

          // 30.8.7.1.5, swap

      void swap(basic_regex&);
    };

30.8.7.1.1 Constructors [re.regex.special.char8_t.construct]

basic_regex();

1 Effects: Constructs an object of class basic_regex that does not match any character sequence.

explicit basic_regex(const char8_t* p, flag_type f = regex_constants::ECMAScript2019);

2 Requires: p shall not be a null pointer.
3 Throws: regex_error if p is not a valid regular expression.
4 Effects: Constructs an object of class basic_regex; the object’s internal finite state machine is constructed from the regular expression contained in the array of char8_t of length char_traits<char8_t>::length(p) whose first element is designated by p and whose elements represent a UTF-8 sequence, and interpreted according to the flags f.
5 Ensures: flags() returns f. mark_count() returns the number of marked sub-expressions within the expression.

basic_regex(const char8_t* p, size_t len, flag_type f = regex_constants::ECMAScript2019);

6 Requires: p shall not be a null pointer.
7 Throws: regex_error if p is not a valid regular expression.
8 Effects: Constructs an object of class basic_regex; the object’s internal finite state machine is constructed from the regular expression contained in the sequence of UTF-8 code units [p, p+len), and interpreted according the flags specified in f.
9 Ensures: flags() returns f. mark_count() returns the number of marked sub-expressions within the expression.

basic_regex(const basic_regex& e);

10 Effects: Constructs an object of class basic_regex as a copy of the object e.
11 Ensures: flags() and mark_count() return e.flags() and e.mark_count(), respectively.

basic_regex(basic_regex&& e) noexcept;

12 Effects: Move constructs an object of class basic_regex from e.
13 Ensures: flags() and mark_count() return the values that e.flags() and e.mark_count(), respectively, had before construction. e is in a valid state with unspecified value.

template<class ST, class SA>
  explicit basic_regex(const basic_string<char8_t, ST, SA>& s,
                       flag_type f = regex_constants::ECMAScript2019);

14 Throws: regex_error if s is not a valid regular expression.
15 Effects: Constructs an object of class basic_regex; the object’s internal finite state machine is constructed from the regular expression contained in the string s whose elements represent a UTF-8 sequence, and interpreted according to the flags specified in f.
16 Ensures: flags() returns f. mark_count() returns the number of marked sub-expressions within the expression.

template<class ForwardIterator>
  basic_regex(ForwardIterator first, ForwardIterator last,
              flag_type f = regex_constants::ECMAScript2019);

17 Throws: regex_error if the sequence [first, last) is not a valid regular expression.
18 Effects: Constructs an object of class basic_regex; the object’s internal finite state machine is constructed from the regular expression contained in the sequence of UTF-8 code units [first, last), and interpreted according to the flags specified in f.
19 Ensures: flags() returns f. mark_count() returns the number of marked sub-expressions within the expression.

basic_regex(initializer_list<charT> il, flag_type f = regex_constants::ECMAScript2019);

20 Effects: Same as basic_regex(il.begin(), il.end(), f).

30.8.7.1.2 Assignment [re.regex.special.char8_t.assign]

basic_regex& operator=(const basic_regex& e);

1 Effects: Copies e into *this and returns *this.
2 Ensures: flags() and mark_count() return e.flags() and e.mark_count(), respectively.

basic_regex& operator=(basic_regex&& e) noexcept;

3 Effects: Move assigns from e into *this and returns *this.
4 Ensures: flags() and mark_count() return the values that e.flags() and e.mark_count(), respectively, had before assignment. e is in a valid state with unspecified value.

basic_regex& operator=(const charT* ptr);

5 Requires: ptr shall not be a null pointer.
6 Effects: Returns assign(ptr).

basic_regex& operator=(initializer_list<charT> il);

7 Effects: Returns assign(il.begin(), il.end()).

template<class ST, class SA>
  basic_regex& operator=(const basic_string<charT, ST, SA>& p);

8 Effects: Returns assign(p).

basic_regex& assign(const basic_regex& that);

9 Effects: Equivalent to: return *this = that;

basic_regex& assign(basic_regex&& that) noexcept;

10 Effects: Equivalent to: return *this = std::move(that);

basic_regex& assign(const charT* ptr, flag_type f = regex_constants::ECMAScript2019);

11 Returns: assign(string_type(ptr), f).

basic_regex& assign(const charT* ptr, size_t len, flag_type f = regex_constants::ECMAScript2019);

12 Returns: assign(string_type(ptr, len), f).

template<class string_traits, class A>
  basic_regex& assign(const basic_string<charT, string_traits, A>& s,
                      flag_type f = regex_constants::ECMAScript2019);

13 Throws: regex_error if s is not a valid regular expression.
14 Returns: *this.
15 Effects: Assigns the regular expression contained in the string s whose elements represent a UTF-8 sequence, interpreted according the flags specified in f. If an exception is thrown, *this is unchanged.
16 Ensures: If no exception is thrown, flags() returns f and mark_count() returns the number of marked sub-expressions within the expression.

template<class InputIterator>
  basic_regex& assign(InputIterator first, InputIterator last,
                      flag_type f = regex_constants::ECMAScript2019);

17 Requires: InputIterator shall meet the Cpp17InputIterator requirements (23.3.5.2).
18 Returns: assign(string_type(first, last), f).

basic_regex& assign(initializer_list<charT> il,
                    flag_type f = regex_constants::ECMAScript2019);

19 Effects: Same as assign(il.begin(), il.end(), f).
20 Returns: *this.

30.8.7.1.3 Constant operations [re.regex.special.char8_t.operations]

unsigned mark_count() const;

1 Effects: Returns the number of marked sub-expressions within the regular expression.

unsigned gname_to_gnumber(const char8_t* p) const;

2 Returns: gname_to_gnumber(string_type(p)).

unsigned gname_to_gnumber(const char8_t* p, size_t len) const;

3 Returns: gname_to_gnumber(string_type(p, len)).

template<class string_traits, class A>
  unsigned gname_to_gnumber(const basic_string<char8_t, string_traits, A>& s) const;

4 Throws: error_backref if s is an empty string or the marked sub-expression assigned with the group name being identical to the UTF-8 string s does not exist within the regular expression.
5 Effects: Returns the group number of the marked sub-expression assigned with the group name being identical to the UTF-8 string s, within the regular expression.

template<class InputIterator>
  unsigned gname_to_gnumber(InputIterator first, InputIterator last) const;

6 Requires: InputIterator shall meet the Cpp17InputIterator requirements (23.3.5.2).
7 Returns: gname_to_gnumber(string_type(first, last)).

flag_type flags() const;

8 Effects: Returns a copy of the regular expression syntax flags that were passed to the object’s constructor or to the last call to assign.

30.8.7.1.4 Locale [re.regex.special.char8_t.locale]

locale_type imbue(locale_type loc);

1 Returns: locale_type().

locale_type getloc() const;

2 Returns: locale_type().

30.8.7.1.5 Swap [re.regex.special.char8_t.swap]

void swap(basic_regex& e);

1 Effects: Swaps the contents of the two regular expressions.
2 Ensures: *this contains the regular expression that was in e, e contains the regular expression that was in *this.
3 Complexity: Constant time.

30.8.7.2 class basic_regex<char16_t> specializations [re.regex.special.char16_t]

namespace std {
  template<>
    class basic_regex<char16_t> {
    public:

          // types

      using value_type = char16_t;
      using traits_type = void;
      using string_type = basic_string<char16_t>;
      using flag_type = regex_constants::syntax_option_type;
      using locale_type = locale;

          // 30.5.1, constants

      static constexpr flag_type icase = regex_constants::icase;
      static constexpr flag_type nosubs = regex_constants::nosubs;
      static constexpr flag_type optimize = regex_constants::optimize;
      static constexpr flag_type multiline = regex_constants::multiline;
      static constexpr flag_type ECMAScript2019 = regex_constants::ECMAScript2019;
      static constexpr flag_type dotall = regex_constants::dotall;

          // construct/copy/destroy

      basic_regex();
      explicit basic_regex(const char16_t* p, flag_type f = regex_constants::ECMAScript2019);
      basic_regex(const char16_t* p, size_t len, flag_type f = regex_constants::ECMAScript2019);
      basic_regex(const basic_regex&);
      basic_regex(basic_regex&&) noexcept;
      template<class ST, class SA>
        explicit basic_regex(const basic_string<char16_t, ST, SA>& p,
                             flag_type f = regex_constants::ECMAScript2019);
      template<class ForwardIterator>
        basic_regex(ForwardIterator first, ForwardIterator last,
                    flag_type f = regex_constants::ECMAScript2019);
      basic_regex(initializer_list<char16_t>, flag_type = regex_constants::ECMAScript2019);

      ~basic_regex();

      basic_regex& operator=(const basic_regex&);
      basic_regex& operator=(basic_regex&&) noexcept;
      basic_regex& operator=(const char16_t* ptr);
      basic_regex& operator=(initializer_list<char16_t> il);
      template<class ST, class SA>
        basic_regex& operator=(const basic_string<char16_t, ST, SA>& p);

          // assign

      basic_regex& assign(const basic_regex& that);
      basic_regex& assign(basic_regex&& that) noexcept;
      basic_regex& assign(const char16_t* ptr, flag_type f = regex_constants::ECMAScript2019);
      basic_regex& assign(const char16_t* p, size_t len, flag_type f);
      template<class string_traits, class A>
        basic_regex& assign(const basic_string<char16_t, string_traits, A>& s,
                            flag_type f = regex_constants::ECMAScript2019);
      template<class InputIterator>
        basic_regex& assign(InputIterator first, InputIterator last,
                            flag_type f = regex_constants::ECMAScript2019);
      basic_regex& assign(initializer_list<char16_t>,
                          flag_type = regex_constants::ECMAScript2019);

          // const operations

      unsigned mark_count() const;
      unsigned gname_to_gnumber(const char16_t* p) const;
      unsigned gname_to_gnumber(const char16_t* p, size_t len) const;
      template<class string_traits, class A>
        unsigned gname_to_gnumber(const basic_string<char16_t, string_traits, A>& s) const;
      template<class InputIterator>
        unsigned gname_to_gnumber(InputIterator first, InputIterator last) const;
      flag_type flags() const;

          // locale

      locale_type imbue(locale_type loc);
      locale_type getloc() const;

          // swap

      void swap(basic_regex&);
    };

1 Same as the specification of class basic_regex<char8_t> specialization, except that the words char8_t and UTF-8 that appear in the text are replaced with char16_t and UTF-16, respectively.

If saying "Same as the specification of ..." is not appropriate, the previous subclause will be rewritten like [re.regex.special.char8_t].

30.8.7.3 class basic_regex<char16_t> specializations [re.regex.special.char32_t]

namespace std {
  template<>
    class basic_regex<char32_t> {
    public:

          // types

      using value_type = char32_t;
      using traits_type = void;
      using string_type = basic_string<char32_t>;
      using flag_type = regex_constants::syntax_option_type;
      using locale_type = locale;

          // 30.5.1, constants

      static constexpr flag_type icase = regex_constants::icase;
      static constexpr flag_type nosubs = regex_constants::nosubs;
      static constexpr flag_type optimize = regex_constants::optimize;
      static constexpr flag_type multiline = regex_constants::multiline;
      static constexpr flag_type ECMAScript2019 = regex_constants::ECMAScript2019;
      static constexpr flag_type dotall = regex_constants::dotall;

          // construct/copy/destroy

      basic_regex();
      explicit basic_regex(const char32_t* p, flag_type f = regex_constants::ECMAScript2019);
      basic_regex(const char32_t* p, size_t len, flag_type f = regex_constants::ECMAScript2019);
      basic_regex(const basic_regex&);
      basic_regex(basic_regex&&) noexcept;
      template<class ST, class SA>
        explicit basic_regex(const basic_string<char32_t, ST, SA>& p,
                             flag_type f = regex_constants::ECMAScript2019);
      template<class ForwardIterator>
        basic_regex(ForwardIterator first, ForwardIterator last,
                    flag_type f = regex_constants::ECMAScript2019);
      basic_regex(initializer_list<char32_t>, flag_type = regex_constants::ECMAScript2019);

      ~basic_regex();

      basic_regex& operator=(const basic_regex&);
      basic_regex& operator=(basic_regex&&) noexcept;
      basic_regex& operator=(const char32_t* ptr);
      basic_regex& operator=(initializer_list<char32_t> il);
      template<class ST, class SA>
        basic_regex& operator=(const basic_string<char32_t, ST, SA>& p);

          // assign

      basic_regex& assign(const basic_regex& that);
      basic_regex& assign(basic_regex&& that) noexcept;
      basic_regex& assign(const char32_t* ptr, flag_type f = regex_constants::ECMAScript2019);
      basic_regex& assign(const char32_t* p, size_t len, flag_type f);
      template<class string_traits, class A>
        basic_regex& assign(const basic_string<char32_t, string_traits, A>& s,
                            flag_type f = regex_constants::ECMAScript2019);
      template<class InputIterator>
        basic_regex& assign(InputIterator first, InputIterator last,
                            flag_type f = regex_constants::ECMAScript2019);
      basic_regex& assign(initializer_list<char32_t>,
                          flag_type = regex_constants::ECMAScript2019);

          // const operations

      unsigned mark_count() const;
      unsigned gname_to_gnumber(const char32_t* p) const;
      unsigned gname_to_gnumber(const char32_t* p, size_t len) const;
      template<class string_traits, class A>
        unsigned gname_to_gnumber(const basic_string<char32_t, string_traits, A>& s) const;
      template<class InputIterator>
        unsigned gname_to_gnumber(InputIterator first, InputIterator last) const;
      flag_type flags() const;

          // locale

      locale_type imbue(locale_type loc);
      locale_type getloc() const;

          // swap

      void swap(basic_regex&);
    };

1 Same as the specification of class basic_regex<char8_t> specialization, except that the words char8_t and UTF-8 that appear in the text are replaced with char32_t and UTF-32, respectively.

If saying "Same as the specification of ..." is not appropriate, the previous subclause will be rewritten like [re.regex.special.char8_t].
30.11.3 regex_search [re.alg.search]

Addition of variants that take three bidirectional iterators also to non-specialized regex_search is for regex_itertor and consistency.

template<class BidirectionalIterator, class Allocator, class charT, class traits>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<charT, traits>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

9 Returns: regex_search(first, last, m, e, flags).

template<class BidirectionalIterator, class Allocator, class charT, class traits>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                   BidirectionalIterator lookbehindlimit,
                   const basic_regex<charT, traits>& e,
                   regex_constants::match_flag_type flags = regex_constants::match_default);

10 Returns: regex_search(first, last, e, flags).

30.11.3.1 regex_search specializations [re.alg.search.special]

1 The header <regex> defines three specializations of the function template regex_search that take as one of parameters an instance of basic_regex<char8_t>, basic_regex<char16_t>, and basic_regex<char32_t>.

2 [Note: These specializations are not required necessarily to be implemented separately; typical implementations will use an internal iterator class template that has specializations for char8_t, char16_t, and char32_t to translate an input sequence of UTF-8, UTF-16, and UTF-32 respectively to a sequence of Unicode code points, and compare that translated sequence with the passed finite state machine in a base function shared by these three specializations. —end note]

template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char8_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

3 Requires: Type BidirectionalIterator shall meet the Cpp17BidirectionalIterator requirements (23.3.5.5).
4 Effects: Determines whether there is some sub-sequence within the UTF-8 sequence [first, last) that matches the regular expression e. The iterator lookbehindlimit is used to specify the limit until where reading the UTF-8 sequence backwards can be performed. If first != lookbehindlimit then ^ shall match lookbehindlimit instead of first. The parameter flags is used to control how the expression is matched against the UTF-8 sequence. Returns true if such a sequence exists, false otherwise.
5 Ensures: m.ready() == true in all cases. If the function returns false, then the effect on parameter m is unspecified except that m.size() returns 0 and m.empty() returns true. Otherwise the effects on parameter m are given in Table 130.

template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char8_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

6 Returns: regex_search(first, last, first, m, e, flags).

template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char16_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

7 Requires: Type BidirectionalIterator shall meet the Cpp17BidirectionalIterator requirements (23.3.5.5).
8 Effects: Determines whether there is some sub-sequence within the UTF-16 sequence [first, last) that matches the regular expression e. The iterator lookbehindlimit is used to specify the limit until where reading the UTF-16 sequence backwards can be performed. If first != lookbehindlimit then ^ shall match lookbehindlimit instead of first. The parameter flags is used to control how the expression is matched against the UTF-16 sequence. Returns true if such a sequence exists, false otherwise.
9 Ensures: m.ready() == true in all cases. If the function returns false, then the effect on parameter m is unspecified except that m.size() returns 0 and m.empty() returns true. Otherwise the effects on parameter m are given in Table 130.

template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char16_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

10 Returns: regex_search(first, last, first, m, e, flags).

template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    BidirectionalIterator lookbehindlimit,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char32_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

11 Requires: Type BidirectionalIterator shall meet the Cpp17BidirectionalIterator requirements (23.3.5.5).
12 Effects: Determines whether there is some sub-sequence within the UTF-32 sequence [first, last) that matches the regular expression e. The iterator lookbehindlimit is used to specify the limit until where reading the UTF-32 sequence backwards can be performed. If first != lookbehindlimit then ^ shall match lookbehindlimit instead of first. The parameter flags is used to control how the expression is matched against the UTF-32 sequence. Returns true if such a sequence exists, false otherwise.
13 Ensures: m.ready() == true in all cases. If the function returns false, then the effect on parameter m is unspecified except that m.size() returns 0 and m.empty() returns true. Otherwise the effects on parameter m are given in Table 130.

template<class BidirectionalIterator, class Allocator>
  bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                    match_results<BidirectionalIterator, Allocator>& m,
                    const basic_regex<char32_t>& e,
                    regex_constants::match_flag_type flags = regex_constants::match_default);

14 Returns: regex_search(first, last, first, m, e, flags).
30.12 Regular expression iterators [re.iter]

30.12.1.1 Constructors [re.regiter.cnstr]

2 Effects: Initializes begin and end to a and b, respectively, sets pregex to addressof(re), sets flags to m, then calls regex_search(begin, end, begin, match, *pregex, flags). If this call returns false the constructor sets *this to the end-of-sequence iterator.

30.12.1.4 Increment [re.regiter.incr]

3 Otherwise, if the iterator holds a zero-length match, the operator calls:

regex_search(start, end, begin, match, *pregex,
flags | regex_constants::match_not_null | regex_constants::match_continuous)

If the call returns true the operator returns *this. Otherwise the operator increments start and continues as if the most recent match was not a zero-length match.

4 If the most recent match was not a zero-length match, the operator sets flags to flags | regex_constants::match_prev_avail and calls regex_search(start, end, begin, match, *pregex, flags). If the call returns false the iterator sets *this to the end-of-sequence iterator. The iterator then returns *this.
30.13 Modified ECMAScript regular expression grammar [re.grammar]

1 The regular expression grammar recognized by basic_regex objects constructed with the ECMAScript flag is that specified by ECMA-262 third edition, except as specified below.

14 The behavior of the internal finite state machine representation when used to match a sequence of characters is as described in ECMA-262 third edition. The behavior is modified according to any match_flag_type flags (30.5.2) specified when using the regular expression object in one of the regular expression algorithms (30.11). The behavior is also localized by interaction with the traits class template parameter as follows:

See also: ECMA-262 third edition 15.10
30.14 Modified ECMAScript2019 regular expression grammar [re.grammar2019]

1 The following production within the ECMAScript2019 grammar is clarified as follows:

CharacterEscape::HexEscapeSequence

Return the numeric value of the code unit in UTF-16 that is the SV of HexEscapeSequence.

Others

The undated version of the ECMAScript Specification is added to references.

2 Normative references [intro.refs]
- (1.1) — Ecma International, ECMAScript Language Specification, Standard Ecma-262.
- (1.12) — Ecma International, ECMAScript Language Specification, Standard Ecma-262, third edition, 1999.
- (1.23) — INTERNET ENGINEERING TASK FORCE (IETF). RFC 6557: Procedures for Maintaining the Time Zone Database [online]. Edited by E. Lear, P. Eggert. February 2012 [viewed 2018-03-26]. Available at https://www.ietf.org/rfc/rfc6557.txt
- (1.34) — ISO/IEC 2382 (all parts), Information technology — Vocabulary
- (1.45) — ISO 8601:2004, Data elements and interchange formats — Information interchange — Representation of dates and times
- (1.56) — ISO/IEC 9899:2011, Programming languages — C
- (1.67) — ISO/IEC 9945:2003, Information Technology — Portable Operating System Interface (POSIX)
- (1.78) — ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)
- (1.89) — ISO/IEC 10646-1:1993, Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane
- (1.910) — ISO/IEC/IEEE 60559:2011, Information technology — Microprocessor Systems — Floating-Point arithmetic
- (1.1011) — ISO 80000-2:2009, Quantities and units — Part 2: Mathematical signs and symbols to be used in the natural sciences and technology

Enhancement of regex

Table of Contents

Abstract

I. Changes

R1

II. Introduction and Motivation

III. Why new syntax option? Why not simply enhance the existing ECMAScript engine?

IV. Why RegExp of ECMAScript?

V. Scope and Impact on the Standard

regex_search

VI. Technical Specifications

1. <regex>

30.1 General [re.general]

30.4 Header <regex> synopsis [re.syn]

30.5.1 Bitmask type syntax_option_type [re.synopt]

30.5.2 Bitmask type `match_flag_type` [re.matchflag]

30.8 Class template `basic_regex` [re.regex]

30.8.7 `basic_regex` specializations [re.regex.special]

30.8.7.1 class `basic_regex<char8_t>` specializations [re.regex.special.char8_t]

30.8.7.1.1 Constructors [re.regex.special.char8_t.construct]

30.8.7.1.2 Assignment [re.regex.special.char8_t.assign]

30.8.7.1.3 Constant operations [re.regex.special.char8_t.operations]

30.8.7.1.4 Locale [re.regex.special.char8_t.locale]

30.8.7.1.5 Swap [re.regex.special.char8_t.swap]

30.8.7.2 class `basic_regex<char16_t>` specializations [re.regex.special.char16_t]

30.8.7.3 class `basic_regex<char16_t>` specializations [re.regex.special.char32_t]

30.11.3 `regex_search` [re.alg.search]

30.11.3.1 `regex_search` specializations [re.alg.search.special]

30.12 Regular expression iterators [re.iter]

30.12.1.1 Constructors [re.regiter.cnstr]

30.12.1.4 Increment [re.regiter.incr]

30.13 Modified ECMAScript regular expression grammar [re.grammar]

30.14 Modified ECMAScript2019 regular expression grammar [re.grammar2019]

Others

2 Normative references [intro.refs]

VII. Relevant Matter

VIII. Review

IX. References

X. Appendix

Enhancement of regex

Table of Contents

Abstract

I. Changes

R1

II. Introduction and Motivation

III. Why new syntax option? Why not simply enhance the existing ECMAScript engine?

IV. Why RegExp of ECMAScript?

V. Scope and Impact on the Standard

regex_search

VI. Technical Specifications

1. <regex>

30.1 General [re.general]

30.4 Header <regex> synopsis [re.syn]

30.5.1 Bitmask type syntax_option_type [re.synopt]

30.5.2 Bitmask type match_flag_type [re.matchflag]

30.8 Class template basic_regex [re.regex]

30.8.7 basic_regex specializations [re.regex.special]

30.8.7.1 class basic_regex<char8_t> specializations [re.regex.special.char8_t]

30.8.7.1.1 Constructors [re.regex.special.char8_t.construct]

30.8.7.1.2 Assignment [re.regex.special.char8_t.assign]

30.8.7.1.3 Constant operations [re.regex.special.char8_t.operations]

30.8.7.1.4 Locale [re.regex.special.char8_t.locale]

30.8.7.1.5 Swap [re.regex.special.char8_t.swap]

30.8.7.2 class basic_regex<char16_t> specializations [re.regex.special.char16_t]

30.8.7.3 class basic_regex<char16_t> specializations [re.regex.special.char32_t]

30.11.3 regex_search [re.alg.search]

30.11.3.1 regex_search specializations [re.alg.search.special]

30.12 Regular expression iterators [re.iter]

30.12.1.1 Constructors [re.regiter.cnstr]

30.12.1.4 Increment [re.regiter.incr]

30.13 Modified ECMAScript regular expression grammar [re.grammar]

30.14 Modified ECMAScript2019 regular expression grammar [re.grammar2019]

Others

2 Normative references [intro.refs]

VII. Relevant Matter

VIII. Review

IX. References

X. Appendix

30.5.2 Bitmask type `match_flag_type` [re.matchflag]

30.8 Class template `basic_regex` [re.regex]

30.8.7 `basic_regex` specializations [re.regex.special]

30.8.7.1 class `basic_regex<char8_t>` specializations [re.regex.special.char8_t]

30.8.7.2 class `basic_regex<char16_t>` specializations [re.regex.special.char16_t]

30.8.7.3 class `basic_regex<char16_t>` specializations [re.regex.special.char32_t]

30.11.3 `regex_search` [re.alg.search]

30.11.3.1 `regex_search` specializations [re.alg.search.special]