.
Last update: 1997-05-20
9945-2-2
Class: No change
_____________________________________________________________________________
Topic: Regular expressions
Relevant Sections: B.5.2
Defect Report:
-----------------------
In Section B.5.2 - Description {of C Binding for Regular
Expression Matching}, the standard states that the re_nsub
member of the regex_t structure represents the number of
parenthesized subexpressions found in pattern. [Draft 12 of
ISO/IEC 9945-2:1993 (July 1992), p. 766, lines 329-331]
The standard then states that the pmatch argument
shall point to an array with at least nmatch
elements, and regexec() shall fill in the elements
of that array with offsets of the substrings of
string that correspond to the parenthesized
subexpressions of pattern: pmatch[i].rm_so shall
be the byte offset of the beginning and
pmatch[i].rm_eo shall be one greater than the byte
offset of the end of substring i. (Subexpression
i begins at the ith matched open parenthesis,
counting from 1.) Offsets in pmatch[0] shall
identify the substring that corresponds to the
entire regular expression.
[Ibid., p. 766-767, lines 339-346]
Thus, if pmatch[] contains nmatch elements, it can only hold
nmatch-1 parenthesized subexpressions of string, since
pmatch[0] represents the entire regular expression.
The standard also states that ``if there are more than
nmatch subexpressions in pattern (pattern itself counts as a
subexpression), then regexec() [...] shall record only the
first nmatch substrings.'' [Ibid., p. 767, lines 347-350]
Lines 347-350 appear to contradict lines 339-346; the latter
talks about parenthesized subexpressions, while the former
mentions plain subexpressions. Is the intent of the
standard to allow the re_nsub member to include the
subexpression representing the entire regular expression in
the count (since it is considered a subexpression on page
767, lines 347-350), or does it only count explicitly
parenthesized subexpressions? We believe this is the
easiest way to rectify the ambiguity.
WG15 response for 9945-2:1993
-----------------------------------
The subexpression representing the entire RE is to be included in the
count represented in the re_nsub member. No change in wording is
necessary.
Rationale for Interpretation:
-----------------------------
The section quoted in the request, from Section B.5.2 (but lines 327-338
in the Standard) contains the phrase "(pattern itself counts as an
expression)", which the committee considers key to interpreting this
apparent conflict.
_____________________________________________________________________________