WG15 Defect Report Ref: 9945-2-02
Topic: Regular expressions

This is an approved interpretation of 9945-2:1993.


Last update: 1997-05-20


	Class: No change


	Topic:			Regular expressions
	Relevant Sections:	B.5.2

Defect Report:

          In Section B.5.2 - Description {of  C  Binding  for  Regular 
          Expression Matching}, the standard states that  the  re_nsub 
          member of the regex_t structure  represents  the  number  of 
          parenthesized subexpressions found in pattern.  [Draft 12 of 
          ISO/IEC 9945-2:1993 (July 1992), p. 766, lines 329-331] 
          The standard then states that the pmatch argument 
               shall point to  an  array  with  at  least  nmatch 
               elements, and regexec() shall fill in the elements 
               of that array with offsets of  the  substrings  of 
               string  that  correspond  to   the   parenthesized 
               subexpressions of pattern:  pmatch[i].rm_so  shall 
               be  the  byte  offset   of   the   beginning   and 
               pmatch[i].rm_eo shall be one greater than the byte 
               offset of the end of substring i.   (Subexpression 
               i begins at  the  ith  matched  open  parenthesis, 
               counting from  1.)   Offsets  in  pmatch[0]  shall 
               identify the substring  that  corresponds  to  the 
               entire regular expression. 
          [Ibid., p. 766-767, lines 339-346] 
          Thus, if pmatch[] contains nmatch elements, it can only hold 
          nmatch-1  parenthesized  subexpressions  of  string,   since 
          pmatch[0] represents the entire regular expression. 
          The standard also states  that  ``if  there  are  more  than 
          nmatch subexpressions in pattern (pattern itself counts as a 
          subexpression), then regexec() [...] shall record  only  the 
          first nmatch substrings.'' [Ibid., p. 767, lines 347-350] 
          Lines 347-350 appear to contradict lines 339-346; the latter 
          talks about parenthesized subexpressions, while  the  former 
          mentions  plain  subexpressions.   Is  the  intent  of   the 
          standard  to  allow  the  re_nsub  member  to  include   the 
          subexpression representing the entire regular expression  in 
          the count (since it is considered a  subexpression  on  page 
          767, lines  347-350),  or  does  it  only  count  explicitly 
          parenthesized  subexpressions?   We  believe  this  is   the 
          easiest way to rectify the ambiguity. 
WG15 response for 9945-2:1993 
The subexpression representing the entire RE is to be included in the
count represented in the re_nsub member. No change in wording is

Rationale for Interpretation:
The section quoted in the request, from Section B.5.2 (but lines 327-338
in the Standard) contains the phrase "(pattern itself counts as an
expression)", which the committee considers key to interpreting this
apparent conflict.