WG15 Defect Report Ref: 9945-2-85
Topic: ERE's


This is an approved interpretation of 9945-2:1993.

.

Last update: 1997-05-20


								9945-2-85

 _____________________________________________________________________________

	Topic:			ERE's
	Relevant Sections:	2.8.4.1.2


Defect Report:
-----------------------
From: jeffhe@mks.com (Jeff Hendrikse)
Date: Tue, 15 Nov 1994 14:53:57 -0500

Sections:
	2.8.4.1.2, ERE Special Characters, lines 3069-3072 and
	B.5.3, Returns, line 424.

Problem:
	Section 2.8.4.1.1 states, with respect to the repeat 
	characters *, +, ?, and {, that
		"Any of the following uses produces undefined results:  
		- If these characters appear first in an ERE, or immediately
		  following a vertical line, circumflex, or left parenthesis."

	This implies that, for instance, the RE "*foo", has undefined results.

	In section B.5.3, discussing the return codes from the regexec 
	and regcomp C API's, the table B-10 includes the error:
		"REG_BADRPT	?, *, or + not preceded by a valid RE"

	This text seems to overlap and contradict the previous text.  If the
	repeater is at the beginning of a RE, then it is not preceded by 
	a valid regular expression, which then results in the error.
	This section implies that the same RE, "*foo", would result in the 
	error REG_BADRPT, since the NULL character preceding the repeat 
	character is not a valid RE.

	We would like to see clarification of these two points.  

Recommendation:
	It is requested that the implementation be allowed undefined results
	if the repeat character appears first in the regular expression.
	Historically, this condition would either be treated as an 
	error, or the repeat character would not be treated specially,
	as is the case with BRE's.

	If the repeat character appears after a regular expression which
	is not a valid expression, this condition should trigger the error.

	So, the expression "*foo" will produce undefined results, while the
	expression "f+*oo" would case a REG_BADRPT (or REG_BADPAT) error 
	condition.


WG15 response for 9945-2:1993
-----------------------------------

The standard does not require the implementation to detect any particular
error, nor to return an error in any particular situation.  It only 
requires that the listed errors only be returned when the indicated error 
is detected by the implementation.

So, regcomp() may return REG_BADRPT if given the pattern "*foo", since
the '*' certainly isn't preceeded by a valid ERE specified by the standard.
It may also do just about anything else, since the interpretation of this 
ERE is undefined.

The interpretation request is based on the conclusion that

  regcomp (&preg, "*foo", 0);

could reasonably dump core, because the interpretation of "*foo" is 
undefined.  

The behavior of regcomp() with a pattern such as '*foo' produces
undefined results.  A conforming application shall not expect the 
return code REG_BADRPT from regcomp(), if it uses an ERE with a
repeat character appearing first or following any of the characters 
mentioned in section 2.8.4.1.2.

The standard clearly states behavior for regular expressions and 
conforming implementations must conform to this.

Rationale
-------------
None.


Forwarded to Interpretations group: 16 Nov 94
Response received: Feb 10 1995
Proposed Resoln forwarded: 13th Feb 1995
Finalised: March 28th 1995
 _____________________________________________________________________________