WG15 Defect Report Ref: 9945-2-91
Topic: awk - gsub/sub


This is an approved interpretation of 9945-2:1993.

.

Last update: 1997-05-20


								9945-2-91

 _____________________________________________________________________________

	Topic:			awk - gsub/sub
	Relevant Sections:	4.1.7.6.2.2


Defect Report:
-----------------------

	From: mark@mks.com (Mark Funkenhauser)
	Date: Tue, 13 Dec 1994 14:12:26 -0500 (EST)

I would like to request an official, binding interpretation from the
WG15 concerning the following point in ISO/IEC 9945-2:1993 (POSIX.2).
 
 
In section 4.1.7.6.2.2, page 178, lines 647-653,
the text for gsub() and sub() concerning the use of backslash states:
 
        For each occurrence of backslash (\) encountered when scanning the
        string _repl_ from beginning to end, the next character shall be
        taken literally and lose its special meaning (e.g.  \& shall be
        interpreted as a literal ampersand character).  Except for & and \,
        it is unspecified what the special meaning of any such character
        is.
 
This text implies that the only portable way to write the string _repl_ is to
put a backslash in front of every literal character, since there is no way
to tell what characters may be special for any particular implementation of
awk.

This wording also does not seem to allow historical behaviour. Historically,
awk treated the backslash character as an escape character
and allowed the characters such as "\b" or "\t" to indicate a backspace
and tab respectively.
The historical behaviour can be describe as:
        In _repl_, all characters are treated as literals, except for 
        ampersand (&) and backslash (\).  An ampersand (&) appearing
        in the string _repl_ shall be replaced by the string from _in_
        that matches the ERE.  Backslashes (\) in _repl_ introduce an
        escape sequence.  The sequence \& is replaced by a literal
        ampersand and \\ is replaced by a literal backslash.  The
        behaviour of a backslash followed by any other character is
        unspecified.

Can you please provide Rationale as to why this non-historical behaviour
was documented in 9945-2:1993. Or provide an interpretation
that allows a conforming implementation to provide the historical behaviour.
If not, then this matter should be forwarded to the sponsors.
 
 
Thank you for your attention to this matter.
 




WG15 response for 9945-2:1993
-----------------------------------

The standard states behavior for the backslash (\) character, and conforming
implementations must conform to this.  However, concerns have been raised
about this which are being referred to the sponsor.

Rationale
-------------
Forwarded to Interpretations group: 14 Dec 94
Response received: Feb 10 1995
Proposed Resoln forwarded: 13th Feb 1995
Finalised: March 28th 1995
 _____________________________________________________________________________