.
Last update: 1997-05-20
9945-2-24
Class: Defect situation
The standards states what it states, and conforming implementations
must conform to this. However, concerns have been raised about this
which are being referred to the Sponsors of the standard for consideration as
a future amendment.
_____________________________________________________________________________
Topic: tr
Relevant Sections: 4.64.5.1
Defect Report:
-----------------------
Component: tr - Sect 4.64.5.1
Submitted by: Alex White
Ref. No.: tr.1
Proposed Resolution:
The interpretation request correctly describes what is in
the standard but this was not what was intended. The
working group will draft and propose a change to .2b to
describe what was originally intended.
_____________________________________________________________________________
In Section 4.64.5.1 - Standard Input {of tr}, the standard
states that the standard input to tr ``can be any file
type.'' [Draft 12 of ISO/IEC 9945-2:1993 (July 1992), p.
483, line 10456]
However, in Section 4.64.5.3 - Environment Variables {of
tr}, the standard states that the LC_COLLATE variable
``shall determine the behaviour of range expressions and
equivalence classes.'' [Ibid., p. 483, lines 10499-10500]
and in Section 4.64.7 - Extended Description {of tr}, the
standard states that the \octal construct
[...] can be used to represent characters with
specific coded values. An octal sequence shall
consist of a backslash followed by the longest
sequence of one-, two-, or three-octal-digit
characters (01234567). The sequence shall cause
the character whose encoding is represented by the
one-, two-, or three-digit octal integer to be
placed into the array.
[Ibid., p. 484, lines 10525-10530]
These two statements cause tr to be unusable on any files of
type other than text. Historically, tr has been used to
manipulate files containing binary data. For example, the
perfectly valid, and useful construct:
tr -d '\200-\2ff'
to delete all characters with the top bit on or even
tr '\200-\2ff' '\0-\1ff'
to strip the top bit (which are useful operations on binary
files), no longer work.
For example, in the PC character set, \200 is a C-cedilla,
and \2ff is not defined as a glyph. Therefore, according to
section 4.64.5.3, the most likely interpretation is
characters which collate from C-cedilla (probably the letter
D) through the end will all match here. This is clearly
wrong, not historical practice, and of no use whatsoever.
May we interpret the standard as permitting octal escape
sequences as endpoints of a range to not use the collating
order, but rather byte ordering?
WG15 response for 9945-2:1993
-----------------------------------
The standard is clear in its requirement that octal sequences used as
endpoints in a range be treated as collating elements. The
implementation must follow this requirement. Concern over the wording of
this area of this standard has been forwarded to the sponsors.
Rationale for Interpretation:
-----------------------------
None.
_____________________________________________________________________________