.
Last update: 1997-05-20
9945-2-25
Class: Defect situation
The standards states what it states, and conforming implementations
must conform to this. However, concerns have been raised about this
which are being referred to the Sponsors of the standard for consideration as
a future amendment.
Topic: tr
Relevant Sections: 4.64.3, 4.64.7
Defect Report:
-----------------------
In Section 4.64.3 - Options {of tr}, the standard states
that the -c option means ``complement the set of characters
specified by string1. See 4.64.7.'' [Draft 12 of ISO/IEC
9945-2:1993 (July 1992), p. 482, line 10472], and in Section
4.64.7 - Extended Description {of tr}, the standard states
that [:class:] ``[r]epresents the range of collating
elements between the range endpoints, inclusive, as defined
by the current setting of the LC_COLLATE locale category.''
[Ibid., p. 484, lines 10544-10546]
In Section 4.64.7 - Extended Description {of tr}, the
standard states that
[i]f the -c option is specified, the complement of
the characters specified by string1-the set of all
characters in the current character set, as
defined by the setting of LC_CTYPE, except for
those actually specified in the string1 operand-
shall be placed in the array in ascending
collation sequence, as defined by the current
setting of LC_COLLATE.
[Ibid., p. 485, lines 10590-10594]
However, if the character set is ISO 646, for example, then
the command
tr -c '[:print:]' '?'
which should translate all unprintable characters to
question-mark characters, will pass bytes with the high bit
set through unchanged. This is clearly wrong, not
historical practice, and violates the principle of least
astonishment.
The tr utility is a binary file manipulator. May we
interpret the wording of lines 10590-10594 as
If the -c option is specified, the complement of
the characters specified by string1-the set of all
possible machine byte patterns, except for those
actually specified in the string1 operand-shall be
placed in the array in ascending collation
sequence, as defined by the current setting of
LC_COLLATE.
to fix this problem?
WG15 response for 9945-2:1993
-----------------------------------
The standard is clear in its requirement that the -c option in this case
will place the complement of the characters specified in string1 in the
array.
It is also clear that the definition of the complement is "the
set of all current characters in the current character set, except for
those actually specified in string1". This precludes an interpreation
allowing the set of all possible machine byte patterns to be added to
the complement set.
The implementation must follow these requirements. Concern over the
wording of this area of this standard has been forwarded to the
sponsors.
Rationale for Interpretation:
-----------------------------
None.
_____________________________________________________________________________