From fxrojas@nlsarch.austin.ibm.com Sun Feb 2 18:30:12 1992 Received: from nlsarch.austin.ibm.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA26753; Sun, 2 Feb 92 18:30:12 +0100 Received: by nlsarch.austin.ibm.com (AIX 9138320c 3.2/UCB 5.64/4.03) id AA15145; Sun, 2 Feb 1992 10:40:27 -0600 Message-Id: <9202021640.AA15145@nlsarch.austin.ibm.com> To: erik@sran8.sra.co.jp Cc: fxrojas@nlsarch.austin.ibm.com, ISO10646@jhuvm.bitnet, Unicode@sun.com, i18n@dkuug.dk, ietf-822@dimacs.rutgers.edu Date: Sun, 02 Feb 92 10:40:27 -0600 From: Frank Rojasoel) To: ISO10646@jhuvm.bitnet, Unicode@sun.c X-Charset: ASCII X-Char-Esc: 29 ------- Blind-Carbon-Copy Erik, while I personally prefer this approach for today's interchange, I do not support this as an internal processing encoding. It is only valid as a Mail interchange mechanism. Also, as a goal, I think that we should try to standardize 8 bit mail rather then adjusting the standard for your conclusion that 7 bit is here forever. Frank, ---------------------------------- more comments ----------------------------- Multilingual Character Encoding for Internet Messages Erik M. van der Poel, SRA January 31, 1992 * Abstract This document describes a multilingual character encoding for use in Internet messages. This encoding is designed to be highly compatible with existing electronic mail and network news handling software. Erik, I have not gone thru this in details, but I wanted to just let you know that the latest release of AIX (3.2) will provide this capability via the converters. Effectively we provided a conversion from any code set supported in a locale to a canonical format which we named "fold7". As such, "ISO8859-1" to "fold7" will convert the Latin-1 characters into a 7 bit encoding using ISO 2022 escape sequences. For both SJIS and eucJP, we follow the JUNET convention. This is exactly what I've been proposing within the OSF SIG; for OSF to provide in their mail sevice. I.e., modify either the sendmail or the mailer themselves to use iconv to do this type of conversion. This will solve the interoperability of mail between systems with different code sets internally. We don't do mnemonics as described in your paper. *** Full ISO 2022 ISO 2022 has mechanisms for encoding text in either 7 or 8 bits. Taking the 7-bit subset, then, may seem to be a feasible approach, but ISO 2022 has very many different ways of encoding the same information. This is rather complex and therefore likely to be implemented wrongly. So full ISO 2022 is rejected. Not when you combine it with the Compound Text rules. The problem with CT is that is is restricted to graphic characters only. Control characters can not be encoded using CT. As such, "fold7" uses the rules of CT but is not limited to graphic characters. *** Compound Text Compound Text [CTEXT] is an MIT X Consortium standard intended to be used in inter-application communications. It uses a subset of ISO 2022, and is therefore relatively simple, but it sets the 8th bit. Not neccessarily. It does allow either 7 or 8 bits. I think you are thinking about implementations of CT only use 8 bits. Yet, the spec does allow 7 bit. * Conformance Implementations that are claimed to conform to this standard need not be able to display all of the character sets specified above, but they must be able to parse this multilingual encoding to the extent of being able to discriminate between character sets that the implementation can and cannot display. That is, all displayable portions must be displayed. Non-displayable portions should be "shown" to the user in some fashion, unspecified by this standard. (One possibility is to simply say e.g. "Undisplayable Greek appeared here".) Displaying of the characters should not be considered in this standard. It should just address the interchange of characters. * Appendix - Processing Code Strictly speaking, this appendix is not a part of this standard. As a Mail interchange proposal this should definitely be removed. Frank Rojas