WG15 Defect Report Ref: 9945-2-41
Topic: I18N issues - locales


This is an approved interpretation of 9945-2:1993.

.

Last update: 1997-05-20


								9945-2-41

 _____________________________________________________________________________


	Topic:			I18N issues - locales
	Relevant Sections:	2.5.2.1, 2.8.3.2
	Classification:  	Q1-6: Unaddressed Issues.
				Q7: Ambiguous Issue.
				Q8: No Change.
			


Defect Report:
-----------------------
(from Andrew Hume Doug McIlroy)

Issue B

[1]
     The specification of locales and the interface to	them
discriminates against non-vendor supplied software.  In	par-
ticular, it is impossible to write a portable implementation
of    regcomp()	 and  regexec(), as there is no	standardised
interface to the vital knowledge presumably set	up by a	call
to setlocale().	 This knowledge	is detailed below; in brief,
the first seems	an oversight and the others are	necessary to
use the	locale information.
	  ________________________________________

 [2] How can membership	in  class    :blank:  be  determined
     portably?	[2.5.2.1, 2.8.3.2(6)]

Proposed Solution:

     Provide a ctype function  isblank().

Rationale:

     It	is inconsistent	that this be the only  LC_CTYPE	cat-
egory  without a C binding.  Note that this extension intro-
duces a	difference between the C and POSIX locales.
	  ________________________________________

 [3] How can the meaning of an arbitrary  equivalence  class
     be	discovered portably?

Proposed Solution:

     Provide a function	that, given any	name for an  equiva-
lence class, returns a list of names of	collating symbols in
the class.  The	order of the list shall	be the same  regard-
less of	what name is given.

Rationale:
     This is needed if an application, such as	a  searching
or  sorting tool, requires this	locale-specific	information.
In particular the  regcomp() and  sort need it.
	  ________________________________________

 [4] How can the meaning/value	of  an	arbitrary  collating
     symbol be determined portably?

Proposed Solution:

     Provide a function	 that,	given  a  collating  symbol,
returns	the representation and length of the symbol.

Rationale:

     This is needed if an application, such as	a  searching
or sorting tool, requires this locale-specific information.
	  ________________________________________

 [5] How can the collating elements in a string	be found and
     compared portably?

Proposed Solution:

     Provide a function	that  returns  the  length  and	 the
weight	vector for the collating element at the	beginning of
the string.

Rationale:

     This is needed if an application, such as	a  searching
or sorting tool, requires this locale-specific information.
	  ________________________________________

 [6] How can  regcomp()	expand a  range	 expression  into  a
     list of collating elements	portably?

Proposed Solution:

     Provide a successor function that,	given the name of  a
collating element, returns the name of the collating element
with the next larger weight vector.  For  this	purpose	 two
elements with the same weight vector compare in	the order of
their equivalence listing.

Rationale:

     This is needed if an application, such as	a  searching
or  sorting tool, requires this	locale-specific	information.
It may further be useful to have a way to inquire whether  a
locale contains	any multicharacter collating elements.
	  ________________________________________

 [7] Lines 2918-20 say that an equivalence class  expression

     that  names  a  collating element not in an equivalence
     class shall be treated as	a  collating  symbol.	Does
     this  statement  affect the meaning of ``collating	sym-
     bol'' in line 3306?  Does it eliminate such equivalence
     class expressions from consideration in lines 2943-5?

Proposed Solution:

     Change 2918-2920  to  say	``the  expression  shall  be
understood  as	an  equivalence	class that contains only the
one collating element.''

     We	would actually prefer the  admittance  of  singleton
equivalence classes in the definitions of 2.5.2.2.

Rationale:

     This question affects the meaning of range	expressions.
Lines  2918-20	could  be  construed  as  forcing  [[=CE1=]-
[=CE2=]] to mean  [[.CE1.]-[.CE2.]] in some cases,  although
the  former  expression	 looks syntactically incorrect.	 The
preferred solution agrees with customary mathematical usage,
and clarifies the behavior of the equivalence-class function
proposed in [3]	above.
	  ________________________________________

 [8] What if collation	changes	 between    regcomp()  and
     regexec()?

Proposed Solution:

     The result	is undefined.

Rationale:

     For the common case of locales in which  all  collating
elements are single characters,	 regcomp() should be allowed
to compile character classes.  At the same time,   regexec()
should	be  allowed  to	handle multicharacter collating	sym-
bols.  The proposed resolution assures that both  desiderata
are met.



WG15 response for 9945-2:1993 
-----------------------------------
Q1   The standard does not speak to this issue and no conformance
distinction can be made between alternative implementations based on
this.

The standard does not require that an implementation conforming
to the standard be portable.  Therefore, there is no requirement
that the functionality be specified by the standard.

Concerns are being forwarded to the sponsor.

Q2,Q3,Q4,Q5,Q6 

The standard does not speak to these issues and no conformance
distinction can be made between alternative implementations based on
this.

Concerns are being forwarded to the sponsor.

Q7 The standard is unclear on this issue, and as such no conformance
distinction can be made between alternative implementations based
on this. This is being referred to the sponsor.


Q8 The standard states the required behavior and
conforming implementations shall conform to this.  According to
P.2 pg 729 line 367-368, the standard specifies the result is
undefined.
     

Rationale for Interpretation:
-----------------------------
None.

 _____________________________________________________________________________