Doc. no. N2683=08-0193
Date: 2008-06-27
Project: Programming Language C++
Reply to: Martin Sebor

issue 454: problems and solutions

Problems with proposed resolution (as of N2612)

  1. Limited Applicability: Only two operating systems are known to provide APIs for the manipulation of wide character file names: Symbian OS (see wfopen() in the Symbian Developer Library) and Microsoft Windows (see _wfopen() in the Microsoft Visual C++ Run-Time Library Reference). Most other popular operating systems, including UNIX flavors such as AIX, FreeBSD, HP-UX, IRIX, Linux, Mac OS/X, Solaris, and Tru64 UNIX, provide only narrow (8-bit clean) character interfaces. Some operating systems (such as HP OpenVMS) preclude the use of internationalized file names due to the restrictions they place on the characters used in such names.
  2. Inadequate Interface: It is insufficient to provide an interface to open or create files with wide character names without also providing interfaces to rename or remove files with such names. This is especially problematic when the translation from the internal wchar_t representation of the file name to the external representation is unspecified because it leaves programs with no means to portably rename or remove files created using the first interface. Another common operation involving file names that would suffer from the lack of interoperability due to the unspecified nature of the conversion is the traversal over the list of file names in a given directory (the POSIX functions opendir() and readdir()).
  3. Insufficient Control: The historical practice common on UNIX platforms is to allow the use of arbitrary encodings for file names, leaving it up to each user to decide on the most suitable encoding. While with the advent of Unicode this practice is slowly becoming displaced by the use of UTF-8 by all, it is still in widespread use.
  4. Underspecification: The behavior of C++ file streams is specified in terms of underlying FILEs, that is, "as if" basic_filebuf::open() called the C function fopen(). Since there is no corresponding wchar_t overload of fopen() (or wfopen()), introducing a wchar_t overload of basic_filebuf::open() opens a hole in the specification leaving the behavior of basic_filebuf objects opened using this overload substantially unspecified.

Solutions

The simplest solution for dealing with problem (1) above in C++ is to do nothing. Users on the two platforms that provide proprietary wchar_t file system APIs (such as wfopen()) can use those APIs. On all other platforms, users can use the existing standard interfaces (fopen() or filebuf::open()).

To adequately address problem (2) above, the C++ standard would need to add overloads for all functions that take narrow file name now. These include the following functions: fopen(), freopen(), remove(), rename(), tmpnam(), and possibly also system(). However, since these functions are specified by the C standard, and since previous enhancements in the area where the C and C++ standards overlap have proved to be exceedingly problematic in practice, the question of adding these extensions would be best left to the C committee. Another solution to problem (2) is for C++ to provide its own comprehensive interface to the file system, independent of C, such as that proposed in N1975, Filesystem Library Proposal for TR2 (Revision 3).

To address problem (3), it is necessary to provide a means for programs to select the desired external encoding of file names. The solution is to add a new bit to the ios_base::openmode bitmask, let's call it ios_base::cvtname, that, when set in the mode argument in a call to basic_filebuf::open(), has the effect of causing the function to convert the file name argument using the codecvt facet installed in the locale imbued in the basic_filebuf object. Thus, a basic_filebuf object opened by calling the wchar_t overload to open() with the ios_base::cvtname bit set in the mode argument will have predictable and portable behavior, namely that of opening a file whose name is the result of converting the const wchar_t* argument to a NTBS using the codecvt facet obtained as if by calling use_facet<codecvt<char_type> >(this->getloc()). With the ios_base::cvtname bit clear, the behavior or the function will be implementation-defined to allow implementers to provide reasonable default behavior appropriate for each platform.

The solution to problem (3) also largely resolves problem (4). While the default behavior of programs that do not set the ios_base::cvtname bit isn't specified by the standard (it is implementation-defined) or guaranteed to be portable from one implementation of C++ to another, the behavior of programs that do set the bit is precisely specified by C++ and guaranteed to have predictable results. That is not to say that the behavior of such programs is guaranteed to be the same across different operating systems, only that it will be the same on the same operating system regardless of the implementation of C++.

Proposal

We urge the library working group to defer this issue until such time that problem (2) has been addressed in the C standard, or until C++ has provided a comprehensive interface to the file system such as that proposed in the Filesystem Library Proposal.

Should the library working group choose to adopt the wchar_t overloads for file stream classes despite the problems noted above, we propose the following resolution.

Add a declaration of a new ios_base::openmode bit, ios_base::cvtname, to the definition of class ios_base in :

typedef T3 openmode;
static const openmode app;
static const openmode ate;
static const openmode binary;
static const openmode in;
static const openmode out;
static const openmode trunc;
static const openmode cvtname;
        

Add a row (the last row below) to Table 117: openmode effects in :

Table 117: openmode effects
Element Effect(s) if set
app seek to end before each write
ate open and seek to end immediately after opening
binary perform input and output in binary mode (as opposed to text mode)
in open for input
out open for output
trunc truncate an existing stream when opening
cvtname convert a file name according to the rules of codecvt before opening

Add the following member function declarations to the definition of class template basic_filebuf in :

//  Members
bool is_open() const;
basic_filebuf<charT,traits>* open (const char *s,
                                   ios_base::openmode mode);
basic_filebuf<charT,traits>* open (const char *ws,
                                   ios_base::openmode mode);
basic_filebuf<charT,traits>* open (const string &s,
                                   ios_base::openmode mode);
basic_filebuf<charT,traits>* open (const wstring &ws,
                                   ios_base::openmode mode);
        

Change , p2 as follows:


basic_filebuf<charT,traits>*
open (const char* s, ios_base::openmode mode);
        

Effects: If is_open() != false, returns a null pointer. Otherwise, initializes the filebuf as required. It then if (mode & ios_base::cvtname) != 0false, the function thenopens a file, if possible, whose name is the NTBS s ("as if" by calling std::fopen(s, modstr)). The NTBS modstr is determined from mode & ~(ios_base::ate | ios_base::cvtname) as indicated in Table 120. If (mode & ios_base::cvtname) == 0, the behavior of the function is implementation-defined.

[Note: the phrase "initializes the filebuf as required" above has been stricken since the object is already initialized and cannot be re-initialized in an ordinary member function.]

Add a new paragraph immediately after , p2, above Table 123, with the following text:

      
basic_filebuf<charT,traits>*
open (const wchar_t* ws, ios_base::openmode mode);
        

Effects: If is_open() != false, returns a null pointer. Otherwise, if (mode & ios_base::cvtname) != 0, converts the WCBS ws to an MBCS s "as if" by a call to a_codecvt.out(). If the conversion succeeds, the function returns open(s, mode & ~ios_base::cvtname). Otherwise, if the conversion fails, the function fails. If (mode & ios_base::cvtname) == 0, the behavior of the function is implementation-defined.

[Note: the symbol a_codecvt referenced above is an exposition-only member of class basic_filebuf defined in , p5.]

Add another new paragraph below the one above:

Note: In both overloads of open(), when (mode & ios_base::cvtname) == 0, implementations are expected to convert the file name argument to the "natural" representation for the system. For a system that "naturally" represents a filename as a WCBS , the NTBS s in the first signature is expected to be mapped to a WCBS; if so, it follows the same mapping rules as the first argument to open().

Also add the corresponding overloads to class templates basic_ifstream in , , , and basic_ofstream in , as follows.

[an error occurred while processing this directive]

Add the following member function declarations to the definition of class template basic_ifstream in :

// Constructors
basic_ifstream ();
explicit basic_ifstream (const char* s,
                         ios_base::openmode mode = ios_base::in);
explicit basic_ifstream (const wchar_t* s,
                         ios_base::openmode mode = ios_base::in);
explicit basic_ifstream (const string &s,
                         ios_base::openmode mode = ios_base::in);
explicit basic_ifstream (const wstring &s,
                         ios_base::openmode mode = ios_base::in);

// Members
basic_ifstream<charT, traits>* rdbuf() const;

bool is_open() const;
void open (const char *s, ios_base::openmode mode = ios_base::in);
void open (const wchar_t *s, ios_base::openmode mode = ios_base::in);
void open (const string& ios_base::openmode mode = ios_base::in);
void open (const wstring & ios_base::openmode mode = ios_base::in);
        

Add the following signature below the declaration (but above the Effects clause) of the constructor in , p2:

explicit basic_ifstream (const wchar_t* s,
                         ios_base::openmode mode = ios_base::in);
        

Add the following signature below the declaration (but above the Effects clause) of the constructor in , p3:

explicit basic_ifstream (const wstring &s,
                         ios_base::openmode mode = ios_base::in);
        

Add the following signature below the first declaration (but above the Effects clause) of open() in , p3:

void open (const wchar_t *s,
           ios_base::openmode mode = ios_base::in);
        

Add the following signature immediately below the second declaration (but above the Effects clause) of open() in , p4:

void open (const wstring &s,
           ios_base::openmode mode = ios_base::in);
        

[Note: the effects as well as the names of the formal function parameters of the added wchar_t overloads are exactly the same as those of the existing char functions and so don't need to be spelled out separately from the existing Effects clauses.]

[an error occurred while processing this directive]

Add the following member function declarations to the definition of class template basic_ofstream in :

// Constructors
basic_ofstream ();
explicit basic_ofstream (const char* s,
                         ios_base::openmode mode = ios_base::out);
explicit basic_ofstream (const wchar_t* s,
                         ios_base::openmode mode = ios_base::out);
explicit basic_ofstream (const string &s,
                         ios_base::openmode mode = ios_base::out);
explicit basic_ofstream (const wstring &s,
                         ios_base::openmode mode = ios_base::out);

// Members
basic_isftream<charT, traits>* rdbuf() const;

bool is_open() const;
void open (const char *s, ios_base::openmode mode = ios_base::out);
void open (const wchar_t *s, ios_base::openmode mode = ios_base::out);
void open (const string & ios_base::openmode mode = ios_base::out);
void open (const wstring & ios_base::openmode mode = ios_base::out);
        

Add the following signature below the declaration (but above the Effects clause) of the constructor in , p2:

explicit basic_ofstream (const wchar_t* s,
                         ios_base::openmode mode = ios_base::out);
        

Add the following signature below the declaration (but above the Effects clause) of the constructor in , p3:

explicit basic_ofstream (const wstring &s,
                         ios_base::openmode mode = ios_base::out);
        

Add the following signature below the first declaration (but above the Effects clause) of open() in , p3:

void open (const wchar_t *s,
           ios_base::openmode mode = ios_base::out);
        

Add the following signature immediately below the second declaration (but above the Effects clause) of open() in , p4:

void open (const wstring &s,
           ios_base::openmode mode = ios_base::out);
        

[Note: the effects as well as the names of the formal function parameters of the added wchar_t overloads are exactly the same as those of the existing char functions and so don't need to be spelled out separately from the existing Effects clauses.]

[an error occurred while processing this directive]

Add the following member function declarations to the definition of class template basic_fstream in :

// Constructors
basic_fstream ();
explicit basic_fstream (const char* s,
                        ios_base::openmode mode = ios_base::in | ios_base::out);
explicit basic_fstream (const wchar_t* s,
                       ios_base::openmode mode = ios_base::in | ios_base::out);
explicit basic_fstream (const string &s,
                        ios_base::openmode mode = ios_base::in | ios_base::out);
explicit basic_fstream (const wstring &s,
                       ios_base::openmode mode = ios_base::in | ios_base::out);

// Members
basic_fstream<charT, traits>* rdbuf() const;

bool is_open() const;
void open (const char *s, ios_base::openmode mode = ios_base::in | ios_base::out);
void open (const wchar_t *s, ios_base::openmode mode = ios_base::in | ios_base::out);
void open (const string & ios_base::openmode mode = ios_base::in | ios_base::out);
void open (const wstring & ios_base::openmode mode = ios_base::in | ios_base::out);
        

Add the following signature below the declaration (but above the Effects clause) of the constructor in , p2:

explicit basic_fstream (const wchar_t* s,
                        ios_base::openmode mode = ios_base::in | ios_base::out);
        

Add the following signature below the declaration (but above the Effects clause) of the constructor in , p3:

explicit basic_fstream (const wstring &s,
                        ios_base::openmode mode = ios_base::in | ios_base::out);
        

Add the following signature below the first declaration (but above the Effects clause) of open() in , p3:

void open (const wchar_t *s,
           ios_base::openmode mode = ios_base::in | ios_base::out);
        

Add the following signature immediately below the second declaration (but above the Effects clause) of open() in , p4:

void open (const wstring &s,
           ios_base::openmode mode = ios_base::in | ios_base::out);
        

[Note: the effects as well as the names of the formal function parameters of the added wchar_t overloads are exactly the same as those of the existing char functions and so don't need to be spelled out separately from the existing Effects clauses.]