Doc. no.   J16/04-0016=WG21/N1576
Date:        6 February 2004
Project:     Programming Language C++
Reply to:   Beman Dawes <bdawes@acm.org>

Filesystem library query

Introduction

This paper is a query to determine interest by the Library Working Group in a future proposal for a C++ filesystem component based on the Boost Filesystem Library. Such a component would be suitable for a future standard or a future TR. This paper is not itself such a proposal.

The Boost Filesystem Library (www.boost.org/libs/filesystem) provides portable facilities to query and manipulate paths, files, and directories. The library is widely used. It would be a pure addition to the C++ standard, leaving in place existing standard library functionality where there is overlap.

The motivation for the library is the desire to perform portable, safe, script-like filesystem operations from within C++ programs. Because the C++ Standard Library currently contains no facilities for such filesystem tasks as directory iteration or directory creation, programmers currently must rely on operating system specific C-style interfaces, making it difficult to write portable programs.

The intent is not to compete with Python, Perl, or shell scripting languages, but rather to provide filesystem operations where C++ is already the language of choice. The design encourages, but does not require, safe and portable filesystem usage.

Sample program using the Boost library

#include "boost/filesystem/operations.hpp"
#include <iostream>

namespace fs = boost::filesystem;
using std::cout;

int main( int argc, char* argv[] )
{
  fs::path p( argc <= 1 ? "." : argv[1] );

  if ( !fs::exists( p ) ) // does not exist
    cout << "Not found: " << argv[1] << '\n';

  else if ( fs::is_directory( p ) ) // is a directory
  {
    for ( fs::directory_iterator dir_itr( p );
          dir_itr != fs::directory_iterator(); ++dir_itr )
    {
      // display only the rightmost name in the path
      cout << dir_itr->leaf() << '\n';
    }
  }

  else // is a file
    cout << "Found: " << argv[1] << '\n'; 
  return 0;
}

Users say they prefer the Filesystem library's interface to native operating system or POSIX API's, even in code without portability requirements.

Important Design Decisions

Portable functionality and behavior

The library provides only functionality and behavior which can be supported uniformly on many different operating systems. As a practical matter, this means functionality and behavior which can be specified to work uniformly on POSIX and Windows. Since modern versions of legacy operating systems such as OS/390 and System/z provide POSIX support, the library can be implemented on these systems. Examples of behavior which is not supported because of portability concerns includes manipulation of file and directory attributes. The emphasis on portable behavior drove many design choices.

Portable Paths

Consider this code:

if ( !exists( "foobar/cheese" ) )
  cout << "Something is rotten in foobar\n";

The exists() function returns true if the indicated file or directory is present in the external file system. The signature is:

bool exists( const path & );

The "foobar/cheese" argument is written according to a portable generic path grammar and is converted to an object of class path, which the implementation translates into the operating system's native format for use in operating system calls. For example, if the operating system uses colons as path element separators, the path above would be passed to the operating system as "foobar:cheese".  Class path has much useful and interesting functionality for manipulating filesystem paths, and for ensuring that names in paths meet application specific requirements.  Non-portable (native) path grammar is also supported.

Use-driven design

Because of the desire to support simple "script-like" usage, use cases often drove design choices. For example, class path has conversion constructors from const char * and const std::string &, allowing users to write if (exists( "foo")) rather than if (exists(path("foo"))).

Errors reported via exceptions

Like all I/O, filesystem operations often encounter runtime errors both expected and unexpected. The library reports runtime errors via C++ exceptions.

Throws heavy-weight exceptions

Filesystem operations often encounter errors such as "File not found" which must be reported to human users. To ensure that the exceptions thrown for such errors contain sufficient information for users to resolve the error, and to eliminate the need for programs to include numerous try/catch blocks, the library throws relatively heavy-weight exceptions. There is a single filesystem_error type, with two error codes, two paths, and two messages. While the details could certainly change a great deal, the overall needs for avoiding try/catch blocks after every operation and for allowing detailed user customization based on error details has to be dealt with one way or another.

Automatic testing for relative portability of names in paths

Because there is no such thing as absolute portability for names of files and directories, the design uses a relative portability approach which allows the user to specify which name portability rules are desired. Default, global user-specified, and per constructor user-specified portability checking allows an application to perform as much or as little portability checking as desired. The experience with automatic checking is that it often identifies programmer oversights before they become serious problems.

Sub-namespace "filesystem"

The Filesystem library includes several components which are essentially new versions of components already in the current C++ Standard Library. Specifically: remove, rename, basic_filebuf, filebuf, wfilebuf, basic_ifstream, ifstream, wifstream, basic_ofstream, ofstream, wofstream, basic_fstream, fstream, and wfstream. The primary difference for the iostream (clause 27) classes is that seven constructors and open functions now take arguments of const path &. Specifications and implementation simply reference the equivalent components in clause 27 of the current standard. remove and rename differ in the type of their arguments, their return types, and how they handle errors. Note that there is no intent to deprecate any components in the current standard; these are in use in millions of lines of existing code and must be preserved.

The versioning problem this creates is not unique to the Filesystem library; it is simply the first place where the C++ committee must face the problem.

Two choices were considered; to give the components completely different names or to place them in a sub-namespace. My thinking for the Boost library was that new names would be a serious confusion, and so the new components were placed in sub-namespace filesystem. For the standard library, a filesystem component should use the same versioning approach used by other standard library components.

Path equality versus equivalence

Path equality is defined essentially as string equality, and path equivalence as a determination (implemented using native filesystem API operations) actually point to the same file or directory. Path equivalence isn't crucial to the library, and wasn't provided in early versions. It is only mentioned here because LWG members have indicated interest in path equality and equivalence issues.

Remaining work: Internationalization

The Boost Filesystem library is not currently internationalized; that work is underway. The approach being prototyped uses a basic_path template, with path and wpath typedefs, similar to strings and iostreams. Paths will need the ability to imbue a locale, to handle the conversion between internal and external representations.


© Copyright Beman Dawes, 2004

Revised February 09, 2004