Boost Filesystem Library

This Document
    Introduction
    Two-minute tutorial
    Examples
    Definitions
    Common Specifications
    Race-condition danger
    Acknowledgements
Other Documents
    Library Design
    FAQ
    Portability Guide
    path.hpp documentation
    operations.hpp documentation
    fstream.hpp documentation
    exception.hpp documentation
    convenience.hpp documentation
    Do-list

Introduction

The Boost Filesystem Library provides portable facilities to query and manipulate paths, files, and directories.

The motivation for the library is the need to be able to perform portable script-like operations from within C++ programs. The intent is not to compete with Python, Perl, or shell languages, but rather to provide portable filesystem operations when C++ is already the language of choice. The design encourages, but does not require, safe and portable filesystem usage.

The Filesystem Library supplies several  headers, all in directory boost/filesystem:

The object-library source files (convenience.cpp, exception.cpp, operations_posix_windows.cpp, and path_posix_windows.cpp) are supplied in directory libs/filesystem/src. These source files implement the library for POSIX or Windows compatible operating systems; no implementation is supplied for other operating systems. Note that many operating systems not normally thought of as POSIX or Windows systems, such as mainframe legacy operating systems or embedded operating systems, support POSIX compatible file systems which will work with the Filesystem Library.

The object-library can be built with a Jamfile supplied in directory libs/filesystem/build.

Two-minute tutorial

First some preliminaries:

#include "boost/filesystem/operations.hpp" // includes boost/filesystem/path.hpp
#include "boost/filesystem/fstream.hpp"    // ditto
#include <iostream>                        // for std::cout
namespace fs = boost::filesystem;

A class path object can be created:

fs::path my_path( "some_dir/file.txt" );

The string passed to the path constructor is in a portable generic path format. Access functions make my_path contents available in an operating system dependent format, such as "some_dir:file.txt", "[some_dir]file.txt", "some_dir/file.txt", or whatever is appropriate for the operating system.

Class path has conversion constructors from const char* and const std:: string&, so that even though the Filesystem Library functions in the following code snippet take const path& arguments, the user can just code C-style strings:

fs::remove_all( "foobar" );
fs::create_directory( "foobar" );
fs::ofstream file( "foobar/cheeze" );
file << "tastes good!\n";
file.close();
if ( !fs::exists( "foobar/cheeze" ) )
  std::cout << "Something is rotten in foobar\n";

Additional class path constructors provide for an operating system dependent format, useful for user provided input:

int main( int argc, char * argv[] ) {
fs::path arg_path( argv[1], fs::native ); // native means use O/S path format

To make class path objects easy to use in expressions, operator/ appends paths:

fs::ifstream file1( arg_path / "foo/bar" );
fs::ifstream file2( arg_path / "foo" / "bar" );

Note that expressions arg_path / "foo/bar" and arg_path / "foo" / "bar" yield identical results.

Class directory_iterator is an important component of the library. It provides input iterators over the contents of a directory, with the value type being class path.

The following function, given a directory path and a file name, recursively searches the directory and its sub-directories for the file name, returning a bool, and if successful, the path to the file that was found.  The code below is extracted from a real program, slightly modified for clarity:

bool find_file( const fs::path & dir_path,     // in this directory,
                const std::string & file_name, // search for this name,
                fs::path & path_found )        // placing path here if found
{
  if ( !fs::exists( dir_path ) ) return false;
  fs::directory_iterator end_itr; // default construction yields past-the-end
  for ( fs::directory_iterator itr( dir_path );
        itr != end_itr;
        ++itr )
  {
    if ( fs::is_directory( *itr ) )
    {
      if ( find_file( *itr, file_name, path_found ) ) return true;
    }
    else if ( itr->leaf() == file_name ) // see below
    {
      path_found = *itr;
      return true;
    }
  }
  return false;
}

The expression itr->leaf() == file_name, in the line commented // see below, calls the leaf() function on the path object returned by the iterator. leaf() returns a string which is a copy of the last (closest to the leaf, farthest from the root) file or directory name in the path object.

Notice that find_file() does not do explicit error checking, such as verifying that the dir_path argument really represents a directory. All Filesystem Library functions throw filesystem_error exceptions if they do not complete successfully, so there is enough implicit error checking that this application doesn't need to include additional error checking code.

The tutorial is now over; hopefully you now are ready to write simple, script-like, programs using the Filesystem Library!

Examples

simple_ls.cpp

The example program simple_ls.cpp is given a path as a command line argument. Since the command line argument may be a relative path, the complete path is determined so that messages displayed can be more precise.

The program checks to see if the path exists; if not a message is printed.

If the path identifies a directory, the directory is iterated through, printing the name of the entries found, and an indication if they are directories. A count of directories and files is updated, and then printed after the iteration is complete.

If the path is for a file, a message indicating that is printed.

Try compiling and executing simple_ls.cpp to see how it works on your system. Try various path arguments to see what happens.

Other examples

The programs used to generate the Boost regression test status tables use the Filesystem Library extensively.  See:

Test programs are sometimes useful in understanding a library, as they illustrate what the developer expected to work and not work. See:

Definitions

directory - A container provided by the operating system, containing the names of files, other directories, or both. Directories are identified by directory path.

directory tree - A directory and file hierarchy viewed as an acyclic graph.

path - A possibly empty sequence of names. Each element in the sequence, except the last, names a directory which contains the next element. The last element may name either a directory or file. The first element is closest to the root of the directory tree, the last element is farthest from the root.

It is traditional to represent a path as a string, where each element in the path is represented by a name, and some operating system defined syntax distinguishes between the name elements. Other representations of a path are possible, such as each name being an element in a std::vector<std::string>.

file path - A path whose last element is a file.

directory path - A path whose last element is a directory.

name - A file or directory name, without any directory path information to indicate the file or directory's actual location within a directory tree. For some operating systems, files and directories may have more than one valid name, such as a short-form name and a long-form name.

root - The initial node in the acyclic graph which represents the directory tree for a filesystem.

multi-root operating system - An operating system which has multiple roots. Some operating systems have different directory trees for each different disk, drive, device, volume, share, or other entity managed the system, with each having its own root-name.

link - A name in a directory can be viewed as a pointer to the underlying directory or file content. Modern operating systems permit multiple directory elements to point to the same underlying directory or file content. Such a pointer is often called a link. Not all operating systems support the concept of links. Links may be referenced counted or non-reference counted. Non-reference counted links are sometimes called symbolic links or shortcuts.

Common Specifications

Unless otherwise specified, all Filesystem Library member and non-member functions have the following common specifications:

May throw exceptions - Filesystem Library functions may throw filesystem_error exceptions if they cannot successfully complete their operational specifications. Function implementations may use C++ Standard Library functions, which may throw std::bad_alloc. These exceptions may be thrown even though the error condition leading to the exception is not explicitly specified in the function's "Throws" paragraph.

Exceptions thrown via boost::throw_exception() - All exceptions thrown by the Filesystem Library are implemented by calling boost::throw_exception(). Thus exact behavior may differ depending on BOOST_NO_EXCEPTIONS at the time the filesystem source files are compiled.

Links follow operating system rules- Links are transparent in that Filesystem Library functions simply follow operating system rules. That implies that some functions may throw filesystem_error exceptions if a link is cyclic or has other problems.

Typical operating systems rules call for deep operations on all links except that destructive operations on non-reference counted links are either shallow, or fail altogether in the case of trying to remove a non-reference counted link to a directory.

Rationale: Follows existing practice (POSIX, Windows, etc.).

No atomic-operation or rollback guarantee - Filesystem Library functions which throw exceptions may leave the external file system in an altered state. It is suggested that implementations provide stronger guarantees when possible.

Rationale: Implementors shouldn't be required to provide guarantees which are impossible to meet on some operating systems. Implementors should be given normative encouragement to provide those guarantees when possible.

Graceful degradation -  Filesystem Library functions which cannot be fully supported on a particular operating system will be partially supported if possible. Implementations must document such partial support. Functions which are requested to provide some operation which they cannot support should report an error at compile time or throw an exception at runtime.

Rationale: Implementations on less-powerful operating systems should provide useful functionality if possible, but should not be required to simulate features not present in the underlying operating system.

Race-condition danger

The state of files and directories is often globally shared, and thus may be changed unexpectedly by other threads, processes, or even other computers having network access to the filesystem. As an example of the difficulties this can cause, note that the following asserts may fail:

assert( exists( "foo" ) == exists( "foo" ) );  // (1)

remove_all( "foo" );
assert( !exists( "foo" ) );  // (2)

assert( is_directory( "foo" ) == is_directory( "foo" ) ); // (3)

(1) will fail if a non-existent "foo" comes into existence, or an existent "foo" is removed, between the first and second call to exists(). This could happen if, during the execution of the example code, another thread, process, or computer is also performing operations in the same directory.

(2) will fail if between the call to remove_all() and the call to exists() a new file or directory named "foo" is created by another thread, process, or computer.

(3) will fail if another thread, process, or computer removes an existing file "foo" and then creates a directory named "foo", between the example code's two calls to is_directory().

A program which needs to be robust when operating on potentially-shared file or directory resources should be prepared for filesystem_error exceptions to be thrown from any filesystem function except those explicitly specified as not throwing exceptions.

Implementation

The current implementation (September, 2002) supports operating systems that have either the POSIX or Windows API's available.

The following tests are provided:

As of December, 2002, these tests succeed for the following compilers on Windows:

As of December, 2002, limited use has been successful on Linux using GCC and IBM/AIX using Visual Age C++.

Acknowledgements

The Filesystem Library was designed and implemented by Beman Dawes. The directory_iterator and filesystem_error classes were based on prior work from Dietmar Kühl, as modified by Jan Langer. Thomas Witt was a particular help in later stages of development.

Key design requirements and design realities were developed during extensive discussions on the Boost mailing list, followed by comments on the initial implementation. Numerous helpful comments were then received during the Formal Review.

Participants included Aaron Brashears, Alan Bellingham, Aleksey Gurtovoy, Alex Rosenberg, Alisdair Meredith, Andy Glew, Anthony Williams, Baptiste Lepilleur, Beman Dawes, Bill Kempf, Bill Seymour, Carl Daniel, Chris Little, Chuck Allison, Craig Henderson, Dan Nuffer, Dan'l Miller, Daniel Frey, Darin Adler, David Abrahams, David Held, Davlet Panech, Dietmar Kuehl, Douglas Gregor, Dylan Nicholson, Ed Brey, Eric Jensen, Eric Woodruff, Fedder Skovgaard, Gary Powell, Gennaro Prota, Geoff Leyland, George Heintzelman, Giovanni Bajo, Glen Knowles, Hillel Sims, Howard Hinnant, Jaap Suter, James Dennett, Jan Langer, Jani Kajala, Jason Stewart, Jeff Garland, Jens Maurer, Jesse Jones, Jim Hyslop, Joel de Guzman, Joel Young, John Levon, John Maddock, John Williston, Jonathan Caves, Jonathan Biggar, Jurko, Justus Schwartz, Keith Burton, Ken Hagen, Kostya Altukhov, Mark Rodgers, Martin Schuerch, Matt Austern, Matthias Troyer, Mattias Flodin, Michiel Salters, Mickael Pointier, Misha Bergal, Neal Becker, Noel Yap, Parksie, Patrick Hartling, Pavel Vozenilek, Pete Becker, Peter Dimov, Rainer Deyke, Rene Rivera, Rob Lievaart, Rob Stewart, Ron Garcia, Ross Smith, Sashan, Steve Robbins, Thomas Witt, Tom Harris, Toon Knapen, Victor Wagner, Vincent Finn, Vladimir Prus, and Yitzhak Sapir

A lengthy discussion on the C++ committee's library reflector illuminated the "illusion of portability" problem, particularly in postings by JP Plauger and Pete Becker.


© Copyright Beman Dawes, 2002

Revised 16 March, 2003