Dynamic Libraries in C++

Notes from the Technical Session in Santa Cruz, Oct. 21, 2002

*Document number:*	N1418 = 02-0076
*Date:*	November 11, 2002
*Project:*	Programming Language C++
*Reference:*	ISO/IEC IS 14882:1998(E)
*Reply to:*	Pete Becker
	Dinkumware, Ltd.
	petebecker@acm.org

Overview · Linking and Libraries · Usage Models · Semantic Issues

Overview

Many operating systems today support applications consisting of an executable file and one or more dynamic libraries¹. Compilers for such operating systems typically provide language extensions that support fine-grained control over the process of creating such applications. The C and C++ language standards, however, say nothing about dynamic libraries, so it is difficult to write portable applications that use them. This paper provides background material needed to better understand some of the problems posed by applications that use dynamic libraries.

Perhaps the most important impediment to discussion of dynamic libraries is differing notions of what the term "dynamic library" means. Systems programmers know the details of how dynamic libraries are loaded and how names defined in dynamic libraries are resolved; application programmers know what they want to do with dynamic libraries. Designing a model of dynamic libraries that is suitable for standardization requires exposure to both domains. The systems programming aspects of dynamic libraries are discussed in Linking and Libraries, and the application programming aspects are discussed in Usage Models. Finally, there are several decisions that must be made concerning what the language should say about dynamic libraries. These are discussed in Semantic Issues.

Linking and Libraries

static linking

Formally, a program in C or C++ consists of one or more translation units which are compiled separately. The resulting object files are then linked together to produce an executable file:

/home/pete$ cc -c test.cpp
/home/pete$ cc -c helper.cpp
/home/pete$ cc test.o helper.o

In practice the link step is often handled by a script or by the compiler itself, so an application can be compiled and linked with a single command:

/home/pete$ cc test.cpp helper.cpp

Compilers also support the use of libraries. A library is nothing more than a set of object files grouped together into one file. A library is linked to an application by putting its name on the command line:

C:\work> cc test.obj helper.obj mylib.lib

Despite the apparent similarities, though, there is usually an important difference between linking an object file into an application and linking a library into an application. With most implementations the linker puts all of the functions and data objects from each object file into the application. When linking a library into an application, however, only the parts of the library that are needed by the application are linked in. The linker scans the library for object files that define names needed by other parts of the application, and links those object files into the application. The rest of the library isn't used. For example:

#include <stdlib.h>
#include <stdio.h>
int main()
{
puts("Hello, world");
exit(0);
}

When the compiler compiles this translation unit it produces an object file that defines the symbol main² and has internal notes that tell the linker that the object file needs definitions for the symbols puts and exit. When the linker links this object file to produce an application it looks through the standard library ³ for an object file that defines one or both of those names and links it into the application, adding any names that that object file needs definitions for into the list of names that it is searching for. The link step is complete only when the linker has resolved all of the symbols needed by the object files that constitute the application and all of the symbols needed by the object files that it linked in from libraries.

application loading

To run an application the operating system calls the program loader and gives it the name of the executable file. The program loader finds memory space for the application and copies the application's executable code into the memory space. Then it does any adjustments to the executable code that are needed to make it ready to run. For example, memory addresses in the executable code might need to be adjusted to reflect the actual location in memory where the program has been loaded. Once these loader fixups have been made the loader turns execution over to the application.

linking to dynamic libraries

When an application uses dynamic libraries the picture changes. The code contained in the dynamic library is not linked into the application; in fact, the code often doesn't even have to be present on the system when the application is linked. The linker just makes notes in the executable file about symbols that it thinks will be resolved by dynamic libraries⁴.

dynamic loading and loader fixups

To run an application that uses dynamic libraries the program loader has a great deal more work to do: in addition to loading the executable file into memory it has to find all of the dynamic libraries that the executable file depends on, including those needed by other dynamic libraries that it has loaded. After a dynamic library has been loaded, each function call from the executable file or from another dynamic library into that dynamic library has to be fixed up. These fixups can't be done any sooner, because it is only at load time that the locations in memory of the dynamic libraries that constitute the application are known. Thus, some of the work that the linker does when linking to a static library is deferred until load time when linking to dynamic libraries.

manual loading

It is also possible to manually load a dynamic library at runtime. This is done by passing the name of the dynamic library to a system function that loads the library and returns a handle that the application can use to refer to the code in the library. After successfully loading the library the application can get the addresses of symbols defined in the dynamic library by calling another system function and passing it the handle for the library and the name of a symbol.

Usage Models

For the designer of an application there are three usage models for dynamic libraries, reflecting the three forms of linking and loading discussed in the previous section.

monolithic applications

A monolithic application is an application that doesn't explicitly use any dynamic libraries⁵. All of the application's code must be present when the application is built, and all of the code is statically linked into the application. This is the traditional C and C++ program model. It imposes the tightest coupling among an application's components, which reduces flexibility and increases robustness.

closed applications

A closed application uses dynamic libraries but doesn't manually load any dynamic libraries at runtime. The application designer determines what the application will be able to do and distributes the applicaton's code among the executable file and the application's dynamic libraries. This allows for the possibility of upgrading the application by distributing an updated executable file or updated dynamic libraries while leaving the unaffected code in place. Such an application is less tightly coupled than a monolithic application, and requires more care to ensure that new components work with correctly older versions.

plug-ins

An application that supports plug-ins manually loads dynamic libraries to supplement its capabilities. Plug-ins often come from the application implementor, but they can also come from third-party developers. To support the latter, the application implementor documents the interface that a plug-in must support, and it provides, documents, and maintains services needed by plug-ins. Unlike a closed application, an application that supports plug-ins permits extensions that were not designed into the application. In this sense such an application is less tightly coupled than a closed application; however, this flexibility comes at a price: new versions of the application must continue to provide the old version's support services so that existing plug-ins will continue to work.

Semantic Issues

exporting and importing

Under Windows, when a dynamic library is built symbols that are intended to be used in code that uses the dynamic library must be marked as exported. Further, in code that uses symbols from a dynamic library each such symbol must be marked as imported. This marking is done by adding implementation-specific keywords to the declarations of these symbols. Each such symbols is modified by a macro that expands to the appropriate keyword for an exported symbol when the dynamic library is being built and to the appropriate keyword for an imported symbol when the dynamic library is being used:

#ifndef MY_HEADER
#define MY_HEADER
#if BUILD_MY_LIBRARY
 #define MY_LIBRARY_DECL __declspec(dllexport)
#else
 #define MY_LIBRARY_DECL __declspec(dllimport)
#endif

MY_LIBRARY_DECL void f(int);
#endif /* MY_HEADER */

Under Unix, the default is that when a dynamic library is built all symbols with external linkage are made available to code that uses the dynamic library. Nothing has to be done to the source code to make symbols available from a dynamic library or to use symbols defined in a dynamic library.

Both of these approaches have problems. The Windows approach requires careful maintenance of the macros that describe the dynamic library. Moving code from one dynamic library to another requires changing the controlling macros so that the symbols will be marked as exported from the new library (e.g., the macro BUILD_MY_LIBRARY in the example above would have to be changed to a name that was defined when building the new library). The Unix approach, simply put, does too much. It exposes internal details that the designers of a dynamic library would prefer to keep private. Unix compilers address this problem from outside the language through a text file that tells the linker which names to make available from a dynamic library. This is obviously awkward, and some compilers are moving toward a keyword-based approach.

Overall, it looks like some form of language support is needed to provide fine-grained control over which symbols are made available by dynamic libraries. There doesn't appear to be any technical barrier to simply marking a symbols as exported (with whatever syntax is deemed appropriate); with that information the compiler can generate whatever information is needed when it sees the definition of that symbol and when it sees a use of that symbol⁶.

language support

The syntax for declaring exported symbols ought to be simple. One possibility that has been discussed on the mail reflector is extending the syntax for a linkage-specification, so that a symbol that is defined in a dynamic library could be marked with something like extern "library"⁷. The following is not intended to be a proposal, merely a survey of the issues presented.

Ordinary functions and data objects can be marked in the same way as they can be labeled extern "C":

extern "library" {
int i;      // i is defined in a dynamic library
void f();   // f is defined in a dynamic library
}

extern "library" double d;
            // d is defined in a dynamic library

The symbols defined by a class consist of its member functions and its static data members. Putting the implementation of a class into a dynamic library requires being able make all of those symbols available:

extern "library" {
    class C {
    public:
        void f();       // C::f is defined in a dynamic library
        static int i;   // C::i is defined in a dynamic library
    };
}

Templates are patterns for creating functions and classes. They are not, in themselves, code or data. Thus, they do not need any special handling for dynamic libraries. Rather, it is template instances that must be labeled when their code and data are in a dynamic library:

template <class T> struct C {
    void set(const T&tt) {t = t;
    T get() {return t; }
private:
    T t;
    };

extern "library" {
template <> C<int>; // C<int>::set and C<int>::get
                    // are defined in a dynamic library
}

The compiler also generates data that is used by the implementation, such as the data that supports runtime type information. For applications that support plug-ins it may be important to control the availability of such data, since writers of plug-ins may rely on the availability of type information for some of the application's types. This poses a problem, since the name of the data structure that holds type information⁸ is not usually known to the user. Some other syntax would be needed to support control of this data.

semantic complications

There are several semantic issues that the standard would have to address, mostly turning on the applicability of the one-definition rule when dynamic libraries are used. What should an implementation be required to do if two dynamic libraries export the same symbol? What should an implementation be required to do if two dynamic libraries define the same symbol as a symbol with external linkage but do not export it? What should an implementation be required to do if two dynamic libraries define the same type? (For example, when code in a dynamic library throws an exception, should code which called that code from another dynamic library be able to catch that exception?)

1. In Windows they're known as DLLs; in Unix they're shared libraries. Throughout this paper they are referred to as "dynamic libraries", in the hope that the name suggests that the two models are similar and that they are different.

2. This discussion ignores name mangling.

3. Although it generally doesn't appear on the command line, the standard library is usually no different from a user-defined library except that the compiler knows its name and passes that name to the linker even if it isn't mentioned on the command line.

4. This is deliberately vague, because the details vary fairly widely from system to system.

5. An application that doesn't explicitly use any dynamic libraries will often use dynamic libraries anyway -- the standard library for C and C++ is often packaged as a dynamic library. However, this is usually not something that the application designer need be concerend with; the implementor will make it work.

6. For Windows programmers this is a simplification; for Unix programmers it is a complication.

7. The use of "library" here is intended only as an aid to exposition, not as a recommendation.

8. If there is, in fact, such a name at all. Some implementations store this information in the vtable.