Doc No: N3415 = 12-0105
Date: 2012-09-13
Reply to:  Bill Seymour <stdbill.h@pobox.com>


A Database Access Library

Bill Seymour
2012-09-13


Abstract

One feature that some programming languages have, usually as part of an associated library, is some mechanism for accessing SQL databases. This paper begins to explore the design of such a library for C++.

As this is probably not something that we would want to require of every standard library implementation, this library is proposed for a Technical Report (the former Type III TR) rather than a Technical Specification (née Type II TR).

One possible design might be something that looks like the java.sql.* stuff; but it seems like we can do better than that in C++. In particular, we can probably have result sets that look like collections and cursors that look like iterators.

In general, constructors and destructors are not shown below unless there’s something interesting about them.

The author will ask “What’s your interest?” in Portland; and he’ll have a working model ready for Bristol if he’s encouraged to build one.


Connecting to a database

class connection
{
public:
    connection(const std::string& protocol,
               const std::string& database,
               const std::string& user_id,
               const std::string& password);

    connection();

    bool has_scrolling_cursors() const noexcept;

    void commit();
    void rollback();

    bool auto_commit() const noexcept;
    bool auto_commit(bool) noexcept;
};

At a minimum, we’ll want to be able to specify which database we want to connect to and supply a user ID and password.

The constructor’s protocol argument specifies the mechanism for communicating with the database; and acceptable strings should include at least "SQL/CLI" which specifies the mechanism described in ISO/IEC 9075-3. (Most folks call this “ODBC”, but since we’re an ISO working group, we refer to the ISO standard.) Additional implementation-defined protocol strings, e.g., "Oracle Call Interface", may be supported.

What goes in the database argument is implementation-defined and probably depends on the protocol. For example, if the protocol is "Oracle Call Interface", database might be a string that gets looked up in a file called tnsnames.ora.

The default constructor will create a connection object from some sort of connection pool. This needs to be explored further.

In general, we won’t know what features we have available until after we’ve connected to a particular database. The has_scrolling_cursors() member function is shown as an example of one thing we might need to discover at run time. There could be others.

Transaction control typically happens at the connection level, not the statement level; so commit() and rollback() are members of this class.

Some databases have an “auto-commit” feature that treats every statement as a separate transaction. At a minimum, we need a way to turn that off.

Note that, unlike a java.sql.Connection, this class is not a statement factory. Rather, a connection object will be passed by reference to a statement constructor. (The idea is that we don’t want several different factory methods for creating different kinds of statements since that’s not extensible.)


Executing SQL statements


A base class for statements:

class statement
{
public:
    typedef /* some unsigned integer type */ position_type;
    typedef /* some integer type, possibly bool */ indicator;

    explicit statement(connection&);
    statement(connection&, const std::string& sql);

    virtual ~statement() noexcept;

    virtual void set_sql(const std::string& sql);

    void prepare();
    void prepare(const std::string& sql)
    {
        set_sql(sql);
        prepare();
    }

    template<class T> void set(position_type pos, const T& val);
    template<class T> void set_null(position_type pos);
    template<class T> void set_null(position_type pos, const T&);

    template<class T> void bind(position_type pos, T* val, indicator* ind = 0);
    void bind(position_type pos, char* val, std::size_t siz, indicator* ind = 0);

    void execute();
    void execute(const std::string& sql)
    {
        set_sql(sql);
        execute();
    }
};
This class provides all the functionality required to execute SQL statements that don’t return results; and statements that do return results can be derived from this. (Alternatively, this could be an abstract base class and we could derive from it a class called, say, ddl_statement, that provides no additional functionality; but that seems needlessly fussy.)

We use “dynamic SQL”; that is, the SQL statement that we intend to execute is just a string that gets interpreted by the database engine at run time. That’s fundamentally how ODBC and similar connection mechanisms work; and we can’t change that even though we might want to.

The SQL can be specified eagerly at construction time (statement(connection,string)), lazily at execution time (execute(string)), or at any time in between (set_sql(string)). Furthermore, the SQL can change between executions.

The SQL statement can have placeholders for data that change between executions; and such statements typically need to be “prepared”. Exactly what that means depends on the database engine.

After such statements have been prepared, but before they’re executed, the placeholders need to be replaced with the actual data; and that’s what set(position_type, const T&) does. This template is the equivalent of the Java PreparedStatement zoo of setString(), setInt(), etc. The first argument is an ordinal that specifies which placeholder we’re setting. (The usual practice, even in Java, is for these ordinals to be 1-based, not 0-based. Is that what we want to do?)

We sometimes need to set placeholders to a null value; and that’s what the set_null template does. The second argument in the two-argument version isn’t used for anything except inferring the template argument.

Both placeholders and returned values can be hooked to program variables. This is variously called “binding” or “describing” depending on whether you’re talking about input or output data. Both are just called “binding” in this paper.

For example, let’s say I have a placeholder of some integer type. Rather than calling set(1, my_int_value) before each execution of the SQL statement, between preparing the statement and the first execution I could say,

    int val;
    // ...
    my_statement.bind(1, &val);
and then just assign a value to val before each execution.

This works for data returned by a query as well:  the bound variable will have the new value after each fetch of a query’s cursor. (More about queries and cursors later.)

There’s a complication, however, if the value can be null. Each bound program variable optionally has associated with it an “indicator variable”, an integer that serves as a boolean indicating whether the bound variable contains real data or rather should be considered null.

We’ll also have a non-template bind() that binds to an array of char of specified size. This could be useful for LOBs and BLOBs.

Note that this paper doesn’t address an interesting problem:  inferring SQL types given C++ types. That might not even be possible in general and might require enums or other integer constants that can be used to specify SQL types. (There’s probably a reason why other database access libraries have them.)


Derived statement types:

Statements that return results can be derived from the statement class. They differ in the kind of results that they return.

Do we want to do this with template specialization rather than subclassing? (The template argument could be the result type with void being a reasonable choice.)

If we do use template specialization and have something like:

    typedef basic_statement<void> ddl_statement;
should we move the execute() functions to the derived classes? They could return the results (like in Java) rather than having separate results() functions as shown below.

On the other hand, having separate results() functions would allow deferring construction of a result, which might be expensive, to when the user actually asks for it. Note that the user might not ask for it at all, and so lazy evaluation seems appropriate.


Data manipulation statements:

class dml_statement : public statement
{
public:
    explicit dml_statement(connection&);
    dml_statement(connection&, const std::string& sql);

    typedef std::size_t result_type;
    result_type results() const;
};
These are INSERT, UPDATE and DELETE statements. They return the number of rows affected. If you don’t care how many rows are affected, you can just use an object of the statement type.


Procedure and function calls:

class row;

class call_statement : public statement
{
public:
    explicit call_statement(connection&);
    call_statement(connection&, const std::string& sql);

    void set_sql(const std::string&);

    typedef row result_type;
    result_type results();
};
Some databases allow storing procedures and functions, typically written in some vendor-lock-in language, for execution by the database itself; and such routines can return zero or more data elements. (A row is a possibly empty collection of data elements. More about that shortly.)

An overload of statement::set_sql() is shown because the actual syntax for calling stored routines might not be the same for all database engines, and so some translation might be needed.

If the routine doesn’t return any values, you might be able to just use a statement. It depends on whether we need to do any translation of the string that contains the call.

If the database doesn’t support such stored routines, the constructor should probably throw an exception.


Queries:

template<class Cursor> class table;

template<class Cursor>
class query : public statement
{
public:
    explicit query(connection&);
    query(connection&, const std::string& sql);

    void prefetch_count(std::size_t) noexcept;

    typedef table<Cursor> result_type;
    result_type results();
};
These are SELECT statements. They return whole tables. (A table is a possibly empty collection of rows. More about that shortly.)

Some databases allow specifying the maximum number of rows that will be returned in a single burst of communication from the database box. The prefetch_count(size_t) function sets that number. If the database doesn’t support setting a prefetch count, this function quietly does nothing. (There’s no reason to fail since the statement will still work just fine. It might not happen as quickly as the user would like, but there’s nothing that can be done about that anyway.)


Getting query results


One datum:

class column
{
public:
    // copyable, moveable, swappable

    bool is_null() const noexcept;

    template<class T> void get(T&) const;
    template<class T> T get() const;
};

bool operator==(const column&, const column&);
bool operator!=(const column&, const column&);
bool operator< (const column&, const column&);
bool operator> (const column&, const column&);
bool operator<=(const column&, const column&);
bool operator>=(const column&, const column&);
This is a kind of “any” type. Unfortunately, we really don’t know the actual type until run time.

They need to be copyable and moveable so that they can be stored in standard-library containers.

The get templates are the equivalent of a Java ResultSet’s getString(), getInt(), etc. By making these members of the column, instead of members of some collection of columns, we eliminate the argument that specifies which column we’re talking about.

We can have just these two member templates instead of a whole zoo of getFoo() functions like in Java. The idea is that T could be any fundamental type, std::string, and maybe std::crono::duration and std::crono::time_point or similar types that might be better for holding civil times (and that understand time zones).

(We should probably also support LOBs and BLOBs somehow, but this author has no experience with them and doesn’t understand the issues.)

Note that, in the database world, “null” is a possible value for anything; and any operation involving a null value yields a null. This is a problem with comparisons and is usually resolved by saying that any comparison involving a null is false. Do we want to mimic that behavior? It can sometimes have surprising results. (For example, == and != can both be false.)


A collection of columns:

A row looks like a const vector<column> that has the additional behavior of allowing retrieval of a particular column by name as well as by index. (Note the operator[](string) and at(string) members).

At a minimum, it needs to be moveable since call_statement::results() returns one of these by value.

class row
{
public:
    typedef implementation-detail cols; // the underlying container type

    typedef cols::size_type size_type;
    typedef cols::value_type value_type;
    typedef cols::const_reference reference;
    typedef cols::const_reference const_reference;

    // copyable, moveable, swappable

    size_type size() const;
    bool empty() const;

    reference operator[](size_type index) const;
    reference at(size_type index) const;

    reference operator[](const std::string& name) const;
    reference at(const std::string& name) const;

  //
  // And just in case anybody actually cares:
  //

    typedef cols::difference_type difference_type;
    typedef cols::const_pointer pointer;
    typedef cols::const_pointer const_pointer;
    typedef cols::const_iterator iterator;
    typedef cols::const_iterator const_iterator;
    typedef cols::const_reverse_iterator reverse_iterator;
    typedef cols::const_reverse_iterator const_reverse_iterator;

    reference front() const;
    reference back() const;

    iterator begin() const;
    iterator end() const;
    reverse_iterator rbegin() const;
    reverse_iterator rend() const;

    const_iterator cbegin() const;
    const_iterator cend() const;
    const_reverse_iterator crbegin() const;
    const_reverse_iterator crend() const;
};

bool operator==(const row&, const row&);
bool operator!=(const row&, const row&);
bool operator< (const row&, const row&);
bool operator> (const row&, const row&);
bool operator<=(const row&, const row&);
bool operator>=(const row&, const row&);


A collection of rows:

A table is the result set that a query generates. At a minimum, it needs to be moveable since query<>::results() returns one of these by value.

A table does not satisfy the requirements for a container. Indeed, it’s basically just an iterator factory.

template<class Cursor> class table
{
public:
    typedef const row value_type;
    typedef value_type* pointer;
    typedef value_type& reference;
    typedef Cursor iterator;
    typedef Cursor const_iterator;

    // copyable, moveable, swappable

    const_iterator begin() const;
    const_iterator end() const;

    const_iterator cbegin() const;
    const_iterator cend() const;
};
Should we make it reversible?
template<> class table<scrolling_cursor>
{
public:
    typedef const row value_type;
    typedef value_type* pointer;
    typedef value_type& reference;
    typedef scrolling_cursor iterator;
    typedef scrolling_cursor const_iterator;
    typedef std::reverse_iterator<iterator> reverse_iterator;
    typedef std::reverse_iterator<const_iterator> const_reverse_iterator;

    // copyable, moveable, swappable

    const_iterator begin() const;
    const_iterator end() const;

    const_iterator cbegin() const;
    const_iterator cend() const;

    const_reverse_iterator rbegin() const;
    const_reverse_iterator rend() const;

    const_reverse_iterator crbegin() const;
    const_reverse_iterator crend() const;
};


Cursors:

struct cursor_tag : public std::input_iterator_tag { };
struct scrolling_cursor_tag : public cursor_tag { };
Unfortunately, even a scrolling cursor, despite being able to move back and forth through the data, is still just an input iterator. The reason is that we’re dealing with read-only data. Furthermore, typically only one row is in memory at a time, and so every use potentially fetches a new row (very much like an istream_iterator).

But we can still have a scrolling_cursor_tag so that algorithms that want bidirectional or random-access iterators, but only need the moving-back-and-forth behavior, can be written to use scrolling_cursors as well.


A base class:

class cursor_base
{
public:
    typedef /*   signed integer type */ difference_type;
    typedef /* unsigned integer type */ position_type;
    typedef const row value_type;
    typedef value_type* pointer;
    typedef value_type& reference;

    // copyable, moveable, swappable

    bool fetch_next();

    reference operator*() const;
    pointer operator->() const;
};

bool operator==(const cursor_base&, const cursor_base&);
bool operator!=(const cursor_base&, const cursor_base&);
This is a base class into which we refactor all the stuff that doesn’t depend on whether it’s a scrolling cursor.

fetch_next() tries to get the next row from the database and returns whether it was successful. It performs no operation (except for returning false) when there is no next row.

For two cursors, lhs and rhs, lhs==rhs implies either that both are past the end, or that both are iterating over the same table and have fetched the same number of rows. As is the case for input iterators generally, == does not imply that *lhs and *rhs have the same identity.


The non-scrolling cursor:

class cursor : public cursor_base
{
public:
    typedef cursor_tag iterator_category;

    // copyable, moveable, swappable

    cursor& operator++()
    cursor  operator++(int);
};
Given cursor_base, all our non-scrolling cursor needs to add are iterator_category and the ++ operators.

Note that the copy returned by the postfix ++ might not be valid after the original is used in any way.


The scrolling cursor:

class scrolling_cursor : public cursor_base
{
public:
    typedef scrolling_cursor_tag iterator_category;

    // copyable, moveable, swappable

    bool fetch_prior();
    bool fetch_first();
    bool fetch_last();
    bool fetch_relative(difference_type);
    bool fetch_absolute(position_type);

    scrolling_cursor& operator++();
    scrolling_cursor  operator++(int);

    scrolling_cursor& operator--();
    scrolling_cursor  operator--(int);

    scrolling_cursor& operator+=(difference_type);
    scrolling_cursor& operator-=(difference_type);

    scrolling_cursor  operator+(difference_type) const;
    scrolling_cursor  operator-(difference_type) const;
    difference_type   operator-(const scrolling_cursor&) const;

    reference operator[](position_type); // does fetch_absolute
};

void advance(scrolling_cursor&, scrolling_cursor::difference_type);

scrolling_cursor::difference_type
  distance(const scrolling_cursor&, const scrolling_cursor&);

bool operator< (const scrolling_cursor&, const scrolling_cursor&);
bool operator> (const scrolling_cursor&, const scrolling_cursor&);
bool operator<=(const scrolling_cursor&, const scrolling_cursor&);
bool operator>=(const scrolling_cursor&, const scrolling_cursor&);
As stated before, despite the ability to move back and forth through the data, this is still just an input iterator: the data is read-only, the [] operator modifies the iterator itself by actually fetching a row, and using the cursor in any way will invalidate copies.

For two scrolling_cursors, lhs and rhs, where lhs most recently fetched the ith row and rhs most recently fetched the jth row, lhs < rhs implies that both are iterating over the same table and i < j (where “past the end” is greater than any actual row’s ordinal).


Handling errors


An exception:

class sql_error : public std::logic_error
{
    int  cd;
    char st[6];
public:
    sql_error(const std::string& msg, int code, const char* state = 0);
    sql_error(const char* msg, int code, const char* state = 0);

    int sqlcode() const noexcept { return cd; }
    const char* sqlstate() const noexcept { return st; }
    const char* errmsg() const noexcept { return logic_error::what(); }
};
The constructors’ third argument can be a null pointer, and defaults to a null pointer, for the benefit of sub-standard database engines that don’t support SQLSTATE.

Since SQLSTATE, when it’s supported at all, is always at most five characters, this can be implemented as a C-style '\0'-terminated string in a char[6] without fear of causing any additional problems (like where to put a std::string when the problem is that we’re running out of memory).


Low-level error messages:

extern std::ostream* error_stream;
extern std::ostream* warning_stream;

bool write_error_message(const char*) noexcept;
bool write_error_message(const std::string&) noexcept;
bool write_warning_message(const char*) noexcept;
bool write_warning_message(const std::string&) noexcept;
This is mainly an implementation detail in the author’s proof of concept (in development…probably ready for Bristol); but there doesn’t seem to be any compelling reason to keep it secret; and the user might find it useful to control where error and warning messages get written. Indeed, many users might want to turn off warnings entirely.

Both of the above pointers are initialized to &std::cerr by default. The user can choose to send error and/or warning messages to a file instead, or can turn such messages off entirely by setting either or both of the pointers to null.


Bill Seymour, <stdbill.h@pobox.com>