A URI Library for C++

Document Number: N3420=12-0110
Date: 2012-09-21
Authors: Glyn Matthews <glynos@acm.org>, Dean Michael Berris <dberris@google.com>

Motivation and Scope

Given the increased importance of being able to develop portable, scalable network-aware applications, C++ developers are at a disavantage in having no standard network implementation to use. One of the fundamental components of any network library is a URI and this proposal is motivated by the desire to introduce a portable, efficient and internationalized implementation of a URI to C++ standard library users.

This proposal is based on original development done in the cpp-netlib project http://cpp-netlib.github.com/. This implementation is released using the Boost software license and will track the proposed library as it evolves.

The scope of this proposal will include a single uri type, some specifications about how URIs are intended to be processed and extended, including some additional helper types and functions. It will include a type and functions to build a URI from its components. Finally, it will include types and functions for percent encoding, URI references and URI normalization.

This is a preliminary proposal. There are still many omissions and the most important open issues are listed in the final section.

Example Usage

std::network::uri uri("http://www.example.com/glynos/?key=value#frag");
assert(uri.is_valid());
assert(uri.is_absolute());
assert(!uri.is_opaque());
assert(uri.scheme().string() == "http");
assert(uri.host().string() == "www.example.com");
assert(uri.path().string() == "/glynos/");
assert(uri.query().string() == "?key=value");
assert(uri.fragment().string() == "frag");

The code excerpt above shows a simple of how the proposed uri will work. The URI string is parsed during object construction and broken down into its component parts. HTTP URIs are absolute and hierarchical (i.e. not opaque).

std::network::uri uri(U"xmpp:example-node@example.com?message;subject=Hello%20World");
assert(uri.is_valid());
assert(uri.is_absolute());
assert(uri.is_opaque());
assert(uri.scheme().string() == "xmpp");
assert(uri.path().string() == "example-node@example.com");
assert(uri.query().string() == "?message;subject=Hello%20World");

The uri in this proposal supports encoded strings and supports encoding conversion. The example above shows a uri object constructed using a std::u32string and allow the parts to be accessed as std::string objects in UTF-8 encoding.

Generic Syntax

The generic syntax of a URI is defined in RFC 3986.

All URIs are of the form:

scheme ":" "hierarchical part" [ "?" query ] [ "#" fragment ]

The scheme is used to identify the specification needed to parse the rest of the URI. A generic syntax parser can parse any URI into its main parts. The scheme can then be used to identify whether further scheme-specific parsing can be performed.

The hierarchical part refers to the part of the URI that holds identification information that is hierarchical in nature. This may contain an authority (always prefixed with a double slash "//") and/or a path. The path part is required, thought it may be empty. The authority part holds an optional user info part, ending with an at sign "@"; a host identifier and an optional port number, preceded by a colon ":". The host may be an IP address or domain name. RFC 3986 does not specify the format for IPv6 addresses, though RFC 2732 does.

The query is an optional part starting with a question mark "?" that contains information that is not hierarchical.

Finally, the fragment is an optional part, prefixed by a hash symbol "#" that is used to identify secondary sources.

RFC 3987 specifies a new protocol element, the Internationalized Resource Identifier (IRI). The IRI complements a URI, and extends it to allow unicode characters.

This proposal will define a uri type that will attempt to encompass all three of these RFCs.

Percent Encoding

URI percent encoding is described in RFC 3986, section 2.1 and RFC 3986, section 2.4.

Percent encoding is the mechanism used to encode reserved characters in a URI. According to RFC 3986, section 2.2, the set of reserved characters are:

Set of reserved characters and percent encoded strings
! # $ & ' ( ) * + , / : ; = ? @ [ ]
%21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D

Percent encoding is not limited to reserved characters. Any character data may be percent encoded:

Common characters and percent encoded strings
newline space " % - . < > \ ^ _ ` { | } ~
%0A %20 %22 %25 %2D %2E %3C %3E %5C %5E %5F %60 %7B %7C %7D %7E

URI Normalization and Comparison

URI normalization is described in RFC 3986, Section 6 and in RFC 3987, Section 5. Normalization is the process by which a URI is transformed in order to determine if two URIs are equivalent.

Different types of normalization may preserve semantics, and others may not. Normalization may also depend on the scheme.

Converting the Scheme / Host to Lower Case

The scheme and host are case-insensitive. The proposed normalization solution will convert these to lowercase.

HTTP://Example.com/ --> http://example.com/

The user info, path, query and fragment are case-sensitive and so must not be converted.

Capitalizing Characters in Escape Sequences

Characters in a percent-encoded triplet are case-insensitive. The proposed normalization solution will convert these to lowercase.

http://example.com/%5b%5d --> http://example.com/%5B%5D

Decoding Unreserved Characters

Unreserved characters that have been encoded will be decoded.

http://example.com/%7Eglynos/ --> http://example.com/~glynos/

Adding Trailing /

If a path refers to a directory, it should be indicated with a trailing slash.

http://example.com/glynos --> http://example.com/glynos/

But not if the path refers to a file.

http://example.com/glynos/page.html --> http://example.com/glynos/page.html

Removing dot-segments from the Path

The segments ”..” and ”.” can be removed according to the algorithm specified in RFC 3986, Section 5.2.4.

http://example.com/glynos/./proposals/../ --> http://example.com/glynos/

Removing the default port

Some schemes may have a default port (for HTTP it is 80). The default port can be removed.

http://example.com:80/ --> http://example.com/
http://example.com:/ --> http://example.com/

The Comparison Ladder

The Comparison Ladder is described in RFC 3986, Section 6.2. It explains that comparing URIs using normalization can be implemented in different ways according to the complexity of the method and the number of false negatives which may arise.

String comparison: The simplest and fastest method is to simply test the URI strings byte-for-byte.
Case normalization: The first step to reduce false negatives is to normalize the parts that are case-insenstive - the scheme and the host and any percent-encoded triplets.
Percent encoding normalization: Next, any percent-encoded triplets that correspond to unreserved characters can be decoded.
Path segment normalization: Any dot-segments can be removed from the path.
Scheme based normalization: Trailing slashes can be added and default ports can be removed. Additionally for HTTP, key/value pairs in the query can appear in any order.
Protocol based normalization: Finally, URI equivalence can be tested by testing the resources directly, e.g. using HTTP to see if one URI redirects to another.

The final two steps in the Comparison Ladder require more information than can be provided within the limits of the proposal in order to be implemented comprehensively, and will not form part of the proposal at this stage.

URI References

URI references are described in RFC 3986, section 4, RFC 3986, section 5 and RFC 3987, section 6.5. URI references are particularly useful when working on the server side when the base URI is always the same, and also when using URIs within the same document.

Two operations related to references are of use: acquiring the relative reference of a URI, and resolving a reference against a base URI.

Header <network/uri> Synopsis

namespace std {
namespace network {
// class declarations
class uri;

// swap functions
void swap(uri &lhs, uri &rhs);

// hash functions
std::size_t hash_value(const uri &u);

// equality and comparison operators
bool operator == (const uri &lhs, const uri &rhs);
bool operator != (const uri &lhs, const uri &rhs);
bool operator <  (const uri &lhs, const uri &rhs);
bool operator <= (const uri &lhs, const uri &rhs);
bool operator >  (const uri &lhs, const uri &rhs);
bool operator >= (const uri &lhs, const uri &rhs);

// percent encoding and decoding
template <class String>
String pct_encode(const String &source);
template <class String>
String pct_decode(const String &source);

// stream operators
std::ostream &operator << (std::ostream &os, const uri &u);
std::wostream &operator << (std::wostream &os, const uri &u);
std::istream &operator >> (std::istream &os, uri &u);
std::wistream &operator >> (std::wistream &os, uri &u);

// transformers
uri normalize(const uri &u);
uri relativize(const uri &u1, const uri &u2);
uri resolve(const uri &u1, const uri &u2);
template <class String>
uri resolve(const uri &u1, const String &u2);
} // namespace network
} // namespace std

Declarations

The <network/uri> header contains a declaration for a single uri class in the std::network namespace.

At this stage, the sub-namespace network namespace should be regarded as a placeholder for a namespace specified for network components during the standardization process (should such as sub-namespace be specified).

Equality and Comparison Operators

namespace std {
namespace network {
bool operator == (const uri &lhs, const uri &rhs);
bool operator != (const uri &lhs, const uri &rhs);
bool operator <  (const uri &lhs, const uri &rhs);
bool operator <= (const uri &lhs, const uri &rhs);
bool operator >  (const uri &lhs, const uri &rhs);
bool operator >= (const uri &lhs, const uri &rhs);
} // namespace network
} // namespace std
Effects: This proposal specifies common overloads of the equality, inequality and comparison. The equality and inequality operators test two uri objects according to the notion of equivalence (RFC 3986, section 6.1 and RFC 3986, section 6.2).

Percent Encoding and Decoding

namespace std {
namespace network {
template <class String>
String pct_encode(const String &source);
} // namespace network
} // namespace std
Effects: Encodes special characters in the source string and returns the encoded string (RFC 3986, section 2.1).
Returns: A percent encoded string.
std::string s = "string with spaces";
assert(std::network::pct_encode(s) == "string%20with%20spaces");
namespace std {
namespace network {
template <class String>
String pct_decode(const String &source);
} // namespace network
} // namespace std
Effects: Decodes special characters in the source string and returns the unencoded string (RFC 3986, section 2.1).
Returns: A percent decoded string.

Stream Operators

namespace std {
namespace network {
std::ostream &operator << (std::ostream &os, const uri &u);
std::wostream &operator << (std::wostream &os, const uri &u);
} // namespace network
} // namespace std

This proposal specifies output stream operators for character and wide character streams.

Preconditions: u.is_valid()
namespace std {
namespace network {
std::istream &operator >> (std::istream &is, uri &u);
std::wistream &operator >> (std::wistream &is, uri &u);
} // namespace network
} // namespace std

This proposal specifies input stream operators for character and wide character streams.

Transformers

This proposal specifies three transformer functions: normalize, relativize and resolve.

namespace std {
namespace network {
uri normalize(const uri &u);
} // namespace network
} // namespace std
Preconditions: u.is_valid()
Postconditions: std::network::normalize(u).is_valid() && std::network::normalize(u) == u
Effects: normalize takes as an argument a uri object and returns a valid, normalized uri object.
namespace std {
namespace network {
uri relativize(const uri &u1, const uri &u2);
} // namespace network
} // namespace std
Preconditions: u1.is_valid() && u2.is_valid()
Postconditions: relativize(u1, u2).is_valid() && !relativize(u1, u2).is_absolute()
Effects: Returns a relative URI.
std::network::uri base_uri("http://www.example.com/");
std::network::uri uri("http://www.example.com/glynos/?key=value#fragment");
std::network::uri rel_uri(base_uri.relativize(uri));
assert(rel_uri.string() == "?key=value#fragment");
namespace std {
namespace network {
uri resolve(const uri &u1, const uri &u2);
} // namespace network
} // namespace std
Preconditions: u1.is_valid() && u2.is_valid()
Postconditions: resolve(u1, u2).is_valid() && resolve(u1, u2).is_absolute()
Effects: resolve resolves the second uri object against the first, and returns a new uri.
namespace std {
namespace network {
template <class String>
uri resolve(const uri &u1, const String &u2);
} // namespace network
} // namespace std
Preconditions: u1.is_valid()
Postconditions: resolve(u1, u2).is_valid() && resolve(u1, u2).is_absolute()
Effects: resolve parses u2 and resolves it against the first, and returns a new uri.

Class uri

Below is the proposed interface for the uri class:

namespace std {
namespace network {
class uri {

public:

    class builder;

    // typedefs
    typedef ... value_type;
    typedef basic_string<value_type> string_type;
    typedef string_type::const_iterator iterator;
    typedef string_type::const_iterator const_iterator;

    // range types
    class part_range;

    // constructors and destructor
    uri();
    template <typename InputIterator>
    uri(const InputIterator &first, const InputIterator &last);
    template <class Source>
    uri(const Source &source);
    uri(const uri &other);
    uri(uri &&other) noexcept;
    ~uri();

    // assignment
    uri &operator = (const uri &other);
    uri &operator = (uri &&other);
    template <typename InputIterator>
    void assign(const InputIterator &first, const InputIterator &last);
    template <class Source>
    void assign(const Source &source);

    // swap
    void swap(uri &other) noexcept;

    // iterators
    const_iterator begin() const;
    const_iterator end() const;

    // accessors
    part_range scheme() const;
    part_range user_info() const;
    part_range host() const;
    part_range port() const;
    part_range path() const;
    part_range authority() const;
    part_range query() const;
    part_range fragment() const;

    // query
    bool empty() const noexcept;
    bool is_valid() const;
    bool is_absolute() const;
    bool is_opaque() const;

    // string accessors
    string_type native() const noexcept;
    const value_type *c_str() const noexcept;
    string string() const;
    wstring wstring() const;
    u16string u16string() const;
    u32string u32string() const;

};
} // namespace network
} // namespace std

The uri class itself is a little more than a light-weight wrapper around a string, a parser and the uri’s component parts. Parsing is performed upon construction and, if successfully parsed, the component parts are stored as iterator ranges that reference the original string. For example, consider the following URI:

http://www.example.com/path/?key=value#fragment
^   ^  ^              ^     ^         ^^       ^
a   b  c              d     e         fg       h

On parsing, the uri object will contain a set of range types corresponding to the ranges for scheme, user info, host, port, path, query and fragment. So the ranges corresponding to the example above will be:

URI part Range String
scheme [a, b) "http"
user_info [c, c) ""
host [c, d) "www.example.com"
port [d, d) ""
path [d, e) "/path/"
query [e, f) "?key=value"
fragment [g, h) "fragment"

uri Requirements

Template parameters named InputIterator are required meet the requirements for a C++ standard library RandomIterator compliant iterator. The iterator’s value type is required to be char, wchar_t, char16_t, or char32_t.

Template parameters named Source are required to be one of:

A container with a value type of char, wchar_t, char16_t, or char32_t.

An iterator for a null terminated byte-string. The value type is required to be char, wchar_t, char16_t, or char32_t.

A C-array. The value type is required to be char, wchar_t, char16_t, or char32_t.

This is identical wording to that found in the filesystem proposal (N3365).

typedef s

typedef ... value_type;
typedef basic_string<value_type> string_type;
typedef string_type::const_iterator iterator;
typedef string_type::const_iterator const_iterator;

The value_type is left unspecified in this proposal and is intended to implementation defined. This may be either char for POSIX systems and wchar_t for Windows. This is influenced by the standard filesystem proposal.

Constructors and Destructors

uri();
Postconditions: empty() == true and valid() == false.
template <typename InputIterator>
uri(const InputIterator &first, const InputIterator &last);
Effects: The range is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and operating system.
Throws: std::bad_alloc
template <class Source>
uri(const Source &source);
Preconditions:
Effects: The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and operating system.
Throws: std::bad_alloc

Assignment

uri &operator = (const uri &other);
uri &operator = (uri &&other);
template <typename InputIterator>
void assign(const InputIterator &first, const InputIterator &last);
template <class Source>
void assign(const Source &source);
Effects: The source is assigned to the uri object and parsed. The encoding is assumed to depend on the underlying character type and operating system.
Throws: std::bad_alloc

Swap

void swap(uri &other) noexcept;
Effects: Swaps the contents of this object with the other.

Iterators

const_iterator begin() const;
Preconditions: is_valid() == true
Returns: An iterator to the first element in the underlying string container.
const_iterator end() const;
Preconditions: is_valid() == true
Returns: An iterator to the end of the underlying string container.

Accessors

part_range scheme() const;
Preconditions: is_valid() == true
Returns: Returns a part_range object which spans the range of the scheme in the underlying URI.
part_range user_info() const;
Preconditions: is_valid() == true
Returns: Returns a part_range object which spans the range of the user info in the underlying URI.
part_range host() const;
Preconditions: is_valid() == true
Returns: Returns a part_range object which spans the range of the host in the underlying URI.
part_range port() const;
Preconditions: is_valid() == true
Returns: Returns a part_range object which spans the range of the port in the underlying URI.
part_range path() const;
Returns: Returns a part_range object which spans the range of the path in the underlying URI.
Preconditions: is_valid() == true
part_range authority() const;
Returns: Returns a part_range object which spans the range of the authority in the underlying URI.
Preconditions: is_valid() == true
part_range query() const;
Preconditions: is_valid() == true
Returns: Returns a part_range object which spans the range of the query in the underlying URI.
part_range fragment() const;
Preconditions: is_valid() == true
Returns: Returns a part_range object which spans the range of the fragment in the underlying URI.

Query

bool empty() const noexcept;
Returns: true if the underlying string object is empty, false otherwise.
bool is_valid() const;
Returns: true if the string object has been parsed and is a valid URI and false otherwise.
bool is_absolute() const;
Preconditions: is_valid() == true
Returns: true if the URI is valid and if the scheme is not empty. Equivalent to !scheme().empty().
bool is_opaque() const;
Preconditions: is_valid() == true
Returns: true if the URI is absolute and its scheme-specific part does not start with a double-slash //.

String Accessors

string_type native() const noexcept;
Preconditions: is_valid() == true
Returns: The native string.
const value_type *c_str() const noexcept;
Preconditions: is_valid() == true
Returns: A raw pointer to the underlying character array.
string string() const;
Preconditions: is_valid() == true
wstring wstring() const;
Preconditions: is_valid() == true
u16string u16string() const;
Preconditions: is_valid() == true
u32string u32string() const;
Preconditions: is_valid() == true

part_range

The part_range is designed to give a reference to different URI parts in the original URI string. It is no more than a pair of iterators defining an immutable range, plus some accessors that copy the characters in the range to different string types:

namespace std {
namespace network {
class uri::part_range {
public:
    typedef uri::iterator iterator;
    typedef uri::const_iterator const_iterator;

    const_iterator begin() const;
    const_iterator end() const;

    // string accessors
    uri::string_type native() const;
    string string() const;
    wstring wstring() const;
    u16string u16string() const;
    u32string u32string() const;

};
} // namespace network
} // namespace std
const_iterator begin() const;
Returns: An iterator to the first element representing the URI part in the parsed URI string.
const_iterator end() const;
Returns: An iterator to the end of the sequence of characters of the URI part in the parsed URI string.
uri::string_type native() const;
Returns: A copy of the URI part in the native string format.
string string() const;
Returns: A copy of the URI part as a std::string. Any unicode conversions are performed using UTF-8, where appropriate.
wstring wstring() const;
Returns: A copy of the URI part as a std::wstring.
Returns: A copy of the URI part as a std::wstring. Any unicode conversions are performed using UTF-16, where appropriate.
u16string u16string() const;
Returns: A copy of the URI part as a std::u16string. Any unicode conversions are performed using UTF-16.
u32string u32string() const;
Returns: A copy of the URI part as a std::u32string. Any unicode conversions are performed using UTF-32.

Class uri::builder

The proposed uri::builder class is provided in order to construct uri objects more safely and more productively.

namespace std {
namespace network {
class uri::builder {

public:

    builder(uri &uri);
    builder(const builder &) = delete;
    builder &operator = builder(const builder &) = delete;
    ~builder();

    template <class Source>
    builder &scheme(const Source &scheme);

    template <class Source>
    builder &user_info(const Source &user_info);

    template <class Source>
    builder &host(const Source &host);

    template <class Source>
    builder &port(const Source &port);

    template <class Source>
    builder &authority(const Source &authority);

    template <class Source>
    builder &authority(const Source &user_info, const Source &host, const Source &port);

    template <class Source>
    builder &path(const Source &path);

    template <class Source>
    builder &append_path(const Source &path);

    template <class Source>
    builder &query(const Source &query);

    template <class Key, class Param>
    builder &query(const Key &key, const Param &param);

    template <class Source>
    builder &fragment(const Source &fragment);

};
} // namespace network
} // namespace std

The builder methods are templates. This can allow the implementation to provide specializations depending on the argument type in order ensure that resultant URI remains valid and consistent. This could mean performing encoding transformations or percent encoding on input strings where appropriate, and could allow, for example, the port to be provided as an integral type. More detailed examples are provided with the API description of each method below.

Example:
std::network::uri uri;
std::network::uri::builder builder(uri);
builder.scheme("http")
       .host("example.com")
       .path("/glynos/")
       .query("key", "value");
assert(uri.string() == "http://example.com/glynos/?key=value")

Constructor

namespace std {
namespace network {
uri::builder::builder(uri &u);
} // namespace network
} // namespace std
Preconditions: u.is_valid()

Builder functions

namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::scheme(const Source &scheme);
} // namespace network
} // namespace std
Effects: Sets the URI scheme.
namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::user_info(const Source &user_info);
} // namespace network
} // namespace std
Effects: Sets the URI user_info.
namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::host(const Source &host);
} // namespace network
} // namespace std
Effects: Sets the URI host.
namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::port(const Source &port);
} // namespace network
} // namespace std
Effects: Sets the URI port.
namespace std {
namespace network {
template <class Source>
builder &authority(const Source &authority);
} // namespace network
} // namespace std
Effects: Sets the URI authority.
namespace std {
namespace network {
template <class Source>
builder &authority(const Source &user_info, const Source &host, const Source &port);
} // namespace network
} // namespace std
Effects: Sets the URI user info, host and port.
namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::path(const Source &path);
} // namespace network
} // namespace std
Effects: Sets the URI path.
namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::append_path(const Source &path);
} // namespace network
} // namespace std
Effects: Appends an element to the uri object’s path.
namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::query(const Source &query);
} // namespace network
} // namespace std
Effects: Sets the URI query.
namespace std {
namespace network {
template <class Key, class Param>
uri::builder &uri::builder::query(const Key &key, const Param &param);
} // namespace network
} // namespace std
Effects: Adds a key / value pair to the uri object’s query.
namespace std {
namespace network {
template <class Source>
uri::builder &uri::builder::fragment(const Source &fragment);
} // namespace network
} // namespace std
Effects: Sets the URI fragment.

Issues

The following is a list of issues that have not yet been addressed as part of this proposal.

Issue 1 - Text Encoding and Interoperability

The most important open issue for this proposal is how to deal with text encoding in a portable way. The main difficulty is that different platforms, libraries and applications use different encodings, Unicode or otherwise. Interoperability is therefore extremely hard.

With the advent of char16_t and char32_t, and of std::u16string and std::u32string, this problem has been acknowledged and encoding can done more explicitly, but issues with interoperability have not been addressed.

This proposal currently partially resolves interoperability issues by using a template for functions and member functions when string arguments are used. e.g. The uri constructor:

namespace std {
namespace network {
template <class Source>
uri::uri(const Source &source);
} // namespace network
} // namespace std

Internally, the uri constructor can handle strings in different encodings by using template specialization to perform the correct transformation depending on the source type.

This can help simplify the library interface but is not completely satisfactory. For example, template specialization is limited to only those types known to the standard - std::basic_string and its variants, plus character arrays. A proposal exists to add a string_ref type to the standard (N3334), which could provide better flexibility and performance. This would replace part_range in the current proposal.

This approach is limited when returning strings from functions. e.g.:

namespace std {
namespace network {
template <class Result>
Result scheme(const uri &uri_);
} // namespace network
} // namespace std

...

std::network::uri uri("http://example.com/");
auto scheme = std::network::scheme<std::u32string>(uri);

The above excerpt cannot work with character array, character pointers or string_ref since memory allocation is required.

Secondly, RFC 3987 is not well supported and often unicode text is converted using percent encoding:

std::network::uri uri(
    "http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8");

The pct_encode and pct_decode functions should take this into account.

Thirdly, the iterators returned by begin and end are not portable with the proposal in its current form. This proposal allows different internal character types, but the iterator types are completely unaware of the encoding. So the following examples are not portable:

std::network::uri uri(
    "http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8");
std::u32string u32;
std::copy(std::begin(uri), std::end(uri), std::back_inserter(u32));
std::network::uri uri(
    "http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8");
auto path_range = uri.path();
std::u32string path;
std::copy(std::begin(path_range), std::end(path_range), std::back_inserter(path));

This proposal as it stands needs better library or language support in order offer better portability. One promising solution is given in another standard proposal, N3336.

Issue 2 - The Parser Implementation

At this stage it is important to note that there are many compatibility issues with URIs. If this proposal were to specify full standard compliance, it may not work in practice. For example, the uri parser implementation must correctly deal with escape control characters. Furthermore, no RFC fully specifies how to parse Windows file-system paths (e.g. file:///C:\path\to\a\file.txt). Finally, it is the belief of the author of this proposal that its success depends strongly on the correct portable behavior of the parser implementation. Without wishing to run the risk of over-specifying the parser, leaving it as “implementation defined” will not be sufficient. There will be more issues that may need to specified in further detail in future revisions of this proposal.

This proposal includes internationalized URIs (RFC 2732). Since this RFC is not widely supported, and as many applications deal with unicode characters through percent encoding, the parser implementation could be simplified if appropriate by removing RFC 2732 from this proposal.

Issue 3 - Error and Exception Handling

The proposal in its current form does not propose a detailed error handling mechanism, beyond providing a uri::is_valid() accessor to determine if the URI is valid according to the parser.

There several potential sources for errors:

Parse errors: This will happen during object construction. If parsing fails, the uri::is_valid() will always return false, and the uri accessors are invalid.
Memory allocation failures: The underlying string can throw std::bad_alloc from its constructor. This can therefore be thrown from the uri constructors and assignment functions, as well as other functions, such as normalize, relativize and resolve).
Builder errors: These come in two forms: firstly, the design of uri::builder makes it possible fo the URI to be in an intermediate state. uri.is_valid() will return false for uri objects in such a state:
std::network::uri uri;
std::network::uri::builder builder(uri);
builder.scheme("http");
assert(!uri.is_valid());
builder.authority("example.com")
       .path("/glynos/");
assert(uri.is_valid());

Secondly, passing illegal arguments to the builder has the same effect: the resultant uri object is simply invalid again. This proposal leaves a lot of scope for improvement in better error reporting when building uri objects.

It may be appropriate to develop a uri_parse_error exception type that can be thrown when a parse error occurs.

Issue 4 - Equivalence

As explained in the previous section on normalization, there is more than one way to test if two URIs are equivalent depending on different accuracy, performance and complexity trade-offs.

Providing a single operator == is sufficient to allow developers to choose different ways of testing URI equivalence, although choosing a good default comparison will be sufficient for the vast majority of cases. Future revisions of this proposal could provide functionality to allow a library to use different parts of the Comparison Ladder.

Issue 5 - Extending the URI

The interface can be extended to allow more flexibility for accessing parts of the URI. For example, an accessor could be provided which converts all query elements into a map:

namespace std {
namespace network {
template <class QueryMap>
QueryMap query(const uri &u);
} // namespace network
} // namespace std

...

typedef std::map<std::string, std::string> QueryMap;
QueryMap query = std::network::query<QueryMap>(uri);

Additionally, his proposal can be extended to include factory functions for common operations, such as constructing a URI from a filesystem path:

namespace std {
namespace network {
uri uri::from_path(const filesystem::path &path);
} // namespace network
} // namespace std

...

std::filesystem::path path("/usr/bin/c++");
auto uri_path = std::network::uri::from_path(path);
assert(uri_path.string() == "file:///usr/bin/c++");

Furthermore, there may be new proposals for types that represent IP addresses. If such types can be accepted into the standard, it would be possible to accommodate them in future revisions of this proposal:

namespace std {
namespace network {
uri::builder &uri::builder::host(const std::network::address_ipv4 &host);
uri::builder &uri::builder::host(const std::network::address_ipv6 &host);
} // namespace network
} // namespace std

Issue 6 - Allocator Support

This proposal does not specify any kind of allocator support.

Issue 7 - Percent Encoding and Iterators

It may be possible to gain extra flexibility and performance with iterators that are aware of percent encoding and decoding. The following untested examples give an illustration:

// assume we're on Windows, using wchar_t
std::network::uri uri(L"http://example.com/path%20with%20spaces/");
// the iterators in the type returned by pct_decoded_path still refer to the native type
auto pct_decoded_path_range = std::network::pct_decoded_path(uri);
assert(std::equal(std::begin(pct_decoded_path_range), std::end(pct_decoded_path_range),
                  L"/path with spaces/"));

This may even be improved by providing an overload for std::network::pct_decode:

std::network::uri uri(L"http://example.com/path%20with%20spaces/");
auto pct_decoded_path_range = std::network::pct_decode(uri.path());
assert(std::equal(std::begin(pct_decoded_path_range), std::end(pct_decoded_path_range),
                  L"/path with spaces/"));

For better expressiveness this can be improved further by replacing std::equal with a range-based equivalent available in Boost:

std::network::uri uri(L"http://example.com/path%20with%20spaces/");
assert(boost::equal(std::network::pct_decode(uri.path()), L"/path with spaces/"));

Acknowledgements

C++ Network Library users and mailing list
Kyle Kloepper
Niklas Gustafsson
Beman Dawes and Filesystem proposal / Text encoding
Wikipedia