Working Draft Technical Specification - URI

Document Number: N3720
Date: 2013-08-30
Authors: Glyn Matthews <glyn.matthews@gmail.com>, Dean Michael Berris <dberris@google.com>

Introduction

Note

Notes highlighted in yellow are comments on the working draft and are not intended for the actual TS.

Note

Note to committee

There is a concern about the URI given that there will be several implementations, each differing in subtly different ways. Clearly, this can hurt portability and, at its worst, will make the std::experimental::uri class useless in practice in cross-platform code bases. This would especially be true of scheme-based normalization and comparison and percent-decoding. Therefore the authors of this working draft would like guidance on how be clear and precise on implementation details and guarantees, while avoiding “weasly” implementation-defined or unspecified behaviour as far as possible.

1 Scope [uri.scope]

The scope of this Technical Specification will include a single std::experimental::uri type, specifications about how the are intended to be processed and extended, including some additional helper types and functions. It will include a std::experimental::uri_builder type to build a URI from its components. Finally, it will include types and functions for percent encoding, URI references, reference resolution and URI normalization and comparison.

2 Conformance [uri.conformance]

2.1 Generic syntax [uri.conformance.generic-syntax]

The generic syntax of a URI is defined in IETF RFC 3986. section 3.

All URIs are of the form:

scheme ":" "hierarchical part" [ "?" query ] [ "#" fragment ]

The scheme is used to identify the specification needed to parse the rest of the URI. A generic syntax parser can parse any URI into its main parts. The scheme can then be used to identify whether further scheme-specific parsing can be performed.

The hierarchical part refers to the part of the URI that holds identification information that is hierarchical in nature. This may contain an authority (always prefixed with a double slash (“//”)) and/or a path. The path part is required, thought it may be empty. The authority part holds an optional user info part, ending with an at sign (“@”); a host identifier and an optional port number, preceded by a colon (”:”). The host may be an IP address or domain name. The normative reference for IPv6 addresses is IETF RFC 2732.

The query is an optional part following a question mark (”?”) that contains information that is not hierarchical.

Finally, the fragment is an optional part, prefixed by a hash symbol (“#”) that is used to identify secondary sources.

IETF RFC 3987 specifies a new protocol element, the Internationalized Resource Identifier (IRI). The IRI complements a URI, and extends it to allow unicode characters. The syntax of an IRI is specified in IETF RFC 3987, section 2.

IETF RFC 6874 specifies scoped IDs in IPv6 addresses. The syntax is specified in IETF RFC 6874, section 2.

2.2 URI Normalization and Comparison [uri.conformance.uri-normalization-and-comparison]

The rules for URI normalization are specified in IETF RFC 3986, section 6 and IETF RFC 3987, section 5.

2.3 URI References [uri.conformance.uri-references]

The rule for transforming references is given in IETF RFC 3986, section 5.2.2.

2.4 Removing Dot Segments [uri.conformance.removing-dot-segments]

The rule for removing dot segments is given in IETF RFC 3986, section 5.2.4.

2.5 URI Recomposition [uri.conformance.uri-recomposition]

The rule for recomposing a URI from its parts is given in IETF RFC 3986, section 5.3.

3. Terms and Definitions [uri.definitions]

3.1 URI [uri.definition.uri]

A Uniform Resource Identifier is a sequence of characters from a limited set with a specific syntax used to identify a name or resource. URIs can be classified as URLs or URNs. The URI syntax is defined in IETF RFC 3986.

3.2 URL [uri.definition.url]

A Uniform Resource Locator (URL) is a type of URI, complementary to a URN used to locate a resource over a network.

3.3 URN [uri.definition.urn]

A Uniform Resource Name (URN) is a type of URI, complementary to a URL used to unambiguously identify resources.

3.4 IRI [uri.definition.iri]

An Internationalized Resource Identifier (IRI) is a complement to the URI that allows characters from the Universal Character Set (Unicode/ISO 10646). The IRI syntax is defined in IETF RFC 3987.

3.5 URI Part [uri.definition.uri-part]

A generic URI is decomposed into four principal parts: the scheme, the hierarchical part, an optional query and optional fragment. The hierarchical part can be further decomposed into four parts: the user info, host, port and path.

3.6 Scheme [uri.definition.scheme]

A scheme name is the top level of the URI naming structure. It indicates the specifications, syntax and semantics of the rest of the URI structure. It is always followed by a colon (”:”).

3.7 Query [uri.definition.query]

A query is a part, indicated by a question mark (”?”) and terminated by a hash (“#”), that contains non-hierarchical information. It is commonly structured as a sequence of key-value parameter values separated by equals (“=”), which are separated by a semi-colon (”;”) or ampersand (“&”).

3.8 Fragment [uri.definition.fragment]

A fragment is indicated by a hash (“#”) and allows indirect identification of a secondary resource. For example, a fragment may refer to a section header in an HTML document with an id attribute of the same name.

3.9 Hierarchical Part [uri.definition.hierarchical-part]

The hierarchical part of a URI contains hierarchical information. If it starts with a double forward slash (“//”), it is followed by an authority and a path. The authority can be further broken down into a user-information part, a hostname and a port. The authority is followed by an optional path. If the hierarchical part does not begin with a double forward slash (“//”), then it must contain a path.

3.10 Authority [uri.definition.authority]

The hierarchical part contains an authority. The authority contains an optional user info followed by at (“@”), a host and an optional port, preceded by a colon (”:”).

3.11 User Info [uri.definition.user-info]

The user info is an optional part of the URI authority, terminated by at (“@”) and is followed by a host. It is used in the telnet scheme:

telnet://<user>:<password>@<host>:<port>/

3.12 Host [uri.definition.host]

The hostname contains a domain name or IP address.

3.13 Domain Name [uri.definition.domain-name]

A domain name is human-readable string used to identify a host. Domain names are registered in the Domain Name System (DNS).

3.14 IP Address [uri.definition.ip-address]

The IP address can either be an IPv4 (e.g. 127.0.0.1) or an IPv6 address (e.g. ::1. In a URI, an IPv6 address is enclosed in square braces (“[]”).

3.15 Port [uri.definition.port]

The optional port is always preceded by a colon (”:”). If the port is not present, even if a colon is present, then the port is considered to have the value of the default port of the scheme.

3.16 Path [uri.definition.path]

The path is a part of the hierarchical data and is a sequence of segments, each separated by a forward slash (“/”). It is terminated by a question mark (”?”), followed by a query, a hash (“#”) followed by a fragment or by the end of the URI.

3.17 Dot Segments [uri.definition.dot-segments]

Dot segments are elements in a path containing either a dot (”.”) or a double dot (”..”), separated by a forward slash (“/”). Dot segments can be removed from a path as part of its normalization without changing the URI semantics.

3.18 Absolute URI [uri.definition.absolute-uri]

An absolute URI always specifies the scheme. URIs that don’t provide the scheme are called relative references.

3.19 Opaque URI [uri.definition.opaque-uri]

An opaque URI is an absolute URI that does not provide a double slash (“//”) after the scheme-delimiting colon (”:”). Opaque URIs have no authority and the part immediately following the colon (”:”) is the path. Some examples of opaque URIs are:

mailto:john.doe@example.com
news:comp.lang.c++

URIs that provide a double slash (“//”) following the scheme-delimiting colon (”:”) are known as hierarchical URIs. Some examples are:

http://www.example.com/
ftp://john.doe@ftp.example.com/

3.20 Normalization [uri.definition.normalization]

URI normalization is the process bby which a URI is transformed in order to determine of two URIs are equivalent. There are different levels to comparison, which trade-off the number of false negatives and complexity. The normalization and comparison procedures are defined in IETF RFC 3986, section 6.

3.21 Comparison Ladder [uri.definition.comparison-ladder]

The comparison ladder describes how URIs can be compared using normalization in different ways, trading off the complexity of the method and the number of false negatives. The comparison ladder is defined in IETF RFC 3986, section 6.2 and IETF RFC 3987, section 5.3.

3.22 Relative Reference [uri.definition.relative-reference]

Relative references are URIs that do not provide a scheme. Relative references are only usable when a base URI is known, against which the relative reference can be resolved. The relative reference is defined in IETF RFC 3986, section 4.2 and IETF RFC 3987, section 6.5.

3.23 Reference Resolution [uri.definition.reference-resolution]

Relative references can be resolved against a base URI, producing an absolute URI. Only the scheme is required to be present in the base URI. Reference resolution is defined in IETF RFC 3986, section 5.

3.24 Percent Encoding [uri.definition.percent-encoding]

Percent encoding is the mechanism used to encode reserved characters in a URI. See IETF RFC 3986, section 2.1.

3.25 Case Normalization [uri.definition.case-normalization]

All characters in a URI scheme and host must be lower-case. All hexidecimal digits within a percent-encoded triplet must be upper-case. See IETF RFC 3986, section 6.2.2.1 and IETF RFC 3987, section 5.3.2.1.

3.26 Percent Encoding Normalization [uri.definition.percent-encoding-normalization]

URIs should be normalized by decoding any percent-encoded octet that corresponds to a an unreserved character. See IETF RFC 3986, section 6.2.2.2 and IETF RFC 3987, section 5.3.2.3.

3.27 Path Segment Normalization [uri.definition.path-segment-normalization]

Path segments [uri.definition.dot-segments] should be removed from URIs that are not relative references. See IETF RFC 3986, section 6.2.2.3 and IETF RFC 3987, section 5.3.2.4.

3.28 Character Normalization [uri.definition.character-normalization]

In Unicode, different sequences of characters could be defined as a equivalent depending on how they are encoded. See IETF RFC 3987, section 5.3.2.2.

3.29 IPv6 Zone IDs [uri.definition.ipv6-zone-ids]

A zone index is used to identify to which scope a non-global address belongs in an IPv6 address. It is specified in IETF RFC 6874.

4. Requirements [uri.requirements]

Template parameters named InputIterator shall meet the C++ Standard’s library input iterator requirements ([input.iterators]) and shall have a value type that is one of the encoded character types.

The uri class must be able to parse according to the rules described in IETF RFC 3986, Section 3.

The uri class must be able to correctly parse IPv6 addresses, described in IETF RFC 2732.

The uri class must be able to parse internationalized uri classs according to IETF RFC 3987, section 2.

The uri class must be able to parse zone IDs in IPv6 addresses according to IETF RFC 6874, section 2.

5. Header <experimental/uri> Synopsis [uri.header-synopsis]

#include <string>        // std::basic_string
#include <system_error>  // std::error_code
#include <iosfwd>        // std::basic_istream, std::basic_ostream
#include <iterator>      // std::iterator_traits
#include <memory>        // std::allocator
#include <optional>      // std::optional

namespace std {
namespace experimental {
// class declarations
class uri;
class uri_builder;
class uri_syntax_error;
class uri_builder_error;
class percent_decoding_error;

enum class uri_error {
 // uri syntax errors
 invalid_syntax,

 // builder errors
 invalid_uri,
 invalid_scheme,
 invalid_user_info,
 invalid_host,
 invalid_port,
 invalid_path,
 invalid_query,
 invalid_fragment,

 // decoding errors
 not_enough_input,
 non_hex_input,
 conversion_failed,
};

enum class uri_comparison_level {
 string_comparison,
 syntax_based,
};

// factory functions
template <class Source>
uri make_uri(const Source& source, std::error_code& e) noexcept;
template <class InputIterator>
uri make_uri(InputIterator first, InputIterator last, std::error_code& e) noexcept;
template <class Source, class Alloc>
uri make_uri(const Source& source, const Alloc& alloc, std::error_code& e) noexcept;
template <class InputIterator, class Alloc>
uri make_uri(InputIterator first, InputIterator last, const Alloc& alloc,
             std::error_code& e) noexcept;

// swap functions
void swap(uri& lhs, uri& rhs) noexcept;

// hash
size_t hash_value(const uri& u) noexcept;

// equality and comparison operators
bool operator== (const uri& lhs, const uri& rhs) noexcept;
bool operator!= (const uri& lhs, const uri& rhs) noexcept;
bool operator<  (const uri& lhs, const uri& rhs) noexcept;
bool operator>  (const uri& lhs, const uri& rhs) noexcept;
bool operator<= (const uri& lhs, const uri& rhs) noexcept;
bool operator>= (const uri& lhs, const uri& rhs) noexcept;

// stream operators
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_ostream<CharT, CharTraits>&
operator<< (std::basic_ostream<CharT, CharTraits>& os, const uri& u);
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_istream<CharT, CharTraits>&
operator>> (std::basic_istream<CharT, CharTraits>& is, uri& u);
} // namespace experimental
} // namespace std

5.1 Declarations [uri.header-synopsis.declarations]

The <experimental/uri> header contains a declaration for a uri class, a uri_builder class and execption classes, uri_syntax_error, uri_builder_error and percent_decoding_error in the std::experimental namespace.

5.2 Factory functions [uri.header-synopsis.factory-functions]

// factory functions
template <class Source>
uri make_uri(const Source& source, std::error_code& e) noexcept;
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object.
template <class InputIterator>
uri make_uri(InputIterator first, InputIterator last, std::error_code& e) noexcept;
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object.
template <class Source, class Alloc>
uri make_uri(const Source& source, const Alloc& alloc, std::error_code& e) noexcept;
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object. All memory allocation shall be performed by alloc.
template <class InputIterator, class Alloc>
uri make_uri(InputIterator first, InputIterator last, const Alloc& alloc,
             std::error_code& e) noexcept;
Effects: Constructs an object of class uri. The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. On error, the error_code is set and make_uri returns an empty uri object. All memory allocation shall be performed by alloc.

5.3 Equality and Comparison Operators [uri.header-synopsis.equality-comparison]

bool operator== (const uri& lhs, const uri& rhs) noexcept;
bool operator!= (const uri& lhs, const uri& rhs) noexcept;
Effects: Common overloads of the equality and inequality operators use string_comparison.
lhs.compare(rhs, uri_comparison_level::string_comparison) == 0 and !(lhs == rhs).
bool operator<  (const uri& lhs, const uri& rhs) noexcept;
bool operator>  (const uri& lhs, const uri& rhs) noexcept;
bool operator<= (const uri& lhs, const uri& rhs) noexcept;
bool operator>= (const uri& lhs, const uri& rhs) noexcept;
Effects: Common overloads of the comparison operators use string_comparison.
lhs.compare(rhs, uri_comparison_level::string_comparison) < 0, (rhs < lhs), !(rhs < lhs) and !(lhs < rhs).

5.4 Stream Operators [uri.header-synopsis.stream-operators]

template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_ostream<CharT, CharTraits>&
operator<< (std::basic_ostream<CharT, CharTraits>& os, const uri& u);
Effects: os << u.string<CharT, CharTraits>();
template <typename CharT, class CharTraits = std::char_traits<CharT>>
std::basic_istream<CharT, CharTraits>&
operator>> (std::basic_istream<CharT, CharTraits>& is, uri& u);
Effects: string<CharT, CharTraits> tmp; is >> tmp; std::error_code ec; u = make_uri(tmp, ec); if (ec) is.setstate(ios::fail);
Throws: std::bad_alloc

5.5 Swap [uri.header-synopsis.swap]

void swap(uri& lhs, uri& rhs) noexcept;
Effects: lhs.swap(rhs);

5.6 Hash [uri.header-synopsis.hash]

size_t hash_value(const uri& u) noexcept;
Returns: A hash value of uri u.

6 Class uri [class.uri]

namespace std {
namespace experimental {
class uri {

public:

    // typedefs
    typedef *unspecified* string_type;
    typedef *unspecified* iterator;
    typedef *unspecified* const_iterator;
    typedef std::iterator_traits<iterator>::value_type value_type;
    typedef basic_string_view<value_type> string_view;

    // constructors and destructor
    uri();
    template <class Source, class Alloc = std::allocator<value_type>>
    explicit uri(const Source& source, const Alloc& alloc = Alloc());
    template <typename InputIterator, class Alloc = std::allocator<value_type>>
    uri(InputIterator first, InputIterator last, const Alloc& alloc = Alloc());
    uri(const uri& other);
    uri(uri&& other) noexcept;
    ~uri() noexcept;

    // assignment
    uri& operator= (const uri& other);
    uri& operator= (uri&& other) noexcept;

    // modifiers
    void swap(uri& other) noexcept;

    // iterators
    const_iterator begin() const;
    const_iterator end() const;
    const_iterator cbegin() const;
    const_iterator cend() const;

    // accessors
    std::optional<string_view> scheme() const noexcept;
    std::optional<string_view> user_info() const noexcept;
    std::optional<string_view> host() const noexcept;
    std::optional<string_view> port() const noexcept;
    template <typename IntT>
    std::optional<IntT> port(typename std::is_integral<IntT>::type* = 0) const noexcept;
    std::optional<string_view> path() const noexcept;
    std::optional<string_view> authority() const noexcept;
    std::optional<string_view> query() const noexcept;
    std::optional<string_view> fragment() const noexcept;

    // string accessors
    template <typename CharT,
              class CharTraits = std::char_traits<CharT>,
              class Alloc = std::allocator<CharT>>
    std::basic_string<CharT, CharTraits, Alloc> string(const Alloc& alloc = Alloc()) const;
    std::string string() const;
    std::wstring wstring() const;
    std::string u8string() const;
    std::u16string u16string() const;
    std::u32string u32string() const;

    // query
    bool empty() const noexcept;
    bool is_absolute() const noexcept;
    bool is_opaque() const noexcept;

    // transformers
    uri normalize(uri_comparison_level level) const;
    template <class Alloc>
    uri normalize(uri_comparison_level level, const Alloc& alloc) const;
    uri normalize(uri_comparison_level level, std::error_code& ec) const noexcept;
    template <class Alloc>
    uri normalize(uri_comparison_level level, const Alloc& alloc, std::error_code) const noexcept;

    uri make_reference(const uri& base) const;
    template <class Alloc>
    uri make_reference(const uri& base, const Alloc& alloc) const;
    uri make_reference(const uri& base, std::error_code& ec) const noexcept;
    template <class Alloc>
    uri make_reference(const uri& base, const Alloc& alloc, std::error_code& ec) const noexcept;

    uri resolve(const uri& other) const;
    template <class Alloc>
    uri resolve(const uri& other, const Alloc& alloc) const;
    uri resolve(const uri& other, std::error_code& ec) const noexcept;
    template <class Alloc>
    uri resolve(const uri& other, const Alloc& alloc, std::error_code& ec) const noexcept;

    // comparison
    int compare(const uri& other, uri_comparison_level level) const noexcept;

    // percent encoding and decoding
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_user_info(InputIterator first, InputIterator last,
                                           OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_host(InputIterator first, InputIterator last,
                                      OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_port(InputIterator first, InputIterator last,
                                      OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_path(InputIterator first, InputIterator last,
                                      OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_query(InputIterator first, InputIterator last,
                                       OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator encode_fragment(InputIterator first, InputIterator last,
                                          OutputIterator out);
    template <typename InputIterator, typename OutputIterator>
    static OutputIterator decode(InputIterator first, InputIterator last,
                                 OutputIterator out);

};
} // namespace experimental
} // namespace std

6.1 uri Requirements [class.uri.reqs]

string_type is unspecified and is not required to be a contiguous memory block. As a consequence, iterator and const_iterator are also unspecified. Should an implementor decide to use a contiguous string (e.g. std::string), iterator and const_iterator can be string_type::const_iterator. Each URI part is required to be a contiguous memory block.

Function template parameters named Source shall be one of:

  • basic_string<CharT, CharTraits, Allocator>. The type charT shall be an encoded character type. A function argument const Source& source shall have an effective range [cbegin(source), cend(source)).
  • A type meeting the input iterator requirements that iterates over a NTCTS [defns.ntcts]. The value type shall be an encoded character type. A function argument const Source& source shall have an effective range [source, end) where end is the first iterator value with an element value equal to iterator_traits<Source>::value_type().
  • A character array that after array-to-pointer decay results in a pointer to a NTCTS. The value type shall be an encoded character type. A function argument const Source& source shall have an effective range [source, end) where end is the first iterator value with an element value equal to iterator_traits<decay<Source>::type>::value_type().

Arguments of type Source shall not be null pointers.

Note

This is similar wording to the filesystem path requirements in N3963 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3693.html#path-Requirements).

URI References returned by std::experimental::uri::make_reference must be transformed by using the algorithm in IETF RFC 3986, section 5.2.2.

Removing dot segments (”.”, ”..”) from a path must conform to the IETF RFC 3986, section 5.2.4.

6.2 typedef s [class.uri.typedefs]

typedef *unspecified* string_type;
typedef *unspecified* iterator;
typedef *unspecified* const_iterator;
typedef std::iterator_traits<iterator>::value_type value_type;
typedef basic_string_view<value_type> string_view;

The string_type, iterator and const_iterator types are left unspecified.

6.3 uri members [class.uri.members]

6.3.1 uri constructors [class.uri.members.constructors]

uri();
Effects: Constructs an object of class uri.
Postconditions: empty()
uri(const uri& other);
Effects: Constructs a uri object with the underlying string and parts copied.
Throws: std::bad_alloc
uri(uri&& other) noexcept;
Effects: Constructs a uri object with the underlying string and parts moved.
template <class Source, class Alloc = std::allocator<value_type>>
uri(const Source& source, const Alloc& alloc = Alloc());
Effects: The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. All memory allocation shall be performed by alloc.
Postconditions: !empty() && is_absolute()
Throws: uri_syntax_error if source is not a valid URI string, std::bad_alloc
template <typename InputIterator, class Alloc = std::allocator<value_type>>
uri(InputIterator first, InputIterator last, const Alloc& alloc = Alloc());
Effects: The source is copied into the uri object and parsed. The encoding is assumed to depend on the underlying character type and implementation. All memory allocation shall be performed by alloc.
Postconditions: !empty() && is_absolute()
Throws: uri_syntax_error if the string in the range [first, last) is not a valid URI string, std::bad_alloc

6.3.2 uri assignment [class.uri.members.assignment]

uri& operator= (const uri& other);
Effects: Assigns a uri object with the underlying string and parts copied.
Throws: std::bad_alloc
uri& operator= (uri&& other) noexcept;
Effects: Assigns a uri object with the underlying string and parts moved.

6.3.3. uri modifiers [uri.members.modifiers]

void swap(uri& other) noexcept;
Effects: Swaps the contents of this object with the other.

6.3.4 uri iterators [uri.members.iterators]

const_iterator begin() const;
Returns: A const_iterator to the first element in the underlying string container.
const_iterator end() const;
Returns: A const_iterator to the end of the underlying string container.
const_iterator cbegin() const;
Returns: A const_iterator to the first element in the underlying string container.
const_iterator cend() const;
Returns: A const_iterator to the end of the underlying string container.

6.3.5 uri accessors [uri.members.accessors]

std::optional<string_view> scheme() const noexcept;
Returns: A std::optional<string_view> object which spans the range of the scheme in the underlying URI. If the scheme is not specified, it returns nullopt.
std::optional<string_view> user_info() const noexcept;
Returns: A std::optional<string_view> object which spans the range of the user info in the underlying URI. If the user info is not specified, it returns nullopt.
std::optional<string_view> host() const noexcept;
Returns: A std::optional<string_view> object which spans the range of the host in the underlying URI. If the host is not specified, it returns nullopt.
std::optional<string_view> port() const noexcept;
Returns: A std::optional<string_view> object which spans the range of the port in the underlying URI. If the port is not specified, it returns nullopt.
template <typename IntT>
std::optional<IntT> port(typename std::is_integral<IntT>::type* = 0) const noexcept;
Returns: A std::optional<IntT> with the port value, if it is present. If the port is not specified, it returns nullopt.
Requires: is_integral<IntT>::value == true
std::optional<string_view> path() const noexcept;
Returns: A std::optional<string_view> object which spans the range of the path in the underlying URI. If the path is not specified, it returns nullopt.
std::optional<string_view> authority() const noexcept;
Returns: A std::optional<string_view> object which spans the range of the authority in the underlying URI. If the authority is not specified, it returns nullopt.
std::optional<string_view> query() const noexcept;
Returns: A std::optional<string_view> object which spans the range of the query in the underlying URI. If the query is not specified, it returns nullopt.
std::optional<string_view> fragment() const noexpect;
Returns: A std::optional<string_view> object which spans the range of the fragment in the underlying URI. If the fragment is not specified, it returns nullopt.
template <typename CharT,
          class CharTraits = std::char_traits<CharT>,
          class Alloc = std::allocator<CharT>>
std::basic_string<CharT, CharTraits, Alloc> string(const Alloc& alloc = Alloc()) const;
Returns: A string object containing a copy of the underlying URI string. All memory allocation shall be performed by alloc.
std::string string() const;
Returns: A string object containing a copy of the underlying URI string.
Throws: std::bad_alloc
std::wstring wstring() const;
Throws: std::bad_alloc
Returns: A wstring object containing a copy of the underlying URI string.
std::string u8string() const;
Returns: A UTF-8 encoded string object containing a copy of the underlying URI string.
Throws: std::bad_alloc
std::u16string u16string() const;
Returns: A u16string object containing a copy of the underlying URI string.
Throws: std::bad_alloc
std::u32string u32string() const;
Returns: A u32string object containing a copy of the underlying URI string.
Throws: std::bad_alloc

6.3.6 uri query [uri.members.query]

bool empty() const noexcept;
Returns: true if the underlying string object is empty, false otherwise.
bool is_absolute() const noexcept;
Returns: true if the URI is absolute. Equivalent to !scheme().empty().
bool is_opaque() const noexcept;
Returns: true if the URI is absolute and its scheme is not hierarchical (i.e. the scheme-specific part does not start with a double-slash // and its authority is empty).

6.3.7 uri transformers [uri.members.transformers]

This proposal specifies three transformer functions: normalize, make_reference and resolve.

uri normalize(uri_comparison_level level) const;
Effects: normalize takes as an argument a uri object and returns a normalized uri object.
Postconditions: u.normalize(level).compare(u, level) == 0
Throws: std::bad_alloc
template <class Alloc>
uri normalize(uri_comparison_level level, const Alloc& alloc) const;
Effects: normalize takes as an argument a uri object and returns a normalized uri object. All memory allocation shall be performed by alloc.
Postconditions: u.normalize(level).compare(u, level) == 0
Throws: std::bad_alloc
uri normalize(uri_comparison_level level, std::error_code& ec) const noexcept;
Effects: normalize takes as an argument a uri object and returns a normalized uri object. ec is set on error.
Postconditions: ec || u.normalize(level).compare(u, level) == 0
template <class Alloc>
uri normalize(uri_comparison_level level, const Alloc& alloc, std::error_code& ec) const noexcept;
Effects: normalize takes as an argument a uri object and returns a normalized uri object. All memory allocation shall be performed by alloc. ec is set on error.
Postconditions: ec || u.normalize(level).compare(u, level) == 0
uri make_reference(const uri& base) const;
Effects: Returns a relative URI reference from the base given as an argument.
Postconditions: !u1.make_reference(u2).absolute()
Returns: A relative URI reference.
Throws: std::bad_alloc
template <class Alloc>
uri make_reference(const uri& base, const Alloc& alloc) const;
Effects: Returns a relative URI reference from the base given as an argument. All memory allocation shall be performed by alloc.
Postconditions: !u1.make_reference(u2).absolute()
Returns: A relative URI reference.
Throws: std::bad_alloc
uri make_reference(const uri& base, std::error_code& ec) const noexcept;
Effects: Returns a relative URI reference from the base given as an argument. ec is set on error.
Returns: A relative URI reference.
template <class Alloc>
uri make_reference(const uri& base, const Alloc &alloc, std::error_code& ec) const noexcept;
Effects: Returns a relative URI reference from the base given as an argument. All memory allocation shall be performed by alloc. ec is set on error.
Returns: A relative URI reference.
uri resolve(const uri& other) const;
Effects: resolve resolves the second uri object against the first, and returns a new uri.
Postconditions: u1.resolve(u2).absolute()
Throws: std::bad_alloc
template <class Alloc>
uri resolve(const uri& other, const Alloc& alloc) const;
Effects: resolve resolves the second uri object against the first, and returns a new uri. All memory allocation shall be performed by alloc.
Postconditions: u1.resolve(u2).absolute()
Throws: std::bad_alloc
uri resolve(const uri& other, std::error_code& ec) const noexcept;
Effects: resolve resolves the second uri object against the first, and returns a new uri. ec is set on error.
template <class Alloc>
uri resolve(const uri& other, const Alloc& alloc, std::error_code& ec) const;
Effects: resolve resolves the second uri object against the first, and returns a new uri. All memory allocation shall be performed by alloc. ec is set on error.

6.3.8 uri comparison [uri.members.comparison]

int compare(const uri& other, uri_comparison_level level) const noexcept;
Effects: Equivalent to normalize(level) == other.normalize(level)
Returns: -1 if the normalized value of this is lexicographically less than the normalized value other, given the comparison level; 0 if they are considered equal and 1 if this is greater.

6.3.9 uri percent encoding [uri.members.percent]

template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_user_info(InputIterator first, InputIterator last,
                                       OutputIterator out);
Effects: Encodes special characters for the user_info part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a user_info string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_host(InputIterator first, InputIterator last,
                                  OutputIterator out);
Effects: Encodes special characters for the host part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a host string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_port(InputIterator first, InputIterator last,
                                  OutputIterator out);
Effects: Encodes special characters for the port part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a port string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_path(InputIterator first, InputIterator last,
                                  OutputIterator out);
Effects: Encodes special characters for the path part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a path string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_query(InputIterator first, InputIterator last,
                                   OutputIterator out);
Effects: Encodes special characters for the query part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a query string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator encode_fragment(InputIterator first, InputIterator last,
                                      OutputIterator out);
Effects: Encodes special characters for the fragment part (IETF RFC 3986, section 2.1).
Returns: An iterator to the last element of a fragment string that has been encoded.
template <typename InputIterator, typename OutputIterator>
static OutputIterator decode(InputIterator first, InputIterator last,
                             OutputIterator out);
Effects: Decodes special characters in the source string and returns the unencoded string (IETF RFC 3986, section 2.1)
Returns: An iterator to the last element of a uri string that has been decoded.
Throws: uri_decoding_error when the input is exhausted, the input is not a hexadecimal character or when the decoding conversion fails.

7 Class uri_builder [class.uri_builder]

namespace std {
namespace experimental {
class uri_builder {

private:

    uri_builder(const uri_builder&) = delete;
    uri_builder& operator = uri_builder(const uri_builder&) = delete;

public:

    // Constructors
    uri_builder();
    explicit uri_builder(const uri& base);
    template <class Source>
    explicit uri_builder(const Source& base);
    ~uri_builder();

    // Setters
    template <class Source>
    uri_builder& scheme(const Source& scheme);
    template <class Source>
    uri_builder& user_info(const Source& user_info);
    template <class Source>
    uri_builder& host(const Source& host);
    template <class Source>
    uri_builder& port(const Source& port);
    template <class Source>
    uri_builder& authority(const Source& authority);
    template <class UserInfoSource, class HostSource, PortSource>
    uri_builder& authority(const UserInfoSource& user_info,
                           const HostSource& host, const PortSource& port);
    template <class Source>
    uri_builder& path(const Source& path);
    template <class Source>
    uri_builder& append_path(const Source& path);
    template <class Source>
    uri_builder& query(const Source& query);
    template <class Key, class Param>
    uri_builder& append_query(const Key& key, const Param& param);
    template <class Source>
    uri_builder& fragment(const Source& fragment);

    // Builder
    std::experimental::uri uri() const;

};
} // namespace experimental
} // namespace std

7.1 uri_builder requirements [class.uri_builder.requirements]

Function template parameters named Source shall be one of:

  • basic_string<CharT, CharTraits, Allocator>. The type charT shall be an encoded character type. A function argument const Source& source shall have an effective range [cbegin(source), cend(source)).
  • A character array that after array-to-pointer decay results in a pointer to a NTCTS. The value type shall be an encoded character type. A function argument const Source& source shall have an effective range [source, end) where end is the first iterator value with an element value equal to iterator_traits<decay<Source>::type>::value_type().
  • A type that be convertible to std::experimental::uri::string_type by a means that can be chosen by the implementation.

Arguments of type Source shall not be null pointers.

The URI must be built according to component recomposition rules in IETF RFC 3986, section 5.3.

7.2 uri_builder constructors [class.uri_builder.constructors]

uri_builder();

Constructs a uri_builder object.

uri_builder(const uri& base);

Constructs a uri_builder object from a base URI.

template <class Source>
uri_builder(const Source& base);

Constructs a uri_builder object from a base URI.

7.3 uri_builder members [class.uri_builder.members]

template <class Source>
uri_builder& scheme(const Source& scheme);
Effects: Sets the URI scheme.
template <class Source>
uri_builder& user_info(const Source& user_info);
Effects: Sets the URI user_info.
template <class Source>
uri_builder& host(const Source& host);
Effects: Sets the URI host.
template <class Source>
uri_builder& port(const Source& port);
Effects: Sets the URI port.
template <class Source>
uri_builder& authority(const Source& authority);
Effects: Sets the URI authority.
template <class UserInfoSource, class HostSource, class PortSource>
uri_builder& authority(const UserInfoSource& user_info,
                       const HostSource& host, const PortSource& port);
Effects: Sets the URI user info, host and port.
template <class Source>
uri_builder& path(const Source& path);
Effects: Sets the URI path.
template <class Source>
uri_builder& append_path(const Source& path);
Effects: Appends an element to the uri object’s path.
template <class Source>
uri_builder& query(const Source& query);
Effects: Sets the URI query.
template <class Key, class Param>
uri_builder& append_query(const Key& key, const Param& param);
Effects: Appends a key-value pair to the uri object’s query.
template <class Source>
uri_builder& fragment(const Source& fragment);
Effects: Sets the URI fragment.
std::experimental::uri uri() const;
Effects: Builds a URI object from the provided parts. A URI built using this method should be normalized according to syntax-based normalization. This includes case normalization, percent-encoding normalization, character normalization and path segment normalization.
Throws: uri_builder_error if any of the parts are invalid and a valid uri cannot be formed.

8 Class uri_syntax_error [class.uri_syntax_error]

namespace std {
namespace experimental {
class uri_syntax_error : public std::system_error {
public:
    uri_syntax_error(const string& what_arg, error_code ec);
    virtual ~uri_syntax_error() noexcept;
    virtual const char *what() const noexcept;
};
} // namespace experimental
} // namespace std

8.1 uri_syntax_error members [class.uri_syntax_error.members]

8.1.1 uri_syntax_error constructors [class.uri_syntax_error.constructors]

uri_syntax_error(const string& what_arg, error_code ec);
Postconditions: what() == what_arg.c_str() && code() == ec

8.1.2 uri_syntax_error accessors [class.uri_syntax_error.accessors]

const char *what() const noexcept;
Returns: A string containing the message in the string passed as what_arg to the class constructor.

9 Class uri_builder_error [class.uri_builder_error]

namespace std {
namespace experimental {
class uri_builder_error : public std::system_error {
public:
    uri_builder_error(const string& what_arg, error_code ec);
    virtual ~uri_builder_error() noexcept;
    virtual const char *what() const noexcept;
};
} // namespace experimental
} // namespace std

9.1. uri_builder_error members [class.uri_builder_error.members]

9.1.1 uri_builder_error constructors [class.uri_builder_error.constructors]

uri_builder_error(const string& what_arg, error_code ec);
Postconditions: what() == what_arg.c_str() && code() == ec

9.1.2 uri_builder_error accessors [class.uri_builder_error.accessors]

const char *what() const noexcept;
Returns: A string containing the message in the string passed as what_arg to the class constructor.

10 Class percent_decoding_error [class.percent_decoding_error]

namespace std {
namespace experimental {
class percent_decoding_error : public std::system_error {
public:
    percent_decoding_error(const string& what_arg, error_code ec);
    virtual ~percent_decoding_error() noexcept;
    virtual const char *what() const noexcept;
};
} // namespace experimental
} // namespace std

10.1 percent_decoding_error members [class.percent_decoding_error.members]

10.1.1 percent_decoding_error constructors [class.percent_decoding_error.constructors]

percent_decoding_error(const string& what_arg, error_code ec);
Postconditions: what() == what_arg.c_str() && code() == ec

10.1.2 percent_decoding_error accessors [class.percent_decoding_error.accessors]

const char *what() const noexcept;
Returns: A string containing the message in the string passed as what_arg to the class constructor.

Issues

Note

Issues

1. Percent decoding

An exception is thrown when encountering a percent-encoded sequence that causes data loss (any sequence yield bytes above 0x80). Beyond that, there may be ambiguity when deocding reserved characters, for example, “http://www.example.com/?q=foo+bar%2B”. On decoding this URI, the distinction between “+” and “%2B” is lost.

2. Scheme-specific and protocol specific transformations

The highest level in the comparison ladder that this draft will implement is syntax_based normalization. Any scheme-specific transformations need to be specified very carefully if users are able to rely on the same behavior in different implementations. These are considered outside the scope of this draft.

3. WhatWG specification

The WhatWG specification attempts to deprecate the terms URI/IRI in favor of the term URL. This has not been implemented in this draft, for reasons of consistency with the majority of implementations of URIs in C++ and in other languages.

Acknowledgements

Note

C++ Network Library users and mailing list

Kyle Kloepper and Niklas Gustafsson for providing valuable feedback and encouragement, and for presenting different versions of this proposal at committee meetings.

Beman Dawes and his Filesystem proposal from which I was influenced strongly in the class design.

Thiago Macieira Faure of Qt for important feedback on the draft proposal.

Wikipedia, for being there.