______________________________________________________________________ 1 General [intro] ______________________________________________________________________ 1.1 Scope [intro.scope] 1 This International Standard specifies requirements for processors of the C++ programming language. The first such requirement is that they implement the language, and so this Standard also defines C++. Other requirements and relaxations of the first requirement appear at various places within the Standard. 2 C++ is a general purpose programming language based on the C programming language as described in ISO/IEC 9899 (1.2). In addition to the facilities provided by C, C++ provides additional data types, classes, templates, exceptions, inline functions, operator overloading, function name overloading, references, free store management operators, function argument checking and type conversion, and additional library facilities. These extensions to C are summarized in C.1. The differences between C++ and ISO C1) are summarized in C.2. The extensions to C++ since 1985 are summarized in C.1.2. 1.2 Normative references [intro.refs] 1 The following standards contain provisions which, through reference in this text, constitute provisions of this International Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this International Standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. - ANSI X3/TR-1-82:1982, American National Dictionary for Information Processing Systems. - ISO/IEC 9899:1990, C Standard - ISO/IEC xxxx:199x Amendment 1 to C Standard +------- BEGIN BOX 1 -------+ This last title must be filled in when Amendment 1 is approved. The other titles have not been checked for accuracy. +------- END BOX 1 -------+ 1-2 General DRAFT: 27 May 1994 1.3 Definitions 1.3 Definitions [intro.defs] 1 For the purposes of this International Standard, the definitions given in ANSI X3/TR-1-82 and the following definitions apply. - argument: An expression in the comma-separated list bounded by the | parentheses in a function call expression, a sequence of | preprocessing tokens in the comma-separated list bounded by the | parentheses in a function-like macro invocation, the operand of throw, or an expression in the comma-separated list bounded by the angle brackets in a template instantiation. Also known as an actual argument or actual parameter. - diagnostic message: A message belonging to an implementation- defined subset of the implementation's message output. - dynamic type: The dynamic type of an expression is determined by its current value and may change during the execution of a program. If a pointer (8.3.1) whose static type is pointer to class B is pointing to an object of class D, derived from B (10), the dynamic type of the pointer is pointer to D. References (8.3.2) are treated similarly. - implementation-defined behavior: Behavior, for a correct program construct and correct data, that depends on the implementation and that each implementation shall document. The range of possible behaviors is delineated by the standard. - implementation limits: Restrictions imposed upon programs by the implementation. - locale-specific behavior: Behavior that depends on local conventions of nationality, culture, and language that each implementation shall document. - multibyte character: A sequence of one or more bytes representing a member of the extended character set of either the source or the execution environment. The extended character set is a superset of the basic character set. - parameter: an object or reference declared as part of a function declaration or definition ir the catch clause of an exception handler that acquires a value on entry to the function or handler, an identifier from the comma-separated list bounded by the parentheses immediately following the macro name in a function-like macro definition, or a template-parameter. A function may said to take arguments or to have parameters. Parameters are also known as a formal arguments or formal parameters. - signature: The signature of a function is the information about that function that participates in overload resolution (13.2): the types of its parameters and, if the function is a non-static member of a class, the CV-qualifiers (if any) on the function itself and whether the function is a direct member of its class or inherited 1.3 Definitions DRAFT: 27 May 1994 General 1-3 from a base class. - static type: The static type of an expression is the type (3.7) resulting from analysis of the program without consideration of execution semantics. It depends only on the form of the program and does not change. - undefined behavior: Behavior, such as might arise upon use of an | erroneous program construct or of erroneous data, for which the standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Note that | many erroneous program constructs do not engender undefined | behavior; they are required to be diagnosed. - unspecified behavior: Behavior, for a correct program construct and correct data, that depends on the implementation. The range of possible behaviors is delineated by the standard. The implementation is not required to document which behavior occurs. 1.4 Syntax notation [syntax] 1 In the syntax notation used in this manual, syntactic categories are indicated by italic type, and literal words and characters in constant width type. Alternatives are listed on separate lines except in a few cases where a long set of alternatives is presented on one line, marked by the phrase one of. An optional terminal or nonterminal symbol is indicated by the subscript opt, so { expressionopt } indicates an optional expression enclosed in braces. 2 Names for syntactic categories have generally been chosen according to the following rules: - X-name is a use of an identifier in a context that determines its meaning (e.g. class-name, typedef-name). - X-id is an identifier with no context-dependent meaning (e.g. qualified-id). - X-seq is one or more X's without intervening delimiters (e.g. | declaration-seq is a sequence of declarations). - X-list is one or more X's separated by intervening commas (e.g. __________________________ 1) Function signatures do not include return type, because that does not participate in overload resolution. 1-4 General DRAFT: 27 May 1994 1.4 Syntax notation expression-list is a sequence of expressions separated by commas). 1.5 The C++ memory model [intro.memory] 1 The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory accessible to a C++ program | is one or more contiguous sequences of bytes. Each byte (except perhaps registers) has a unique address. 2 The constructs in a C++ program create, refer to, access, and manipulate objects in memory. Each object (except bit-fields) occupies one or more contiguous bytes. Objects are created by definitions (3.1) and new-expressions (5.3.4). Each object has a type determined by the construct that creates it. The type in turn determines the number of bytes that the object occupies and the interpretation of their contents. Objects may contain other objects, called sub-objects (9.2, 10). An object that is not a sub-object of any other object is called a complete object. For every object x, there is some object called the complete object of x, determined as follows: - If x is a complete object, then x is the complete object of x. - Otherwise, the complete object of x is the complete object of the (unique) object that contains x. 3 C++ provides a variety of built-in types and several ways of composing new types from existing types. 4 Certain types have alignment restrictions. An object of one of those types may appear only at an address that is divisible by a particular integer. 1.6 Processor compliance [intro.compliance] 1 Every conforming C++ processor shall, within its resource limits, accept and correctly execute well-formed C++ programs, and shall issue at least one diagnostic error message when presented with any ill- formed program that contains a violation of any rule that is identified as diagnosable in this Standard or of any syntax rule, except as noted herein. 2 Well-formed C++ programs are those that are constructed according to the syntax rules, semantic rules identified as diagnosable, and the One Definition Rule (3.1). If a program is not well-formed but does not contain any diagnosable errors, this Standard places no requirement on processors with respect to that program. 1.7 Program execution DRAFT: 27 May 1994 General 1-5 1.7 Program execution [intro.execution] 1 The semantic descriptions in this Standard define a parameterized nondeterministic abstract machine. This Standard places no requirement on the structure of conforming processors. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming processors are required to emulate (only) the observable behavior of the abstract machine as explained below. 2 Certain aspects and operations of the abstract machine are described | in this Standard as implementation defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects, which documentation defines the instance of the abstract machine that corresponds to that implementation (referred to as the ``corresponding instance'' below). 3 Certain other aspects and operations of the abstract machine are described in this Standard as unspecified (for example, order of evaluation of arguments to a function). In each case the Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine. An instance of the abstract machine may thus have more than one possible execution sequence for a given program and a given input. 4 Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). 5 A conforming processor executing a well-formed program shall produce the same observable behavior as one of the possible execution sequences of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution sequence contains an undefined operation, this Standard places no requirement on the processor executing that program with that input (not even with regard to operations previous to the first undefined operation). 6 The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.2) __________________________ 2) An implementation can offer additional library I/O functions as an extension. Implementations that do so should treat calls to those functions as ``observable behavior'' as well. ______________________________________________________________________ 2 Lexical conventions [lex] ______________________________________________________________________ 1 A C++ program need not all be translated at the same time. The text of the program is kept in units called source files in this standard. A source file together with all the headers (17.1.2) and source files included (16.2) via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion (16.1) preprocessing directives, is called a translation unit. Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate (3.4) by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program. (3.4). 2.1 Phases of translation [lex.phases] 1 The precedence among the syntax rules of translation is specified by the following phases.3) 1 Physical source file characters are mapped to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences (2.2) are replaced by corresponding single-character internal representations. 2 Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character. 3 The source file is decomposed into preprocessing tokens (2.3) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or comment. Each comment is replaced by one space character. New- line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined. The process of dividing a source file's characters into preprocessing tokens is context-dependent. For example, see the handling of < within a __________________________ 3) Implementations must behave as if these separate phases occur, although in practice different phases may be folded together. 2-2 Lexical conventions DRAFT: 27 May 1994 2.1 Phases of translation #include preprocessing directive. 4 Preprocessing directives are executed and macro invocations are expanded. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. 5 Each source character set member and escape sequence in character constants and string literals is converted to a member of the execution character set. 6 Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated. 7 White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. (See 2.4). The resulting tokens are syntactically and semantically analyzed and translated. The result of this process starting from a single source file is called a translation unit. 8 The translation units that will form a program are combined. All external object and function references are resolved. +------- BEGIN BOX 2 -------+ What about shared libraries? +------- END BOX 2 -------+ Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment. 2.2 Trigraph sequences [lex.trigraph] 1 Before any other processing takes place, each occurrence of one of the following sequences of three characters (trigraph sequences) is replaced by the single character indicated in Table 1. 2.2 Trigraph sequences DRAFT: 27 May 1994 Lexical conventions 2-3 Table 1-trigraph sequences __________________________________________________________________________ | trigraph replacement| trigraph replacement| trigraph replacement| |_______________________|________________________|________________________| | ??= # | ??( [ | ??< { | |_______________________|________________________|________________________| | ??/ \ | ??) ] | ??> } | |_______________________|________________________|________________________| | ??' ^ | ??! | | ??- ~ | |_______________________|________________________|________________________| 2 For example, ??=define arraycheck(a,b) a??(b??) ??!??! b??(a??) becomes #define arraycheck(a,b) a[b] || b[a] 2.3 Preprocessing tokens [lex.pptoken] preprocessing-token: header-name identifier pp-number character-constant string-literal operator digraph punctuator each non-white-space character that cannot be one of the above 1 Each preprocessing token that is converted to a token (2.5) shall have the lexical form of a keyword, an identifier, a constant, a string literal, an operator, a digraph, or a punctuator. 2 A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The categories of preprocessing token are: header names, identifiers, preprocessing numbers, character constants, string literals, operators, punctuators, digraphs, and single non-white-space characters that do not lexically match the other preprocessing token categories. If a ' or a " character matches the last category, the behavior is undefined. Preprocessing tokens can be separated by white space; this consists of comments (2.6), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As described in 2-4 Lexical conventions DRAFT: 27 May 1994 2.3 Preprocessing tokens Clause 16, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more than preprocessing token separation. White space may appear within a preprocessing token only as part of a header name or between the quotation characters in a character constant or string literal. 3 If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token. 4 The program fragment 1Ex is parsed as a preprocessing number token (one that is not a valid floating or integer constant token), even though a parse as the pair of preprocessing tokens 1 and Ex might produce a valid expression (for example, if Ex were a macro defined as +1). Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating constant token), whether or not E is a macro name. 5 The program fragment x+++++y is parsed as x ++ ++ + y, which violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression. 2.4 Digraph sequences [lex.digraph] 1 Alternate representations are provided for the operators and punctuators whose primary representations use the national characters. These include digraphs and additional reserved words. digraph: <% %> <: :> %: | 2 In translation phase 3 (2.1) the digraphs are recognized as preprocessing tokens. Then in translation phase 7 the digraphs and the additional identifiers listed below are converted into tokens identical to those from the corresponding primary representations, as shown in Table 2. 2.4 Digraph sequences DRAFT: 27 May 1994 Lexical conventions 2-5 Table 2-identifiers that are treated as operators _________________________________________________________________ | alternate primary| alternate primary| alternate primary| |____________________|_____________________|_____________________| | <% { | and && | and_eq &= | |____________________|_____________________|_____________________| | %> } | bitor | | or_eq |= | |____________________|_____________________|_____________________| | <: [ | or || | xor_eq ^= | |____________________|_____________________|_____________________| | :> ] | xor ^ | not ! | |____________________|_____________________|_____________________| | %: # | compl ~ | not_eq != | | |____________________|_____________________|_____________________| | bitand & | | | |____________________|_____________________|_____________________| 2.5 Tokens [lex.token] token: identifier keyword literal operator punctuator 1 There are five kinds of tokens: identifiers, keywords, literals (which include strings and character and numeric constants), operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, white space), as described below, are ignored except as they serve to separate tokens. Some white space is required to separate otherwise adjacent identifiers, keywords, and literals. 2 If the input stream has been parsed into tokens up to a given character, the next token is taken to be the longest string of characters that could possibly constitute a token. 2.6 Comments [lex.comment] 1 The characters /* start a comment, which terminates with the characters */. These comments do not nest. The characters // start a comment, which terminates the next new-line character. If there is a form-feed or a vertical-tab character in such a comment, only white- space characters may appear between it and the new-line that terminates the comment; no diagnostic is required. The comment characters //, /*, and */ have no special meaning within a // 2-6 Lexical conventions DRAFT: 27 May 1994 2.6 Comments comment and are treated just like other characters. Similarly, the comment characters // and /* have no special meaning within a /* comment. 2.7 Identifiers [lex.name] identifier: nondigit identifier nondigit identifier digit nondigit: one of _ a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z digit: one of 0 1 2 3 4 5 6 7 8 9 1 An identifier is an arbitrarily long sequence of letters and digits. The first character must be a letter; the underscore _ counts as a letter. Upper- and lower-case letters are different. All characters are significant. 2.8 Keywords [lex.key] 1 The identifiers shown in Table 3 are reserved for use as keywords, and may not be used otherwise in phases 7 and 8: Table 3-keywords ________________________________________________________________________ | asm delete if reinterpret_cast true | | auto do inline return try | | bool double int short typedef | | break dynamic_cast long signed typeid | | case else mutable sizeof union | | catch enum namespace static unsigned | | char extern new static_cast using | | class false operator struct virtual | | const float private switch void | | const_cast for protected template volatile | | continue friend public this wchar_t | | default goto register throw while | |_______________________________________________________________________| 2.8 Keywords DRAFT: 27 May 1994 Lexical conventions 2-7 2 Furthermore, the alternate representations shown in Table 4 for certain operators and punctuators (2.4) are reserved and may not be used otherwise: Table 4-alternate representations ________________________________________________ | bitand and bitor or xor compl| | and_eq or_eq xor_eq not not_eq | |_______________________________________________| 3 In addition, identifiers containing a double underscore ( __) are reserved for use by C++ implementations and standard libraries and should be avoided by users; no diagnostic is required. 4 The ASCII representation of C++ programs uses as operators or for punctuation the characters shown in Table 5. Table 5-operators and punctuation characters _______________________________________________________ | ! % ^ & * ( ) - + _ { } | ~| | [ ] \ ; ' : " < > ? , . / | |______________________________________________________| Table 6 shows the character combinationations that are used as operators. Table 6-character combinations used as operators ______________________________________________________________ | -> ++ -- .* ->* << >> <= >= == != &&| | || *= /= %= += -= <<= >>= &= ^= |= ::| |_____________________________________________________________| Each is converted to a single token in translation phase 7 (2.1). 5 Table 7 shows character combinations that are used as alternative representations for certain operators and punctuators (2.4). 2-8 Lexical conventions DRAFT: 27 May 1994 2.8 Keywords Table 7-digraphs ________________________ | <% %> <: :> %:| | |_______________________| Each of these is also recognized as a single token in translation phases 3 and 7. 6 Table 8 shows additional tokens that are used by the preprocessor. Table 8-preprocessing tokens __________________________ | # ## %: %:%: | | |_________________________| 7 Certain implementation-dependent properties, such as the type of a sizeof (5.3.3) and the ranges of fundamental types (3.7.1), are defined in the standard header files (16.2) These headers are part of the ISO C standard. In addition the headers define the types of the most basic library functions. The last two headers are part of the ISO C standard; is C++ specific. 2.9 Literals [lex.literal] 1 There are several kinds of literals (often referred to as constants). literal: integer-literal character-literal floating-literal string-literal boolean-literal 2.9.1 Integer literals DRAFT: 27 May 1994 Lexical conventions 2-9 2.9.1 Integer literals [lex.icon] integer-literal: decimal-literal integer-suffixopt octal-literal integer-suffixopt hexadecimal-literal integer-suffixopt decimal-literal: nonzero-digit decimal-literal digit octal-literal: 0 octal-literal octal-digit hexadecimal-literal: 0x hexadecimal-digit 0X hexadecimal-digit hexadecimal-literal hexadecimal-digit nonzero-digit: one of 1 2 3 4 5 6 7 8 9 octal-digit: one of 0 1 2 3 4 5 6 7 hexadecimal-digit: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F integer-suffix: unsigned-suffix long-suffixopt long-suffix unsigned-suffixopt unsigned-suffix: one of u U long-suffix: one of l L 1 An integer literal consisting of a sequence of digits is taken to be decimal (base ten) unless it begins with 0 (digit zero). A sequence of digits starting with 0 is taken to be an octal integer (base 2-10 Lexical conventions DRAFT: 27 May 1994 2.9.1 Integer literals eight). The digits 8 and 9 are not octal digits. A sequence of digits preceded by 0x or 0X is taken to be a hexadecimal integer (base sixteen). The hexadecimal digits include a or A through f or F with decimal values ten through fifteen. For example, the number twelve can be written 12, 014, or 0XC. 2 The type of an integer literal depends on its form, value, and suffix. If it is decimal and has no suffix, it has the first of these types in which its value can be represented: int, long int, unsigned long int. If it is octal or hexadecimal and has no suffix, it has the first of these types in which its value can be represented: int, unsigned int, long int, unsigned long int. If it is suffixed by u or U, its type is the first of these types in which its value can be represented: unsigned int, unsigned long int. If it is suffixed by l or L, its type is the first of these types in which its value can be represented: long int, unsigned long int. If it is suffixed by ul, lu, uL, Lu, Ul, lU, UL, or LU, its type is unsigned long int. 3 A program is ill-formed if it contains an integer literal that cannot be represented by any of the allowed types. 2.9.2 Character literals [lex.ccon] character-literal: 'c-char-sequence' L'c-char-sequence' c-char-sequence: c-char c-char-sequence c-char c-char: any member of the source character set except the single-quote ', backslash \, or new-line character escape-sequence escape-sequence: simple-escape-sequence octal-escape-sequence hexadecimal-escape-sequence simple-escape-sequence: one of \' \" \? \\ \a \b \f \n \r \t \v 2.9.2 DRAFT: 27 May 1994 Lexical conventions 2-11 Character literals octal-escape-sequence: \ octal-digit \ octal-digit octal-digit \ octal-digit octal-digit octal-digit hexadecimal-escape-sequence: \x hexadecimal-digit hexadecimal-escape-sequence hexadecimal-digit 1 A character literal is one or more characters enclosed in single quotes, as in 'x', optionally preceded by the letter L, as in L'x'. Single character literals that do not begin with L have type char, with value equal to the numerical value of the character in the machine's character set. Multicharacter literals that do not begin with L have type int and implementation-defined value. 2 A character literal that begins with the letter L, such as L'ab', is a wide-character literal. Wide-character literals have type wchar_t. They are intended for character sets where a character does not fit into a single byte. 3 Certain nongraphic characters, the single quote ', the double quote ", ?, and the backslash \, may be represented according to Table 9. Table 9-escape sequences ___________________________________ | new-line NL (LF) \n | | horizontal tab HT \t | | vertical tab VT \v | | backspace BS \b | | carriage return CR \r | | form feed FF \f | | alert BEL \a | | backslash \ \\ | | question mark ? \? | | single quote ' \' | | double quote " \" | | octal number ooo \ooo | | hex number hhh \xhhh| |__________________________________| If the character following a backslash is not one of those specified, the behavior is undefined. An escape sequence specifies a single character. 2-12 Lexical conventions DRAFT: 27 May 1994 2.9.2 Character literals 4 The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by a sequence of hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of hexadecimal digits in the sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character literal is implementation dependent if it exceeds that of the largest char. 2.9.3 Floating literals [lex.fcon] floating-constant: fractional-constant exponent-partopt floating-suffixopt digit-sequence exponent-part floating-suffixopt fractional-constant: digit-sequenceopt . digit-sequence digit-sequence . exponent-part: e signopt digit-sequence E signopt digit-sequence sign: one of + - digit-sequence: digit digit-sequence digit floating-suffix: one of f l F L 1 A floating literal consists of an integer part, a decimal point, a fraction part, an e or E, an optionally signed integer exponent, and an optional type suffix. The integer and fraction parts both consist of a sequence of decimal (base ten) digits. Either the integer part or the fraction part (not both) may be missing; either the decimal point or the letter e (or E) and the exponent (not both) may be missing. The type of a floating literal is double unless explicitly specified by a suffix. The suffixes f and F specify float, the suffixes l and L specify long double. 2.9.4 String literals DRAFT: 27 May 1994 Lexical conventions 2-13 2.9.4 String literals [lex.string] string-literal: "s-char-sequenceopt" L"s-char-sequenceopt" s-char-sequence: s-char s-char-sequence s-char s-char: any member of the source character set except the double-quote ", backslash \, or new-line character escape-sequence 1 A string literal is a sequence of characters (as defined in 2.9.2) surrounded by double quotes, optionally beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L has type array of char and static storage duration (3.6), and is | initialized with the given characters. Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation dependent. The effect of attempting to modify a string literal is undefined. 2 A string literal that begins with L, such as L"asdf", is a wide- character string. A wide-character string is of type array of wchar_t. Concatenation of ordinary and wide-character string literals is undefined. +------- BEGIN BOX 3 -------+ Should this render the program ill-formed? Or is it deliberately undefined to encourage extensions? +------- END BOX 3 -------+ 3 Adjacent string literals are concatenated. Characters in concatenated strings are kept distinct. For example, "\xA" "B" contains the two characters '\xA' and 'B' after concatenation (and not the single hexadecimal character '\xAB'). 4 After any necessary concatenation '\0' is appended so that programs that scan a string can find its end. The size of a string is the number of its characters including this terminator. Within a string, the double quote character " must be preceded by a \. 2-14 Lexical conventions DRAFT: 27 May 1994 2.9.5 Boolean literals 2.9.5 Boolean literals [lex.bool] boolean-literal: false true 1 The Boolean literals are the keywords false and true. Such literals have type bool and the given values. They are not lvalues. ______________________________________________________________________ 3 Basic concepts [basic] ______________________________________________________________________ 1 This clause presents the basic concepts of the C++ language. It explains the difference between an object and a name and how they relate to the notion of an lvalue. It introduces the concepts of a declaration and a definition and presents C++'s notion of type, scope, linkage, and storage duration. The mechanisms for starting | and terminating a program are discussed. Finally, this clause presents the fundamental types of the language and lists the ways of | constructing compound types from these. 2 This clause does not cover concepts that affect only a single part of the language. Such concepts are discussed in the relevant clauses. 3 An entity is a value, object, subobject, base class subobject, array element, variable, function, set of functions, instance of a function, enumerator, type, class member, template, or namespace. 4 A name is a use of an identifier (2.7) that denotes an entity or | label (6.6.4, 6.1). 5 Every name that denotes an entity is introduced by a declaration. Every name that denotes a label is introduced either by a goto statement (6.6.4) or a labeled-statement (6.1). Every name is | introduced in some contiguous portion of program text called a | declarative region (3.3), which is the largest part of the program in which that name can possibly be valid. In general, each particular name is valid only within some possibly discontiguous portion of program text called its scope (3.3). To determine the scope of a | declaration, it is sometimes convenient to refer to the potential scope of a declaration. The scope of a declaration is the same as its potential scope unless the potential scope contains another declaration of the same name. In that case, the potential scope of the declaration in the inner (contained) declarative region is excluded from the scope of the declaration in the outer (containing) declarative region. 6 For example, in 3-2 Basic concepts DRAFT: 27 May 1994 3 Basic concepts int j = 24; main() { int i = j, j; j = 42; } the identifier j is declared twice as a name (and used twice). The declarative region of the first j includes the entire example. The potential scope of the first j begins immediately after that j and extends to the end of the program, but its (actual) scope excludes the text between the , and the }. The declarative region of the second declaration of j (the j immediately before the semicolon) includes all the text between { and }, but its potential scope excludes the | declaration of i. The scope of the second declaration of j is the | same as its potential scope. 7 Some names denote types, classes, or templates. In general, it is necessary to determine whether or not a name denotes one of these entities before parsing the program that contains it. The process that determines this is called name lookup. 8 An identifier used in more than one translation unit may potentially refer to the same entity in these translation units depending on the linkage (3.4) specified in the translation units. 9 An object is a region of storage (3.8). In addition to giving it a name, declaring an object gives the object a storage duration, (3.6), | which determines the object's lifetime. Some objects are polymorphic; the implementation generates information carried in each such object that makes it possible to determine that object's type during program execution. For other objects, the meaning of the values found therein is determined by the type of the expressions used to access them. +------- BEGIN BOX 4 -------+ Most of this section needs more work. +------- END BOX 4 -------+ 3.1 Declarations and definitions [basic.def] 1 A declaration (7) introduces one or more names into a program and gives each name a meaning. 2 A declaration is a definition unless it declares a function without specifying the function's body (8.4), it contains the extern specifier (7.1.1) and neither an initializer nor a function-body, it declares a static data member in a class declaration (9.5), it is a class name declaration (9.1), or it is a typedef declaration (7.1.3), a using declaration(7.3.3), or a using directive(7.3.4). 3.1 DRAFT: 27 May 1994 Basic concepts 3-3 Declarations and definitions 3 The following, for example, are definitions: int a; // defines a extern const int c = 1; // defines c int f(int x) { return x+a; } // defines f struct S { int a; int b; }; // defines S struct X { // defines X int x; // defines nonstatic data member x static int y; // declares static data member y X(): x(0) { } // defines a constructor of X }; int X::y = 1; // defines X::y enum { up, down }; // defines up and down namespace N { int d; } // defines N and N::d namespace N1 = N; // defines N1 X anX; // defines anX whereas these are just declarations: extern int a; // declares a extern const int c; // declares c int f(int); // declares f struct S; // declares S typedef int Int; // declares Int extern X anotherX; // declares anotherX using N::d; // declares N::d 4 In some circumstances, C++ implementations generate definitions automatically. These definitions include default constructors, copy constructors, assignment operators, and destructors. For example, given struct C { string s; // string is the standard library class (17.5.1.1) }; main() { C a; C b=a; b=a; } the implementation will generate functions to make the definition of C equivalent to struct C { string s; C(): s() { } C(const C& x): s(x.s) { } C& operator=(const C& x) { s = x.s; return *this; } ~C() { } }; 3-4 Basic concepts DRAFT: 27 May 1994 3.1 Declarations and definitions 3.2 One definition rule [basic.def.odr] +------- BEGIN BOX 5 -------+ This is still very much under review by the Committee. +------- END BOX 5 -------+ 1 No translation unit shall contain more than one definition of any | variable, function, class type, enumeration type or template. 2 A function is used if it is called, its address is taken, or it is a | virtual member function that is not pure (10.4). Every program shall contain at least one definition of every function that is used in that program. That definition may appear explicitly in the program, it may be found in the standard or a user-defined library, or (when appropriate) the implementation may generate it. If a non-virtual function is not defined, a diagnostic is required only if an attempt is actually made to call that function. +------- BEGIN BOX 6 -------+ This says nothing about user-defined libraries. Probably it shouldn't, but perhaps it should be more explicit that it isn't discussing it. +------- END BOX 6 -------+ 3 Exactly one definition in a program is required for a non-local variable with static storage duration, unless it has a builtin type or is an aggregate and also is unused or used only as the operand of the sizeof operator. +------- BEGIN BOX 7 -------+ This is still uncertain. +------- END BOX 7 -------+ 4 At least one definition of a class is required in a translation unit if the class is used other than in the formation of a pointer type. +------- BEGIN BOX 8 -------+ This is not quite right, because it is possible to declare a function | that has an undefined class type as its return type, that has | arguments of undefined class type. +------- END BOX 8 -------+ +------- BEGIN BOX 9 -------+ There may be other situations that do not require a class to be defined: extern declarations (i.e. "extern X x;"), declaration of static members, others??? +------- END BOX 9 -------+ For example the following complete translation unit is well-formed, even though it never defines X: 3.2 DRAFT: 27 May 1994 Basic concepts 3-5 One definition rule struct X; // declare X is a struct type struct X* x1; // use X in pointer formation X* x2; // use X in pointer formation 5 There may be more than one definition of a named enumeration type in a program provided that each definition appears in a different translation unit and the values of the enumerators are the same. +------- BEGIN BOX 10 -------+ This will need to be revisited when the ODR is made more precise +------- END BOX 10 -------+ 6 There may be more than one definition of a class type in a program provided that each definition appears in a different translation unit and the definitions describe the same type. | 7 No diagnostic is required for a violation of the ODR rule. | +------- BEGIN BOX 11 -------+ This will need to be revisited when the ODR is made more precise +------- END BOX 11 -------+ 3.3 Declarative regions and scopes [basic.scope] 1 The scope rules are summarized in 10.5. | 3.3.1 Local scope [basic.scope.local] 1 A name declared in a block (6.3) is local to that block. Its scope begins at its point of declaration (3.3.10) and ends at the end of its declarative region. 2 Names of parameters of a function are local to the function and shall not be redeclared in the outermost block of that function. 3 The name in a catch exception-declaration is local to the handler and shall not be redeclared in the outermost block of the handler. 4 Names in a declaration in the condition part of an if, while, for, do, or switch statement are local to the controlled statement and shall not be redeclared in the outermost block of that statement. 3.3.2 Function prototype scope [basic.scope.proto] 1 In a function declaration, names of parameters (if supplied) have function prototype scope, which terminates at the end of the function declarator. 3-6 Basic concepts DRAFT: 27 May 1994 3.3.3 Function scope 3.3.3 Function scope 1 Labels (6.1) can be used anywhere in the function in which they are declared. Only labels have function scope. 3.3.4 File scope [basic.file.scope] 1 A name declared outside all named namespaces (_namespace_), blocks (6.3) and classes (9) has file scope. The potential scope of such a name begins at its point of declaration (3.3.10) and ends at the end of the translation unit that is its declarative region. Names declared with file scope are said to be global. 2 File scope can be treated as a special case of namespace scope (3.3.5) by viewing an entire translation unit as an unnamed namespace called the global namespace. 3.3.5 Namespace scope [basic.scope.namespace] 1 A name declared in a namespace (_namespace_) has namespace scope. Its potential scope includes its namespace from the name's point of declaration (3.3.10) onwards, as well as the potential scope of any using directive (7.3.4) that nominates its namespace. A namespace | member can be also be used after the :: scope resolution operator | (5.1) applied to the name of its namespace. | 2 A function may be defined only in namespace or class scope. | 3.3.6 Class scope [basic.scope.class] 1 The name of a class member is local to its class and can be used only in a member of that class (9.4) or a class derived from that class, after the . operator applied to an expression of the type of its class (5.2.4) or a class derived from (10) its class, after the -> operator applied to a pointer to an object of its class (5.2.4) or a class derived from (10) its class, after the :: scope resolution operator (5.1) applied to the name of its class or a class derived from its class, or after a using directive (7.3.4). | +------- BEGIN BOX 12 -------+ What does: "can be used only in a member of that class" mean? It should be phrased to include: body of member functions, ctor-init- list, static member initializers. | +------- END BOX 12 -------+ 2 The scope of names introduced by friend declarations is described in | | 7.3.1. | 3 A function may be defined only in namespace or class scope. | | 4 The scope rules for classes are summarized in 9.3. | 3.3.7 Name hiding DRAFT: 27 May 1994 Basic concepts 3-7 3.3.7 Name hiding [basic.scope.hiding] 1 A name may be hidden by an explicit declaration of that same name in a nested declarative region or derived class. 2 A class name (9.1) may be hidden by the name of an object, function, or enumerator declared in the same scope. If a class and an object, function, or enumerator are declared in the same scope (in any order) with the same name the class name is hidden. 3 If a name is in scope and is not hidden it is said to be visible. 4 The region in which a name is visible is called the reach of the name. +------- BEGIN BOX 13 -------+ The term 'reach' is defined here but never used. More work is needed with the "descriptive terminology". +------- END BOX 13 -------+ 3.3.8 Explicit qualification [basic.scope.exqual] +------- BEGIN BOX 14 -------+ The information in this section is very similar to the one provided in | 7.3.5. The information in these two sections (3.3.8 and 7.3.5) should | be consolidated in one place. | +------- END BOX 14 -------+ 1 A hidden name can still be used when it is qualified by its class or namespace name using the :: operator (5.1, 9.5, 10). A hidden file scope name can still be used when it is qualified by the unary :: operator (5.1). 3.3.9 Elaborated type specifier [basic.scope.elab] 1 A class name or enumeration name can be hidden by the name of an | object, function, or enumerator in local, class or namespace scope. A | hidden class name can still be used when appropriately prefixed with class, struct, or union (7.1.5), or when followed by the :: | operator. A hidden enumeration name can still be used when | appropriately prefixed with enum (7.1.5). For example: 3-8 Basic concepts DRAFT: 27 May 1994 3.3.9 Elaborated type specifier class A { public: static int n; }; main() { int A; A::n = 42; // OK class A a; // OK A b; // ill-formed: A does not name a type } The scope of class names first introduced in elaborated-type- | specifiers is described in (7.1.5.3). 3.3.10 Point of declaration [basic.scope.pdecl] 1 The point of declaration for a name is immediately after its complete declarator (8) and before its initializer (if any), except as noted below. For example, int x = 12; { int x = x; } 2 Here the second x is initialized with its own (unspecified) value. 3 For the point of declaration for an enumerator, see 7.2. 4 The point of declaration of a function with the extern or friend specifier is in the innermost enclosing namespace just after outermost nested scope containing it which is contained in the namespace. +------- BEGIN BOX 15 -------+ The terms "just after the outermost nested scope" imply name injection. We avoided introducing the concept of name injection in the working paper up until now. We should probably continue to do | without. +------- END BOX 15 -------+ 5 The point of declaration of a class first declared in an elaborated- type-specifier is immediately after the identifier; 6 A nonlocal name remains visible up to the point of declaration of the local name that hides it. For example, const int i = 2; { int i[i]; } declares a local array of two integers. | 3.3.10 DRAFT: 27 May 1994 Basic concepts 3-9 Point of declaration 7 The point of instantiation of a template is described in 14.3. | 3.4 Program and linkage [basic.link] 1 A program consists of one or more translation units (2) linked together. The process of linking together translation units. A | translation unit consists of a sequence of declarations. translation unit: * declaration-seqopt 2 A name is said to have linkage when it may denote the same object, | function, type, template, or value as a name introduced by a | declaration in another scope: | - When a name has external linkage, the entity it denotes may be | referred to by names from scopes of other translation units or from | other scopes of the same translation unit. | - When a name has internal linkage, the entity it denotes may be | referred to by names from other scopes of the same translation unit. | - When a name has no linkage, the entity it denotes cannot be | referred to by names from other scopes. | 3 A name is said to be ``of namespace scope'' if its immediate scope is | | the file scope or the scope of a named or unnamed namespace. | +------- BEGIN BOX 16 -------+ The definition of ``of namespace scope'' should probably appear | elsewhere. | +------- END BOX 16 -------+ A name of namespace scope has internal linkage if it is the name of | - a variable that is explicitly declared static or is explicitly | declared const and not explicitly declared extern; or | - a function that is explicitly declared static or is explicitly | declared inline and not explicitly declared extern. In addition, | the name of a data member of an anonymous union declared at | namespace scope has internal linkage. 4 A name of namespace scope has external linkage if it is the name of | - a variable, unless it has internal linkage; or | - a function, unless it has internal linkage; or | - a class that has any static data members (9.5), any member functions | that are not defined within the class definition and are not | explicitly declared inline (9.4.2), or any member types with | external linkage; or | 3-10 Basic concepts DRAFT: 27 May 1994 3.4 Program and linkage - a template (14). Moreover, the name of a class (9) or enumeration | (7.2) has external linkage if it is used to declare a function, | variable, or type with external linkage, to declare a template, or | to specify a template argument. Using a class object in a throw- | expression does not affect the linkage of the class. | +------- BEGIN BOX 17 -------+ This was voted in San Diego but was probably a mistake. There can, | after all, be no issue of C compatibility where exceptions are | involved. Moreover, this treatment creates a bad pitfall: | // file a.h | struct A { }; | // file main.c | #include "a.h" | extern void f(); | main() | { | try { | f(); | } catch (A) { | } | } | // file f.c | void f() { throw A(); } | 5 It is reasonable to expect that the throw and the catch refer to the | same type, but according to the San Diego resolutions they don't. | +------- END BOX 17 -------+ The names of class members and enumerators has external linkage if the | class or enumeration to which they belong has external linkage. | 6 The name of a function declared in a block scope or a variable | | declared extern in a block scope has linkage, either internal or | external to match the linkage of prior declarations of the name in the | same translation unit, but if there is no prior declaration it has | external linkage. 7 Names not covered by these rules have no linkage. Moreover, except as | noted, a name declared in a local scope (3.3.1) has no linkage and | shall not be used in a way that also requires it to have external | linkage. For example: | void f() || { || struct A { int x; }; // no linkage || extern A a; // ill-formed || } || Here, there are conflicting constraints on A: its use as the type of | an object with external linkage requires it to have external linkage, | 3.4 DRAFT: 27 May 1994 Basic concepts 3-11 Program and linkage but because it is declared in a local scope, it has no linkage. 8 Two names are the same if | - they are identifiers composed of the same character sequence; or | - they are the names of overloaded operator functions formed with the | same operator; or | - they are the names of user-defined conversion functions formed with | the same type. +------- BEGIN BOX 18 -------+ A definition of name-sameness should probably appear elsewhere, since | it is also assumed in [basic.scope.hiding]. +------- END BOX 18 -------+ Two names that are the same and that are declared in different scopes | shall denote the same object, function, type, enumerator, or template | if | - both names have external linkage or else both names have internal | linkage and are declared in the same translation unit; and | - both names refer to members of the same namespace or to members, not | by inheritance, of the same class; and | - when both names denote functions or function templates, the function | types are identical for purposes of overloading. 9 Inline class member functions must have exactly one definition in a * program. +------- BEGIN BOX 19 -------+ To be reworked when the ODR is clarified. +------- END BOX 19 -------+ 10After all adjustments of types (during which typedefs (7.1.3) are * replaced by their definitions), the types specified by all declarations of a particular external name must be identical, except that such types may differ by the presence or absence of a major array bound (8.3.4). A violation of this rule does not require a diagnostic. | +------- BEGIN BOX 20 -------+ This needs to specified more precisely to deal with function name | overloading. | +------- END BOX 20 -------+ 11Linkage to non-C++ declarations can be achieved using a linkage- * specification (7.5). 3-12 Basic concepts DRAFT: 27 May 1994 3.5 Start and termination 3.5 Start and termination [basic.start] 3.5.1 Main function [basic.start.main] 1 A program shall contain a function called main, which is the designated start of the program. 2 This function is not predefined by the compiler, it cannot be overloaded, and its type is implementation dependent. The two examples below are allowed on any implementation. It is recommended that any further (optional) parameters be added after argv. The function main() may be defined as int main() { /* ... */ } or int main(int argc, char* argv[]) { /* ... */ } In the latter form argc shall be the number of arguments passed to the program from an environment in which the program is run. If argc is nonzero these arguments shall be supplied as zero-terminated strings in argv[0] through argv[argc-1] and argv[0] shall be the name used to invoke the program or "". It is guaranteed that argv[argc]==0. 3 The function main() shall not be called from within a program. The linkage (3.4) of main() is implementation dependent. The address of main() shall not be taken and main() shall not be declared inline or static. 4 Calling the function void exit(int); declared in (17.2.4.5) terminates the program without | leaving the current block and hence without destroying any local variables (12.4). The argument value is returned to the program's environment as the value of the program. 5 A return statement in main() has the effect of leaving the main function (destroying any local variables) and calling exit() with the return value as the argument. If control reaches the end of main without encountering a return statement, the effect is that of executing return 0; 3.5.2 Initialization of non-local objects [basic.start.init] +------- BEGIN BOX 21 -------+ This is still under active discussion by the committee. +------- END BOX 21 -------+ 3.5.2 DRAFT: 27 May 1994 Basic concepts 3-13 Initialization of non-local objects 1 The initialization of nonlocal static objects (3.6) in a translation unit is done before the first use of any function or object defined in that translation unit. Such initializations (8.5, 9.5, 12.1, 12.6.1) may be done before the first statement of main() or deferred to any point in time before the first use of a function or object defined in that translation unit. The default initialization of all static objects to zero (8.5) is performed before any dynamic (that is, run- time) initialization. No further order is imposed on the initialization of objects from different translation units. The initialization of local static objects is described in 6.7. * 2 If construction or destruction of a non-local static object ends in | throwing an uncaught exception, the result is to call terminate() | (_exccept.terminate_). | 3.5.3 Termination | [basic.start.term] | 1 Destructors (12.4) for initialized static objects are called when | returning from main() and when calling exit() (17.2.4.5). Destruction is done in reverse order of initialization. The function atexit() from can be used to specify that a function must | be called at exit. If atexit() is to be called, objects initialized before an atexit() call may not be destroyed until after the function specified in the atexit() call has been called. 2 Where a C++ implementation coexists with a C implementation, any actions specified by the C implementation to take place after the atexit() functions have been called take place after all destructors have been called. 3 Calling the function void abort(); declared in terminates the program without executing | destructors for static objects and without calling the functions passed to atexit(). 3.6 Storage duration [basic.stc] 1 The storage duration of an object determines its lifetime. 2 The storage class specifiers static, auto, and mutable are related to storage duration as described below. 3.6.1 Static storage duration [basic.stc.static] 1 All non-local variables have static storage duration; such variables are created and destroyed as described in 3.5 and _stmt.decl_. 2 Note that if an object of static storage duration has a constructor or | a destructor with side effects, it shall not be eliminated even if it appears to be unused. 3-14 Basic concepts DRAFT: 27 May 1994 3.6.1 Static storage duration +------- BEGIN BOX 22 -------+ This awaits committee action on the ``as-if'' rule. +------- END BOX 22 -------+ 3 The keyword static may be used to declare a local variable with static storage duration; for a description of initialization and destruction of local variables, see 6.7. 4 The keyword static applied to a class variable in a class definition also determines that it has static storage duration. 3.6.2 Automatic storage duration [basic.stc.auto] 1 Local objects not declared static or explicitly declared auto or | register have automatic storage duration and are associated with an | invocation of a block (7.1.1). 2 Each object with automatic storage duration is initialized (8.5) each | time the control flow reaches its definition and destroyed (12.4) whenever control passes from within the scope of the object to outside that scope (6.6). 3 A named automatic object with a constructor or destructor with side effects may not be destroyed before the end of its block, nor may it be eliminated even if it appears to be unused. | 3.6.3 Dynamic storage duration | [basic.stc.dynamic] | 1 Objects can be created dynamically during program execution (1.7), | using new-expressions (5.3.4), and destroyed using delete- | expressions (5.3.5). A C++ implementation provides access to, and | management of, dynamic storage via the global allocation functions | operator new (17.3.3.4) and operator new[] (17.3.3.5), and the global | deallocation functions operator delete (17.3.3.2) and operator | delete[] (17.3.3.3). | 2 These functions are always implicitly declared. The library provides | | default definitions for them (17.3.3). A C++ program may provide at | most one definition of any of the functions ::operator new(size_t), | ::operator new[](size_t), ::operator delete(void*), and/or | ::operator delete[](void*). Any such function definitions replace the | default versions. This replacement is global and takes effect upon | program startup (3.5).Allocation and/or deallocation functions may | also be declared and defined for any class (12.5). | 3 Any allocation and/or deallocation functions defined in a C++ program | | shall conform to the semantics specified in this subclause. | 3.6.3.1 Allocation functions | [basic.stc.dynamic.allocation] | 1 Allocation functions can be static class member functions or global | | functions. They may be overloaded, but the return type shall always | be void* and the first parameter type shall always be size_t | 3.6.3.1 DRAFT: 27 May 1994 Basic concepts 3-15 Allocation functions (5.3.3), an implementation-defined integral type defined in the | standard header (17.3). | 2 The function shall return the address of a block of available storage | | at least as large as the requested size. The order, contiguity, and | initial value of storage allocated by successive calls to an | allocation function is unspecified. The pointer returned is suitably | aligned so that it may be assigned to a pointer of any type and then | used to access such an object or an array of such objects in the | storage allocated (until the storage is explicitly deallocated by a | call to a corresponding deallocation function). Each such allocation | shall yield a pointer to storage (1.5) disjoint from any other | currently allocated storage. The pointer returned points to the start | (lowest byte address) of the allocated storage. If the size of the | space is requested is zero, the value returned shall be nonzero and | disjoint from any other currently allocated storage. The results of | dereferencing a pointer returned as a request for zero size are | undefined.4) 3 If an allocation function is unable to obtain an appropriate block of | storage, it may invoke the currently installed new_handler5) and/or | throw an exception (15) of class alloc (17.3.2.9) or a class derived | from alloc. | 4 If the allocation function returns the null pointer the result is | | implementation defined. | 3.6.3.2 Deallocation functions | [basic.stc.dynamic.deallocation] | 1 Like allocation functions, deallocation functions may be static class | | member functions or global functions. | 2 Each deallocation function shall return void and its first parameter | | shall be void*. For class member deallocation functions, a second | parameter of type size_t may be added but deallocation functions may | not be overloaded. | 3 The value of the first parameter supplied to a deallocation function | | shall be zero, or refer to storage allocated by the corresponding | allocation function. If the value of the first argument is null, the | call to the deallocation function has no effect. If the value of the | first argument refers to a pointer already deallocated, the effect is | undefined. | __________________________ | 4) The intent is to have operator new() implementable by calling | malloc() or calloc(), so the rules are substantially the same. C++ || differs from C in requiring a zero request to return a non-null || pointer. || 5) A program-supplied allocation function may obtain the address of || the currently installed new_handler using the set_new_handler() || function (17.3.3.1). || 3-16 Basic concepts DRAFT: 27 May 1994 3.6.3.2 Deallocation functions 4 A deallocation function may free the storage referenced by the pointer | | given as its argument and renders the pointer invalid. The storage | may be available for further allocation. An invalid pointer contains | an unusable value: it cannot even be used in an expression. | 5 If the argument is non-null, the value of a pointer that refers to | | deallocated space is indeterminate. The effect of dereferencing an | indeterminate pointer value is undefined.6) 3.6.4 Duration of sub-objects [basic.stc.inherit] 1 The storage duration of class subobjects, base class subobjects and array elements is that of their complete object (1.5). 3.6.5 The mutable keyword [basic.stc.mutable] 1 The keyword mutable is grammatically a storage class specifier but is unrelated to the storage duration (lifetime) of the class member it describes. The mutable keyword is described in 3.8, 5.2.4, 7.1.1 and | 7.1.5.1. 3.6.6 Reference duration [basic.stc.ref] 1 Except in the case of a reference declaration initialised by an rvalue | (8.5.3), a reference may be used to name an existing object denoted by an lvalue. 2 The reference has static storage duration if it is declared non- | locally, automatic storage duration if declared locally including as a function parameter, and inherited storage duration if declared in a | class. 3 References may or may not require storage. 4 The duration of a reference is distinct from the duration of the object it refers to except in the case of a reference declaration | initialized by an rvalue. 5 Access through a reference to an object which no longer exists or has not yet been constructed yields undefined behaviour. +------- BEGIN BOX 23 -------+ Can references be declared auto or static? This section probably does not belong here. +------- END BOX 23 -------+ __________________________ || 6) On some architectures, it causes a system-generated runtime fault. || 3.7 Types DRAFT: 27 May 1994 Basic concepts 3-17 3.7 Types [basic.types] +------- BEGIN BOX 24 -------+ Section 9.2 describes the concept of layout-compatible types. | Shouldn't this information be described here? | +------- END BOX 24 -------+ 1 There are two kinds of types: fundamental types and compound types. Types may describe objects, references (8.3.2), or functions (8.3.5). 2 Arrays of unknown size and classes that have been declared but not defined are called incomplete types because the size and structure of an instance of the type is unknown. Also, the void type represents an empty set of values, so that no objects of type void ever exist; void is an incomplete type. The term incompletely-defined object type is a synonym for incomplete type; the term completely-defined object type is a synonym for complete type; 3 A class type (such as class X) may be incomplete at one point in a translation unit and complete later on; the type class X is the same type at both points. The declared type of an array may be incomplete at one point in a translation unit and complete later on; the array types at those two points (array of unknown bound of T and array of N T) are different types. However, the type of a pointer to array of unknown size cannot be completed. 4 Expressions that have incomplete type are prohibited in some contexts. | For example: class X; // X is an incomplete type | extern X* xp; // xp is a pointer to an incomplete type extern int arr[]; // the type of arr is incomplete typedef int UNKA[]; // UNKA is an incomplete type UNKA* arrp; // arrp is a pointer to an incomplete type UNKA** arrpp; void foo() { xp++; // ill-formed: X is incomplete arrp++; // ill-formed: incomplete type arrpp++; // okay: sizeof UNKA* is known } struct X { int i; }; // now X is a complete type int arr[10]; // now the type of arr is complete 3-18 Basic concepts DRAFT: 27 May 1994 3.7 Types X x; void bar() { xp = &x; // okay; type is ``pointer to X'' arrp = &arr; // ill-formed: different types xp++; // okay: X is complete arrp++; // ill-formed: UNKA can't be completed } 3.7.1 Fundamental types [basic.fundamental] 1 There are several fundamental types. The standard header | specifies the largest and smallest values of each for an implementation. 2 Objects declared as characters ( char) are large enough to store any member of the implementation's basic character set. If a character from this set is stored in a character variable, its value is equivalent to the integer code of that character. Characters may be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char consume the same amount of space. 3 An enumeration comprises a set of named integer constant values. Each distinct enumeration constitutes a different enumerated type. Each constant has the type of its enumeration. 4 There are four signed integer types: signed char, short int, int, and long int. In this list, each type provides at least as much storage as those preceding it in the list, but the implementation may otherwise make any of them equal in storage size. Plain ints have the natural size suggested by the machine architecture; the other signed integer types are provided to meet special needs. 5 For each of the signed integer types, there exists a corresponding * (but different) unsigned integer type: unsigned char, unsigned short | int, unsigned int, and unsigned long int, each of which occupies the | same amount of storage and has the same alignment requirements (1.5) as the corresponding signed integer type.7) An alignment requirement is an implementation-dependent restriction on the value of a pointer to an object of a given type (5.4, 1.5). 6 Unsigned integers, declared unsigned, obey the laws of arithmetic modulo 2n where n is the number of bits in the representation of that particular size of integer. This implies that unsigned arithmetic does not overflow. __________________________ || 7) See 7.1.5.2 regarding the correspondence between types and the || sequences of type-specifiers that designate them. || 3.7.1 DRAFT: 27 May 1994 Basic concepts 3-19 Fundamental types 7 Type wchar_t is a distinct type whose values can represent distinct | | codes for all members of the largest extended character set specified | among the supported locales (17.5.9.1). Type wchar_t has the same | size, signedness, and alignment requirements (1.5) as one of the other | integral types, called its underlying type. 8 Values of type bool can be either true or false.8) There are no signed, unsigned, short, or long bool types or values. As described below, bool values behave as integral types. Thus, for example, they participate in integral promotions (4.1, 5.2.3). Although values of type bool generally behave as signed integers, for example by promoting (4.1) to int instead of unsigned int, a bool value can successfully be stored in a bit-field of any (nonzero) size. 9 There are three floating point types: float, double, and long | double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. Each implementation defines the characteristics of the fundamental floating point types in the standard header . | 10Types bool, char, and the signed and unsigned integer types are collectively called integral types. A synonym for integral type is integer type. Enumerations (7.2) are not integral, but they can be promoted (4.1) to signed or unsigned int. Integral and floating types are collectively called arithmetic types. 11The void type specifies an empty set of values. It is used as the return type for functions that do not return a value. No object of type void may be declared. Any expression may be explicitly converted to type void (5.4); the resulting expression may be used only as an expression statement (6.2), as the left operand of a comma expression (5.18), or as a second or third operand of ?: (5.16). 3.7.2 Compound types [basic.compound] 1 There is a conceptually infinite number of compound types constructed from the fundamental types in the following ways: * - arrays of objects of a given type, 8.3.4; - functions, which have parameters of given types and return objects of a given type, 8.3.5; - pointers to objects or functions (including static members of classes) of a given type, 8.3.1; - references to objects or functions of a given type, 8.3.2; __________________________ || 8) Using a bool value in ways described by this International || Standard as ``undefined,'' such as by examining the value of an || uninitialized automatic variable, might cause it to behave as if is || neither true nor false. || 3-20 Basic concepts DRAFT: 27 May 1994 3.7.2 Compound types - constants, which are values of a given type, 7.1.5; - classes containing a sequence of objects of various types (9), a set of functions for manipulating these objects (9.4), and a set of restrictions on the access to these objects and functions, 11; - structures, which are classes without default access restrictions, 11; - unions, which are classes capable of containing objects of different types at different times, 9.6; - pointers to non-static9) class members, which identify members of a given type within objects of a given class, 8.3.3. * 2 In general, these methods of constructing types can be applied recursively; restrictions are mentioned in 8.3.1, 8.3.4, 8.3.5, and 8.3.2. 3 Any type so far mentioned is an unqualified type. Each unqualified type has three corresponding qualified versions of its type:10) a const-qualified version, a volatile-qualified version, and a const- volatile-qualified version (see 7.1.5). The cv-qualified or unqualified versions of a type are distinct types that belong to the same category and have the same representation and alignment requirements.11) A compound type is not cv-qualified (3.7.3) by the cv-qualifiers (if any) of the type from which it is compounded. | However, an array type is considered to be cv-qualified by the cv- | qualifiers of its element type. 4 A pointer to objects of a type T is referred to as a pointer to T. For example, a pointer to an object of type int is referred to as pointer to int and a pointer to an object of class X is called a pointer to X. Pointers to incomplete types are allowed although there are restrictions on what can be done with them (3.7). 5 Objects of cv-qualified (3.7.3) or unqualified type void* (pointer to void), can be used to point to objects of unknown type. A void* must have enough bits to hold any object pointer. 6 Except for pointers to static members, text referring to pointers does not apply to pointers to members. __________________________ || 9) Static class members are objects or functions, and pointers to them || are ordinary pointers to objects or functions. || 10) See 8.3.4 and 8.3.5 regarding cv-qualified array and function || types. || 11) The same representation and alignment requirements are meant to || imply interchangeability as arguments to functions, return values from || functions, and members of unions. || 3.7.3 CV-qualifiers DRAFT: 27 May 1994 Basic concepts 3-21 3.7.3 CV-qualifiers [basic.type.qualifier] +------- BEGIN BOX 25 -------+ This section covers the same information as section 7.1.5.1. This | information should probably be consolidated in one place. | +------- END BOX 25 -------+ 1 There are two cv-qualifiers, const and volatile. When applied to an object, const means the program may not change the object, and volatile has an implementation-defined meaning.12) An object may have both cv-qualifiers. 2 There is a (partial) ordering on cv-qualifiers, so that one object or pointer may be said to be more cv-qualified than another. Table 10 shows the relations that constitute this ordering. Table 10-relations on const and volatile ______________________________________ | no cv-qualifier < const | | no cv-qualifier < volatile | | no cv-qualifier < const volatile| | const < const volatile| | volatile < const volatile| |_____________________________________| 3 A pointer or reference to cv-qualified type (sometimes called a cv- qualified pointer or reference) need not actually point to a cv- qualified object, but it is treated as if it does. For example, a pointer to const int may point to an unqualified int, but a well- formed program may not attempt to change the pointed-to object through that pointer even though it may change the same object through some other access path. CV-qualifiers are supported by the type system so that a cv-qualified object or cv-qualified access path to an object may not be subverted without casting (5.4). For example: __________________________ || 12) Roughly, volatile means the object may change of its own accord || (that is, the processor may not assume that the object continues to || hold a previously held value). || 3-22 Basic concepts DRAFT: 27 May 1994 3.7.3 CV-qualifiers void f() { int i = 2; // not cv-qualified const int ci = 3; // cv-qualified (initialized as required) ci = 4; // error: attempt to modify const const int* cip; // pointer to const int cip = &i; // okay: cv-qualified access path to unqualified *cip = 4; // error: attempt to modify through ptr to const int* ip; ip = cip; // error: attempt to convert const int* to int* } 3.7.4 Type names [basic.type.name] 1 Fundamental and compound types can be given names by the typedef mechanism (7.1.3), and families of types and functions can be specified and named by the template mechanism (14). 3.8 Lvalues and rvalues [basic.lval] 1 Every expression is either an lvalue or rvalue. 2 An lvalue refers to an object or function. Some rvalue expressions- those of class or cv-qualified class type-also refer to objects.13) 3 Some builtin operators and function calls yield lvalues. For example, if E is an expression of pointer type, then *E is an lvalue expression referring to the object or function to which E points. As another example, the function int& f(); yields an lvalue, so the call f() is an lvalue expression. | 4 Some builtin operators expect lvalue operands, for example the builtin assignment operators all expect their left hand operands to be lvalues. Other builtin operators yield rvalues, and some expect them. For example the unary and binary + operator expect rvalue arguments | and yield rvalue results. The discussion of each builtin operator in 5 indicates whether it expects lvalue operands and whether it yields an lvalue. * 5 Constructor invocations and calls to functions that do not return | references are always rvalues. User defined operators are functions, and whether such operators expect or yield lvalues is determined by their type. __________________________ || 13) Expressions such as invocations of constructors and of functions || that return a class type do in some sense refer to an object, and the || implementation may invoke a member function upon such objects, but the || expressions are not lvalues. || 3.8 DRAFT: 27 May 1994 Basic concepts 3-23 Lvalues and rvalues 6 The discussion of reference initialization in 8.5.3 and of temporaries | in 12.2 indicates the behavior of lvalues and rvalues in other | significant contexts. | 7 Rvalues may be qualified types, however the unqualified type is used | unless the rvalue is of class type and a member function is called on the rvalue. 8 Whenever an lvalue that refers to a non-array14) non-class object appears in a context where an lvalue is not expected, the value contained in the referenced object is used. When this occurs, the value has the unqualified type of the lvalue. For example: const int* cip; int i = *cip // "*cip" has type int If this type is incomplete, the program is ill-formed. * const int* cip; int i = *cip // "*cip" has type int When an lvalue is used as the operand of sizeof the value contained | in the referenced object is not accessed, since that operator does not evaluate its operand. 9 An lvalue or rvalue of class type can also be used to modify its referent under certain circumstances. +------- BEGIN BOX 26 -------+ Provide example and cross-reference. | +------- END BOX 26 -------+ 10Functions cannot be modified, but pointers to functions may be modifiable. 11A pointer to an incomplete type may be modifiable. At some point in | the program when this pointer type is complete, the object at which | the pointer points may also be modified. 12Array objects cannot be modified, but their elements may be modifiable. 13The referent of a const-qualified expression shall not be modified (through that expression), except that if it is of class type and has a mutable component, that component may be modified. 14If an expression can be used to modify its object, it is called modifiable. A program that attempts to modify an object through a nonmodifiable lvalue or rvalue expression is ill-formed. __________________________ || 14) An lvalue that refers to an array object is usually converted to a || (rvalue) pointer to the initial element of the array (4.6). || ______________________________________________________________________ 4 Standard conversions [conv] ______________________________________________________________________ 1 Some operators may, depending on their operands, cause conversion of the value of an operand from one type to another. This section summarizes the conversions demanded by most ordinary operators and explains the result to be expected from such conversions; it will be supplemented as required by the discussion of each operator. These conversions are also used in initialization (8.5, 8.5.3, 12.8, 12.1). 12.3 and 13.2 describe user-defined conversions and their interaction with standard conversions. The result of a conversion is an lvalue only if the result is a reference (8.3.2). 4.1 Integral promotions [conv.prom] 1 A char, wchar_t, bool, short int, enumerator, object of enumeration type (7.2), or an int bit-field (9.7) (in both their signed and unsigned varieties) may be used wherever an integer rvalue may be used. In contexts where a constant integer is required, the bool, char, wchar_t, short int, object of enumeration type (7.2), or bit-field must be constant. (Enumerators are always constant). 2 Except for enumerators, objects of enumeration type, and type wchar_t, if an int can represent all the values of the original type, the value is converted to int; otherwise it is converted to unsigned int. 3 For enumerators, objects of enumeration type, and type wchar_t, if an int can represent all the values of the underlying type, the value is converted to an int; otherwise if an unsigned int can represent all the values, the value is converted to an unsigned int; otherwise, if a long can represent all the values, the value is converted to a long; otherwise it is converted to unsigned long. 4 A Boolean value may be converted to int, taking false to zero and true to one. 5 This process is called integral promotion. 4.2 Integral conversions [conv.integral] 1 An integer rvalue may be converted to any integral type. If the target type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). In a two's complement representation, this conversion is conceptual and there is 4.2 DRAFT: 27 May 1994 Standard conversions 4-1 Integral conversions no change in the bit pattern. 2 When an integer is converted to a signed type, the value is unchanged if it can be represented in the new type; otherwise the value is implementation dependent. 3 When an integer is converted to bool, see 4.9. 4.3 Float and double [conv.double] 1 Single-precision floating point arithmetic may be used for float expressions. When a less precise floating value is converted to an equally or more precise floating type, the value is unchanged. When a more precise floating value is converted to a less precise floating type and the value is within representable range, the result may be either the next higher or the next lower representable value. If the result is out of range, the behavior is undefined. 4.4 Floating and integral [conv.float] 1 Conversion of a floating value to an integral type truncates; that is, the fractional part is discarded. Such conversions are machine dependent; for example, the direction of truncation of negative numbers varies from machine to machine. The result is undefined if the value cannot be represented in the integral type. 2 Conversions of integral values to floating type are as mathematically correct as the hardware allows. Loss of precision occurs if an integral value cannot be represented exactly as a value of the floating type. 4.5 Arithmetic conversions [conv.arith] 1 Many binary operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions. 2 - If either operand is of type long double, the other is converted to long double. - Otherwise, if either operand is double, the other is converted to double. - Otherwise, if either operand is float, the other is converted to float. - Otherwise, the integral promotions (4.1) are performed on both operands. - Then, if either operand is unsigned long the other is converted to unsigned long. 4-2 Standard conversions DRAFT: 27 May 1994 4.5 Arithmetic conversions - Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int is converted to a long int; otherwise both operands are converted to unsigned long int. - Otherwise, if either operand is long, the other is converted to long. - Otherwise, if either operand is unsigned, the other is converted to unsigned. - Otherwise, both operands are int. 4.6 Pointer conversions [conv.ptr] 1 The following conversions may be performed wherever pointers (8.3.1) are assigned, initialized, compared, or otherwise used: - A constant expression (5.19) that evaluates to zero (the null pointer constant) when assigned to, compared with, alternated with (5.16), or used as an initializer of an operand of pointer type is converted to a pointer of that type. It is guaranteed that this value will produce a pointer distinguishable from a pointer to any object or function. - A pointer to a cv-qualified or unqualified object type may be converted to a pointer to the same type with greater cv- qualifications (3.7.3). That is, for any unqualified type T, a T* may be converted to a const T*, a volatile T*, or a const volatile T*; a const T* may be converted to a const volatile T*; or a volatile T* may be converted to a const volatile T*. - A pointer to any object type may be converted to a void* with the | greater or equal cv-qualifications (3.7.3). That is, for any unqualified type T, a T* may be converted to a void*, a const | void*, a volatile void*, or a const volatile void*; a const T* may be converted to a const void* or a const volatile void*; a volatile T* may be converted to a volatile void* or a const volatile void*; and a const volatile T* may be converted to a const volatile void*. - Two pointer types T1 and T2 are similar if there exists a type T | and integer N>0 such that: T1 is Tcv1,n * . . . cv1,1 * cv1,0 and T2 is Tcv2,n * . . . cv2,1 * cv2,0 where each cvi,j is const, volatile, const volatile, or nothing. An expression of type T1 may be converted to type T2 if and only if the following conditions are satisfied: 4.6 DRAFT: 27 May 1994 Standard conversions 4-3 Pointer conversions - the pointer types are similar. - for every j>0, if const is in cv1,j then const is in cv2,j, and similarly for volatile. - the cv1,j and cv2,j are different, then const is in every cv2,k for 0