lex


  ______________________________________________________________________

  2   Lexical conventions                                          [lex]

  ______________________________________________________________________

1 The text of the program is kept in units called source files  in  this
  International  Standard.   A source file together with all the headers
  (_lib.headers_) and source files included (_cpp.include_) via the pre-
  processing directive #include, less any source lines skipped by any of
  the conditional inclusion (_cpp.cond_)  preprocessing  directives,  is
  called  a  translation  unit.   [Note:  a C++  program need not all be
  translated at the same time.  ]

2 [Note: previously translated translation units and instantiation units
  can  be  preserved individually or in libraries. The separate transla-
  tion units of a program communicate (_basic.link_)  by  (for  example)
  calls  to functions whose identifiers have external linkage, manipula-
  tion of objects whose identifiers have external linkage, or  manipula-
  tion of data files. Translation units can be separately translated and
  then later linked to produce an executable program. (_basic.link_).  ]

  2.1  Phases of translation                                [lex.phases]

1 The  precedence  among the syntax rules of translation is specified by
  the following phases.1)

    1 Physical  source file characters are mapped, in an implementation-
      defined manner, to the source character set (introducing  new-line
      characters  for  end-of-line  indicators)  if necessary.  Trigraph
      sequences (_lex.trigraph_) are replaced by  corresponding  single-
      character internal representations.  Any source file character not
      in the basic source character set (_lex.charset_) is  replaced  by
      the universal-character-name that designates that character.2)

    2 Each instance of a new-line character and an immediately preceding
      backslash  character is deleted, splicing physical source lines to
  _________________________
  1) Implementations must behave as if these separate phases occur,  al-
  though in practice different phases might be folded together.
  2)  The  process of handling extended characters is specified in terms
  of mapping to an encoding that uses only the  basic  source  character
  set,  and, in the case of character literals and strings, further map-
  ping to the execution character set.  In practical terms, however, any
  internal encoding may be used, so long as an actual extended character
  encountered in the input, and the same extended character expressed in
  the input as a universal-character-name (i.e. using the notation), are
  handled equivalently.

      form logical source lines.  If, as a result, a character  sequence
      that matches the syntax of a universal-character-name is produced,
      the behavior is undefined.  If a source file  that  is  not  empty
      does  not end in a new-line character, or ends in a new-line char-
      acter immediately preceded by a backslash character, the  behavior
      is undefined.

    3 The   source   file   is   decomposed  into  preprocessing  tokens
      (_lex.pptoken_) and sequences of white-space characters (including
      comments).  A source file shall not end in a partial preprocessing
      token or partial comment3).  Each comment is replaced by one space
      character.    New-line  characters  are  retained.   Whether  each
      nonempty sequence of white-space characters other than new-line is
      retained  or  replaced  by  one space character is implementation-
      defined.  The process of dividing a source file's characters  into
      preprocessing tokens is context-dependent.  [Example: see the han-
      dling of < within a #include preprocessing directive.  ]

    4 Preprocessing directives are executed and  macro  invocations  are
      expanded.   If  a  character sequence that matches the syntax of a
      universal-character-name  is  produced  by   token   concatenation
      (_cpp.concat_), the behavior is undefined.  A #include preprocess-
      ing directive causes the named header or source file  to  be  pro-
      cessed from phase 1 through phase 4, recursively.

    5 Each  source  character set member, escape sequence, or universal-
      character-name in character literals and string literals  is  con-
      verted to a member of the execution character set.

    6 Adjacent  ordinary  string literal tokens are concatenated.  Adja-
      cent wide string literal tokens are concatenated.

    7 White-space characters separating tokens are  no  longer  signifi-
      cant.   Each  preprocessing  token  is  converted  into  a  token.
      (_lex.token_). The resulting tokens are syntactically and semanti-
      cally  analyzed  and translated.  [Note: Source files, translation
      units and translated translation units  need  not  necessarily  be
      stored  as  files, nor need there be any one-to-one correspondence
      between these  entities  and  any  external  representation.   The
      description  is conceptual only, and does not specify any particu-
      lar implementation.  ]

    8 Translated translation units and instantiation units are  combined
      as  follows:  [Note:  some  or all of these may be supplied from a
      library.  ] Each translated translation unit is examined  to  pro-
      duce  a  list of required instantiations.  [Note: this may include
      instantiations    which    have    been    explicitly    requested
  _________________________
  3) A partial preprocessing token would arise from a source file ending
  in the first portion of a multi-character token that requires a termi-
  nating sequence of characters, such as a header-name that  is  missing
  the  closing " or >.  A partial comment would arise from a source file
  ending with an unclosed /* comment.

      (_temp.explicit_).   ]  The  definitions of the required templates
      are located. It is implementation-defined whether  the  source  of
      the  translation units containing these definitions is required to
      be available.  [Note: an implementation  could  encode  sufficient
      information  into  the translated translation unit so as to ensure
      the source is not required here.  ] All  the  required  instantia-
      tions  are performed to produce instantiation units.  [Note: these
      are similar to translated translation units, but contain no refer-
      ences  to uninstantiated templates and no template definitions.  ]
      The program is ill-formed if any instantiation fails.

    9 All external object and function references are resolved.  Library
      components  are linked to satisfy external references to functions
      and objects not defined  in  the  current  translation.  All  such
      translator output is collected into a program image which contains
      information needed for execution in its execution environment.

  2.2  Basic source character set                          [lex.charset]

1 The basic source character set consists of 96  characters:  the  space
  character,  the control characters representing horizontal tab, verti-
  cal tab, form feed, and new-line,  plus  the  following  91  graphical
  characters:
          a b c d e f g h i j k l m n o p q r s t u v w x y z
          A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
          0 1 2 3 4 5 6 7 8 9
          _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

2 The  universal-character-name  construct  provides a way to name other
  characters.
          hex-quad:
                  hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

          universal-character-name:
                  \u hex-quad
                  \U hex-quad hex-quad
  The character designated by the universal-character-name \UNNNNNNNN is
  that  character  whose  encoding  in  ISO/IEC 10646 is the hexadecimal
  value NNNNNNNN; the character designated by  the  universal-character-
  name  \uNNNN  is that character whose encoding in ISO/IEC 10646 is the
  hexadecimal value 0000NNNN.

  2.3  Trigraph sequences                                 [lex.trigraph]

1 Before any other processing takes place, each occurrence of one of the
  following  sequences  of  three  characters  ("trigraph sequences") is
  replaced by the single character indicated in Table 1.

                       Table 1--trigraph sequences

  +-----------------------+------------------------+------------------------+
  |trigraph   replacement | trigraph   replacement | trigraph   replacement |
  +-----------------------+------------------------+------------------------+
  |  ??=           #      |   ??(           [      |   ??<           {      |
  +-----------------------+------------------------+------------------------+
  |  ??/           \      |   ??)           ]      |   ??>           }      |
  +-----------------------+------------------------+------------------------+
  |  ??'           ^      |   ??!           |      |   ??-           ~      |
  +-----------------------+------------------------+------------------------+

2 [Example:
          ??=define arraycheck(a,b) a??(b??) ??!??! b??(a??)
  becomes
          #define arraycheck(a,b) a[b] || b[a]
   --end example]

3 [Note: no other trigraph sequence exists.  Each ?  that does not begin
  one of the trigraphs listed above is not changed.  ]

4 Trigraph replacement is done left to right, so that when two sequences
  which could represent trigraphs overlap, only the  first  sequence  is
  replaced.   Characters that result from trigraph replacement are never
  part of a subsequent trigraph.  [Example: The sequence "???="  becomes
  "?=",  not  "?#".  The sequence "?????????"  becomes "???", not "?".
  --end example]

  2.4  Preprocessing tokens                                [lex.pptoken]
          preprocessing-token:
                  header-name
                  identifier
                  pp-number
                  character-literal
                  string-literal
                  preprocessing-op-or-punc
                  each non-white-space character that cannot be one of the above

1 Each preprocessing token that is converted to  a  token  (_lex.token_)
  shall have the lexical form of a keyword, an identifier, a literal, an
  operator, or a punctuator.

2 A preprocessing token is the minimal lexical element of  the  language
  in  translation  phases  3 through 6.  The categories of preprocessing
  token are: header names, identifiers, preprocessing numbers, character
  literals,  string  literals, preprocessing-op-or-punc, and single non-
  white-space characters that do not lexically match the  other  prepro-
  cessing  token  categories.   If a ' or a " character matches the last
  category, the behavior is undefined.  Preprocessing tokens can be sep-
  arated  by  white space; this consists of comments (_lex.comment_), or

  white-space characters (space, horizontal tab, new-line, vertical tab,
  and  form-feed),  or  both.   As described in Clause _cpp_, in certain
  circumstances during translation phase 4, white space (or the  absence
  thereof)  serves  as  more than preprocessing token separation.  White
  space can appear within a preprocessing token only as part of a header
  name  or  between  the  quotation characters in a character literal or
  string literal.

3 If the input stream has been parsed into preprocessing tokens up to  a
  given  character, the next preprocessing token is the longest sequence
  of characters that could constitute a  preprocessing  token,  even  if
  that would cause further lexical analysis to fail.

4 [Example: The program fragment 1Ex is parsed as a preprocessing number
  token (one that is not a valid floating  or  integer  literal  token),
  even though a parse as the pair of preprocessing tokens 1 and Ex might
  produce a valid expression (for example, if Ex were a macro defined as
  +1).  Similarly, the program fragment 1E1 is parsed as a preprocessing
  number (one that is a valid floating literal token), whether or not  E
  is a macro name.  ]

5 [Example:  The  program  fragment  x+++++y  is  parsed as x ++ ++ + y,
  which, if x and y are of built-in  types,  violates  a  constraint  on
  increment  operators,  even though the parse x ++ + ++ y might yield a
  correct expression.  ]

  2.5  Alternative tokens                                  [lex.digraph]

1 Alternative token representations are provided for some operators  and
  punctuators4).

2 In all respects of the language, each alternative  token  behaves  the
  same,  respectively,  as its primary token, except for its spelling5).
  The set of alternative tokens is defined in Table 2.

  _________________________
  4)  These  include "digraphs" and additional reserved words.  The term
  "digraph" (token consisting of two characters) is  not  perfectly  de-
  scriptive,  since  one of the alternative preprocessing-tokens is %:%:
  and of course several primary tokens contain two characters.  Nonethe-
  less, those alternative tokens that aren't lexical keywords are collo-
  quially known as "digraphs".
  5)  Thus the "stringized" values (_cpp.stringize_) of [ and <: will be
  different, maintaining the source spelling, but the tokens can  other-
  wise be freely interchanged.

                       Table 2--alternative tokens

  +----------------------+-----------------------+-----------------------+
  |alternative   primary | alternative   primary | alternative   primary |
  +----------------------+-----------------------+-----------------------+
  |    <%           {    |     and         &&    |   and_eq        &=    |
  +----------------------+-----------------------+-----------------------+
  |    %>           }    |    bitor         |    |    or_eq        |=    |
  +----------------------+-----------------------+-----------------------+
  |    <:           [    |     or          ||    |   xor_eq        ^=    |
  +----------------------+-----------------------+-----------------------+
  |    :>           ]    |     xor          ^    |     not          !    |
  +----------------------+-----------------------+-----------------------+
  |    %:           #    |    compl         ~    |   not_eq        !=    |
  +----------------------+-----------------------+-----------------------+
  |   %:%:         ##    |   bitand         &    |                       |
  +----------------------+-----------------------+-----------------------+

  2.6  Tokens                                                [lex.token]
          token:
                  identifier
                  keyword
                  literal
                  operator
                  punctuator

1 There are five kinds of  tokens:  identifiers,  keywords,  literals,6)
  operators,  and  other  separators.   Blanks,  horizontal and vertical
  tabs, newlines, formfeeds, and comments (collectively, "white space"),
  as  described  below,  are  ignored  except  as they serve to separate
  tokens.  Some white space is required to separate  otherwise  adjacent
  identifiers, keywords, and literals.

  2.7  Comments                                            [lex.comment]

1 The  characters  /* start a comment, which terminates with the charac-
  ters */.  These comments do not nest.  The characters // start a  com-
  ment, which terminates with the next new-line character. If there is a
  form-feed or a vertical-tab character in such a comment,  only  white-
  space  characters shall appear between it and the new-line that termi-
  nates the comment; no diagnostic  is  required.   [Note:  The  comment
  characters  //, /*, and */ have no special meaning within a // comment
  and are treated just like other characters.   Similarly,  the  comment
  characters // and /* have no special meaning within a /* comment.  ]

  _________________________
  6) Literals include strings and character and numeric literals.

  2.8  Header names                                         [lex.header]
          header-name:
                  <h-char-sequence>
                  "q-char-sequence"
          h-char-sequence:
                  h-char
                  h-char-sequence h-char
          h-char:
                  any member of the source character set except
                          new-line and >
          q-char-sequence:
                  q-char
                  q-char-sequence q-char
          q-char:
                  any member of the source character set except
                          new-line and "

1 Header  name  preprocessing tokens shall only appear within a #include
  preprocessing directive (_cpp.include_).  The sequences in both  forms
  of  header-names  are  mapped  in  an implementation-defined manner to
  headers  or  to  external  source   file   names   as   specified   in
  _cpp.include_.

2 If  either  of  the  characters  '  or  \,  or either of the character
  sequences /* or // appears in a q-char-sequence or a  h-char-sequence,
  or  the  character  "  appears  in  a h-char-sequence, the behavior is
  undefined.7)

  2.9  Preprocessing numbers                              [lex.ppnumber]
          pp-number:
                  digit
                  . digit
                  pp-number digit
                  pp-number nondigit
                  pp-number e sign
                  pp-number E sign
                  pp-number .

1 Preprocessing  number  tokens  lexically  include all integral literal
  tokens (_lex.icon_) and all floating literal tokens (_lex.fcon_).

2 A preprocessing number does not have a type or a  value;  it  acquires
  both  after  a  successful conversion (as part of translation phase 7,
  _lex.phases_) to an integral  literal  token  or  a  floating  literal
  token.

  _________________________
  7) Thus, sequences of characters that resemble escape sequences  cause
  undefined behavior.

  2.10  Identifiers                                           [lex.name]
          identifier:
                  nondigit
                  identifier nondigit
                  identifier digit
          nondigit: one of
                  universal-character-name
                  _ a b c d e f g h i j k l m
                    n o p q r s t u v w x y z
                    A B C D E F G H I J K L M
                    N O P Q R S T U V W X Y Z
          digit: one of
                  0 1 2 3 4 5 6 7 8 9

1 An  identifier  is an arbitrarily long sequence of letters and digits.
  Each universal-character-name in an identifier shall designate a char-
  acter  whose encoding in ISO 10646 falls into one of the ranges speci-
  fied in _extendid_.  Upper- and lower-case letters are different.  All
  characters are significant.8)

2 In addition, identifiers containing a double underscore (__) or begin-
  ning with an underscore and an upper-case letter are reserved for  use
  by  C++  implementations  and standard libraries and shall not be used
  otherwise; no diagnostic is required.

  2.11  Keywords                                               [lex.key]

1 The identifiers shown in Table 3 are  reserved  for  use  as  keywords
  (that is, they are unconditionally treated as keywords in phase 7):

  _________________________
  8)  On  systems in which linkers cannot accept extended characters, an
  encoding of the universal-character-name may be used in forming  valid
  external identifiers.  For example, some otherwise unused character or
  sequence of characters may be used to encode the \u  in  a  universal-
  character-name.  Extended characters may produce a long external iden-
  tifier, but C++ does not place  a  translation  limit  on  significant
  characters  for  external  identifiers.  In C++, upper- and lower-case
  letters are considered different for all identifiers, including exter-
  nal identifiers.

                            Table 3--keywords

  +--------------------------------------------------------------------------+
  |asm          do             inline             short         typeid       |
  |auto         double         int                signed        typename     |
  |bool         dynamic_cast   long               sizeof        union        |
  |break        else           mutable            static        unsigned     |
  |case         enum           namespace          static_cast   using        |
  |catch        explicit       new                struct        virtual      |
  |char         extern         operator           switch        void         |
  |class        false          private            template      volatile     |
  |const        float          protected          this          wchar_t      |
  |const_cast   for            public             throw         while        |
  |continue     friend         register           true                       |
  |default      goto           reinterpret_cast   try                        |
  |delete       if             return             typedef                    |
  +--------------------------------------------------------------------------+

2 Furthermore, the alternative representations shown in Table 4 for cer-
  tain operators and punctuators (_lex.digraph_) are reserved and  shall
  not be used otherwise:

                   Table 4--alternative representations

            +------------------------------------------------+
            |and      and_eq   bitand   bitor   compl    not |
            |not_eq   or       or_eq    xor     xor_eq       |
            +------------------------------------------------+

  2.12  Operators and punctuators

1 The  lexical  representation of C++ programs includes a number of pre-
  processing tokens which are used in the syntax of the preprocessor  or
  are converted into tokens for operators and punctuators:
          preprocessing-op-or-punc: one of
          {       }       [       ]       #       ##      (       )
          <:      :>      <%      %>      %:      %:%:    ;       :       ...
          new     delete  ?       ::      .       .*
          +       -       *       /       %       ^       &       |       ~
          !       =       <       >       +=      -=      *=      /=      %=
          ^=      &=      |=      <<      >>      >>=     <<=     ==      !=
          <=      >=      &&      ||      ++      --      ,       ->*     ->
          and     and_eq  bitand  bitor   compl   not     not_eq  or      or_eq
          xor     xor_eq

  Each preprocessing-op-or-punc is converted to a single token in trans-
  lation phase 7 (_lex.phases_).

  2.13  Literals                                           [lex.literal]

1 There are several kinds of literals.9)
          literal:
                  integer-literal
                  character-literal
                  floating-literal
                  string-literal
                  boolean-literal

  2.13.1  Integer literals                                    [lex.icon]
          integer-literal:
                  decimal-literal integer-suffixopt
                  octal-literal integer-suffixopt
                  hexadecimal-literal integer-suffixopt
          decimal-literal:
                  nonzero-digit
                  decimal-literal digit
          octal-literal:
                  0
                  octal-literal octal-digit
          hexadecimal-literal:
                  0x hexadecimal-digit
                  0X hexadecimal-digit
                  hexadecimal-literal hexadecimal-digit
          nonzero-digit: one of
                  1  2  3  4  5  6  7  8  9
          octal-digit: one of
                  0  1  2  3  4  5  6  7
          hexadecimal-digit: one of
                  0  1  2  3  4  5  6  7  8  9
                  a  b  c  d  e  f
                  A  B  C  D  E  F
          integer-suffix:
                  unsigned-suffix long-suffixopt
                  long-suffix unsigned-suffixopt
          unsigned-suffix: one of
                  u  U
          long-suffix: one of
                  l  L

1 An integer literal is a sequence of digits that has no period or expo-
  nent part.  An integer literal may have a prefix  that  specifies  its
  base  and a suffix that specifies its type.  The lexically first digit
  of the sequence of digits is the most significant.  A decimal  integer
  literal  (base ten) begins with a digit other than 0 and consists of a
  sequence of decimal digits.  An octal  integer  literal  (base  eight)
  begins with the digit 0 and consists of a sequence of octal digits.10)
  An hexadecimal integer literal (base sixteen) begins with 0x or 0X and
  _________________________
  9) The term "literal"  generally  designates,  in  this  International
  Standard, those tokens that are called "constants" in ISO C.
  10) The digits 8 and 9 are not octal digits.

  consists  of a sequence of hexadecimal digits, which include the deci-
  mal digits and the letters a through f and A through  F  with  decimal
  values  ten through fifteen.  [Example: the number twelve can be writ-
  ten 12, 014, or 0XC.  ]

2 The type of an integer literal depends on its form, value, and suffix.
  If it is decimal and has no suffix, it has the first of these types in
  which its value can be  represented:  int,  long  int,  unsigned  long
  int.11) If it is octal or hexadecimal and has no suffix,  it  has  the
  first  of  these  types  in  which  its value can be represented: int,
  unsigned int, long int, unsigned long int.  If it is suffixed by u  or
  U, its type is the first of these types in which its value can be rep-
  resented: unsigned int, unsigned long int.  If it is suffixed by l  or
  L, its type is the first of these types in which its value can be rep-
  resented: long int, unsigned long int.  If it is suffixed by  ul,  lu,
  uL, Lu, Ul, lU, UL, or LU, its type is unsigned long int.

3 A  program  is  ill-formed if one of its translation units contains an
  integer literal that cannot be  represented  by  any  of  the  allowed
  types.

  2.13.2  Character literals                                  [lex.ccon]
          character-literal:
                  'c-char-sequence'
                  L'c-char-sequence'
          c-char-sequence:
                  c-char
                  c-char-sequence c-char
          c-char:
                  any member of the source character set except
                          the single-quote ', backslash \, or new-line character
                  escape-sequence
                  universal-character-name
          escape-sequence:
                  simple-escape-sequence
                  octal-escape-sequence
                  hexadecimal-escape-sequence
          simple-escape-sequence: one of
                  \'  \"  \?  \\
                  \a  \b  \f  \n  \r  \t  \v
          octal-escape-sequence:
                  \ octal-digit
                  \ octal-digit octal-digit
                  \ octal-digit octal-digit octal-digit
          hexadecimal-escape-sequence:
                  \x hexadecimal-digit
                  hexadecimal-escape-sequence hexadecimal-digit
  _________________________
  11)  A  decimal integer literal with no suffix never has type unsigned
  int.  Otherwise, for example, on an implementation where unsigned  int
  values  have  16 bits and unsigned long values have strictly more than
  17 bits, we would have -30000<0, -50000>0 (because  50000  would  have
  type unsigned int), and -70000<0 (because 70000 would have type long).

1 A  character  literal  is  one  or  more characters enclosed in single
  quotes, as in 'x', optionally preceded by the letter L, as in L'x'.  A
  character  literal that does not begin with L is an ordinary character
  literal, also referred to as a narrow-character literal.  An  ordinary
  character  literal  that  contains a single c-char has type char, with
  value equal to the numerical value of the encoding of  the  c-char  in
  the  execution character set.  An ordinary character literal that con-
  tains more than one c-char is a multicharacter literal.  A  multichar-
  acter literal has type int and implementation-defined value.

2 A  character literal that begins with the letter L, such as L'x', is a
  wide-character literal.  A wide-character literal has type wchar_t.12)
  The value of a wide-character literal containing a single  c-char  has
  value  equal  to  the numerical value of the encoding of the c-char in
  the execution wide-character set.  The value of a wide-character  lit-
  eral containing multiple c-chars is implementation-defined.

3 Certain nongraphic characters, the single quote ', the double quote ",
  the question mark ?, and the backslash \, can be represented according
  to Table 5.

                        Table 5--escape sequences

                   +----------------------------------+
                   |new-line          NL (LF)   \n    |
                   |horizontal tab    HT        \t    |
                   |vertical tab      VT        \v    |
                   |backspace         BS        \b    |
                   |carriage return   CR        \r    |
                   |form feed         FF        \f    |
                   |alert             BEL       \a    |
                   |backslash         \         \\    |
                   |question mark     ?         \?    |
                   |single quote      '         \'    |
                   |double quote      "         \"    |
                   |octal number      ooo       \ooo  |
                   |hex number        hhh       \xhhh |
                   +----------------------------------+
  The  double  quote  "  and  the question mark ?, can be represented as
  themselves or by the escape sequences \" and \?  respectively, but the
  single  quote ' and the backslash \ shall be represented by the escape
  sequences \' and \\ respectively.  If the character following a  back-
  slash  is  not  one of those specified, the behavior is undefined.  An
  escape sequence specifies a single character.

4 The escape \ooo consists of the backslash followed  by  one,  two,  or
  three  octal digits that are taken to specify the value of the desired
  _________________________
  12) They are intended for character sets where a  character  does  not
  fit into a single byte.

  character.  The escape \xhhh consists of the backslash followed  by  x
  followed  by  one or more hexadecimal digits that are taken to specify
  the value of the desired character.  There is no limit to  the  number
  of digits in a hexadecimal sequence.  A sequence of octal or hexadeci-
  mal digits is terminated by the first character that is not  an  octal
  digit  or a hexadecimal digit, respectively.  The value of a character
  literal is implementation-defined if it falls outside of the implemen-
  tation-defined  range  defined  for  char  (for  ordinary literals) or
  wchar_t (for wide literals).

5 A universal-character-name is translated to the encoding, in the  exe-
  cution  character  set,  of  the character named.  If there is no such
  encoding, the universal-character-name is translated to an implementa-
  tion-defined  encoding.   [Note:  in translation phase 1, a universal-
  character-name is introduced whenever an actual extended character  is
  encountered  in  the  source text.  Therefore, all extended characters
  are described in terms  of  universal-character-names.   However,  the
  actual  compiler  implementation may use its own native character set,
  so long as the same results are obtained.  ]

  2.13.3  Floating literals                                   [lex.fcon]
          floating-literal:
                  fractional-constant exponent-partopt floating-suffixopt
                  digit-sequence exponent-part floating-suffixopt
          fractional-constant:
                  digit-sequenceopt . digit-sequence
                  digit-sequence .
          exponent-part:
                  e signopt digit-sequence
                  E signopt digit-sequence
          sign: one of
                  +  -
          digit-sequence:
                  digit
                  digit-sequence digit
          floating-suffix: one of
                  f  l  F  L

1 A floating literal consists of an integer part,  a  decimal  point,  a
  fraction  part,  an e or E, an optionally signed integer exponent, and
  an optional type suffix.  The integer and fraction parts both  consist
  of  a  sequence of decimal (base ten) digits.  Either the integer part
  or the fraction part (not both) can be  omitted;  either  the  decimal
  point  or the letter e (or E) and the exponent (not both) can be omit-
  ted.  The integer part, the optional decimal point  and  the  optional
  fraction  part form the significant part of the floating literal.  The
  exponent, if present, indicates the power of 10 by which the  signifi-
  cant  part  is  to  be scaled.  If the scaled value is in the range of
  representable values for its type, the result is the scaled  value  if
  representable,  else the larger or smaller representable value nearest
  the scaled value, chosen in  an  implementation-defined  manner.   The
  type  of a floating literal is double unless explicitly specified by a
  suffix.  The suffixes f and F specify float,  the  suffixes  l  and  L
  specify  long  double.   If  the  scaled  value is not in the range of

  representable values for its type, the program is ill-formed.

  2.13.4  String literals                                   [lex.string]
          string-literal:
                  "s-char-sequenceopt"
                  L"s-char-sequenceopt"
          s-char-sequence:
                  s-char
                  s-char-sequence s-char
          s-char:
                  any member of the source character set except
                          the double-quote ", backslash \, or new-line character
                  escape-sequence
                  universal-character-name

1 A  string  literal  is  a  sequence  of  characters  (as  defined   in
  _lex.ccon_) surrounded by double quotes, optionally beginning with the
  letter L, as in "..." or L"...".  A string literal that does not begin
  with  L  is  an  ordinary string literal, also referred to as a narrow
  string literal.  An ordinary string literal has type "array of n const
  char"  and  static storage duration (_basic.stc_), where n is the size
  of the string as defined below, and  is  initialized  with  the  given
  characters.   A string literal that begins with L, such as L"asdf", is
  a wide string literal.  A wide string literal has  type  "array  of  n
  const wchar_t" and has static storage duration, where n is the size of
  the string as defined below, and is initialized with the given charac-
  ters.

2 Whether  all  string  literals  are  distinct  (that is, are stored in
  nonoverlapping objects)  is  implementation-defined.   The  effect  of
  attempting to modify a string literal is undefined.

3 In translation phase 6 (_lex.phases_), adjacent narrow string literals
  are concatenated and adjacent wide string literals  are  concatenated.
  If  a narrow string literal token is adjacent to a wide string literal
  token, the behavior is undefined.  Characters in concatenated  strings
  are kept distinct.  [Example:
          "\xA" "B"
  contains the two characters '\xA' and 'B' after concatenation (and not
  the single hexadecimal character '\xAB').  ]

4 After  any   necessary   concatenation,   in   translation   phase   7
  (_lex.phases_),  '\0' is appended to every string literal so that pro-
  grams that scan a string can find its end.

5 Escape sequences and universal-character-names in string literals have
  the  same  meaning  as in character literals (_lex.ccon_), except that
  the single quote ' is representable either by itself or by the  escape
  sequence  \',  and  the double quote " shall be preceded by a \.  In a
  narrow string literal, a universal-character-name may map to more than
  one char element due to multibyte encoding.  The size of a wide string
  literal is the total number of escape sequences,  universal-character-
  names,  and other characters, plus one for the terminating L'\0'.  The
  size of a  narrow  string  literal  is  the  total  number  of  escape

  sequences  and  other  characters, plus at least one for the multibyte
  encoding of each universal-character-name, plus one for the  terminat-
  ing '\0'.

  2.13.5  Boolean literals                                    [lex.bool]
          boolean-literal:
                  false
                  true

1 The  Boolean  literals are the keywords false and true.  Such literals
  have type bool.  They are not lvalues.