Doc. no.   WG21/N2105=06-0175
Date:        2006-10-23
Project:     Programming Language C++
Reply to:   Beman Dawes <bdawes@acm.org>

Proposed C++0x Keywords Considered

Introduction
Proposed Keywords
Do proposed names clearly denote semantics?
Are proposed names consistent with naming conventions?
How much do proposed names impact existing code?
Possible alternatives for problem keywords
Code-search web sites
Methodology
Acknowledgements
Existing keywords

Introduction

This paper looks at new keywords proposed for C++0x, identifiers several that cause concerns, and proposes alternatives to eliminate or reduce the concerns.

Whenever the committee makes a decision that appears inconsistent or to fly in the face of reason, committee members (me included) get called stupid, arrogant, and unconcerned about ordinary users. That's OK as long as there is a compelling rationale for the decision, and alternatives have been considered and found wanting. The purpose of this paper is to discover if any proposed keywords are problematic, explicitly consider alternatives for problem keywords, and encourage development of rationale for the committee's final keyword choices.

Please don't shoot the messenger! I am in favor, often strongly, of all the proposals discussed in this paper. But someone has to point out the bad news about some of the proposed keywords.

Possible concerns regarding proposed keywords include:

Each of those concerns is discussed below.

Proposed Keywords

Proposed New C++ Keywords
Keyword Proposal
_Char16_t, _Char32_t N1823 - New Character Types in C++
alignof, align_union N1877 - Adding Alignment Support to the C++ Programming Language
concept, concept_map, where, axiom, late_check N2081 - Concepts
constexpr N1980 - Generalized Constant Expressions
decltype N1705 - Decltype (and auto)
import N2073 - Modules in C++
nullptr N1601 - A name for the null pointer: nullptr
static_assert N1720 - static_assert

Do proposed names clearly denote semantics?

This concern has not be raised for any of the proposed keywords. It will be of interest, however, when considering alternatives to proposed keywords.

Are proposed names consistent with naming conventions?

New keywords which do not follow current C++ Standard naming conventions are more difficult to learn and provide a lightning rod for criticism of C++. Indeed, there has already been criticism on comp.std.c++ and elsewhere of some of the new keywords for their inconsistency with current language and standard library naming conventions.

C++ language and standard library naming conventions for ordinary names are to use all-lowercase, begin with an alpha character, and separate multiple words with underscores. The underscore separator convention is sometimes not applied if it will become part of a set of existing keyword or library names that do not follow this convention.

The following proposed keywords do not follow the above conventions, and thus clash with similar current keywords and library names:

Proposed keyword Similar existing keywords and library names
_Char16_t, _Char32_t wchar_t, size_t, and the many header <cstdint> types.
constexpr const_cast, const_iterator, const_pointer, const_reference, const_mem_fun_t, and so on.
decltype size_type, value_type, difference_type, argument_type, result_type, and so on.
nullptr auto_ptr, shared_ptr, weak_ptr and bad_weak_ptr.

How much do proposed names impact existing code?

Any new keyword has the potential to break existing code. In the past, it was not possible to quantify how much existing code would be broken, but with the advent of code-search web sites it is now possible to gain quantitative insights into the impact of a proposed new keyword. Although the searches that can be performed by these search engines are not yet sophisticated enough to make exact assertions about potential keywords, they do allow us to make hard-data predictions about the impact on existing code.

The impact of proposed keywords on 1,388,870 existing C++ source files is analyzed in the following table. See Methodology.

Keyword # of files Comments
_Char16_t, _Char32_t, align_union, concept_map, late_check. 0 <1 in 1,000,000 files
alignof 0 There was one use in a gcc compiler file not counted since it appeared compatible.
static_assert 0 A few uses of a static_assert macro not counted since they appeared compatible.
constexpr 1  
nullptr 7  
decltype 14  
axiom 17  
concept 110  
import 190 1 in 7,300 files
where 11,678 1 in 119  files; ~ 1% of all files. Projects include: wxWidgets, Mozilla, KDE, etc.

The proposed where keyword stands out as a serious problem. It can't just be changed globally in a source file, because its use in comments is ambiguous; is "where" the English word "where" or a source entity name? The name where is used in widely in third party libraries, including proprietary libraries. That means an organization can't upgrade to C++0x until after all of the third party libraries it depends on upgrade to C++0x. Because one of the several meanings of the word "where" is "location", applications such as Geographic Information Systems, Logistics Support, graphics, and some of the physical sciences make particularly heavy use of where in program code. Because where is a keyword in SQL, C++ code that composes SQL commands is particularly hard hit. Because industrial code is proprietary, it is not present in the code database that was searched, so the impact of adding where as a keyword may be even worse that indicated in the table above.

Possible alternatives for problem keywords

Proposed keyword Alternatives Comments on Alternative
_Char16_t char16_t N1823 proposes a typedef instead
_Char32_t char32_t N1823 proposes a typedef instead
constexpr const_expr Consistent with const_cast, const_iterator, etc.
decltype decl_type Consistent with size_type, value_typeresult_type, etc. The case is weaker, however, because other type related keywords (typedef, typeid, typename) do not have underscores.
nullptr null_ptr Consistent with auto_ptr, shared_ptr, etc.
import ? No useful alternative comes to mind.
concept type_concept, typeconcept Although the case is weak since the impact on existing code is fairly minor, these would reduce impact to essentially zero, and avoid tramping on useful names. (0 files found).
axiom type_axiom, typeaxiom
where requires Cuts impact on existing code by factor of 64 (11,678 files to 183 files). Arguably does a better job of denoting semantics.

Code-search web sites

Code-search web sites are starting to become available. These sites allow automated searches of publicly available source code. The files usually have some form of open-source license.

The ability to search vast amounts of source code allows quantitative analysis of the impact on existing code of a proposed C++ keyword.

Code-search sites include:

Google Code Search - www.google.com/codesearch - A search for "include" finds 809,000 C++ files and 4 million C files. Google is mistakenly classifying at least some C++ code as C code. For example, some Boost code is classified as C code. No apparent way to exclude comments from search.

codesearch.net - csourcesearch.net - 283 million lines of C/C++ code in 1.1 million files. Searches do not yield total hit counts, making the site less useful than it might otherwise be.

koders - www.koders.com - Claims 424 million lines of code, but doesn't appear to have as much C/C++ code as others. No apparent way to exclude comments from search.

krugle - www.krugle.com - Appears to have 1,388,870 files classified as C++ code. Allows comments to be excluded from search.

Methodology

The krugle database was studied because it distinguishes between source code and comments, gives files found counts, appears to have done a good job of identifying C++ code, and did not yield self-contradictory results for test queries.

For each keyword of interest, the automated search was run on the word, selecting C++ as the language and "Source code" as the search area.

For candidates reported in 100 or less files, each search hit was examined to determine the actual count. This determination included discarding uses in literals, as part of longer names, and anything else that could skew the results.

For candidates reported in more than 100 files, the first five search hits on the first 20 pages (100 total hits) were examined to determine a percentage of correct hits. This determination included discarding uses in literals, as part of longer names, and anything else that could skew the results. The percentage in the sample was then applied to the total file count to obtain the file count reported in the table.

Acknowledgements

Gennaro Prota pointed out the nullptr naming inconsistency in a comp.std.c++ posting.

Existing keywords

Existing C++ Keywords
asm
auto
bool
break
case
catch
char
class
const
const_cast
continue
default
delete
do
double
dynamic_cast
else
enum
explicit
export
extern
false
float
for
friend
goto
if
inline
int
long
mutable
namespace
new
operator
private
protected
public
register
reinterpret_cast
return
short
signed
sizeof
static
static_cast
struct
switch
template
this
throw
true
try
typedef
typeid
typename
union
unsigned
using
virtual
void
volatile
wchar_t
while

Beman Dawes 2006

Revised 23 October 2006