==========================================================

Submitter: Kayvan Memarian and Peter Sewell

Submission Date: 2016-09-22

Document: WG14 N2090

Related: Section 4 of N2012, Questions 3/15, 4/15, and 5/15 of our survey, Section 2.1-2.9 (Q1-20) of our N2013, and DR260

1 Summary

This document is based on N2012 (Section 4), adding a concrete Technical Corrigendum proposal for discussion, revising the text, and adding concrete examples (from N2013).

1.1 Problem

C pointer values could traditionally be considered to be concrete numeric values (our survey, Questions 3, 4, and 5, indicates many still do). The DR260 Committee Response suggests otherwise, hinting at a notion of provenance associated to values that keeps track of their "origins":

"Implementations are permitted to track the origins of a bit-pattern and treat those representing an indeterminate value as distinct from those representing a determined value. They may also treat pointers based on different origins as distinct even though they are bitwise identical."

Current compilers appear to exploit this, using it to justify alias analysis based on provenance distinctions. However, the DR260 CR has not been incorporated in the standard text, and it also leaves many specific questions unclear: it's ambiguous whether some idioms are allowed or not, and unclear what alias analysis and optimisation is allowed to do.

In this note we go through a sequence of specific questions, supported by concrete examples, and make a candidate proposal for discussion.

1.2 Outline of proposal

The basic idea is to associate a provenance with every pointer value, essentially identifying the original allocation the pointer is derived from. This is for the "C abstract machine" as defined in the standard: compilers might rely on provenance in their alias analysis and optimisation, but one would not expect normal implementations to record or manipulate provenance at runtime (though dynamic or static analysis tools might). Accordingly, provenances do not have any representation.

Then there are many specific choices of how provenance is affected by arithmetic operations and suchlike. We first discuss the questions and then summarise our proposal.

Note that provenance-based alias analysis should not be confused with type-based alias analysis, and it is (contrary to the expectations of some expert C users and OS developers) still observable in current mainstream implementations even with -no-strict-aliasing, opening the door to subtle bugs. Compilers should probably provide an option to turn it off, and any proposal for "safe C" should mandate and describe that option.

2 Questions and examples

2.1 Basic provenance

2.1.1 Q1. Must the pointer used for a memory access have the right provenance, i.e. be derived from the pointer to the original allocation (with undefined behaviour otherwise)? (This lets compilers do provenance-based alias analysis)

Here DR260CR clearly says yes. Our experimental data shows cases where recent versions of GCC and ICC do assume non-aliasing of pointers with identical representation values but distinct provenance. This is incompatible with a concrete semantics of pointers (where they are fully characterised by their representation values). Tracking of provenance in the "abstract machine" is therefore clearly necessary to make these compilers sound with respect to the standard.

For example, consider the following pathological code (adapted from DR260).

Example: provenance_basic_global_yx.c
#include <stdio.h>
#include <string.h> 
int  y=2, x=1;
int main() {
  int *p = &x + 1;
  int *q = &y;
  printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
  if (memcmp(&p, &q, sizeof(p)) == 0) {
    *p = 11;  // does this have undefined behaviour?
    printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
  }
  return 0;
}

Depending on the implementation, x and y might happen to be allocated in adjacent memory, in which case &x+1 and &y will have bitwise-identical representation values, the memcmp will succeed, and p (derived from a pointer to x) will have the same representation value as a pointer to a different object, y, at the point of the update *p=11. This can occur in practice, e.g. with GCC -O2. Its output of

x=1 y=2 *p=11 *q=2

suggests that the compiler is reasoning that *p does not alias with y or *q, and hence that the initial value of y=2 can be propagated to the final printf.

This outcome would not be correct with respect to a naive concrete semantics, and so to make the compiler sound it is necessary for this program to be deemed to have undefined behaviour. Note that this example does not involve type-based alias analysis, and the outcome is not affected by GCC's -fno-strict-aliasing flag. One might ask whether the mere formation of the pointer x+1 is legal. This case is explicitly permitted by the ISO standard.

2.1.2 Q2. Can equality testing on pointers be affected by pointer provenance information?

Example: provenance_equality_global_yx.c
#include <stdio.h>
#include <string.h> 
int  y=2, x=1;
int main() {
  int *p = &x + 1;
  int *q = &y;
  printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
  _Bool b = (p==q);
  // can this be false even with identical addresses?
  printf("(p==q) = %s\n", b?"true":"false");
  return 0;
}

This is also allowed according to DR260CR. We have observed GCC regarding two pointers with different provenance as nonequal (with ==) even though they have the same representation value. This happens in some circumstances but not others, so we suggest that whether pointer equality takes provenance into account or not should be made indeterminate in the standard (again to make the observed compiler behaviour sound with respect to the standard). Note that requiring equality to always take provenance into account would require implementations to track provenance at runtime.

The ISO C11 standard text is too strong here: 6.5.9p6 says "Two pointers compare equal if and only if both are [...] or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space", which requires such pointers to compare equal (reasonable pre-DR260CR, not not after it). We don't expect programmers to rely on that behaviour and GCC does not satisfy it, so, to be consistent with DR260CR and with the indeterminate behaviour we suggest, it should permit them to compare either equal or non-equal.

2.2 Pointer provenance via integer types

2.2.1 Q3 Can one make a usable pointer via casts to intptr_t and back?

ISO C11 optionally allows implementations to provide the type intptr_t (along with an unsigned variant) with guaranteed round-trip properties for pointer/integer casts. The following should be allowed, and that means that the C abstract machine should track provenance via such casts to and from integer values.

Example: provenance_roundtrip_via_intptr_t.c
#include <stdio.h>
#include <inttypes.h>
int  x=1;
int main() {
  int *p = &x;
  intptr_t i = (intptr_t)p;
  int *q = (int *)i;
  *q = 11; // is this free of undefined behaviour?
  printf("*p=%d  *q=%d\n",*p,*q);  
}

2.2.2 Q4 Can one make a usable pointer via casts to unsigned long and back?

It also seems to be common practice (e.g. in Linux) to extend these properties to unsigned long, as in the example below, when its implementation is large enough. We suggest that this be permitted iff that is the case, or that it be implementation-defined which integer types support this.

Example: provenance_roundtrip_via_unsigned_long.c
#include <stdio.h>
int  x=1;
int main() {
  int *p = &x;
  unsigned long i = (unsigned long)p;
  int *q = (int *)i;
  *q = 11; // is this free of undefined behaviour?
  printf("*p=%d  *q=%d\n",*p,*q);  
}

2.2.3 Q5 Must provenance information be tracked via casts to integer types and integer arithmetic?

Example: provenance_basic_using_intptr_t_global_yx.c
#include <stdio.h>
#include <string.h> 
#include <stdint.h>
#include <inttypes.h>
int  y = 2, x = 1;
int main() {
  intptr_t ux = (intptr_t)&x;
  intptr_t uy = (intptr_t)&y;
  intptr_t offset = 4;
  int *p = (int *)(ux + offset);
  int *q = &y;
  printf("Addresses: &x=%"PRIiPTR" p=%p &y=%"PRIiPTR\
         "\n",ux,(void*)p,uy);
  if (memcmp(&p, &q, sizeof(p)) == 0) {
    *p = 11; // does this have undefined behaviour?
    printf("x=%d  y=%d  *p=%d  *q=%d\n",x,y,*p,*q); 
  }
}

Given the type intptr_t, this question asks whether one can return to a concrete view of pointers as simple numbers, by casting to intptr_t followed by integer arithmetic and casting back to a pointer type. Here again, we observe GCC behaving the same as with Q1, reasoning that pointers obtained in this way cannot alias even if they have the same numerical values. This observation is reinforced by the GCC documentation, which mentions an "original pointer" associated to integer values cast to pointer type, so the answer seems to be "yes". This leads to many more questions regarding the specifics of how provenance information affect the semantics of each integer operator. Some of these are discussed in the next subsection and the remainder are given a complete treatment in the summary of our proposal at the end.

### Q6 Can one use bit manipulation and integer casts to store information in unused bits of pointers?

Now we extend the example of Q3, that cast a pointer to intptr_t and back, to use logical operations on the integer value to store some tag bits. The following code exhibits a strong form of this, storing the address and tag bit combination as a pointer (which thereby creates a misaligned pointer value, though one not used for accesses); a weaker form would store the combined value only as an integer.

Example: provenance_tag_bits_via_uintptr_t_1.c
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int x=1;
int main() {
  int *p = &x;
  // cast &x to an integer 
  uintptr_t i = (uintptr_t) p;
  // check the bottom two bits of an int* are not used
  assert(_Alignof(int) >= 4);
  assert((i & 3u) == 0u);
  // construct an integer like &x with low-order bit set
  i = i | 1u;  
  // cast back to a pointer
  int *q = (int *) i; // defined behaviour?
  // cast to integer and mask out the low-order two bits
  uintptr_t j = ((uintptr_t)q) & ~((uintptr_t)3u);  
  // cast back to a pointer
  int *r = (int *) j; 
  // are r and p now equivalent?  
  *r = 11;           //  defined behaviour? 
  _Bool b = (r==p); 
  printf("x=%i *r=%i (r==p)=%s\n",x,*r,b?"true":"false");  
}

The standard leaves conversions between integer and pointer types implementation-defined (6.3.2.3p{5,6}), but it is common practice to use unused pointer bits (either low-order bits from alignment requirements or high-order bits beyond the maximum address range). We suggest that the set of unused bits for pointer types of each alignment should be made implementation-defined, to make this practice legal.

Moreover, where the standard does give a guarantee, e.g. for round-trips through intptr_t (7.20.1.4p1), it says only that the result "will compare equal". In a provenance-aware semantics, that may not be enough to make the result usable to reference memory; the standard text should be strengthened here to guarantee that by giving the result a usable provenance.

2.2.4 Q7 Can equality testing on integers that are derived from pointer values be affected by their provenance?

Example: provenance_equality_uintptr_t_global_yx.c
#include <stdio.h>
#include <inttypes.h> 
int y=2, x=1;
int main() {
  uintptr_t p = (uintptr_t)(&x + 1);
  uintptr_t q = (uintptr_t)&y;
  printf("Addresses: p=%" PRIxPTR " q=%" PRIxPTR "\n",
         p,q);
  _Bool b = (p==q);
  // can this be false even with identical numeric addresses?
  printf("(p==q) = %s\n", b?"true":"false");
  return 0;
}

GCC did at one point print false for this, but it was regarded as a bug and fixed. We have observed in Clang for a similar example, but believe it is also a bug there. DR 260CR does not address the question. We believe that integer equality testing should not be affected by provenance, i.e. "no".

This example is inspired by one from Krebbers' PhD thesis.

2.3 Pointers involving multiple provenances

2.3.1 Q8 Should intra-object pointer subtraction give provenance-free integer results?

DR260CR does not address this, but it is uncontroversially "yes": an intra-object pointer subtraction, say between the addresses of two elements of an array, should give a provenance-free integer offset that can then be used for indexing into this or other arrays.

Example: provenance_multiple_2_global.c #include int y[2], x[2]; int main() { int *p = &(x[0]) + (&(y[1])-&(y[0])); *p = 11; // is this free of undefined behaviour? printf("x[1]=%d p=%d",x[1],p); return 0; }

We say a provenance-free integer offset because the fact that an offset is calculated from pointers to y should not, when added to a pointer to a distinct (maybe adjacent) object x, license its use to access y. The example below should not be allowed to access y[0], and we observe GCC optimising based on that assumption.

Example: provenance_multiple_4_global_yx.c
#include <stdio.h>
#include <string.h> 
int  y[2], x[2];
int main() {
  int *p = &x[1] + (&y[1]-&y[0]) + 0;
  int *q = &y[0];
  printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
  if (memcmp(&p, &q, sizeof(p)) == 0) {
    *p = 11;  // does this have undefined behaviour?
    printf("y[0]=%d *p=%d *q=%d\n",y[0],*p,*q);
  }
  return 0;
}

2.3.2 Q9 Can one make a usable offset between two separately allocated objects by inter-object subtraction (using either pointer or integer arithmetic), to make a usable pointer to the second by adding the offset to the first?

This is asking about pointers that have multiple provenances, which is not addressed in DR260CR or current GCC or Clang compiler documentation - they refer to "the origin" of a pointer as if there were necessarily only one.

The example below is a variant of the Q5 provenance_basic_using_intptr_t_global_yx.c in which the constant offset is replaced by a subtraction (here after casting from pointer to integer type).

Example: pointer_offset_from_subtraction_1_global.c
#include <stdio.h>
#include <string.h> 
#include <stdint.h>
#include <inttypes.h>
int  y = 2, x=1;
int main() {
  intptr_t ux = (intptr_t)&x;
  intptr_t uy = (intptr_t)&y;
  intptr_t offset = uy - ux;
  printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\
         " offset=%"PRIiPTR" \n",ux,uy,offset);
  int *p = (int *)(ux + offset);
  int *q = &y;
  if (memcmp(&p, &q, sizeof(p)) == 0) {
    *p = 11; // is this free of undefined behaviour?
    printf("x=%d  y=%d  *p=%d  *q=%d\n",x,y,*p,*q); 
  }
}

Our experiments and our survey responses both suggest that compilers do not in general support this, and we imagine it is uncommon in practice. However, there do seem to be specific important use cases, including the Linux and FreeBSD per-CPU variable implementations - though it is unclear whether these are between multiple allocations in the C sense. They might be dealt with by an attribute such as the GCC may_alias - though the documentation for that refers only to type-based alias analysis, not to "provenance-based" alias analysis. This needs further discussion, but we tentatively suggest "no".

2.3.3 Q11 Is the XOR linked list idiom supported?

The classic XOR linked list algorithm (implementing a doubly linked list with only one pointer per node, by storing the XOR of two pointers) also makes essential use of multiple-provenance pointers. In this example we XOR the integer values from two pointers and XOR the result again with one of them.

Example: pointer_offset_xor_global.c
#include <stdio.h>
#include <inttypes.h>
int x=1;
int y=2;  
int main() {
  int *p = &x;
  int *q = &y;
  uintptr_t i = (uintptr_t) p;
  uintptr_t j = (uintptr_t) q;
  uintptr_t k = i ^ j;
  uintptr_t l = k ^ i;
  int *r = (int *)l;
  // are r and q now equivalent?  
  *r = 11;     // does this have defined behaviour?             
  _Bool b = (r==q); 
  printf("x=%i y=%i *r=%i (r==p)=%s\n",x,y,*r,
         b?"true":"false");  
}

It is unclear whether this algorithm is important in modern practice. One respondent remarks that the XOR list implementation interacts badly with modern pipelines and the space saving is not a big win; another doesn't know of modern usages, though suspects that it is probably still used in places. We don't know whether current compiler alias analysis permits it or not. Our suggested semantics would not allow it.

2.4 Pointer provenance via pointer representation copying

2.4.1 Q13 Can one make a usable copy of a pointer by copying its representation bytes using the library memcpy?

Example: pointer_copy_memcpy.c
#include <stdio.h>
#include <string.h>
int x=1;
int main() {
  int *p = &x;
  int *q;
  memcpy (&q, &p, sizeof p); 
  *q = 11; // is this free of undefined behaviour?
  printf("*p=%d  *q=%d\n",*p,*q);  
}

The ISO C11 text does not explicitly address this. In a pre-provenance semantics, before DR260, it did not need to, but now (as it surely should be allowed) the standard needs to guarantee that the result has the appropriate provenance to make it usable. One could allow it by special-casing memcpy() to preserve provenance, but the following questions suggest a less ad hoc approach.

2.4.2 Q14 Can one make a usable copy of a pointer by copying its representation bytes (unchanged) in user code?

Example: pointer_copy_user_dataflow_direct_bytewise.c
#include <stdio.h>
#include <string.h>
int  x=1;
void user_memcpy(unsigned char* dest, 
                 unsigned char *src, size_t n) {
  while (n > 0)  {      
    *dest = *src;
    src += 1;
    dest += 1;
    n -= 1;
  }
}
int main() {
  int *p = &x;
  int *q;
  user_memcpy((unsigned char*)&q, (unsigned char*)&p, 
              sizeof(p));
  *q = 11; // is this free of undefined behaviour?
  printf("*p=%d  *q=%d\n",*p,*q);
}

ISO C11 and DR260CR again do not mention this explicitly (though the 6.5p6 effective type text weakly implies it is allowed). We believe it is widely relied on.
(The only exceptions we are aware of are capability machines such as IBM system 38 and descendents, or CHERI. In CHERI you have to copy pointers at pointer types for it to work properly, but capability loads and stores can operate generically, because the capability registers have tag bits. There is also some new tagged memory support for Oracle Sparc, to find invalid pointers.)

Our proposed semantics makes it legal by regarding each representation byte (as an integer value) as having the provenance of the original pointer, and the result pointer, being composed of representation bytes with that provenance, as having the same. We could either insist:

The former is our preferred option. There may not be much reasonable code that would be sensitive to the distinctions between these - perhaps manipulations of pointers where one knows the high-order bytes are common, as in the survey response mentioning encoding 64-bit pointers in 48 bits. Our semantics will permit that.

2.4.3 Q15 Can one make a usable copy of a pointer by copying its representation bytes by user code that indirectly computes the identity function on those bytes?

For example, suppose one reads the bytes of a pointer representation pointing to some object, encrypts them, decrypts them, store them as the representation of another pointer value, and tries to access the object. The following code is a simplified version of this, just using a XOR twice; one should imagine a more complex transform, with the transform and its inverse separated in the code and in time so that the compiler cannot analyse them.

Example: pointer_copy_user_dataflow_indirect_bytewise.c
#include <stdio.h>
#include <string.h>
int  x=1;
void user_memcpy2(unsigned char* dest, 
                  unsigned char *src, size_t n) {
  while (n > 0)  {      
    *dest = ((*src) ^ 1) ^ 1;
    src += 1;
    dest += 1;
    n -= 1;
  }
}
int main() {
  int *p = &x;
  int *q;
  user_memcpy2((unsigned char*)&q, (unsigned char*)&p, 
              sizeof(p));
  *q = 11; // is this free of undefined behaviour?
  printf("*p=%d  *q=%d\n",*p,*q);
}

Whether this should be supported is unclear, and DR260 CR does not speak to it
(it calls out the library memcpy and memmove as special cases: "Note that using assignment or bitwise copying via memcpy or memmove of a determinate value makes the destination acquire the same determinate value."). Our proposal would allow it only in the case where bytes of different pointer values are not mixed during the computation.

2.4.4 Q16 Can one carry provenance through dataflow alone or also through control flow?

Our provenance examples so far have all only involved dataflow; we also have to ask if a usable pointer can be constructed via non-dataflow control-flow paths, e.g. if testing equality of an unprovenanced integer value against a valid pointer permits the integer to be used as if it had the same provenance as the pointer. We don't expect that this is relied on in practice, and our proposed semantics does not permit it - we track provenance only through dataflow. This needs to be discussed with respect to current compiler analysis behaviour.

For example, consider a version of the previous indirect memcpy example with a control-flow choice on the value of the bytes:

Example: pointer_copy_user_ctrlflow_bytewise_abbrev.c
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <limits.h>
int  x=1;
unsigned char control_flow_copy(unsigned char c) {
  assert(UCHAR_MAX==255);
  switch (c) {
  case 0: return(0);
  case 1: return(1);
  case 2: return(2);
  ...
  case 255: return(255);
  }
}
void user_memcpy2(unsigned char* dest, 
                  unsigned char *src, size_t n) {
  while (n > 0)  {      
    *dest = control_flow_copy(*src);
    src += 1;
    dest += 1;
    n -= 1;
  }
}
int main() {
  int *p = &x;
  int *q;
  user_memcpy2((unsigned char*)&q, (unsigned char*)&p, 
              sizeof(p));
  *q = 11; // is this free of undefined behaviour?
  printf("*p=%d  *q=%d\n",*p,*q);
}

Similarly, one can imagine copying a pointer via uintptr_t bit-by-bit via a control-flow choice for each bit.

Finally, contrasting with the first two examples above, that recover all the concrete value information of the original pointer, we can consider a variant of the Q5 provenance_basic_using_intptr_t_global_yx.c example in which there is a control-flow choice based on partial information of the intended target pointer (here just whether q is null) and the concrete value information is obtained otherwise:

Example: provenance_basic_mixed_global_offset+4.c
#include <stdio.h>
#include <string.h> 
#include <stdint.h>
#include <inttypes.h>
int  y = 2, x=1;
int main() {
  intptr_t ux = (intptr_t)&x;
  intptr_t uy = (intptr_t)&y;
  intptr_t offset = 4;
  printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\
         "\n",ux,uy);
  int *q = &y;
  if (q != NULL) {
    int *p = (int *)(ux + offset);
    if (memcmp(&p, &q, sizeof(p)) == 0) {
      *p = 11; // is this free of undefined behaviour?
      printf("x=%d  y=%d  *p=%d  *q=%d\n",x,y,*p,*q); 
    }
  }
}

A semantics that tracks provenance only through dataflow dependency seems to be the simplest option and probably compatible with programming practice; we imagine that none of these idioms occur in normal practice. It would forbid the above examples, while permitting the dataflow bitwise copy example.
This is our preferred option.

Allowing provenance to be propagated via any control-flow dependency would allow all these examples, but it seems clear that the last example should be forbidden, in ISO or de facto semantics, and indeed GCC is again doing an optimisation that would not be sound if it were. In real code we imagine that many pointer accesses are in some way control-flow dependent on others, given the many null-pointer checks required in C, so tracking that would neither be feasible nor useful.

2.5 Pointer provenance and union type punning

2.5.1 Q17 Is type punning between integer and pointer values allowed?

The following example (analogous to the provenance_roundtrip_via_intptr_t.c of Q3) constructs a pointer by casting a pointer to uintptr_t, storing that in a member of a union of that type, and then reading from a member of the union of pointer type.

Example: provenance_union_punning_1_global.c
#include <stdio.h>
#include <string.h> 
#include <inttypes.h>
int x=1;
typedef union { uintptr_t ui; int *p; } un;
int main() {
  un u; 
  int *px = &x;
  uintptr_t i = (uintptr_t)px;
  u.ui = i;
  int *p = u.p;
  printf("Addresses: p=%p &x=%p\n",(void*)p,(void*)&x);
  *p = 11;  // is this free of undefined behaviour?
  printf("x=%d *p=%d\n",x,*p);
  return 0;
}

The ISO standard says "the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type", but says little about that reinterpretation, and DR260 CR does not speak to the provenance of the result.

For any particular implementation, pointers to normal object types might or might not have the same representation as uintptr_t values. That might well not hold for some implementations, but it does for our usual ones.
Our semantics has to have an implementation-defined map for the conversions between pointer representations and uintptr_t in any case, so we can say that this example is allowed, preserving the original provenance, iff that is the identity.

That choice relies on an assumption that compiler alias analysis and optimisation are not assuming that this example is undefined behaviour. At present we have no data either way about that.

2.6 Pointer provenance via IO

2.6.1 Q19 Can one make a usable pointer via IO?

We now consider the extreme example of pointer provenance flowing via IO, if one writes the address of an object to a file and reads it back in. We have three versions: one using fprintf/fscanf and the %p format, one using fwrite/fread on the pointer representation bytes, and one converting the pointer to and from uintptr\_t and using fprintf/fscanf on that value with the PRIuPTR/SCNuPTR formats.
The first gives a syntactic indication of a potentially escaping pointer value, while the others (after preprocessing) do not. Giving just the first in full:

Example: provenance_via_io_percentp_global.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
int x=1;
int main() {
  int *p = &x;
  FILE *f = fopen(
    "provenance_via_io_percentp_global.tmp","w+b");
  printf("Addresses: p=%p\n",(void*)p);
  // print pointer address to a file
  fprintf(f,"%p\n",(void*)p);
  rewind(f);
  void *rv;
  int n = fscanf(f,"%p\n",&rv);
  int *r = (int *)rv;
  if (n != 1) exit(EXIT_FAILURE);
  printf("Addresses: r=%p\n",(void*)r);
  // are r and p now equivalent?  
  *r=12; // is this free of undefined behaviour?                                                           
  _Bool b1 = (r==p); // do they compare equal?                      
  _Bool b2 = (0==memcmp(&r,&p,sizeof(r)));//same reps?
  printf("x=%i *r=%i b1=%s b2=%s\n",x,*r,
         b1?"true":"false",b2?"true":"false");
}

This is used in practice: in graphics code for marshalling/unmarshalling, using %p, in xlib, using SCNuPTR, and in debuggers.

In the ISO standard, the standard text for fprintf and scanf for %p say that this should work: "If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value; otherwise the behavior of the %p conversion is undefined" (modulo the usual remarks about "compare equal"), and the text for uintptr_t and the presence of SCNuPTR in inttypes.h implies the same there.

To make the standard allow it in a provenance-aware C abstract machine, we suggest that either

After a successful usage of a pointer with wildcard provenance, one could conceivably side-effect the pointer value to collapse its provenance down to the one used. We have no idea whether compilers could or do depend on that; absent a compelling reason otherwise, we do not plan to build that in to our semantics.

2.6.2 Q20 Can one make a usable pointer from a concrete address (of device memory)?

C programs should normally not form pointers from particular concrete addresses. For example, the following should normally be considered to have undefined behaviour, as address 0xABC might not be mapped or, if it is, might alias with other data used by the runtime. By the ISO standard it does have undefined behaviour, consistent with an abstract view of pointers.

Example: pointer_from_concrete_address_1.c
int main() { 
  // on systems where 0xABC is not a legal non-stack/heap 
  // address, does this have undefined behaviour?
  *((int *)0xABC) = 123;
}

But in some circumstances, especially for embedded devices, it is idiomatic to use concrete addresses in C to access memory-mapped devices, e.g.

Example: pointer_from_concrete_address_2.c
#define PORTBASE 0x40000000
unsigned int volatile * const port = 
  (unsigned int *) PORTBASE;
int main() {
  unsigned int value = 0;
  // on systems where PORTBASE is a legal non-stack/heap 
  // address, does this have defined behaviour?
  *port = value; /* write to port */
  value = *port; /* read from port */
}

Our suggestion is to introduce an implementation-defined set of "device memory" addresses (which may depend on linking), and which is guaranteed to be disjoint from normal C-accessible stack, heap, and program memory, for which the creation of such pointers be allowed.

3 Proposed semantics, in detail

4 Proposed Technical Corrigendum

provenance
abstract information associated to each concrete value of pointer or integer type. It can either be the empty provenance, a single provenance for a given object or region, or the wildcard provenance.

When the lvalue conversion is performed on an lvalue with pointer type, if it has the empty provenance and the associated memory location is not within the implementation-defined device memory, the behavior is undefined. If it has a single provenance and the object designated by the lvalue is not the same as or within the object from the single provenance, the behavior is undefined.

NOTE: for the wildcard provenance there is no additional check.

The provenance of a value resulting from an lvalue conversion is has follows: if all the bytes of the designated object have the empty provenance, then so does the resulting value; if some bytes have a single provenance and all the other bytes have the empty provenance, then the resulting value has the same single provenance; if any two bytes have different single provenances, then the resulting value has the empty provenance; if any byte has the wildcard provenance, then so does the resulting value.

There is an implementation-defined set of integer values for which the resulting pointer is within a device region of storage.

A null pointer has the empty provenance

Some expression operators evaluate to an integer value when both operands have integer type. For these the provenance of the resulting integer is as follows: if the value of both operands have empty provenance, so does the result; if the value of one operand has a single provenance and the other has empty provenance, or both values have the same single provenance, the result has that single provenance; if the values of the two operands have different single provenances, the result has the empty provenance; if the value of one operand has the wildcard provenance, the result has the wildcard provenance.

When the operand designates an object, the result has the single provenance of the outermost object containing that object.

For all these operators, when the operand has integer type, the result has the empty provenance.

The result has the empty provenance.

When both operands have integer type, the resulting integer has provenance as described in X.

The result has the provenance of the first operand.

, and has the empty provenance.

The result has provenance as described in X.

The result has provenance as described in X.

The result has provenance as described in X.

, and has the empty provenance.

, and has the empty provenance.

The value may be copied into an object of type unsigned char [n] (e.g., by memcpy)

   to read:

The value may be copied into an object of type unsigned char [n] (e.g., by memcpy), if the value is an integer or a pointer (and therefore has a provenance), the elements of the resulting array all have that provenance.

If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value

   to read:

If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to the most recent such value and have the same provenance