A Precise and Abstract Memory Model for C using Symbolic Values

Real life C programs are often written using C dialects which, for the ISO C standard, have undefined behaviours.
In particular, according to the ISO C standard, reading an uninitialised variable is an undefined behaviour and low-level pointer operations are implementation defined.
We propose a formal semantics capable of capturing those behaviours for the C dialect of the CompCert compiler.
Our semantics builds upon a novel memory model leveraging a notion of symbolic values. Symbolic values are used to delay the evaluation of operations and are normalised to standard values at strategic points in the semantics.
We show that the most precise normalisation is computable and that a slightly relaxed normalisation can be efficiently implemented using SMT solvers.
The semantics is executable and our experiments show that the enhancements of our semantics are mandatory to give a meaning to the allocation functions of an implementation of the C standard library.

Page last updated on: 2014-06-02

Paper

This work has been submitted to the 12th Asian Symposium on Programming Languages and Systems (APLAS 2014).

Coq Development

Download the last stable version of the Coq development: [symbolic.tar.gz]
(tested with OCaml 4.00.1 and Coq 8.4pl2)

Requirements

To compile our interpreter, you need to have OCaml and Coq installed on your machine.
You also need to have Z3 installed, as it is used to normalise expressions in the semantics.

Installation instructions

Note: You need OCaml and Coq installed on your machine.
  1. Download and extract the archive (tar xvzf symbolic.tgz).
  2. Inside the symbolic directory, run make to compile the sources (this may take several minutes).

Usage instructions

A list of options is available by typing ./ccomp -help. However, there is a script that allows to easily run a C file in our semantics. This script is located at /tests/pdclib/run:

Example

If you don't want to download and install OCaml, Coq, and Z3, here is an example of running tests/pdclib/examples/low-level-pointer-arithmetic.c.

char hash(void* ptr){
  uintptr_t p = (uintptr_t) ptr;
  return p | (p >> 8) | (p >> 16) | (p >> 24);
}

int main(){
  int * p = (int *)malloc(sizeof(int));
  *p = 42;
  int * q = (int *) ((uintptr_t) p | (hash(p) & 0xF));
  int * r = (int *) (((uintptr_t) q >> 4) << 4);
  printf("p = %p, r = %p\n", p, r);
  printf("p = %d, r = %d\n", *p, *r);
  return *r;
}

This program takes advantage of the fact that malloc returns pointers aligned on a 16-byte boundary, i.e. the last four bits are null. This unused space is used to store a hash of the pointer.

Below are the runs of the program with each semantics

With symbolic values
$ ./run -f examples/low-level-pointer-arithmetic.c 
p = <237+16>, r = <237+16>
p = 42, r = 42
Program performed 206 simplifications.
With gcc
$ ./run -f examples/low-level-pointer-arithmetic.c -g
$ ./a.out 
p = 0xf7760010, r = 0xf7760010
p = 42, r = 42
With CompCert 2.2 without symbolic values
$ ./run -f examples/low-level-pointer-arithmetic.c -n
Stuck state: in function sys_alloc, expression
   =
    16U
      + ((16 - ((unsigned int) (void *) (0 + 8U) & 16 - (unsigned int) 1)
           & 16 - (unsigned int) 1)
           + (sizeof(struct malloc_segment) + sizeof(unsigned int)
                + (16 - (unsigned int) 1) & 
               ~(16 - (unsigned int) 1))
           + (sizeof(struct malloc_chunk) + (16 - (unsigned int) 1)
               & ~(16 - (unsigned int) 1)) + 
          16) + (mparams.granularity - (unsigned int) 1)
      & ~(mparams.granularity - (unsigned int) 1)
Stuck subexpression: 0 + 8U
ERROR: Undefined behavior