Below are listed some of the projects I have been working on (reverse-chronological order).
Certified compilation of multi-threaded Java
This work aims at providing a certified compiler of Java concurrent programs. Ultimately, an input bytecode program will compile down to an provably equivalent assembly program. This entails choosing a formal semantics for the input language. The Java Memory Model, as is, is not suitable, and this is twofold. First, its formal definition is known to be broken on several points. Second, the race-commiting semantics it defines is too complex to even think about formally proving a compiler with regard to it.
We define a new Java Memory Model that solves these issues: it is more restricted than the Java Memory Model, while still allowing the compiler to perform enough optimizations and taking into account the relaxations of the underlying hardware. We provide two equivalent view of this model. The axiomatic view is easier for the programmer to understand even in the presence of races. The operational view of the model is amenable for reasoning about program transformations and static analysis, and better matches the semantics of the compiler's lower level representations.
References:
- Plan B: A Buffered Memory Model for Java. D. Demange, V. Laporte, L. Zhao, D. Pichardie, S. Jagannathan, J. Vitek. Submitted. 2011.
Static Single Assignment form
Static Single Assignment (SSA) form is an intermediate representation of program in which program variables are assigned exactly once, thus making explicit in the program syntax the link between the variable definition program point and program points where it is used (def-use chains). Introduced in the late 80's by Cytron et al., SSA has been extensively used in many static analysis, because of the strong invariants that its structural constraints trigger.
Despite its large use in practice, and the vast body of academic work on SSA-based analysis, little work has been done on proving them sound with respect to a formal semantics of SSA. This work fills this gap by formalizing the intuitive semantics of SSA, and identifying the key lemma that allows SSA to be thought of as an equational form of programs. We illustrate how these invariants can be used to prove an SSA-based optimization: Global Value Numbering.
Additionally, this work proposes to use the a posteriori validation technique to prove correct a family of SSA generation algorithms. This allows abstracting all the heuristics and implementation tweaks used in these generation algorithms, and focusing on their common properties. The validator is proved to be sound, i.e. it only accepts SSA candidates that are semantically equivalent to the initial program. It is also proved to be complete: it will never reject correct SSA candidates generated from dominance-frontier based algorithms.
This work has been formalized within the CompCert C verified compiler, by adding an SSA-based middle-end. The middle-end performs (i) conversion from RTL to SSA form (ii) SSA-based optimizations, including Global Value Numbering and (iii) destruction of the SSA form back to RTL.
Website: http://www.irisa.fr/celtique/ext/compcertSSA/
References:
- [BDP12] A formally verified ssa-based middle-end - Static Single Assignment meets CompCert. G. Barthe, D. Demange, and D. Pichardie. In Proc. of 21th European Symposium on Programming (ESOP 2012), Lecture Notes in Computer Science. Springer-Verlag, 2012. [.pdf]
- [BDP11] A formally verified SSA-based middle-end compiler. G. Barthe, D. Demange and D. Pichardie. Research Report 634702, INRIA, 2011. [.pdf]
Java bytecode intermediate representation
This work deals with Static analysis of Java program at the bytecode level. This can be either (i) preferred to Java source program analysis (simpler syntax and semantics) (ii) or required in the case where the Java compiler cannot be trusted. One particular difficulty about static analysis of bytecode is that the Java Virtual Machine is stack-based: the bytecode manipulates an operand-stack (to compute expressions) that has to be properly handled by analyses, for precision concern. One solution is to abstract the operand stack with symbolic information, but this might deeply impact the complexity of the analysis. An alternative is to perform the analysis on an intermediate representation of the Bytecode, but the generation of this IR must in turn be proved semantics-preserving.
We propose here a provably correct stackless IR for Java Bytecode, in which side-effect free expression trees are recovered. It also fold the object constructor code sequence, that might be needed for some analyses. Finally, the IR provides dynamic explicit checks instructions, that prevents the exception throwing order to be altered. The transformation is proved to be correct with regards to an operational semantics of Java bytecode. This IR is now part of the larger project of Sawja (Static Analysis Workshop for Java), a framework for designing and implementing Java bytecode analysis -- and includes a full Java bytecode class file manipulation library. Sawja is written in Ocaml and developped in the Celtique research team. Sawja has been used to develop several static analysis (see URL below), and the use this IR turned out to be very helpful in simplifying their conception.
Website: http://sawja.inria.fr/
References:
- [DJP10] A provably correct stackless intermediate representation for Java bytecode. D. Demange, T. Jensen, and D. Pichardie. In Proc. of the 8th Asian Symposium on Programming Languages and Systems (APLAS 2010), volume 6461 of Lecture Notes in Computer Science, pages 97-113. Springer-Verlag, 2010. [.pdf]
- [HBBD10] Sawja: Static Analysis Workshop for Java. L. Hubert, N. Barré, F. Besson, D. Demange, T. Jensen, V. Monfort, D. Pichardie, and T. Turpin. In Proc. of the International Conference on Formal Verification of Object-Oriented Software (FoVeOOS 2010), volume 6528 of Lecture Notes in Computer Science, pages 82-106. Springer-Verlag, 2011. [.pdf]
- [DJP09] A Provably Stackless Intermediate Representation for Java Bytecode. D. Demange and T. Jensen and D. Pichardie. Research Report 7021, INRIA, 2009. Extended version of [DJP10][.pdf]
Non-interference
Recent work of Askarov et al (ESORICS’08) shows that the notion of non-interference used in most of the tools for analysing information flow security of program is not as secure as it was thought before: for programs which perform output, the amount of information leaked by a Denning style analysis is not bounded; the good news is that if secrets are chosen to be sufficiently large and sufficiently random then they cannot be effectively leaked at all. But secrets cannot always be made sufficiently large or sufficiently random. This work addresses this issue by developping a notion of secret-sensitive noninterference in which “small” secrets are handled more carefully than “big” ones. We illustrate the idea with a type system enforcing this policy, and which combines a liberal Denning-style analysis with a more restrictive system according to the nature of the secrets at hand.
References:
- [DS09a] All Secrets Great and Small. D. Demange and D. Sands. In Programming Languages and Systems. 18th European Symposium on Programming, ESOP 2009, number 5502 in LNCS. Springer Verlag, 2009. [.pdf]
- [DS09b] All Secrets Great and Small. D. Demange and D. Sands. Technical report, Chalmers University of Technology and Goteborgs University, March 2009. Extended version of [DS09a]. [.pdf]