Smart Building of Software Variants

Publié le
Equipe
Date de début de thèse (si connue)
01/09/2021
Description du sujet de la thèse

Most of today's software is highly-configurable in order to fit constraints, functional and performance requirements of users. For instance the Linux kernel offers 15000 configurations options and this is certainly the key to its success: configurations allow users to embed Linux variants in a multitude of devices with different levels of performance, security, etc. Configurations, though beneficial and needed for many organisations, also come with challenges for developers: How to ensure that, throughout the continuous evolution of a project, all or at least a subset of configurations build well?

The goal of this thesis is to develop what we call incremental build of configurations. Given a base configuration, we want to modify it (through the re-setting of some options values) and then build it without starting from scratch. Similarly, we aim to build a given set of configurations without starting from scratch each time. The promise is to dramatically reduce the cost of building software, a stressing topic when you think about the environmental and financial costs that companies and public organizations should have to bear.

**Society relies on software, but building software has an enormeous cost: this project aims to mitigate this trend**

The usual compilation and build process works quite well when small modifications are made (eg modification of one source file), but building several configurations involve large modifications that span numerous source files. There are two extremes: (1) small modifications, with very low cost since the incremental compilation is fast (2) large modifications, with high cost since almost everything should be recompiled. In-between, we want to find a good trade off between diversity of the configurations and cost of compiling them.

In a sense, we want to explore the configuration space in a smart, efficient way. The are at least four research questions:
RQ1: Is incremental compilation of configurations safe? (ie do we obtain the same exact binary than with a standard compilation?)
RQ2: What's the gain of applying incremental compilation? (gain: time needed to compile eg the Linux kernels)
RQ3: Can we explore a diverse set of configurations with incremental compilations?
RQ4: Is there a compilation strategy that reduces the cost of compilation without trading diversity?

Several subject systems will be considered, with different languages, compilers, and build properties. We will start with the configuration space of Linux. We will instrument incremental compilation on top of TuxML, a tool dedicated to the large scale build of configurations. The ultimage goal is to integrate our idea in mainstream testing infrastructure (eg KernelCI), for exploring further configurations at lower cost. We will then consider other systems, like JHipster or Chromium.

The outcome of this research is to formulate the foundations of incremental build, invent new algorithms integrated into mainstream compilers and build systems, and assess the solution on widely used software projects.
 

Bibliographie

[Acher et al., 2019a] Acher, M., Martin, H., Alves Pereira, J., Blouin, A., Eddine Khelladi, D., and
Jézéquel, J.-M. (2019a). Learning From Thousands of Build Failures of Linux Kernel Configura-
tions. Technical report, Inria ; IRISA.
[Acher et al., 2019b] Acher, M., Martin, H., Pereira, J. A., Blouin, A., Jézéquel, J.-M., Khelladi,
D. E., Lesoil, L., and Barais, O. (2019b). Learning Very Large Configuration Spaces: What
Matters for Linux Kernel Sizes. Research report, Inria Rennes - Bretagne Atlantique.

[Zhang et al., 2015] Zhang, Y., Jiang, Y., Xu, C., Ma, X., and Yu, P. (2015). Abc: Accelerated
building of c/c++ projects. 2015 Asia-Pacific Software Engineering Conference (APSEC), pages
182–189.

[Dietrich et al., 2017] Dietrich, C., Rothberg, V., Füracker, L., Ziegler, A., and Lohmann, D. (2017).
cHash: detection of redundant compilations via AST hashing. In Proceedings of the 2017 USENIX
Annual Technical Conference (USENIX ’17), Berkeley, CA, USA. USENIX Association.

[Edge et al., 2020] Edge, J. The costs of continuous integration (2020) https://lwn.net/Articles/813767/ 

Liste des encadrants et encadrantes de thèse

Nom, Prénom
Jean-Marc Jézéquel
Type d'encadrement
Directeur.trice de thèse
Unité de recherche
IRISA

Nom, Prénom
Mathieu Acher
Type d'encadrement
2e co-directeur.trice (facultatif)
Unité de recherche
IRISA

Nom, Prénom
Djamel Khelladi
Type d'encadrement
Co-encadrant.e
Unité de recherche
IRISA
Contact·s
Mots-clés
software, languages, compilation, build, configuration, variability, software product line, sampling