Learning for Distributed Control

Publié le
Equipe
Date de début de thèse (si connue)
automne 2021
Lieu
Rennes
Unité de recherche
IRISA - UMR 6074
Description du sujet de la thèse

Machine learning techniques have achieved spectacular breakthroughs over the last years, in domains like complex pattern recognition/classification, or for the design of optimal strategies in large games. It has also renewed the interest for reinforcement learning approaches to design control laws for complex systems that can easily be simulated. The overall objective of the L4DisCo project is capitalize on these successes and extend their application range to the control of complex cyberphysical systems. The challenges lie in the size and complexity of the systems we target, in the extension to a multi-agent framework (distributed control), and in the theoretical connection with traditional concepts of control theory like convergence speed and stability.

One of our target use-cases about smart transportation systems would better illustrate these goals. We consider the regulation problem on a subway line, of arbitrary topology, where each train is governed by its own control law. By controlling speeds and dwell times at stations, one would like to balance the spacing of trains, to minimize energy consumption by synchronizing stoppings and departures, to catch up quickly on delays (which tend to accumulate and propagate, a known instability of loaded lines), to recover from jammed situations, etc. Such systems can rather easily be simulated with formal (quantitative) models of different granularities, but centralized and a fortiori distributed control laws are extremely difficult to design and to analyze. While in principle reinforcement learning could help get a centralized controller, one is quickly faced with several difficulties, among which

  • the need to mix symbolic decisions with quantitative ones,
  • the huge size of the underlying model, and thus the selection of the appropriate model granularity, together with the understanding of its influence,
  • the slow convergence of learning methods,
  • the complexity of a centralized control law, that is less practical and less readable than a distributed one,
  • the lack of guarantees on the performance and on the safety of the synthesized controllers.

We plan to address the above challenges by capitalizing on our experience in the approximation of models and algorithms for large scale systems, in particular dynamical systems handling quantitative features like time, probabilities and costs. According to the skills and tastes of the selected candidate for this thesis, the research will explore some of the following topics :

  • understanding the influence of model granularity on the performance of the synthesized controllers; this covers the design of model approximation schemes for target performances of the controlled system,
  • designing learning algorithms for a distributed controller, where each agent operates on a fine scale knowledge of its local state and a coarser scale knowledge of the system state,
  • learning control laws with guaranteed safety properties, or with likelihood upper bounds on their violation,
  • providing explainable strategies
  • characterizing the performances of the designed controllers, in terms of stability and convergence speed.

Context : this thesis will be partly supported by the ANR project Maveriq (2021-2025). Maveriq aims at developing a unified framework to deal with quantitative discrete event models, involving time, probabilities, costs, etc., under an operator theory umbrella. A specific focus is on approximation methods both for systems or for verification/control/estimation algorithms for such models. For the inspiring use-case, we will rely on knowledge acquired in our long-lasting collaboration with Alstom.

 

Bibliographie
  • Borja Balle,  Xavier Carreras, Franco M. Luque, Ariadna Quattoni.  Spectral Learning of Weighted Automata, a forward backward perspective. Machine Learning (2014), 96:33–63.
  • Damien Busatto-Gaston, Debraj Chakraborty, Jean-Francois Raskin. Monte Carlo Tree Search guided by Symbolic Advice for MDPs. July 2020. arXiv:2006.04712v2
  • Taolue Chen, Vojtech Forejt, Marta Z. Kwiatkowska, Aistis Simaitis, Clemens Wiltsche. On stochastic games with multiple objectives. In proceedings of MFCS 2013, LNCS 8087, pp. 266–277.
  • Bruno Adeline, Pierre Dersin, Eric Fabre, Loïc Hélouët, Karim Kecir. An efficient evaluation scheme for KPIs in regulated urban train systems. In proceedings of RSSRail 2017, LNCS 10598, pp. 195–211.
  • Matthieu Pichené, Sucheendra Palaniappan, Eric Fabre, Blaise Genest. Modeling Variability in Populations of Cells using Approximated Multivariate Distributions. IEEE/ACM Transactions on Computational Biology and Bioinformatics 17(5), Sept.-Oct. 2020, pp. 1691-1702.
Liste des encadrants et encadrantes de thèse

Nom, Prénom
Eric Fabre
Type d'encadrement
Directeur.trice de thèse
Unité de recherche
Rennes
Contact·s
Mots-clés
reinforcement learning, formal methods, multi-agent systems, model abstractions, optimal control