Local search for combinatorial bandits

Submitted by Romaric GAUDEL on
Team
Date of the beginning of the PhD (if already known)
septembre 2024
Place
Rennes
Laboratory
IRISA - UMR 6074
Description of the subject

The goal of the project is to solve combinatorial optimization problems in the presence of bandit feedback using local search approaches.

Multi-armed bandits deal with the optimization of problems in which certain parameters are to be inferred during interactions. This area has been studied in machine learning since the work of P. Auer [A02], which led to the emergence of a class of algorithms applicable to real-world problems. This pioneering work was limited to the situation where the possible configurations of the optimization problem (called arms) were independent. Subsequently, several works have extended the classes of problems covered, such as combinatorial problems [C15, A21], including optimization on matroids and polymatroids [K14, P19]. However, these approaches rely on global optimization, which comes at a cost in terms of regret and computational time. Recently, we have adapted an algorithm based on local search [C14] to two matching problems, thus reducing these costs [G21, G22]. However, the approaches we proposed rely on properties specific to these two matching problems, in particular the existence of a small set of parameters sufficient to identify the best match.

The PhD project will extend the idea of local search to a larger class of combinatorial optimization problems. It will first focus on optimization problems on weighted matroids (e.g., task scheduling) and on the intersection of weighted matroids (e.g., the assignment problem). Subsequently, it will study other classes of problems, such as optimization problems on greedoids

Bibliography

[A02] Auer, P.; Cesa-Bianchi, N.; and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3): 235–256.

[A21] Agarwal, M., Aggarwal, V., Umrawal, A. K., & Quinn, C. (2021). DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’21), 35(8), 6557-6565.

[C14] Combes, R. and Proutiere, A. (2014). Unimodal bandits: Regret lower bounds and optimal algorithms. In proc. of the 31st Int. Conf. on Machine Learning, (ICML’14).

[C15] Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, Marc Lelarge. (2015). Combinatorial Bandits Revisited. In proc. of Advances in Neural Information Processing Systems 28 (NeurIPS’15).

[K14] Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. (2014). Matroid bandits: fast combinatorial optimization with learning. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI'14). AUAI Press, Arlington, Virginia, USA, 420–429.

[G21] Camille-Sovaneary Gauthier, Romaric Gaudel, Elisa Fromont, and Boammani Aser Lompo (2021). Parametric Graph for Unimodal Ranking Bandit. In Proc. of the 38th International Conference on Machine Learning (ICML’21), pp. 3630–3639.

[G22] C.-S. Gauthier, Romaric Gaudel, and Elisa Fromont (2022). UniRank: Unimodal Bandit Algorithms for Online Ranking. In Proc. of the 39th International Conference on Machine Learning (ICML’22), pp. 7279–7309.

[P19] Pierre Perrault, Vianney Perchet, Michal Valko (2019). Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits. In Proc. of the 36th International Conference on Machine Learning (ICML’19)

Researchers

Lastname, Firstname
Gaudel, Romaric
Type of supervision
Director
Laboratory
IRISA (UMR 6074) and Inria
Team

Lastname, Firstname
Fromont, Élisa
Type of supervision
Co-director (optional)
Laboratory
IRISA (UMR 6074) and Inria
Team
Contact·s
Nom
Gaudel, Romaric
Email
Romaric.gaudel@irisa.fr
Nom
Fromont, Élisa
Email
elisa.fromont@irisa.fr
Keywords
Combinatorial Multi-armed Bandits, Unimodal Bandits, Learning to Rank