Vous êtes ici

Optimal transport for the classification of structured data

Equipe et encadrants
Département / Equipe: 
Site Web Equipe: 
Directeur de thèse
Nicolas Courty
Co-directeur(s), co-encadrant(s)
Laetitia Chapel
Romain Tavenard
NomAdresse e-mail
Nicolas Courty
Laetitia Chapel
Romain Tavenard
Sujet de thèse
In numerous applications, data are not in a vectorial form but are rather structured: they are described by a set of parts that have relationships or constraints between them. For instance, an image can be represented at different scales by a hierarchical representation; time series have an intrinsic internal structure that must be taken into account. A consequence of the presence of structure in the data is that the classical machine learning techniques can not be directly applied. Two solutions are usually implemented to solve this problem: 
i) data are first transformed in order to bring back to a vectorial form (e.g. thanks to a feature extraction step in the time series context  or by stacking all the nodes attributes when dealing with a tree). Nevertheless, providing meaningful features is not straightforward;
ii) similarity measures between the different subparts are computed, then combined together (thanks to a convolutional kernel for instance. It usually suffers from high computational costs, preventing the method to be used in a large scale context.
In both cases, the solution is problem dependent, depending on the type of the structure, the type of features etc.
In the meantime, optimal transport (OT)  has emerged as a powerful tool to compute distances(a.k.a. Wasserstein or earth mover's distances) between empirical distribution of data, thanks to new computational schemes that make the transport computation tractable. It has wide applications in computer vision, statistics, imaging and has been recently introduced in the machine learning community to efficiently solve classification or transfer learning problems. The advantage of OT is that it can compare possibly high dimensional empirical probability measures, taking into account the geometry of the underlying metric spaces and dealing with discrete measures.
The objective of the PhD is to define a new unified paradigm for classification of structured data by leveraging on the theory of optimal transport. Two directions will be explored:
i) integration of the information carried out by the structure directly in the OT problem. In particular,  the lead of defining a dedicated regularization term shall be explored;
ii) integration fo the structure directly inside the distance matrix between the data, building  upon  the  notion of  Gromov-Wasserstein  distances for instance.
The aim is to produce an unified framework for many types of structured data, integrating problem specificities within the shape of the regularization or distances. A particular emphasis will be put on the developpement of efficient solutions, able to deal with large datasets.
From an application point of view, a particular attention will be given on remote sensing datasets. Indeed, hierarchical representations are more and more used to model the content of an image, providing an effective framework for image classification. In addition, with the launch of new satellites, spatial and temporal resolution of remote sensing images has considerably increased, thus calling for the development of efficient algorithms.
The OBELIX Team is a team   from Irisa (http://www.irisa.fr/). The team is dedicated to environment observation problems, implying advanced image processing techniques and machine learning. The team is co-located between Rennes and Vannes, two beautiful cities from Brittany, France. 
More information on the following PDF file.
Début des travaux: 
Automne 2017
Mots clés: 
structured data, machine learning, optimal transport, regularisation
IRISA - Campus de Tohannic - Vannes et Université Rennes 2