Multimodal detection of Fake News

Publié le lun 24/01/2022 - 11:21
Type de contrat
Corps / Catégorie
Research engineer
Equipe de recherche
The position will be supervised by Ewa Kijak and Vincent Claveau. The position is to be filled as soon as possible. Its max duration is 18 months.
EU-citizen only.

The objective of the project is to propose and implement robust approaches to detect manipulations in documents containing text and images, adapted in particular to social networks. To do so, we wish to exploit multimodal information and contextual information (knowledge base on personalities, publication context).

The research questions that arise are the following:

  • How to leverage users' reactions to detect fake news?
  • Which multimodal representations are most suitable to deal with text/image news [2]?
  • How to take advantage of advances on multimodal tasks such as the generation of one modality from another to from another to extract new multimodal representations?
  • What database search strategies should be used? How should the returned results be exploited?
  • How can attention layers be used to highlight the words or regions of the image that contributed the most to the decision?


These questions will be explored through two use cases, which are not often discussed in the literature although frequent.


Use case 1: image reuse (repurposing) The first case is that of image reuse, in which the image is authentic but the accompanying metadata (usually the caption) has been manipulated. The problem naturally calls for representations of text and image, but goes beyond the simpler task of text-image matching. There have been few recent attempts to detect the reuse in the literature [6, 10], evaluated on small and unrealistic datasets.

The detection will exploit contextual information in the form of a knowledge base, such as all publications of major news media. Multimodal approaches will be implemented to retrieve and characterize the semantic consistency of the information: joint or common representations of documents [2], but also representations of one modality conditioned by the other [1, 4, 9].


Use case 2: videos of manipulated people (reenactment) The second case is that of videos of manipulated people by expression transfer without identity modification [5]. The current deepfakes databases and the corresponding detection algorithms focus on face swapping type modifications (s.t the recent Facebook challenge). These algorithms, based on deep networks, generalize poorly to new examples [11]. Here again, a detection approach based on multimodality makes sense and is independent of the techniques used to generate these contents: modified videos often make a character speak in a different way than the original. In connection with the first axis, the aim is to jointly represent the visual characteristics (typically, a face) with linguistic information.


Profil / Compétences
The candidate should have:
- a PhD in Computer Science, in relation with AI/machine learning
- experience of deep learning for NLP and/or IR and/or Computer Vision
- programming skills: Python, PyTorch or TensorFlow, HuggingFace
- fluency in English is required, French would be appreciated

The agency funding this project may require a EU citizenship.
Diplôme requis
PhD in computer science; engineering degree or equivalent
Lieu de travail
IRISA Rennes
Date prévisionnelle d'embauche
Date limite de candidature
Durée du contrat (en mois)
Salaire brut mensuel
Salary according to CNRS pay scale (minimal Annual Gross Salary: 31 000 € with charges i.e. about 2000€ net per month)
To apply, visit: