Vous êtes ici

Computational models of visual attention aided by geo-localized and time stamp information


PERCEPT TEAM IRISAPERCEPT team: the word percept has a Latin root, i.e. perceptum. It refers to the perception of an object or more generally to the perception of visual phenomena. The human visual system is one of the most important parts of our central nervous system. It gives us the ability to detect and to interpret visual information of our visual environment. The understanding of the mechanisms taking place in the visual system as well as our ability to reproduce them, from a computational standpoint, is a huge challenge. PERCEPT team aims to understand and to express algorithmically complex phenomena taking place in our visual system.

About the Project ANR DISSOCIE

DISSOCIE stands for “Détection automatIque des SaillanceS du point de vue des Opérateurs et Compression Intelligente des vidéos de dronEs“ or “Automated Detection of SaliencieS from Operators' Point of View and Intelligent Compression of DronE videos”.

Home page : https://sites.google.com/insa-rennes.fr/projetanrastrid-dissocie

The aerial surveillance, monitoring and observation with drone present major challenges in terms of defence, security and environment. For example, France and Britain have agreed to invest 2 billion euros in a project to build next-generation multi-role drones capable of carrying out surveillance and observation missions, identifying targets and launching strikes on enemy territory for future operational capacity beyond 2030. However, the observation, targets identification and surveillance missions are currently being carried out by human operators, who do not have the ability to fully and effectively exploit all available drone videos. The science and the technology of the eye-tracking study, visual attention modelling, human operator models, and intelligent compression opens up new possibilities to meet these challenges.

In this context, the DISSOCIE project aims to develop automatic and semi-automatic operator models capable of detecting salient areas from the point of view of human operators, by considering the low-level characteristics of the salient content in the videos, geo-temporally localized contextual information, and the expertise and the detection strategies of human operators. Machine learning can be used at different levels of this modelling process. The new HEVC video compression standard and the scalable coding will also be exploited in this project to improve the efficiency when the experts re-watch the videos. The originality of the project lies in an innovative approach to jointly address these challenges based on the complementarity and the strengthening of the scientific expertise gathered in the consortium: especially on eye-tracking analysis, visual fixation prediction, visual attention modelling, salient object detection and segmentation, human observer modelling, and video compression. The project is broken down into 4 tasks: Construction of a ground truth (T1 Task), Development of models and algorithms of geo-temporally localized saliency (T2 Task), Human operator modelling via machine learning and its integration with the geo-temporally localized saliency (Task T3), Intelligent compression based on salient regions and metadata insertion (T4 Task). The DISSOCIE initiative, from its consortium formed by three academic members (IETR/VADDER, IRISA/PERCEPT, LS2N/IPI), will implement an applied research program.


The objective of the postdoc is to contribute mainly to the design of new computational models and algorithms of geo-temporally localized salience (task T2 of the project).
The main objective is to design a computational model of visual attention for the specific case of drone video sequences. This model would be based on deep network architecture and trained on existing datasets (such as [1]) as well as on a proprietary dataset coming from the T1 task of the project. In addition, we aim to use prior knowledge in order to help the model to detect salient areas. By prior knowledge, we want to use time and geographical information when available.

The postdoc will then contribute to the definition and the development of this model. He will contribute also to the project life (production of the corresponding deliverables, meetings), and to the dissemination of the scientific results.

Profil / compétences: 

Skills and profile:

  • PhD in computer science, data science, signal/image processing, computer vision or applied maths.
  • Background in visual attention modeling, eye-tracking, computational modeling, image/video coding, image/video processing and computer vision.
    Excellent programming skills.
  • Basics (or confirmed) in deep learning (python / Keras)
  • Fluency in English.
Lieu de travail: 
IRISA Rennes, Campus de Beaulieu
Type de contrat: 
Durée du contrat (en mois): 
Salaire Brut / Mens €: 
Around 2200 euros
Date prévisionnelle d'embauche: 
Le plus tôt possible

Send your CV and motivations olemeur@irisa.fr

Nationality: the candidate must be European (UE or Swiss)


[1] Robicquet, A., Sadeghian, A., Alahi, A., & Savarese, S. (2016, October). Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vision (pp. 549-565). Springer, Cham. http://cvgl.stanford.edu/projects/uav_data/