TEXMEX Research Team
Efficient Exploitation of Multimedia Documents
Exploration, Indexing, Navigation, and Access to Very Large Databases

PhD thesis subject proposed for fall 2010

Multimodal content-based video analysis for video on demand

Key words

Video analysis, multimedia, video-on-demand, statistical models, Bayesian networks

Description

Understanding video content is fundamental for many applications related to video analysis. Such an understanding is not possible without extracting high-level and semantic information from video documents. Due to the increasing size of video databases, manual extraction of this information is no longer practical, thereby motivating automatic techniques. This PhD thesis will therefore focus on studying and establishing new techniques aiming at producing a semantic description of multimedia documents.

The work will be done in collaboration between an industrial partner, Thomson, and a public research institute, IRISA/INRIA. With this context in mind, the targeted applications will be part of the innovation axis for Thomson products.

The Ph. D. work will focus on two applications in the context of video-on-demand (VoD). In this domain, any content related information that can help a final user to make up his choice in the video catalogue is welcome. The objective is therefore to automatically generate some metadata at a high semantic level, which can be aggregated to the videos and proposed to a user navigating a content database.

In that sense, the use of a criterion for (depicting) 'sensible scenes' allows a user to apply a parental control on the films of the catalog, which were not tagged as 'adult-content'. For these films, the 'most sensible' scenes can be proposed to the user for display. This search of sensible scenes, which may be restricted to 'violent scenes', is an example of the first application that will be investigated.

Among the video documents proposed in VoD applications, one may find some report programs and documentaries. With the objective of helping the user to choose some content and to browse this content in a non-linear manner, it is also interesting to study which semantic information may be extracted from such documents, to describe their different parts and reports. Extracting semantic information from documentaries and report programs will be the second application of this PhD.

Both applications rely on automatic video content analysis in order to extract knowledge from low-level video features. One of the difficulties of this task comes from the multimodal nature of videos where information is embedded in both the audio and visual modalities and, eventually, on associated textual data (e.g., program description from a program guide). The challenge is therefore to develop truly multimodal approaches for video analysis. Recently, Bayesian networks have been successfully used to design joint statistical models for the integration of multimodal inputs, but their use is limited to rather simple tasks such as event detection in soccer programs. In this Ph. D., we will start from such joint models to find out their limits and propose new multimodal analysis paradigms. In particular, we will investigate the notion of collaborative analysis where each modality is analyzed on its own but in collaboration with the analyses in other modalities.

The PhD candidate will belong to the Thomson project "Content Production Workflow" (Thomson Corporate Research). The aim of this project is to develop a software library called Content Preparation Library (CPL). The CPL technology is a software module that easily integrates with final system products to provide automatic content analysis, automatic metadata generation and automatic content preparation for repurposing (i.e. web & mobile publishing, content search). All core technologies developed during the PhD will therefore be included in the Content Preparation Library. The algorithms developed will thus have to satisfy some constraints in term of efficiency (computational efficiency, robustness and result quality) and innovation (patents and publications). In collaboration with the R&D engineers of the SAP lab, algorithms will be contextualized and tested in some key applications. The goal is to provide prototypes to other Thomson entities in charge of product development.

Supervision

The research work will be done in partnership with the Content Production Workflow project of Thomson and the Texmex and Metiss projects of IRISA/INRIA. The PhD candidate will be mainly located at Thomson Corporate Research Rennes but will have full access to IRISA/INRIA. He will be strongly encouraged to have frequent visits to IRISA/INRIA.

The main part of the source code development will be done under Thomson R&D engineer's responsibility, the theoretical part of the PhD being supervised by both IRISA/INRIA and Thomson PhD directors.

Advisors